From colleen at gazlene.net Sat Jun 1 00:17:56 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Fri, 31 May 2019 17:17:56 -0700 Subject: [keystone] Keystone Team Update - Week of 27 May 2019 Message-ID: <462238d5-1fb1-4e71-a7fe-6073fe58e2c7@www.fastmail.com> # Keystone Team Update - Week of 27 May 2019 ## News ### Admin Endpoint in Keystonemiddleware Currently, keystonemiddleware is hardcoded to use the admin endpoint to communicate with keystone. With the removal of the v2 API, having an admin endpoint shouldn't be necessary, so Jens is working on making this configurable[1]. There has been a fair amount of debate over how to do this transition and what the new default should be. Please respond on the patch with your thoughts. [1] https://review.opendev.org/651790 ### Unit Test Refactor Lance is working on refactoring the protection unit tests to avoid calling setUp() repetitively. There was discussion about the best way to do this[2], given that we make a lot of use of instance methods in the unit tests, especially with regard to fixtures. [2] http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2019-05-30.log.html#t2019-05-30T14:19:25 ### M-1 Team Check-in As discussed at the PTG, we'l lbe holding milestone-ly check-ins and retrospectives to try to keep up momentum throughout the cycle. The first one is scheduled for June 11, 15:00-17:00 UTC[3]. [3] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006783.html ## Open Specs Train specs: https://bit.ly/2uZ2tRl Ongoing specs: https://bit.ly/2OyDLTh Spec proposals for Train are due next week! If you are planning a feature for Train, please propose the spec ASAP or it will not be accepted for Train. ## Recently Merged Changes Search query: https://bit.ly/2pquOwT We merged 7 changes this week. ## Changes that need Attention Search query: https://bit.ly/2tymTje There are 38 changes that are passing CI, not in merge conflict, have no negative reviews and aren't proposed by bots. This includes important changes like Train specs, the removal of PKI support from keystonemiddleware[4], an update to our vision reflection[5], and a change to make keystoneauth's error handling conform to the API-SIG's guidelines[6]. [4] https://review.opendev.org/613675 [5] https://review.opendev.org/662106 [6] https://review.opendev.org/662281 ## Bugs This week we opened 1 new bugs and closed 5. Bugs opened (1) Bug #1831100 (keystone:Undecided) opened by Kris Watson https://bugs.launchpad.net/keystone/+bug/1831100 Bugs closed (1) Bug #1807697 (keystone:Wishlist) https://bugs.launchpad.net/keystone/+bug/1807697 Bugs fixed (4) Bug #1815771 (keystone:Medium) fixed by Jose Castro Leon https://bugs.launchpad.net/keystone/+bug/1815771 Bug #1804700 (keystone:Low) fixed by Gage Hugo https://bugs.launchpad.net/keystone/+bug/1804700 Bug #1801101 (keystoneauth:Undecided) fixed by Chinmay Naik https://bugs.launchpad.net/keystoneauth/+bug/1801101 Bug #1827008 (keystoneauth:Undecided) fixed by jacky06 https://bugs.launchpad.net/keystoneauth/+bug/1827008 ## Milestone Outlook https://releases.openstack.org/train/schedule.html Next week is spec proposal freeze. Please ensure that the specs you are planning are proposed ASAP. Reviews of proposed specs are also welcome. ## Help with this newsletter Help contribute to this newsletter by editing the etherpad: https://etherpad.openstack.org/p/keystone-team-newsletter From gagehugo at gmail.com Sat Jun 1 00:34:17 2019 From: gagehugo at gmail.com (Gage Hugo) Date: Fri, 31 May 2019 19:34:17 -0500 Subject: [security] Security SIG Newsletter Message-ID: #Week of: 30 May 2019 - Security SIG Meeting Info: http://eavesdrop.openstack.org/#Security_SIG_meeting - Weekly on Thursday at 1500 UTC in #openstack-meeting - Agenda: https://etherpad.openstack.org/p/security-agenda - https://security.openstack.org/ - https://wiki.openstack.org/wiki/Security-SIG #Meeting Notes - Summary: http://eavesdrop.openstack.org/meetings/security/2019/security.2019-05-30-15.00.html - nickthetait offered to start helping with cleaning up & updating the security guide docs ## News - Interesting article: https://duo.com/decipher/docker-bug-allows-root-access-to-host-file-system # VMT Reports - A full list of publicly marked security issues can be found here: https://bugs.launchpad.net/ossa/ - No new public security bugs this week -------------- next part -------------- An HTML attachment was scrubbed... URL: From aj at suse.com Sat Jun 1 07:26:45 2019 From: aj at suse.com (Andreas Jaeger) Date: Sat, 1 Jun 2019 09:26:45 +0200 Subject: [tc][all] Github mirroring (or lack thereof) for unofficial projects In-Reply-To: References: <20190503190538.GB3377@localhost.localdomain> <20190515175110.26i2xuclkksgx744@arabian.linksys.moosehall> <8d81b9a7-b460-43e1-a774-9bd65ee42143@www.fastmail.com> <20190530180658.xgpcy35au72ccmzt@yuggoth.org> Message-ID: On 01/06/2019 01.50, Clark Boylan wrote: > Close, I think we can archive all repos in openstack-dev and openstack-infra. Part of the repo renames we did today were to get the repos that were left behind in those two orgs into their longer term homes. Then any project in https://github.com/openstack that is not in https://opendev.org/openstack can be archived in Github too. > Once https://review.opendev.org/661803 merged, we can archive openstack-infra. openstack-dev is already unused. We have then only retired repos in openstack-infra, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From mnaser at vexxhost.com Sat Jun 1 12:35:14 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Sat, 1 Jun 2019 08:35:14 -0400 Subject: [qa][openstack-ansible] redefining devstack Message-ID: Hi everyone, This is something that I've discussed with a few people over time and I think I'd probably want to bring it up by now. I'd like to propose and ask if it makes sense to perhaps replace devstack entirely with openstack-ansible. I think I have quite a few compelling reasons to do this that I'd like to outline, as well as why I *feel* (and I could be biased here, so call me out!) that OSA is the best option in terms of a 'replacement' # Why not another deployment project? I actually thought about this part too and considered this mainly for ease of use for a *developer*. At this point, Puppet-OpenStack pretty much only deploys packages (which means that it has no build infrastructure, a developer can't just get $commit checked out and deployed). TripleO uses Kolla containers AFAIK and those have to be pre-built beforehand, also, I feel they are much harder to use as a developer because if you want to make quick edits and restart services, you have to enter a container and make the edit there and somehow restart the service without the container going back to it's original state. Kolla-Ansible and the other combinations also suffer from the same "issue". OpenStack Ansible is unique in the way that it pretty much just builds a virtualenv and installs packages inside of it. The services are deployed as systemd units. This is very much similar to the current state of devstack at the moment (minus the virtualenv part, afaik). It makes it pretty straight forward to go and edit code if you need/have to. We also have support for Debian, CentOS, Ubuntu and SUSE. This allows "devstack 2.0" to have far more coverage and make it much more easy to deploy on a wider variety of operating systems. It also has the ability to use commits checked out from Zuul so all the fancy Depends-On stuff we use works. # Why do we care about this, I like my bash scripts! As someone who's been around for a *really* long time in OpenStack, I've seen a whole lot of really weird issues surface from the usage of DevStack to do CI gating. For example, one of the recent things is the fact it relies on installing package-shipped noVNC, where as the 'master' noVNC has actually changed behavior a few months back and it is completely incompatible at this point (it's just a ticking thing until we realize we're entirely broken). To this day, I still see people who want to POC something up with OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter how many warnings we'll put up, they'll always try to do it. With this way, at least they'll have something that has the shape of an actual real deployment. In addition, it would be *good* in the overall scheme of things for a deployment system to test against, because this would make sure things don't break in both ways. Also: we run Zuul for our CI which supports Ansible natively, this can remove one layer of indirection (Zuul to run Bash) and have Zuul run the playbooks directly from the executor. # So how could we do this? The OpenStack Ansible project is made of many roles that are all composable, therefore, you can think of it as a combination of both Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained the base modules (i.e. puppet-nova, etc) and TripleO was the integration of all of it in a distribution. OSA is currently both, but it also includes both Ansible roles and playbooks. In order to make sure we maintain as much of backwards compatibility as possible, we can simply run a small script which does a mapping of devstack => OSA variables to make sure that the service is shipped with all the necessary features as per local.conf. So the new process could be: 1) parse local.conf and generate Ansible variables files 2) install Ansible (if not running in gate) 3) run playbooks using variable generated in #1 The neat thing is after all of this, devstack just becomes a thin wrapper around Ansible roles. I also think it brings a lot of hands together, involving both the QA team and OSA team together, which I believe that pooling our resources will greatly help in being able to get more done and avoiding duplicating our efforts. # Conclusion This is a start of a very open ended discussion, I'm sure there is a lot of details involved here in the implementation that will surface, but I think it could be a good step overall in simplifying our CI and adding more coverage for real potential deployers. It will help two teams unite together and have more resources for something (that essentially is somewhat of duplicated effort at the moment). I will try to pick up sometime to POC a simple service being deployed by an OSA role instead of Bash, placement which seems like a very simple one and share that eventually. Thoughts? :) -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From skaplons at redhat.com Sat Jun 1 17:35:23 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Sat, 1 Jun 2019 19:35:23 +0200 Subject: [neutron][networking-ovn] Core team updates In-Reply-To: <2e3ac83e-63bd-2107-2d41-943d483b0687@redhat.com> References: <2e3ac83e-63bd-2107-2d41-943d483b0687@redhat.com> Message-ID: Congrats Kuba and good luck Miguel in Your new role :) > On 31 May 2019, at 10:53, Jakub Libosvar wrote: > > Thanks for your trust! I'll try to do my best! Looking forward to our > future collaboration. > > Jakub > > On 31/05/2019 10:38, Lucas Alvares Gomes wrote: >> Hi all, >> >> I'd like to welcome Jakub Libosvar to the networking-ovn core team. >> The team was in need for more reviewers with +2/+A power and Jakub's >> reviews have been super high quality [0][1]. He's also helping the >> project out in many other different efforts such as bringing in the >> full stack test suit and bug fixes. >> >> Also, Miguel Ajo has changed focus from OVN/networking-ovn and is been >> dropped from the core team. Of course, we will welcome him back when >> his activity picks back up again. >> >> Thank you Jakub and Miguel! >> >> [0] https://www.stackalytics.com/report/contribution/networking-ovn/30 >> [1] https://www.stackalytics.com/report/contribution/networking-ovn/90 >> >> Cheers, >> Lucas >> > > — Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Sat Jun 1 17:46:11 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Sat, 1 Jun 2019 19:46:11 +0200 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: Hi, I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it in separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something similar possible with OSA or will it be needed to contribute always every change to OSA repository? Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep in mind. > On 1 Jun 2019, at 14:35, Mohammed Naser wrote: > > Hi everyone, > > This is something that I've discussed with a few people over time and > I think I'd probably want to bring it up by now. I'd like to propose > and ask if it makes sense to perhaps replace devstack entirely with > openstack-ansible. I think I have quite a few compelling reasons to > do this that I'd like to outline, as well as why I *feel* (and I could > be biased here, so call me out!) that OSA is the best option in terms > of a 'replacement' > > # Why not another deployment project? > I actually thought about this part too and considered this mainly for > ease of use for a *developer*. > > At this point, Puppet-OpenStack pretty much only deploys packages > (which means that it has no build infrastructure, a developer can't > just get $commit checked out and deployed). > > TripleO uses Kolla containers AFAIK and those have to be pre-built > beforehand, also, I feel they are much harder to use as a developer > because if you want to make quick edits and restart services, you have > to enter a container and make the edit there and somehow restart the > service without the container going back to it's original state. > Kolla-Ansible and the other combinations also suffer from the same > "issue". > > OpenStack Ansible is unique in the way that it pretty much just builds > a virtualenv and installs packages inside of it. The services are > deployed as systemd units. This is very much similar to the current > state of devstack at the moment (minus the virtualenv part, afaik). > It makes it pretty straight forward to go and edit code if you > need/have to. We also have support for Debian, CentOS, Ubuntu and > SUSE. This allows "devstack 2.0" to have far more coverage and make > it much more easy to deploy on a wider variety of operating systems. > It also has the ability to use commits checked out from Zuul so all > the fancy Depends-On stuff we use works. > > # Why do we care about this, I like my bash scripts! > As someone who's been around for a *really* long time in OpenStack, > I've seen a whole lot of really weird issues surface from the usage of > DevStack to do CI gating. For example, one of the recent things is > the fact it relies on installing package-shipped noVNC, where as the > 'master' noVNC has actually changed behavior a few months back and it > is completely incompatible at this point (it's just a ticking thing > until we realize we're entirely broken). > > To this day, I still see people who want to POC something up with > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > how many warnings we'll put up, they'll always try to do it. With > this way, at least they'll have something that has the shape of an > actual real deployment. In addition, it would be *good* in the > overall scheme of things for a deployment system to test against, > because this would make sure things don't break in both ways. > > Also: we run Zuul for our CI which supports Ansible natively, this can > remove one layer of indirection (Zuul to run Bash) and have Zuul run > the playbooks directly from the executor. > > # So how could we do this? > The OpenStack Ansible project is made of many roles that are all > composable, therefore, you can think of it as a combination of both > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > the base modules (i.e. puppet-nova, etc) and TripleO was the > integration of all of it in a distribution. OSA is currently both, > but it also includes both Ansible roles and playbooks. > > In order to make sure we maintain as much of backwards compatibility > as possible, we can simply run a small script which does a mapping of > devstack => OSA variables to make sure that the service is shipped > with all the necessary features as per local.conf. > > So the new process could be: > > 1) parse local.conf and generate Ansible variables files > 2) install Ansible (if not running in gate) > 3) run playbooks using variable generated in #1 > > The neat thing is after all of this, devstack just becomes a thin > wrapper around Ansible roles. I also think it brings a lot of hands > together, involving both the QA team and OSA team together, which I > believe that pooling our resources will greatly help in being able to > get more done and avoiding duplicating our efforts. > > # Conclusion > This is a start of a very open ended discussion, I'm sure there is a > lot of details involved here in the implementation that will surface, > but I think it could be a good step overall in simplifying our CI and > adding more coverage for real potential deployers. It will help two > teams unite together and have more resources for something (that > essentially is somewhat of duplicated effort at the moment). > > I will try to pick up sometime to POC a simple service being deployed > by an OSA role instead of Bash, placement which seems like a very > simple one and share that eventually. > > Thoughts? :) > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com > — Slawek Kaplonski Senior software engineer Red Hat From mnaser at vexxhost.com Sat Jun 1 18:49:10 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Sat, 1 Jun 2019 14:49:10 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Sat, Jun 1, 2019 at 1:46 PM Slawomir Kaplonski wrote: > > Hi, > > I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it in separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something similar possible with OSA or will it be needed to contribute always every change to OSA repository? Not a dumb question at all. So, we do have this concept of 'roles' which you _could_ kinda technically identify similar to plugins. However, I think one of the things that would maybe come out of this is the inability for projects to maintain their own plugins (because now you can host neutron/devstack/plugins and you maintain that repo yourself), under this structure, you would indeed have to make those changes to the OpenStack Ansible Neutron role i.e.: https://opendev.org/openstack/openstack-ansible-os_neutron However, I think from an OSA perspective, we would be more than happy to add project maintainers for specific projects to their appropriate roles. It would make sense that there is someone from the Neutron team that could be a core on os_neutron from example. > Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep in mind. Indeed, with our current CI infrastructure with OSA, we have the ability to create these dynamic scenarios (which can actually be defined by a simple Zuul variable). https://github.com/openstack/openstack-ansible/blob/master/zuul.d/playbooks/pre-gate-scenario.yml#L41-L46 We do some really neat introspection of the project name being tested in order to run specific scenarios. Therefore, that is something that should be quite easy to accomplish simply by overriding a scenario name within Zuul. It also is worth mentioning we now support full metal deploys for a while now, so not having to worry about containers is something to keep in mind as well (with simplifying the developer experience again). > > On 1 Jun 2019, at 14:35, Mohammed Naser wrote: > > > > Hi everyone, > > > > This is something that I've discussed with a few people over time and > > I think I'd probably want to bring it up by now. I'd like to propose > > and ask if it makes sense to perhaps replace devstack entirely with > > openstack-ansible. I think I have quite a few compelling reasons to > > do this that I'd like to outline, as well as why I *feel* (and I could > > be biased here, so call me out!) that OSA is the best option in terms > > of a 'replacement' > > > > # Why not another deployment project? > > I actually thought about this part too and considered this mainly for > > ease of use for a *developer*. > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > (which means that it has no build infrastructure, a developer can't > > just get $commit checked out and deployed). > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > beforehand, also, I feel they are much harder to use as a developer > > because if you want to make quick edits and restart services, you have > > to enter a container and make the edit there and somehow restart the > > service without the container going back to it's original state. > > Kolla-Ansible and the other combinations also suffer from the same > > "issue". > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > a virtualenv and installs packages inside of it. The services are > > deployed as systemd units. This is very much similar to the current > > state of devstack at the moment (minus the virtualenv part, afaik). > > It makes it pretty straight forward to go and edit code if you > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > it much more easy to deploy on a wider variety of operating systems. > > It also has the ability to use commits checked out from Zuul so all > > the fancy Depends-On stuff we use works. > > > > # Why do we care about this, I like my bash scripts! > > As someone who's been around for a *really* long time in OpenStack, > > I've seen a whole lot of really weird issues surface from the usage of > > DevStack to do CI gating. For example, one of the recent things is > > the fact it relies on installing package-shipped noVNC, where as the > > 'master' noVNC has actually changed behavior a few months back and it > > is completely incompatible at this point (it's just a ticking thing > > until we realize we're entirely broken). > > > > To this day, I still see people who want to POC something up with > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > how many warnings we'll put up, they'll always try to do it. With > > this way, at least they'll have something that has the shape of an > > actual real deployment. In addition, it would be *good* in the > > overall scheme of things for a deployment system to test against, > > because this would make sure things don't break in both ways. > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > the playbooks directly from the executor. > > > > # So how could we do this? > > The OpenStack Ansible project is made of many roles that are all > > composable, therefore, you can think of it as a combination of both > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > integration of all of it in a distribution. OSA is currently both, > > but it also includes both Ansible roles and playbooks. > > > > In order to make sure we maintain as much of backwards compatibility > > as possible, we can simply run a small script which does a mapping of > > devstack => OSA variables to make sure that the service is shipped > > with all the necessary features as per local.conf. > > > > So the new process could be: > > > > 1) parse local.conf and generate Ansible variables files > > 2) install Ansible (if not running in gate) > > 3) run playbooks using variable generated in #1 > > > > The neat thing is after all of this, devstack just becomes a thin > > wrapper around Ansible roles. I also think it brings a lot of hands > > together, involving both the QA team and OSA team together, which I > > believe that pooling our resources will greatly help in being able to > > get more done and avoiding duplicating our efforts. > > > > # Conclusion > > This is a start of a very open ended discussion, I'm sure there is a > > lot of details involved here in the implementation that will surface, > > but I think it could be a good step overall in simplifying our CI and > > adding more coverage for real potential deployers. It will help two > > teams unite together and have more resources for something (that > > essentially is somewhat of duplicated effort at the moment). > > > > I will try to pick up sometime to POC a simple service being deployed > > by an OSA role instead of Bash, placement which seems like a very > > simple one and share that eventually. > > > > Thoughts? :) > > > > -- > > Mohammed Naser — vexxhost > > ----------------------------------------------------- > > D. 514-316-8872 > > D. 800-910-1726 ext. 200 > > E. mnaser at vexxhost.com > > W. http://vexxhost.com > > > > — > Slawek Kaplonski > Senior software engineer > Red Hat > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From clemens.hardewig at crandale.de Sun Jun 2 19:09:23 2019 From: clemens.hardewig at crandale.de (Clemens Hardewig) Date: Sun, 2 Jun 2019 21:09:23 +0200 Subject: [nova] Bug #1755266: How to proceed with test failures Message-ID: Hi there, Since Pike I am struggling with instance migration/instance resize for flavors whose swap space is provided via an lvm volume; according to my understanding the default behavior if cinder uses lvm as a backend driver (at least I could not convince cinder to behave different …). I am somewhat surprised that I seem to be the only one who has some problems with that behavior - according to my understanding you are coming into this constellation automatically when simply following the manual installation procedure as being described in the official Openstack docs... Anyway I opened the bug above, however it did not find some interest and I tried then as a python newbie to get it fixed by my own. After a lengthy live test phase of my changes in the driver.py across Pike, Queens, Rocky, and now also Stein, I made then my first commit (yeee - got it done) to the master branch, had a good short conversation with Eric in Berlin on it, fixed some code format issues Zuul was rightfully complaining about but then unfortunately failed some further tests in Zuul and other test areas. (see https://review.opendev.org/#/c/618621/ ). Related to my changes, tox gets me an error: nova.tests.unit.virt.libvirt.test_driver.LibvirtDriverTestCase.test_finish_migration_power_on --------------------------------------------------------------------------------------------- Captured traceback: ~~~~~~~~~~~~~~~~~~~ Traceback (most recent call last): File "nova/tests/unit/virt/libvirt/test_driver.py", line 18707, in test_finish_migration_power_on self._test_finish_migration() File "/opt/stack/nova/.tox/py27/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched return func(*args, **keywargs) File "nova/tests/unit/virt/libvirt/test_driver.py", line 18662, in _test_finish_migration mock_raw_to_qcow2.assert_has_calls(convert_calls, any_order=True) File "/opt/stack/nova/.tox/py27/local/lib/python2.7/site-packages/mock/mock.py", line 983, in assert_has_calls ), cause) File "/opt/stack/nova/.tox/py27/local/lib/python2.7/site-packages/six.py", line 737, in raise_from raise value AssertionError: (call('/tmp/tmpC60sPk/tmpOHikWL/8ea1de33-64d7-4d1d-af02-88e6f7ec91c1/disk.swap'),) not all found in call list but I failed yet to develop any idea what needs to be done to get the test failure fixed. I am missing context here how the logic of the test code works; therefore I would like to ask whether somebody could point me in the right direction what needs to be done to get the failed unit/Zuul/other Tests passed. Are there any docs or helps alongside testing in Nova where to learn what to do? Perhaps also someone could give me a hint whether there are some conceptually misleading ideas behind my fix proposal which need to be driven into another direction … I am looking forward to your reply Best regards Clemens Clemens Hardewig -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3898 bytes Desc: not available URL: From henry at thebonaths.com Mon Jun 3 03:45:35 2019 From: henry at thebonaths.com (Henry Bonath) Date: Sun, 2 Jun 2019 23:45:35 -0400 Subject: [openstack-ansible] Installing Third-Party drivers into the Cinder-Volume container during playbook execution In-Reply-To: References: Message-ID: I think the idea here, at least for me, would be to have it rolled into the deployment automatically - in a similar fashion to how horizon themes are deployed within Openstack-Ansible. Obviously having this specific driver in the tree would solve my specific issue, but I don't know how many more third party Cinder drivers which are not packaged into the tree people are deploying these days. My question for the community is simply finding out if this mechanism exists already. On Thu, May 30, 2019 at 11:11 AM Jean-Philippe Evrard wrote: > > > > On Tue, May 28, 2019, at 04:10, Henry Bonath wrote: > > Hello, I asked this into IRC but I thought this might be a more > > appropriate place to ask considering the IRC channel usage over the > > weekend. > > > > If I wanted to deploy a third party driver along with my Cinder-Volume > > container, is there a built-in mechanism for doing so? (I am > > specifically wanting to use: https://github.com/iXsystems/cinder) > > > > I am able to configure a cinder-backend in the > > "openstack_user_config.yml" file which works perfectly if I let it > > fail during the first run, then copy the driver into the containers > > and run "os-cinder-install.yml" a second time. > > > > I've found that you guys have built similar stuff into the system > > (e.g. Horizon custom Theme installation via .tgz) and was curious if > > there is a similar mechanism for Cinder Drivers that may be > > undocumented. > > > > http://paste.openstack.org/show/752132/ > > This is an example of my working config, which relies on the driver > > being copied into the > > /openstack/venvs/cinder-19.x.x.x/lib/python2.7/site-packages/cinder/volume/drivers/ixsystems/ > > folder. > > > > Thanks in advance! > > > > > > I suppose the community would be okay to have this in tree, so no need for a third party system here (and no need to maintain this on your own, separately). However... if it's just about copying the content of this repo, did you think of packaging this, and publish it to pypi ? This way you could just pip install the necessary package into your cinder venv... > > Regards, > Jean-Philippe Evrard (evrardjp) > From madhuri.kumari at intel.com Mon Jun 3 05:53:56 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Mon, 3 Jun 2019 05:53:56 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> Hi Ironic, Nova Developers, I am currently working on implementing Intel Speed Select(ISS) feature[1] in Ironic and I have a use case where I want to change ISS configuration in BIOS after a node is provisioned. Such use case of changing the configuration post deployment is common and not specific to ISS. A real-life example for such a required post-deploy configuration change is the change of BIOS settings to disable hyper-threading in order to address a security vulnerability. Currently there is no way of changing any BIOS configuration after a node is provisioned in Ironic. One solution for it is to allow manual deploy steps in Ironic[2](not implemented yet) which can be trigged by changing traits in Nova. For this purpose, we would need to change a trait of the server's flavor in Nova. This trait is mapped to a deploy step in Ironic which does some operation(change BIOS config and reboot in this use case). In Nova, the only API to change trait in flavor is resize whereas resize does migration and a reboot as well. In short, I am looking for a Nova API that only changes the traits, and trigger the ironic deploy steps but no reboot and migration. Please suggest. Thanks in advance. Regards, Madhuri [1] https://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/support-intel-speed-select.html [2] https://storyboard.openstack.org/#!/story/2005129 -------------- next part -------------- An HTML attachment was scrubbed... URL: From manu.km at idrive.com Mon Jun 3 07:31:20 2019 From: manu.km at idrive.com (Manu K M) Date: Mon, 3 Jun 2019 13:01:20 +0530 Subject: [swift] How to track the rest api call count Message-ID: Hi there I have to keep track of the no of rest call made by a specific account/tenant to my swift cluster. Ceilometer provides only the number of incoming and outgoing bytes. -- Regards Manu K M -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Mon Jun 3 08:46:11 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 3 Jun 2019 09:46:11 +0100 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> Message-ID: On Mon, 3 Jun 2019 at 06:57, Kumari, Madhuri wrote: > Hi Ironic, Nova Developers, > > > > I am currently working on implementing Intel Speed Select(ISS) feature[1] > in Ironic and I have a use case where I want to change ISS configuration in > BIOS after a node is provisioned. > > Such use case of changing the configuration post deployment is common and > not specific to ISS. A real-life example for such a required post-deploy > configuration change is the change of BIOS settings to disable > hyper-threading in order to address a security vulnerability. > > Currently there is no way of changing any BIOS configuration after a node > is provisioned in Ironic. One solution for it is to allow manual deploy > steps in Ironic[2](not implemented yet) which can be trigged by changing > traits in Nova. > > For this purpose, we would need to change a trait of the server’s flavor > in Nova. This trait is mapped to a deploy step in Ironic which does some > operation(change BIOS config and reboot in this use case). > > In Nova, the only API to change trait in flavor is resize whereas resize > does migration and a reboot as well. > > In short, I am looking for a Nova API that only changes the traits, and > trigger the ironic deploy steps but no reboot and migration. Please suggest. > > > Hi, it is possible to modify a flavor (openstack flavor set --property =). However, changes to a flavor are not reflected in instances that were previously created from that flavor. Internally, nova stores an 'embedded flavor' in the instance state. I'm not aware of any API that would allow modifying the embedded flavor, nor any process that would synchronise those changes to ironic. Thanks in advance. > > > > Regards, > > Madhuri > > [1] > https://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/support-intel-speed-select.html > > [2] https://storyboard.openstack.org/#!/story/2005129 > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at ya.ru Mon Jun 3 11:04:37 2019 From: noonedeadpunk at ya.ru (Dmitriy Rabotyagov) Date: Mon, 03 Jun 2019 14:04:37 +0300 Subject: [openstack-ansible] Installing Third-Party drivers into the Cinder-Volume container during playbook execution In-Reply-To: References: Message-ID: <1112501559559877@sas2-0106f63be698.qloud-c.yandex.net> The quick answer - no, currently such option is not present for the cinder role. And by far the only way to install some custom things with cinder now is to provide a list of cinder_user_pip_packages[1], so that's why packaging driver into pypi might be an option for distribution of custom drivers. [1] https://opendev.org/openstack/openstack-ansible-os_cinder/src/branch/master/defaults/main.yml#L302 03.06.2019, 06:59, "Henry Bonath" : > I think the idea here, at least for me, would be to have it rolled > into the deployment automatically - in a similar fashion to how > horizon themes are deployed within Openstack-Ansible. > Obviously having this specific driver in the tree would solve my > specific issue, but I don't know how many more third party Cinder > drivers which are not packaged into the tree people are deploying > these days. > > My question for the community is simply finding out if this mechanism > exists already. > > On Thu, May 30, 2019 at 11:11 AM Jean-Philippe Evrard > wrote: >>  On Tue, May 28, 2019, at 04:10, Henry Bonath wrote: >>  > Hello, I asked this into IRC but I thought this might be a more >>  > appropriate place to ask considering the IRC channel usage over the >>  > weekend. >>  > >>  > If I wanted to deploy a third party driver along with my Cinder-Volume >>  > container, is there a built-in mechanism for doing so? (I am >>  > specifically wanting to use: https://github.com/iXsystems/cinder) >>  > >>  > I am able to configure a cinder-backend in the >>  > "openstack_user_config.yml" file which works perfectly if I let it >>  > fail during the first run, then copy the driver into the containers >>  > and run "os-cinder-install.yml" a second time. >>  > >>  > I've found that you guys have built similar stuff into the system >>  > (e.g. Horizon custom Theme installation via .tgz) and was curious if >>  > there is a similar mechanism for Cinder Drivers that may be >>  > undocumented. >>  > >>  > http://paste.openstack.org/show/752132/ >>  > This is an example of my working config, which relies on the driver >>  > being copied into the >>  > /openstack/venvs/cinder-19.x.x.x/lib/python2.7/site-packages/cinder/volume/drivers/ixsystems/ >>  > folder. >>  > >>  > Thanks in advance! >>  > >>  > >> >>  I suppose the community would be okay to have this in tree, so no need for a third party system here (and no need to maintain this on your own, separately). However... if it's just about copying the content of this repo, did you think of packaging this, and publish it to pypi ? This way you could just pip install the necessary package into your cinder venv... >> >>  Regards, >>  Jean-Philippe Evrard (evrardjp) --  Kind Regards, Dmitriy Rabotyagov From cdent+os at anticdent.org Mon Jun 3 11:24:31 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Mon, 3 Jun 2019 12:24:31 +0100 (BST) Subject: [placement] Office Hours In-Reply-To: References: <923884A4-E427-439E-AD76-9DDBB45550D9@leafe.com> <1559227066.23481.3@smtp.office365.com> Message-ID: On Thu, 30 May 2019, Eric Fried wrote: > +1 for 1500 UTC Wednesdays. wfm, as well Note, that since we've declared this office hours, it means it's ad hoc and nobody is required to be there. It's merely a time that we've designated as a reasonable point for the placement to check in with each other and for other people to check in with the placement team. That is: let's make sure this doesn't turn into "we moved the meeting to wednesday". -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From dangtrinhnt at gmail.com Mon Jun 3 11:34:31 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Mon, 3 Jun 2019 20:34:31 +0900 Subject: [searchlight] Team meeting today cancelled Message-ID: Hi team, I'm in a middle of something right now and will not expect to finish at 13:30 UTC today so we have to cancel the team meeting today. Ping me on the #openstack-searchlight channel. Sorry for this late notice. dangtrinhnt -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim at jimrollenhagen.com Mon Jun 3 11:55:05 2019 From: jim at jimrollenhagen.com (Jim Rollenhagen) Date: Mon, 3 Jun 2019 07:55:05 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: I don't think I have enough coffee in me to fully digest this, but wanted to point out a couple of things. FWIW, this is something I've thought we should do for a while now. On Sat, Jun 1, 2019 at 8:43 AM Mohammed Naser wrote: > Hi everyone, > > This is something that I've discussed with a few people over time and > I think I'd probably want to bring it up by now. I'd like to propose > and ask if it makes sense to perhaps replace devstack entirely with > openstack-ansible. I think I have quite a few compelling reasons to > do this that I'd like to outline, as well as why I *feel* (and I could > be biased here, so call me out!) that OSA is the best option in terms > of a 'replacement' > > # Why not another deployment project? > I actually thought about this part too and considered this mainly for > ease of use for a *developer*. > > At this point, Puppet-OpenStack pretty much only deploys packages > (which means that it has no build infrastructure, a developer can't > just get $commit checked out and deployed). > > TripleO uses Kolla containers AFAIK and those have to be pre-built > beforehand, also, I feel they are much harder to use as a developer > because if you want to make quick edits and restart services, you have > to enter a container and make the edit there and somehow restart the > service without the container going back to it's original state. > Kolla-Ansible and the other combinations also suffer from the same > "issue". > FWIW, kolla-ansible (and maybe tripleo?) has a "development" mode which mounts the code as a volume, so you can make edits and just run "docker restart $service". Though systemd does make that a bit nicer due to globs (e.g. systemctl restart nova-*). That said, I do agree moving to something where systemd is running the services would make for a smoother transition for developers. > > OpenStack Ansible is unique in the way that it pretty much just builds > a virtualenv and installs packages inside of it. The services are > deployed as systemd units. This is very much similar to the current > state of devstack at the moment (minus the virtualenv part, afaik). > It makes it pretty straight forward to go and edit code if you > need/have to. We also have support for Debian, CentOS, Ubuntu and > SUSE. This allows "devstack 2.0" to have far more coverage and make > it much more easy to deploy on a wider variety of operating systems. > It also has the ability to use commits checked out from Zuul so all > the fancy Depends-On stuff we use works. > > # Why do we care about this, I like my bash scripts! > As someone who's been around for a *really* long time in OpenStack, > I've seen a whole lot of really weird issues surface from the usage of > DevStack to do CI gating. For example, one of the recent things is > the fact it relies on installing package-shipped noVNC, where as the > 'master' noVNC has actually changed behavior a few months back and it > is completely incompatible at this point (it's just a ticking thing > until we realize we're entirely broken). > > To this day, I still see people who want to POC something up with > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > how many warnings we'll put up, they'll always try to do it. With > this way, at least they'll have something that has the shape of an > actual real deployment. In addition, it would be *good* in the > overall scheme of things for a deployment system to test against, > because this would make sure things don't break in both ways. > ++ > > Also: we run Zuul for our CI which supports Ansible natively, this can > remove one layer of indirection (Zuul to run Bash) and have Zuul run > the playbooks directly from the executor. > > # So how could we do this? > The OpenStack Ansible project is made of many roles that are all > composable, therefore, you can think of it as a combination of both > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > the base modules (i.e. puppet-nova, etc) and TripleO was the > integration of all of it in a distribution. OSA is currently both, > but it also includes both Ansible roles and playbooks. > > In order to make sure we maintain as much of backwards compatibility > as possible, we can simply run a small script which does a mapping of > devstack => OSA variables to make sure that the service is shipped > with all the necessary features as per local.conf. > ++ > > So the new process could be: > > 1) parse local.conf and generate Ansible variables files > 2) install Ansible (if not running in gate) > 3) run playbooks using variable generated in #1 > > The neat thing is after all of this, devstack just becomes a thin > wrapper around Ansible roles. I also think it brings a lot of hands > together, involving both the QA team and OSA team together, which I > believe that pooling our resources will greatly help in being able to > get more done and avoiding duplicating our efforts. > > # Conclusion > This is a start of a very open ended discussion, I'm sure there is a > lot of details involved here in the implementation that will surface, > but I think it could be a good step overall in simplifying our CI and > adding more coverage for real potential deployers. It will help two > teams unite together and have more resources for something (that > essentially is somewhat of duplicated effort at the moment). > > I will try to pick up sometime to POC a simple service being deployed > by an OSA role instead of Bash, placement which seems like a very > simple one and share that eventually. > > Thoughts? :) > The reason this hasn't been pushed on in the past is to avoid the perception that the TC or QA team is choosing a "winner" in the deployment space. I don't think that's a good reason not to do something like this (especially with the drop in contributors since I've had that discussion). However, we do need to message this carefully at a minimum. > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Mon Jun 3 12:01:06 2019 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 3 Jun 2019 14:01:06 +0200 Subject: [uc][tc][ops] reviving osops- repos In-Reply-To: <20190531164102.5lwt2jyxk24u3vdz@yuggoth.org> References: <20190530205552.falsvxcegehtyuge@yuggoth.org> <20190531123501.tawgvqgsw6yle2nu@csail.mit.edu> <20190531164102.5lwt2jyxk24u3vdz@yuggoth.org> Message-ID: Jeremy Stanley wrote: > On 2019-05-31 10:24:36 -0400 (-0400), Erik McCormick wrote: > [...] >> there's a project [1]. >> >> So either: >> A) Make a SIG out of that and assign the repos to the sig, or >> B) Maybe add it under / rename the Ops Docs SIG [2] as it might bring >> more eyes to both things which serve the same folks. > [...] > > I'd also be perfectly fine with C) say that it's being vouched for > by the UC through its Osops project, stick these repos in a list > *somewhere* as a durable record of that, and let decisions about > project vs. SIG decision be independent of the repository naming > decision. +2 to keep it under the openstack/ namespace one way or another. As to what construct should "own" it, the closest thing we have that would match history would be a UC "team"[1] or "working group"[2], both of which have repositories defined in [3]. Alternatively, I feel like a SIG (be it the Ops Docs SIG or a new "Operational tooling" SIG) would totally be a good idea to revive this. In that case we'd define the repository in [4]. My personal preference would be for a new SIG, but whoever is signing up to work on this should definitely have the final say. [1] https://opendev.org/openstack/governance-uc/src/branch/master/reference/teams.yaml [2] https://opendev.org/openstack/governance-uc/src/branch/master/reference/working-groups.yaml [3] https://opendev.org/openstack/governance/src/branch/master/reference/user-committee-repos.yaml [4] https://opendev.org/openstack/governance/src/branch/master/reference/sigs-repos.yaml -- Thierry Carrez (ttx) From strigazi at gmail.com Mon Jun 3 12:02:32 2019 From: strigazi at gmail.com (Spyros Trigazis) Date: Mon, 3 Jun 2019 14:02:32 +0200 Subject: [magnum] Meeting at 2019-06-04 2100 UTC Message-ID: Hello all, I would like to discuss moving the drivers out-of-tree, as we briefly discussed it in the PTG. Can you all make it for the next meeting [1]? This is not super urgent, but it will accelerate development and bug fixes at the driver level. Cheers, Spyros [0] https://etherpad.openstack.org/p/magnum-train-ptg [1] https://www.timeanddate.com/worldclock/fixedtime.html?msg=magnum-meeting&iso=20190604T21 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bharat at stackhpc.com Mon Jun 3 12:27:40 2019 From: bharat at stackhpc.com (Bharat Kunwar) Date: Mon, 3 Jun 2019 13:27:40 +0100 Subject: [magnum] Meeting at 2019-06-04 2100 UTC In-Reply-To: References: Message-ID: <2EF51CC9-4CF6-4C94-87AF-E93158842D45@stackhpc.com> Sounds good to me! > On 3 Jun 2019, at 13:02, Spyros Trigazis wrote: > > Hello all, > > I would like to discuss moving the drivers out-of-tree, as > we briefly discussed it in the PTG. Can you all make it for the > next meeting [1]? > > This is not super urgent, but it will accelerate development and bug > fixes at the driver level. > > Cheers, > Spyros > > [0] https://etherpad.openstack.org/p/magnum-train-ptg > [1] https://www.timeanddate.com/worldclock/fixedtime.html?msg=magnum-meeting&iso=20190604T21 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Jun 3 12:27:53 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Mon, 3 Jun 2019 14:27:53 +0200 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Hi, > On 1 Jun 2019, at 20:49, Mohammed Naser wrote: > > On Sat, Jun 1, 2019 at 1:46 PM Slawomir Kaplonski wrote: >> >> Hi, >> >> I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it in separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something similar possible with OSA or will it be needed to contribute always every change to OSA repository? > > Not a dumb question at all. So, we do have this concept of 'roles' > which you _could_ kinda technically identify similar to plugins. > However, I think one of the things that would maybe come out of this > is the inability for projects to maintain their own plugins (because > now you can host neutron/devstack/plugins and you maintain that repo > yourself), under this structure, you would indeed have to make those > changes to the OpenStack Ansible Neutron role > > i.e.: https://opendev.org/openstack/openstack-ansible-os_neutron > > However, I think from an OSA perspective, we would be more than happy > to add project maintainers for specific projects to their appropriate > roles. It would make sense that there is someone from the Neutron > team that could be a core on os_neutron from example. Yes, that may work for official projects like Neutron. But what about everything else, like projects hosted now in opendev.org/x/ repositories? Devstack gives everyone easy way to integrate own plugin/driver/project with it and install it together with everything else by simply adding one line (usually) in local.conf file. I think that it may be a bit hard to OSA team to accept and review patches with new roles for every project or driver which isn’t official OpenStack project. > >> Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep in mind. > > Indeed, with our current CI infrastructure with OSA, we have the > ability to create these dynamic scenarios (which can actually be > defined by a simple Zuul variable). > > https://github.com/openstack/openstack-ansible/blob/master/zuul.d/playbooks/pre-gate-scenario.yml#L41-L46 > > We do some really neat introspection of the project name being tested > in order to run specific scenarios. Therefore, that is something that > should be quite easy to accomplish simply by overriding a scenario > name within Zuul. It also is worth mentioning we now support full > metal deploys for a while now, so not having to worry about containers > is something to keep in mind as well (with simplifying the developer > experience again). > >>> On 1 Jun 2019, at 14:35, Mohammed Naser wrote: >>> >>> Hi everyone, >>> >>> This is something that I've discussed with a few people over time and >>> I think I'd probably want to bring it up by now. I'd like to propose >>> and ask if it makes sense to perhaps replace devstack entirely with >>> openstack-ansible. I think I have quite a few compelling reasons to >>> do this that I'd like to outline, as well as why I *feel* (and I could >>> be biased here, so call me out!) that OSA is the best option in terms >>> of a 'replacement' >>> >>> # Why not another deployment project? >>> I actually thought about this part too and considered this mainly for >>> ease of use for a *developer*. >>> >>> At this point, Puppet-OpenStack pretty much only deploys packages >>> (which means that it has no build infrastructure, a developer can't >>> just get $commit checked out and deployed). >>> >>> TripleO uses Kolla containers AFAIK and those have to be pre-built >>> beforehand, also, I feel they are much harder to use as a developer >>> because if you want to make quick edits and restart services, you have >>> to enter a container and make the edit there and somehow restart the >>> service without the container going back to it's original state. >>> Kolla-Ansible and the other combinations also suffer from the same >>> "issue". >>> >>> OpenStack Ansible is unique in the way that it pretty much just builds >>> a virtualenv and installs packages inside of it. The services are >>> deployed as systemd units. This is very much similar to the current >>> state of devstack at the moment (minus the virtualenv part, afaik). >>> It makes it pretty straight forward to go and edit code if you >>> need/have to. We also have support for Debian, CentOS, Ubuntu and >>> SUSE. This allows "devstack 2.0" to have far more coverage and make >>> it much more easy to deploy on a wider variety of operating systems. >>> It also has the ability to use commits checked out from Zuul so all >>> the fancy Depends-On stuff we use works. >>> >>> # Why do we care about this, I like my bash scripts! >>> As someone who's been around for a *really* long time in OpenStack, >>> I've seen a whole lot of really weird issues surface from the usage of >>> DevStack to do CI gating. For example, one of the recent things is >>> the fact it relies on installing package-shipped noVNC, where as the >>> 'master' noVNC has actually changed behavior a few months back and it >>> is completely incompatible at this point (it's just a ticking thing >>> until we realize we're entirely broken). >>> >>> To this day, I still see people who want to POC something up with >>> OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter >>> how many warnings we'll put up, they'll always try to do it. With >>> this way, at least they'll have something that has the shape of an >>> actual real deployment. In addition, it would be *good* in the >>> overall scheme of things for a deployment system to test against, >>> because this would make sure things don't break in both ways. >>> >>> Also: we run Zuul for our CI which supports Ansible natively, this can >>> remove one layer of indirection (Zuul to run Bash) and have Zuul run >>> the playbooks directly from the executor. >>> >>> # So how could we do this? >>> The OpenStack Ansible project is made of many roles that are all >>> composable, therefore, you can think of it as a combination of both >>> Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained >>> the base modules (i.e. puppet-nova, etc) and TripleO was the >>> integration of all of it in a distribution. OSA is currently both, >>> but it also includes both Ansible roles and playbooks. >>> >>> In order to make sure we maintain as much of backwards compatibility >>> as possible, we can simply run a small script which does a mapping of >>> devstack => OSA variables to make sure that the service is shipped >>> with all the necessary features as per local.conf. >>> >>> So the new process could be: >>> >>> 1) parse local.conf and generate Ansible variables files >>> 2) install Ansible (if not running in gate) >>> 3) run playbooks using variable generated in #1 >>> >>> The neat thing is after all of this, devstack just becomes a thin >>> wrapper around Ansible roles. I also think it brings a lot of hands >>> together, involving both the QA team and OSA team together, which I >>> believe that pooling our resources will greatly help in being able to >>> get more done and avoiding duplicating our efforts. >>> >>> # Conclusion >>> This is a start of a very open ended discussion, I'm sure there is a >>> lot of details involved here in the implementation that will surface, >>> but I think it could be a good step overall in simplifying our CI and >>> adding more coverage for real potential deployers. It will help two >>> teams unite together and have more resources for something (that >>> essentially is somewhat of duplicated effort at the moment). >>> >>> I will try to pick up sometime to POC a simple service being deployed >>> by an OSA role instead of Bash, placement which seems like a very >>> simple one and share that eventually. >>> >>> Thoughts? :) >>> >>> -- >>> Mohammed Naser — vexxhost >>> ----------------------------------------------------- >>> D. 514-316-8872 >>> D. 800-910-1726 ext. 200 >>> E. mnaser at vexxhost.com >>> W. http://vexxhost.com >>> >> >> — >> Slawek Kaplonski >> Senior software engineer >> Red Hat >> > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com — Slawek Kaplonski Senior software engineer Red Hat From mnaser at vexxhost.com Mon Jun 3 12:37:54 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 3 Jun 2019 08:37:54 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019 at 8:02 AM Jim Rollenhagen wrote: > > I don't think I have enough coffee in me to fully digest this, but wanted to > point out a couple of things. FWIW, this is something I've thought we should do > for a while now. > > On Sat, Jun 1, 2019 at 8:43 AM Mohammed Naser wrote: >> >> Hi everyone, >> >> This is something that I've discussed with a few people over time and >> I think I'd probably want to bring it up by now. I'd like to propose >> and ask if it makes sense to perhaps replace devstack entirely with >> openstack-ansible. I think I have quite a few compelling reasons to >> do this that I'd like to outline, as well as why I *feel* (and I could >> be biased here, so call me out!) that OSA is the best option in terms >> of a 'replacement' >> >> # Why not another deployment project? >> I actually thought about this part too and considered this mainly for >> ease of use for a *developer*. >> >> At this point, Puppet-OpenStack pretty much only deploys packages >> (which means that it has no build infrastructure, a developer can't >> just get $commit checked out and deployed). >> >> TripleO uses Kolla containers AFAIK and those have to be pre-built >> beforehand, also, I feel they are much harder to use as a developer >> because if you want to make quick edits and restart services, you have >> to enter a container and make the edit there and somehow restart the >> service without the container going back to it's original state. >> Kolla-Ansible and the other combinations also suffer from the same >> "issue". > > > FWIW, kolla-ansible (and maybe tripleo?) has a "development" mode which mounts > the code as a volume, so you can make edits and just run "docker restart > $service". Though systemd does make that a bit nicer due to globs (e.g. > systemctl restart nova-*). > > That said, I do agree moving to something where systemd is running the services > would make for a smoother transition for developers. I didn't know about this (and this wasn't around for the time that I was trying and experimenting with Kolla). This does seem like a possible solution if we're okay with adding the Docker dependency into DevStack and the workflow changing from restarting services to restarting containers. >> >> >> OpenStack Ansible is unique in the way that it pretty much just builds >> a virtualenv and installs packages inside of it. The services are >> deployed as systemd units. This is very much similar to the current >> state of devstack at the moment (minus the virtualenv part, afaik). >> It makes it pretty straight forward to go and edit code if you >> need/have to. We also have support for Debian, CentOS, Ubuntu and >> SUSE. This allows "devstack 2.0" to have far more coverage and make >> it much more easy to deploy on a wider variety of operating systems. >> It also has the ability to use commits checked out from Zuul so all >> the fancy Depends-On stuff we use works. >> >> # Why do we care about this, I like my bash scripts! >> As someone who's been around for a *really* long time in OpenStack, >> I've seen a whole lot of really weird issues surface from the usage of >> DevStack to do CI gating. For example, one of the recent things is >> the fact it relies on installing package-shipped noVNC, where as the >> 'master' noVNC has actually changed behavior a few months back and it >> is completely incompatible at this point (it's just a ticking thing >> until we realize we're entirely broken). >> >> To this day, I still see people who want to POC something up with >> OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter >> how many warnings we'll put up, they'll always try to do it. With >> this way, at least they'll have something that has the shape of an >> actual real deployment. In addition, it would be *good* in the >> overall scheme of things for a deployment system to test against, >> because this would make sure things don't break in both ways. > > > ++ > >> >> >> Also: we run Zuul for our CI which supports Ansible natively, this can >> remove one layer of indirection (Zuul to run Bash) and have Zuul run >> the playbooks directly from the executor. >> >> # So how could we do this? >> The OpenStack Ansible project is made of many roles that are all >> composable, therefore, you can think of it as a combination of both >> Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained >> the base modules (i.e. puppet-nova, etc) and TripleO was the >> integration of all of it in a distribution. OSA is currently both, >> but it also includes both Ansible roles and playbooks. >> >> In order to make sure we maintain as much of backwards compatibility >> as possible, we can simply run a small script which does a mapping of >> devstack => OSA variables to make sure that the service is shipped >> with all the necessary features as per local.conf. > > > ++ > >> >> >> So the new process could be: >> >> 1) parse local.conf and generate Ansible variables files >> 2) install Ansible (if not running in gate) >> 3) run playbooks using variable generated in #1 >> >> The neat thing is after all of this, devstack just becomes a thin >> wrapper around Ansible roles. I also think it brings a lot of hands >> together, involving both the QA team and OSA team together, which I >> believe that pooling our resources will greatly help in being able to >> get more done and avoiding duplicating our efforts. >> >> # Conclusion >> This is a start of a very open ended discussion, I'm sure there is a >> lot of details involved here in the implementation that will surface, >> but I think it could be a good step overall in simplifying our CI and >> adding more coverage for real potential deployers. It will help two >> teams unite together and have more resources for something (that >> essentially is somewhat of duplicated effort at the moment). >> >> I will try to pick up sometime to POC a simple service being deployed >> by an OSA role instead of Bash, placement which seems like a very >> simple one and share that eventually. >> >> Thoughts? :) > > > The reason this hasn't been pushed on in the past is to avoid the perception > that the TC or QA team is choosing a "winner" in the deployment space. I don't > think that's a good reason not to do something like this (especially with the > drop in contributors since I've had that discussion). However, we do need to > message this carefully at a minimum. Right. I think that's because in OpenStack-Ansible world, we have two things - OSA roles: nothing but basic roles to deploy OpenStack services, with external consumers - Integrated: contains all the playbooks In a way, our roles is "Puppet OpenStack" and our integrated repo is "TripleO", back when TripleO deployed via Puppet anyways... I have to be honest, I wish that our roles lived under a different name so we can collaborate all on them (because an Ansible role to deploy something generically is needed regardless). We've actually done a lot of work with the TripleO team and they are consuming one of our roles (os_tempest) to do all their tempest testing, we gate TripleO and they gate us for the role. >> >> >> -- >> Mohammed Naser — vexxhost >> ----------------------------------------------------- >> D. 514-316-8872 >> D. 800-910-1726 ext. 200 >> E. mnaser at vexxhost.com >> W. http://vexxhost.com >> -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From mnaser at vexxhost.com Mon Jun 3 12:39:22 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 3 Jun 2019 08:39:22 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Message-ID: On Mon, Jun 3, 2019 at 8:27 AM Slawomir Kaplonski wrote: > > Hi, > > > On 1 Jun 2019, at 20:49, Mohammed Naser wrote: > > > > On Sat, Jun 1, 2019 at 1:46 PM Slawomir Kaplonski wrote: > >> > >> Hi, > >> > >> I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it in separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something similar possible with OSA or will it be needed to contribute always every change to OSA repository? > > > > Not a dumb question at all. So, we do have this concept of 'roles' > > which you _could_ kinda technically identify similar to plugins. > > However, I think one of the things that would maybe come out of this > > is the inability for projects to maintain their own plugins (because > > now you can host neutron/devstack/plugins and you maintain that repo > > yourself), under this structure, you would indeed have to make those > > changes to the OpenStack Ansible Neutron role > > > > i.e.: https://opendev.org/openstack/openstack-ansible-os_neutron > > > > However, I think from an OSA perspective, we would be more than happy > > to add project maintainers for specific projects to their appropriate > > roles. It would make sense that there is someone from the Neutron > > team that could be a core on os_neutron from example. > > Yes, that may work for official projects like Neutron. But what about everything else, like projects hosted now in opendev.org/x/ repositories? Devstack gives everyone easy way to integrate own plugin/driver/project with it and install it together with everything else by simply adding one line (usually) in local.conf file. > I think that it may be a bit hard to OSA team to accept and review patches with new roles for every project or driver which isn’t official OpenStack project. You raise a really good concern. Indeed, we might have to change the workflow from "write a plugin" to "write an Ansible role" to be able to test your project with DevStack at that page (or maintain both a "legacy" solution) with a new one. > > > >> Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep in mind. > > > > Indeed, with our current CI infrastructure with OSA, we have the > > ability to create these dynamic scenarios (which can actually be > > defined by a simple Zuul variable). > > > > https://github.com/openstack/openstack-ansible/blob/master/zuul.d/playbooks/pre-gate-scenario.yml#L41-L46 > > > > We do some really neat introspection of the project name being tested > > in order to run specific scenarios. Therefore, that is something that > > should be quite easy to accomplish simply by overriding a scenario > > name within Zuul. It also is worth mentioning we now support full > > metal deploys for a while now, so not having to worry about containers > > is something to keep in mind as well (with simplifying the developer > > experience again). > > > >>> On 1 Jun 2019, at 14:35, Mohammed Naser wrote: > >>> > >>> Hi everyone, > >>> > >>> This is something that I've discussed with a few people over time and > >>> I think I'd probably want to bring it up by now. I'd like to propose > >>> and ask if it makes sense to perhaps replace devstack entirely with > >>> openstack-ansible. I think I have quite a few compelling reasons to > >>> do this that I'd like to outline, as well as why I *feel* (and I could > >>> be biased here, so call me out!) that OSA is the best option in terms > >>> of a 'replacement' > >>> > >>> # Why not another deployment project? > >>> I actually thought about this part too and considered this mainly for > >>> ease of use for a *developer*. > >>> > >>> At this point, Puppet-OpenStack pretty much only deploys packages > >>> (which means that it has no build infrastructure, a developer can't > >>> just get $commit checked out and deployed). > >>> > >>> TripleO uses Kolla containers AFAIK and those have to be pre-built > >>> beforehand, also, I feel they are much harder to use as a developer > >>> because if you want to make quick edits and restart services, you have > >>> to enter a container and make the edit there and somehow restart the > >>> service without the container going back to it's original state. > >>> Kolla-Ansible and the other combinations also suffer from the same > >>> "issue". > >>> > >>> OpenStack Ansible is unique in the way that it pretty much just builds > >>> a virtualenv and installs packages inside of it. The services are > >>> deployed as systemd units. This is very much similar to the current > >>> state of devstack at the moment (minus the virtualenv part, afaik). > >>> It makes it pretty straight forward to go and edit code if you > >>> need/have to. We also have support for Debian, CentOS, Ubuntu and > >>> SUSE. This allows "devstack 2.0" to have far more coverage and make > >>> it much more easy to deploy on a wider variety of operating systems. > >>> It also has the ability to use commits checked out from Zuul so all > >>> the fancy Depends-On stuff we use works. > >>> > >>> # Why do we care about this, I like my bash scripts! > >>> As someone who's been around for a *really* long time in OpenStack, > >>> I've seen a whole lot of really weird issues surface from the usage of > >>> DevStack to do CI gating. For example, one of the recent things is > >>> the fact it relies on installing package-shipped noVNC, where as the > >>> 'master' noVNC has actually changed behavior a few months back and it > >>> is completely incompatible at this point (it's just a ticking thing > >>> until we realize we're entirely broken). > >>> > >>> To this day, I still see people who want to POC something up with > >>> OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > >>> how many warnings we'll put up, they'll always try to do it. With > >>> this way, at least they'll have something that has the shape of an > >>> actual real deployment. In addition, it would be *good* in the > >>> overall scheme of things for a deployment system to test against, > >>> because this would make sure things don't break in both ways. > >>> > >>> Also: we run Zuul for our CI which supports Ansible natively, this can > >>> remove one layer of indirection (Zuul to run Bash) and have Zuul run > >>> the playbooks directly from the executor. > >>> > >>> # So how could we do this? > >>> The OpenStack Ansible project is made of many roles that are all > >>> composable, therefore, you can think of it as a combination of both > >>> Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > >>> the base modules (i.e. puppet-nova, etc) and TripleO was the > >>> integration of all of it in a distribution. OSA is currently both, > >>> but it also includes both Ansible roles and playbooks. > >>> > >>> In order to make sure we maintain as much of backwards compatibility > >>> as possible, we can simply run a small script which does a mapping of > >>> devstack => OSA variables to make sure that the service is shipped > >>> with all the necessary features as per local.conf. > >>> > >>> So the new process could be: > >>> > >>> 1) parse local.conf and generate Ansible variables files > >>> 2) install Ansible (if not running in gate) > >>> 3) run playbooks using variable generated in #1 > >>> > >>> The neat thing is after all of this, devstack just becomes a thin > >>> wrapper around Ansible roles. I also think it brings a lot of hands > >>> together, involving both the QA team and OSA team together, which I > >>> believe that pooling our resources will greatly help in being able to > >>> get more done and avoiding duplicating our efforts. > >>> > >>> # Conclusion > >>> This is a start of a very open ended discussion, I'm sure there is a > >>> lot of details involved here in the implementation that will surface, > >>> but I think it could be a good step overall in simplifying our CI and > >>> adding more coverage for real potential deployers. It will help two > >>> teams unite together and have more resources for something (that > >>> essentially is somewhat of duplicated effort at the moment). > >>> > >>> I will try to pick up sometime to POC a simple service being deployed > >>> by an OSA role instead of Bash, placement which seems like a very > >>> simple one and share that eventually. > >>> > >>> Thoughts? :) > >>> > >>> -- > >>> Mohammed Naser — vexxhost > >>> ----------------------------------------------------- > >>> D. 514-316-8872 > >>> D. 800-910-1726 ext. 200 > >>> E. mnaser at vexxhost.com > >>> W. http://vexxhost.com > >>> > >> > >> — > >> Slawek Kaplonski > >> Senior software engineer > >> Red Hat > >> > > > > > > -- > > Mohammed Naser — vexxhost > > ----------------------------------------------------- > > D. 514-316-8872 > > D. 800-910-1726 ext. 200 > > E. mnaser at vexxhost.com > > W. http://vexxhost.com > > — > Slawek Kaplonski > Senior software engineer > Red Hat > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From ed at leafe.com Mon Jun 3 13:34:42 2019 From: ed at leafe.com (Ed Leafe) Date: Mon, 3 Jun 2019 08:34:42 -0500 Subject: [placement] Office Hours In-Reply-To: References: <923884A4-E427-439E-AD76-9DDBB45550D9@leafe.com> <1559227066.23481.3@smtp.office365.com> Message-ID: <2829F729-B385-4B06-9C2E-2E8A0A21F7BF@leafe.com> On Jun 3, 2019, at 6:24 AM, Chris Dent wrote: > > Note, that since we've declared this office hours, it means it's ad > hoc and nobody is required to be there. It's merely a time that > we've designated as a reasonable point for the placement to check in > with each other and for other people to check in with the placement > team. That is: let's make sure this doesn't turn into "we moved the > meeting to wednesday". Agreed, but it should also be emphasized that if you *can* make it, you should. It would be nice to know that if there is something to be discussed, that there is a good chance that the discussion might be fruitful. -- Ed Leafe From kennelson11 at gmail.com Mon Jun 3 13:46:53 2019 From: kennelson11 at gmail.com (Kendall Nelson) Date: Mon, 3 Jun 2019 06:46:53 -0700 Subject: Elections for Airship In-Reply-To: <20190530223625.7ao2hmxlrrj3ny4b@yuggoth.org> References: <7C64A75C21BB8D43BD75BB18635E4D89709A2256@MOSTLS1MSGUSRFF.ITServices.sbc.com> <20190530223625.7ao2hmxlrrj3ny4b@yuggoth.org> Message-ID: Might also be helpful to look at our document that outlines the process we go through[1]. If you have any questions, let us know! -Kendall (diablo_rojo) [1] https://opendev.org/openstack/election/src/branch/master/README.rst On Thu, May 30, 2019 at 3:37 PM Jeremy Stanley wrote: > On 2019-05-30 19:04:56 +0000 (+0000), MCEUEN, MATT wrote: > > OpenStack Infra team, > > The OpenStack Infrastructure team hasn't been officially involved in > running technical elections for OpenStack for several years now > (subject tag removed accordingly). With the advent of Gerrit's REST > API, contributor data can be queried and assembled anonymously by > anyone. While I happen to be involved in these activities for longer > than that's been the case, I'll be answering while wearing my > OpenStack Technical Election Official hat throughout the remainder > of this reply. > > > As the Airship project works to finalize our governance and > > elected positions [1], we need to be ready to hold our first > > elections. I wanted to reach out and ask for any experience, > > guidance, materials, or tooling you can share that would help this > > run correctly and smoothly? This is an area where the Airship team > > doesn't have much experience so we may not know the right > > questions to ask. > > > > Aside from a member of the Airship community creating a poll in > > CIVS [2], is there anything else you would recommend? Is there any > > additional tooling in place in the OpenStack world? Any potential > > pitfalls, or other hard-won advice for us? > [...] > > As Sean mentioned in his reply, the OpenStack community has been > building and improving tooling in the openstack/election Git > repository on OpenDev over the past few years. The important bits > (in my opinion) center around querying Gerrit for a list of > contributors whose changes have merged to sets of official project > repositories within a qualifying date range. I've recently been > assisting StarlingX's election officials with a similar request, and > do have some recommendations. > > Probably the best place to start is adding an official structured > dataset with your team/project information following the same schema > used by OpenStack[0] and now StarlingX[1], then applying a couple of > feature patches[2][3] (if they haven't merged by the time you read > this) to the openstack/election master branch. After that, you ought > to be able to run something along the lines of: > > tox -e venv -- owners --after 2018-05-30 --before 2019-05-31 > --nonmember --outdir airship-electorate > --projects ../../airship/governance/projects.yaml > --ref master > > (Note that the --after and --before dates work like in Gerrit's > query language and carry with them an implied midnight UTC, so one > is the actual start date but the other is the day after the end > date; "on or after" and "before but not on" is how I refer to them > in prose.) > > You'll see the resulting airship-electorate directory includes a lot > of individual files. There are two basic types: .yaml files which > are structured data meant for human auditing as well as scripted > analysis, and .txt files which are a strict list of one Gerrit > preferred E-mail address per line for each voter (the format > expected by the https://civs.cs.cornell.edu/ voting service). It's > probably also obvious that there are sets of these named for each > team in your governance, as well as a set which start with > underscore (_). The former represent contributions to the > deliverable repositories of each team, while the latter are produced > from an aggregate of all deliverable repositories for all teams > (this is what you might use for electing an Airship-wide governing > body). > > There are a couple of extra underscore files... > _duplicate_owners.yaml includes information on deduplicated entries > for contributors where the script was able to detect more than one > Gerrit account for the same individual, while the _invites.csv file > isn't really election-related at all and is what the OSF normally > feeds into the automation which sends event discounts to > contributors. In case you're curious about the _invites.csv file, > the first column is the OSF member ID (if known) or 0 (if no > matching membership was found), the second column is the display > name from Gerrit, the third column is the preferred E-mail address > from Gerrit (this corresponds to the address used for the > _electorate.txt file), and any subsequent columns are the extra > non-preferred addresses configured in Gerrit for that account. > > Please don't hesitate to follow up with any additional questions you > might have! > > [0] > https://opendev.org/openstack/governance/src/branch/master/reference/projects.yaml > [1] > https://opendev.org/starlingx/governance/src/branch/master/reference/tsc/projects.yaml > [2] https://review.opendev.org/661647 > [3] https://review.opendev.org/661648 > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From henry at thebonaths.com Mon Jun 3 13:52:01 2019 From: henry at thebonaths.com (Henry Bonath) Date: Mon, 3 Jun 2019 09:52:01 -0400 Subject: [openstack-ansible] Installing Third-Party drivers into the Cinder-Volume container during playbook execution In-Reply-To: <1112501559559877@sas2-0106f63be698.qloud-c.yandex.net> References: <1112501559559877@sas2-0106f63be698.qloud-c.yandex.net> Message-ID: Dmitriy, Thank you for answering my question. That's good to know that we can deploy additional pip packages within the container, I'll look into what it takes to package the driver in pypi and start moving in this direction. On Mon, Jun 3, 2019 at 7:25 AM Dmitriy Rabotyagov wrote: > > The quick answer - no, currently such option is not present for the cinder role. And by far the only way to install some custom things with cinder now is to provide a list of cinder_user_pip_packages[1], so that's why packaging driver into pypi might be an option for distribution of custom drivers. > > [1] https://opendev.org/openstack/openstack-ansible-os_cinder/src/branch/master/defaults/main.yml#L302 > > 03.06.2019, 06:59, "Henry Bonath" : > > I think the idea here, at least for me, would be to have it rolled > > into the deployment automatically - in a similar fashion to how > > horizon themes are deployed within Openstack-Ansible. > > Obviously having this specific driver in the tree would solve my > > specific issue, but I don't know how many more third party Cinder > > drivers which are not packaged into the tree people are deploying > > these days. > > > > My question for the community is simply finding out if this mechanism > > exists already. > > > > On Thu, May 30, 2019 at 11:11 AM Jean-Philippe Evrard > > wrote: > >> On Tue, May 28, 2019, at 04:10, Henry Bonath wrote: > >> > Hello, I asked this into IRC but I thought this might be a more > >> > appropriate place to ask considering the IRC channel usage over the > >> > weekend. > >> > > >> > If I wanted to deploy a third party driver along with my Cinder-Volume > >> > container, is there a built-in mechanism for doing so? (I am > >> > specifically wanting to use: https://github.com/iXsystems/cinder) > >> > > >> > I am able to configure a cinder-backend in the > >> > "openstack_user_config.yml" file which works perfectly if I let it > >> > fail during the first run, then copy the driver into the containers > >> > and run "os-cinder-install.yml" a second time. > >> > > >> > I've found that you guys have built similar stuff into the system > >> > (e.g. Horizon custom Theme installation via .tgz) and was curious if > >> > there is a similar mechanism for Cinder Drivers that may be > >> > undocumented. > >> > > >> > http://paste.openstack.org/show/752132/ > >> > This is an example of my working config, which relies on the driver > >> > being copied into the > >> > /openstack/venvs/cinder-19.x.x.x/lib/python2.7/site-packages/cinder/volume/drivers/ixsystems/ > >> > folder. > >> > > >> > Thanks in advance! > >> > > >> > > >> > >> I suppose the community would be okay to have this in tree, so no need for a third party system here (and no need to maintain this on your own, separately). However... if it's just about copying the content of this repo, did you think of packaging this, and publish it to pypi ? This way you could just pip install the necessary package into your cinder venv... > >> > >> Regards, > >> Jean-Philippe Evrard (evrardjp) > > -- > Kind Regards, > Dmitriy Rabotyagov > From openstack at fried.cc Mon Jun 3 14:49:48 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 3 Jun 2019 09:49:48 -0500 Subject: [nova] Bug #1755266: How to proceed with test failures In-Reply-To: References: Message-ID: <598f1be2-2eb9-c288-077e-90c17267921d@fried.cc> Hi Clemens. First of all, thank you for digging into the code and working to fix the issue you're seeing. > I am missing context here how the logic of the test > code works; Note that it is normal (in fact almost always required) to change test code with any change to the production side. The purpose of unit tests (you can tell this is a unit test from '/unit/' in the file path) is to exercise a small chunk ("unit" :) of code and make sure branching and method calls are all as expected. You've changed some logic, so the test is (rightly) failing, and you'll need to change what it's expecting accordingly. These tests are making use of mock [0] to hide the guts of some of the methods being called by the unit in question, just to make sure that those methods are being invoked the correct number of times, with the correct arguments. In this case... > Related to my changes, tox gets me an error: > > nova.tests.unit.virt.libvirt.test_driver.LibvirtDriverTestCase.test_finish_migration_power_on > --------------------------------------------------------------------------------------------- > > Captured traceback: > ~~~~~~~~~~~~~~~~~~~ >     Traceback (most recent call last): >       File "nova/tests/unit/virt/libvirt/test_driver.py", line 18707, in > test_finish_migration_power_on >         self._test_finish_migration() >       File > "/opt/stack/nova/.tox/py27/local/lib/python2.7/site-packages/mock/mock.py", > line 1305, in patched >         return func(*args, **keywargs) >       File "nova/tests/unit/virt/libvirt/test_driver.py", line 18662, in > _test_finish_migration >         mock_raw_to_qcow2.assert_has_calls(convert_calls, any_order=True) >       File > "/opt/stack/nova/.tox/py27/local/lib/python2.7/site-packages/mock/mock.py", > line 983, in assert_has_calls >         ), cause) >       File > "/opt/stack/nova/.tox/py27/local/lib/python2.7/site-packages/six.py", > line 737, in raise_from >         raise value >     AssertionError: > (call('/tmp/tmpC60sPk/tmpOHikWL/8ea1de33-64d7-4d1d-af02-88e6f7ec91c1/disk.swap'),) > not all found in call list The stack trace is pointing you at [1] for all three failures. I can see from your code change that you're skipping the qcow conversion for disk.swap here [2]. So when I remove 'disk.swap' from the list on L19438, the three tests pass. That will get you green in zuul, but I should warn you that one of the first things reviewers will notice is that you *only* had to change tests for the piece of your change at [2]. That means there's missing/incomplete test coverage for the other code paths you've touched, and you'll have to add some (or justify why it's unnecessary). > Are there any docs or helps alongside > testing in Nova where to learn what to do? I'm not sure if this is targeted at the right level for you, but here's the nova contributor guide [3]. If you're more of an interactive learner, feel free to jump on IRC in #openstack-nova and I'll be happy to walk you through some basics. > Perhaps also someone could give me a hint whether there are some > conceptually misleading ideas behind my fix proposal which need to be > driven into another direction … Yup, that's what code review is for. Nova has a very high "open changes" to "reviewer bandwidth" ratio, so it's unfortunately pretty normal for changes to go unreviewed while they're still failing zuul testing. Getting those fixed up, and bringing attention to the issue here on the mailing list and/or IRC, should all get your change some better attention. Thanks again for diving in. efried [0] https://docs.python.org/3/library/unittest.mock.html [1] https://opendev.org/openstack/nova/src/branch/master/nova/tests/unit/virt/libvirt/test_driver.py#L19437-L19439 [2] https://review.opendev.org/#/c/618621/4/nova/virt/libvirt/driver.py at 8410 [3] https://docs.openstack.org/nova/latest/contributor/index.html From cboylan at sapwetik.org Mon Jun 3 14:56:58 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 03 Jun 2019 07:56:58 -0700 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Sat, Jun 1, 2019, at 5:36 AM, Mohammed Naser wrote: > Hi everyone, > > This is something that I've discussed with a few people over time and > I think I'd probably want to bring it up by now. I'd like to propose > and ask if it makes sense to perhaps replace devstack entirely with > openstack-ansible. I think I have quite a few compelling reasons to > do this that I'd like to outline, as well as why I *feel* (and I could > be biased here, so call me out!) that OSA is the best option in terms > of a 'replacement' > > # Why not another deployment project? > I actually thought about this part too and considered this mainly for > ease of use for a *developer*. > > At this point, Puppet-OpenStack pretty much only deploys packages > (which means that it has no build infrastructure, a developer can't > just get $commit checked out and deployed). > > TripleO uses Kolla containers AFAIK and those have to be pre-built > beforehand, also, I feel they are much harder to use as a developer > because if you want to make quick edits and restart services, you have > to enter a container and make the edit there and somehow restart the > service without the container going back to it's original state. > Kolla-Ansible and the other combinations also suffer from the same > "issue". > > OpenStack Ansible is unique in the way that it pretty much just builds > a virtualenv and installs packages inside of it. The services are > deployed as systemd units. This is very much similar to the current > state of devstack at the moment (minus the virtualenv part, afaik). > It makes it pretty straight forward to go and edit code if you > need/have to. We also have support for Debian, CentOS, Ubuntu and > SUSE. This allows "devstack 2.0" to have far more coverage and make > it much more easy to deploy on a wider variety of operating systems. > It also has the ability to use commits checked out from Zuul so all > the fancy Depends-On stuff we use works. > > # Why do we care about this, I like my bash scripts! > As someone who's been around for a *really* long time in OpenStack, > I've seen a whole lot of really weird issues surface from the usage of > DevStack to do CI gating. For example, one of the recent things is > the fact it relies on installing package-shipped noVNC, where as the > 'master' noVNC has actually changed behavior a few months back and it > is completely incompatible at this point (it's just a ticking thing > until we realize we're entirely broken). I'm not sure this is a great example case. We consume prebuilt software for many of our dependencies. Everything from the kernel to the database to rabbitmq to ovs (and so on) are consumed as prebuilt packages from our distros. In many cases this is desirable to ensure that our software work with the other software out there in the wild that people will be deploying with. > > To this day, I still see people who want to POC something up with > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > how many warnings we'll put up, they'll always try to do it. With > this way, at least they'll have something that has the shape of an > actual real deployment. In addition, it would be *good* in the > overall scheme of things for a deployment system to test against, > because this would make sure things don't break in both ways. > > Also: we run Zuul for our CI which supports Ansible natively, this can > remove one layer of indirection (Zuul to run Bash) and have Zuul run > the playbooks directly from the executor. I think if you have developers running a small wrapper locally to deploy this new development stack you should run that same wrapper in CI. This ensure the wrapper doesn't break. > > # So how could we do this? > The OpenStack Ansible project is made of many roles that are all > composable, therefore, you can think of it as a combination of both > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > the base modules (i.e. puppet-nova, etc) and TripleO was the > integration of all of it in a distribution. OSA is currently both, > but it also includes both Ansible roles and playbooks. > > In order to make sure we maintain as much of backwards compatibility > as possible, we can simply run a small script which does a mapping of > devstack => OSA variables to make sure that the service is shipped > with all the necessary features as per local.conf. > > So the new process could be: > > 1) parse local.conf and generate Ansible variables files > 2) install Ansible (if not running in gate) > 3) run playbooks using variable generated in #1 > > The neat thing is after all of this, devstack just becomes a thin > wrapper around Ansible roles. I also think it brings a lot of hands > together, involving both the QA team and OSA team together, which I > believe that pooling our resources will greatly help in being able to > get more done and avoiding duplicating our efforts. > > # Conclusion > This is a start of a very open ended discussion, I'm sure there is a > lot of details involved here in the implementation that will surface, > but I think it could be a good step overall in simplifying our CI and > adding more coverage for real potential deployers. It will help two > teams unite together and have more resources for something (that > essentially is somewhat of duplicated effort at the moment). > > I will try to pick up sometime to POC a simple service being deployed > by an OSA role instead of Bash, placement which seems like a very > simple one and share that eventually. > > Thoughts? :) For me there are two major items to consider that haven't been brought up yet. The first is devstack's (lack of) speed. Any replacement should be at least as quick as the current tooling because the current tooling is slow enough already. The other is logging. I spend a lot of time helping people to debug CI job runs and devstack has grown a fairly effective set of logging that just about any time I have to help debug another deployment tool's CI jobs I miss (because they tend to log only a tiny fraction of what devstack logs). Clark From jean-philippe at evrard.me Mon Jun 3 15:08:12 2019 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Mon, 03 Jun 2019 17:08:12 +0200 Subject: =?UTF-8?Q?Re:_[openstack-ansible]_Installing_Third-Party_drivers_into_th?= =?UTF-8?Q?e_Cinder-Volume_container_during_playbook_execution?= In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019, at 05:45, Henry Bonath wrote: > I think the idea here, at least for me, would be to have it rolled > into the deployment automatically - in a similar fashion to how > horizon themes are deployed within Openstack-Ansible. > Obviously having this specific driver in the tree would solve my > specific issue, but I don't know how many more third party Cinder > drivers which are not packaged into the tree people are deploying > these days. > > My question for the community is simply finding out if this mechanism > exists already. As you might have seen, we have documentation in the cinder role that points to different third party cinder drivers [1]. That's why i think it would be fine to have your specific code integrated into the cinder role. There is a precedent there. This way you would have it part of the deployment automatically. On the technical aspect of the matter, I believe it would be better to package that code into a python package though, so you can install and use it directly. It will reduce the maintainance burden in the long run, and would be easier to test in CI: The OpenStack infrastructure have a cache (or mirror?) of PyPI, and we don't have a mirror of this code. Regards, Jean-Philippe Evrard (evrardjp) [1]: https://docs.openstack.org/openstack-ansible-os_cinder/latest/ From mnaser at vexxhost.com Mon Jun 3 15:15:15 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 3 Jun 2019 11:15:15 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019 at 11:05 AM Clark Boylan wrote: > > On Sat, Jun 1, 2019, at 5:36 AM, Mohammed Naser wrote: > > Hi everyone, > > > > This is something that I've discussed with a few people over time and > > I think I'd probably want to bring it up by now. I'd like to propose > > and ask if it makes sense to perhaps replace devstack entirely with > > openstack-ansible. I think I have quite a few compelling reasons to > > do this that I'd like to outline, as well as why I *feel* (and I could > > be biased here, so call me out!) that OSA is the best option in terms > > of a 'replacement' > > > > # Why not another deployment project? > > I actually thought about this part too and considered this mainly for > > ease of use for a *developer*. > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > (which means that it has no build infrastructure, a developer can't > > just get $commit checked out and deployed). > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > beforehand, also, I feel they are much harder to use as a developer > > because if you want to make quick edits and restart services, you have > > to enter a container and make the edit there and somehow restart the > > service without the container going back to it's original state. > > Kolla-Ansible and the other combinations also suffer from the same > > "issue". > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > a virtualenv and installs packages inside of it. The services are > > deployed as systemd units. This is very much similar to the current > > state of devstack at the moment (minus the virtualenv part, afaik). > > It makes it pretty straight forward to go and edit code if you > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > it much more easy to deploy on a wider variety of operating systems. > > It also has the ability to use commits checked out from Zuul so all > > the fancy Depends-On stuff we use works. > > > > # Why do we care about this, I like my bash scripts! > > As someone who's been around for a *really* long time in OpenStack, > > I've seen a whole lot of really weird issues surface from the usage of > > DevStack to do CI gating. For example, one of the recent things is > > the fact it relies on installing package-shipped noVNC, where as the > > 'master' noVNC has actually changed behavior a few months back and it > > is completely incompatible at this point (it's just a ticking thing > > until we realize we're entirely broken). > > I'm not sure this is a great example case. We consume prebuilt software for many of our dependencies. Everything from the kernel to the database to rabbitmq to ovs (and so on) are consumed as prebuilt packages from our distros. In many cases this is desirable to ensure that our software work with the other software out there in the wild that people will be deploying with. Yeah. I guess that's fair, but there's still other things like lack of coverage for many other operating systems as well. > > > > To this day, I still see people who want to POC something up with > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > how many warnings we'll put up, they'll always try to do it. With > > this way, at least they'll have something that has the shape of an > > actual real deployment. In addition, it would be *good* in the > > overall scheme of things for a deployment system to test against, > > because this would make sure things don't break in both ways. > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > the playbooks directly from the executor. > > I think if you have developers running a small wrapper locally to deploy this new development stack you should run that same wrapper in CI. This ensure the wrapper doesn't break. That's fair enough, that's always been the odd thing of driving things directly via Zuul or with a small executor. > > > > # So how could we do this? > > The OpenStack Ansible project is made of many roles that are all > > composable, therefore, you can think of it as a combination of both > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > integration of all of it in a distribution. OSA is currently both, > > but it also includes both Ansible roles and playbooks. > > > > In order to make sure we maintain as much of backwards compatibility > > as possible, we can simply run a small script which does a mapping of > > devstack => OSA variables to make sure that the service is shipped > > with all the necessary features as per local.conf. > > > > So the new process could be: > > > > 1) parse local.conf and generate Ansible variables files > > 2) install Ansible (if not running in gate) > > 3) run playbooks using variable generated in #1 > > > > The neat thing is after all of this, devstack just becomes a thin > > wrapper around Ansible roles. I also think it brings a lot of hands > > together, involving both the QA team and OSA team together, which I > > believe that pooling our resources will greatly help in being able to > > get more done and avoiding duplicating our efforts. > > > > # Conclusion > > This is a start of a very open ended discussion, I'm sure there is a > > lot of details involved here in the implementation that will surface, > > but I think it could be a good step overall in simplifying our CI and > > adding more coverage for real potential deployers. It will help two > > teams unite together and have more resources for something (that > > essentially is somewhat of duplicated effort at the moment). > > > > I will try to pick up sometime to POC a simple service being deployed > > by an OSA role instead of Bash, placement which seems like a very > > simple one and share that eventually. > > > > Thoughts? :) > > For me there are two major items to consider that haven't been brought up yet. The first is devstack's (lack of) speed. Any replacement should be at least as quick as the current tooling because the current tooling is slow enough already. The other is logging. I spend a lot of time helping people to debug CI job runs and devstack has grown a fairly effective set of logging that just about any time I have to help debug another deployment tool's CI jobs I miss (because they tend to log only a tiny fraction of what devstack logs). The idea is *not* to use OpenStack Ansible to deploy DevStack, it's to use the roles to deploy the specific services. Therefore, the log collection stuff should all still be the same, as long as it pulls down the correct systemd unit (which should be matching). The idea that it should be 100% transparent to the user at the end of the day, there should be no functional changes in how DevStack runs or what it logs in the gate. > Clark > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From jim at jimrollenhagen.com Mon Jun 3 15:18:25 2019 From: jim at jimrollenhagen.com (Jim Rollenhagen) Date: Mon, 3 Jun 2019 11:18:25 -0400 Subject: [tc][all] Github mirroring (or lack thereof) for unofficial projects In-Reply-To: References: <20190503190538.GB3377@localhost.localdomain> <20190515175110.26i2xuclkksgx744@arabian.linksys.moosehall> <8d81b9a7-b460-43e1-a774-9bd65ee42143@www.fastmail.com> <20190530180658.xgpcy35au72ccmzt@yuggoth.org> Message-ID: On Fri, May 31, 2019 at 7:51 PM Clark Boylan wrote: > On Fri, May 31, 2019, at 11:09 AM, Jim Rollenhagen wrote: > > On Thu, May 30, 2019 at 3:15 PM Jim Rollenhagen > wrote: > > > On Thu, May 30, 2019 at 2:18 PM Jeremy Stanley > wrote: > > >> On 2019-05-30 09:00:20 -0700 (-0700), Clark Boylan wrote: > > >> [...] > > >> > If you provide us with the canonical list of things to archive I > > >> > think we can probably script that up or do lots of clicking > > >> > depending on the size of the list I guess. > > >> [...] > > >> > > >> Alternatively, I's like to believe we're at the point where we can > > >> add other interested parties to the curating group for the openstack > > >> org on GH, at which point any of them could volunteer to do the > > >> archiving. > > > > > > Thanks Clark/Jeremy. I'll make a list tomorrow, as we'll > > > need that in either case. :) > > > > I think what we want is to archive all Github repos in the > > openstack, openstack-dev, and openstack-infra orgs, > > which don't have something with the same name on > > Gitea in the openstack namespace. Is that right? > > Close, I think we can archive all repos in openstack-dev and > openstack-infra. Part of the repo renames we did today were to get the > repos that were left behind in those two orgs into their longer term homes. > Then any project in https://github.com/openstack that is not in > https://opendev.org/openstack can be archived in Github too. > Cool, that made me realize I wasn't outputting the org, and now I don't need to. :) New list (gathered only from the openstack org): http://paste.openstack.org/show/752443/ And new code: http://paste.openstack.org/show/752444/ And yes, I realize I pasted my token there, it's no longer valid :) // jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Mon Jun 3 15:18:36 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 03 Jun 2019 08:18:36 -0700 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: <45ca1872-187b-46cf-b23a-a9a7be7acb85@www.fastmail.com> On Mon, Jun 3, 2019, at 8:15 AM, Mohammed Naser wrote: > On Mon, Jun 3, 2019 at 11:05 AM Clark Boylan wrote: > > On Sat, Jun 1, 2019, at 5:36 AM, Mohammed Naser wrote: snip > > > I will try to pick up sometime to POC a simple service being deployed > > > by an OSA role instead of Bash, placement which seems like a very > > > simple one and share that eventually. > > > > > > Thoughts? :) > > > > For me there are two major items to consider that haven't been brought up yet. The first is devstack's (lack of) speed. Any replacement should be at least as quick as the current tooling because the current tooling is slow enough already. The other is logging. I spend a lot of time helping people to debug CI job runs and devstack has grown a fairly effective set of logging that just about any time I have to help debug another deployment tool's CI jobs I miss (because they tend to log only a tiny fraction of what devstack logs). > > The idea is *not* to use OpenStack Ansible to deploy DevStack, it's to > use the roles > to deploy the specific services. Therefore, the log collection stuff > should all still > be the same, as long as it pulls down the correct systemd unit (which should > be matching). I know. I'm saying the logging that these other systems produce is typically lacking compared to devstack. So any change needs to address that. > > The idea that it should be 100% transparent to the user at the end of > the day, there > should be no functional changes in how DevStack runs or what it logs > in the gate. If this is the plan then the logging concerns should be addressed as part of the "don't make it noticeable change" work. Clark From clemens.hardewig at crandale.de Mon Jun 3 15:25:02 2019 From: clemens.hardewig at crandale.de (Clemens Hardewig) Date: Mon, 3 Jun 2019 17:25:02 +0200 Subject: [nova] Bug #1755266: How to proceed with test failures In-Reply-To: <598f1be2-2eb9-c288-077e-90c17267921d@fried.cc> References: <598f1be2-2eb9-c288-077e-90c17267921d@fried.cc> Message-ID: <2FE9B647-2484-4ADF-A0AD-12C0DE197394@crandale.de> Hi Eric, thank you that you have taken the time leading me through the process and answering extensively. Very much appreciated and insightful. Having digged now into the code in /nova/nova/tests/unit/virt/libvirt/test_driver.py, it is obvious that my proposal is not universal but fixes only my specific config (and make then other configs fail). However, it seems to me that a config that running cinder on each compute node with lvm backend creating root volume as lvm volume creating swap not as ephermal or swap (raw) disk but as lvm volume (as lvm/qemu does automatically) is not a supported model in nova yet to deal with instance resizing/migrations. Thanks again for your guidance, will go through it ... Br Clemens -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3898 bytes Desc: not available URL: From openstack at fried.cc Mon Jun 3 15:51:11 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 3 Jun 2019 10:51:11 -0500 Subject: [nova] Bug #1755266: How to proceed with test failures In-Reply-To: <2FE9B647-2484-4ADF-A0AD-12C0DE197394@crandale.de> References: <598f1be2-2eb9-c288-077e-90c17267921d@fried.cc> <2FE9B647-2484-4ADF-A0AD-12C0DE197394@crandale.de> Message-ID: Clemens- > However, it seems to me that a config that > > * running cinder on each compute node with lvm backend > * creating root volume as lvm volume > * creating swap not as ephermal or swap (raw) disk but as lvm volume > (as lvm/qemu does automatically) > > > is not a supported model in nova yet to deal with instance > resizing/migrations. I've asked a subject matter expert (which I certainly am not) to have a look at your change. Hopefully he can answer the above, which is pretty much Greek to me :) Thanks, efried . From liujinxin at xiangcloud.com.cn Mon Jun 3 11:08:25 2019 From: liujinxin at xiangcloud.com.cn (liujinxin at xiangcloud.com.cn) Date: Mon, 3 Jun 2019 19:08:25 +0800 Subject: ovn L3 TCP protocol has a large number of retransmissions Message-ID: <2019060319082531569215@xiangcloud.com.cn> Hi: I have the following two questions. What shall I do? problem1:When the cloud host accesses the external network through L3 router. TCP protocol has a large number of retransmissions, leading to TCP link failure, TCP data transmission error problem2:TCP links data packets, duplicates ACK and TCP data transmission disorderly when the instances communicate across hosts through geneve, but the quality impact of TCP is relatively acceptable. openstack queens with ovn environment OS: CentOS Linux release 7.3.1611 (Core) kernel: 3.10.0-514.el7.x86_64 openstack: kolla-ansible queens networking-ovn:python-networking-ovn-4.0.3 ovs and ovn: openvswitch-ovn-central-2.10.90 openvswitch-2.10.90 openvswitch-ovn-host-2.10.90 openvswitch-ovn-common-2.10.90 topology: openstack controller 10.200.105.19 openstack compute 10.200.105.16,10.200.105.17,10.200.105.18 openstack gateway 10.200.105.20 openstack controller gateway compute 10.200.105.19 10.200.105.20 10.200.105.[16-18] neutron_server ovn-northd ---------bond0------------|------------------------------------------------------------------| | | | ovn-controller ovn-controller ovn-controller | | | ovs ovs ovs | | | | | | | |----------------------------------|--|------bond0-------------------------------------------------|--| |-------------------------------------|--------bond1--------------------------------------------------| Packet forwarding: | compute1 | compute2 | gateway | | 10.200.105.16 | 10.200.105.17 | 10.200.105.20 | | vm1 | vm2 | | | | | | | | | br-int <-> br-ex | br-int <-> br-ex | br-int <-> br-ex | | |_____bond1_vlan___|___________|____________|________| |__________bond0_____________|_______________________| 1、L3 data flow 10.200.100.16 | 10.200.105.20 vm1<--->br-int<-->geneve <->bond0 <―-> bond0<-->geneve<--->br-ex<-->bond1<-->vlan<---->internet 2、vm1<->vm2 10.200.100.16 | 10.200.105.17 vm1<--->br-int<-->geneve <->bond0 <―-> bond0<-->geneve<--->br-int<--->vm2 Configure: Openstack Configure 1、neutron.conf ... service_plugins = networking_ovn.l3.l3_ovn.OVNL3RouterPlugin,qos ... 2、cat /etc/kolla/neutron-server/ml2_conf.ini [ml2] type_drivers = flat,vlan,local,geneve tenant_network_types = geneve mechanism_drivers = ovn extension_drivers = port_security,qos overlay_ip_version = 4 [ml2_type_vlan] network_vlan_ranges = physnet1 [securitygroup] enable_security_group = true [ml2_type_geneve] vni_ranges = 1:65536 max_header_size = 38 [ovn] ovn_nb_connection = tcp:10.200.105.19:6641 ovn_sb_connection = tcp:10.200.105.19:6642 ovn_l3_mode = True ovn_l3_scheduler = leastloaded ovn_native_dhcp = True neutron_sync_mode = repair enable_distributed_floating_ip = True ovsdb_log_level = DEBUG [qos] notification_drivers = ovn-qos Ovn Configure 10.200.105.19 ovs-vsctl get open . external_ids {hostname="10-200-105-19", ovn-bridge-mappings="physnet1:br-ex", ovn-encap-ip="10.200.105.19", ovn-encap-type="geneve,vxlan", ovn-remote="tcp:10.200.105.19:6642", rundir="/var/run/openvswitch", system-id="160e569c-a12f-41a3-8d2a-37bd9af0c7ed"} 10.200.105.20 ovs-vsctl get open . external_ids {hostname="10-200-105-20", ovn-bridge-mappings="physnet1:br-ex", ovn-cms-options=enable-chassis-as-gw, ovn-encap-ip="10.200.105.20", ovn-encap-type="geneve,vxlan", ovn-remote="tcp:10.200.105.19:6642", rundir="/var/run/openvswitch", system-id="96e89c3c-5c85-498d-b42f-5aea559bdd42"} 10.200.105.[16-18] ovs-vsctl get open . external_ids {hostname="10-200-105-17", ovn-bridge-mappings="physnet1:br-ex", ovn-encap-ip="10.200.105.17", ovn-encap-type="geneve,vxlan", ovn-remote="tcp:10.200.105.19:6642", rundir="/var/run/openvswitch", system-id="a768ca6e-905d-4aac-aa1e-d18b38dedadf"} ovn-nbctl show 2019-06-03T10:51:46Z|00001|ovsdb_idl|WARN|NB_Global table in OVN_Northbound database lacks ipsec column (database needs upgrade?) 2019-06-03T10:51:46Z|00002|ovsdb_idl|WARN|NB_Global table in OVN_Northbound database lacks options column (database needs upgrade?) switch eddff890-b515-41d3-ad49-edcae9a3197b (neutron-7489be65-074f-49f0-9cf3-c520dcd3b08d) (aka v) port 066c4c72-a1f7-4311-8d40-ed7ca0f942b3 addresses: ["fa:16:3e:a8:9d:05 192.168.2.212"] port edc6e2a9-47db-4a8a-8857-d8afa63d900d type: router router-port: lrp-edc6e2a9-47db-4a8a-8857-d8afa63d900d port provnet-7489be65-074f-49f0-9cf3-c520dcd3b08d type: localnet addresses: ["unknown"] switch 23d3676d-9d95-403e-947c-bcd4b298bde0 (neutron-7dd91bd0-10dd-4022-868c-6d17be7380f7) (aka bb) port a764f462-7897-475f-9ef0-04b7c83e44db addresses: ["fa:16:3e:cd:23:b2 10.0.0.11"] port 71247f19-21bd-4eac-b3db-94e770abb50c type: router router-port: lrp-71247f19-21bd-4eac-b3db-94e770abb50c port 659f304c-266f-4b3f-946a-b3cf4ea988c5 addresses: ["fa:16:3e:f8:5f:1b 10.0.0.9"] router 3c5d2c44-e3c4-46e9-9f43-64c1cbc7e065 (neutron-f8611590-42a1-4c6a-b433-db9ade3194a2) (aka v) port lrp-edc6e2a9-47db-4a8a-8857-d8afa63d900d mac: "fa:16:3e:06:f4:ca" networks: ["192.168.2.205/16"] gateway chassis: [311c4582-71d1-4886-baf0-1aefa5f2ceab d61a09c2-87e2-4dff-91be-82e705ab85f4] port lrp-71247f19-21bd-4eac-b3db-94e770abb50c mac: "fa:16:3e:ef:06:c6" networks: ["10.0.0.1/24"] nat 4bc0e7cf-3bdb-4725-94e4-a29b62f7d8e0 external ip: "192.168.2.205" logical ip: "10.0.0.0/24" type: "snat" liujinxin at xiangcloud.com.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From jabach at blizzard.com Mon Jun 3 12:03:39 2019 From: jabach at blizzard.com (James Bach) Date: Mon, 3 Jun 2019 12:03:39 +0000 Subject: [magnum] Meeting at 2019-06-04 2100 UTC In-Reply-To: References: Message-ID: I’m OOO until next week but I’d be glad to anytime after that Jim ________________________________ From: Spyros Trigazis Sent: Monday, June 3, 2019 8:02:32 AM To: openstack-discuss at lists.openstack.org Cc: Fei Long Wang; James Bach; Erik Olof Gunnar Andersson; Bharat Kunwar Subject: [magnum] Meeting at 2019-06-04 2100 UTC Hello all, I would like to discuss moving the drivers out-of-tree, as we briefly discussed it in the PTG. Can you all make it for the next meeting [1]? This is not super urgent, but it will accelerate development and bug fixes at the driver level. Cheers, Spyros [0] https://etherpad.openstack.org/p/magnum-train-ptg [1] https://www.timeanddate.com/worldclock/fixedtime.html?msg=magnum-meeting&iso=20190604T21 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jayachander.it at gmail.com Mon Jun 3 15:42:13 2019 From: jayachander.it at gmail.com (Jay See) Date: Mon, 3 Jun 2019 17:42:13 +0200 Subject: [Floating IP][Networking issue] Not able to connect to VM using Floating IP Message-ID: Hi, I have followed OpenStack installation guide for Queens [0][1]. In my setup: I have 3 servers. 1 controller , 2 compute nodes - with Ubuntu 16.04, behind my firewall (OpenBSD) *Issue 1:* All my severs have several NIC, but I wanted to use at least two NIC, but I am able to connect to my servers only with one of the NIC. I could not figure, what is wrong with my settings. root at h018:~# cat /etc/network/interfaces # This file describes the network interfaces available on your system # and how to activate them. For more information, see interfaces(5). source /etc/network/interfaces.d/* # The loopback network interface auto lo iface lo inet loopback iface eth5 inet static iface eth4 inet static auto eth3 iface eth3 inet static address 10.4.15.118 netmask 255.255.255.0 network 10.4.15.0 broadcast 10.4.15.255 gateway 10.4.15.1 auto eth2 iface eth2 inet static address 10.3.15.118 netmask 255.255.255.0 network 10.3.15.0 broadcast 10.3.15.255 gateway 10.3.15.1 auto eth1 iface eth1 inet static address 10.2.14.118 netmask 255.255.255.0 network 10.2.14.0 broadcast 10.2.14.255 gateway 10.2.14.1 # The primary network interface auto eth0 iface eth0 inet static address 10.1.14.118 netmask 255.255.255.0 network 10.1.14.0 broadcast 10.1.14.255 gateway 10.1.14.1 # dns-* options are implemented by the resolvconf package, if installed dns-nameservers 10.1.14.1 8.8.8.8 8.8.4.4 *Issue 2:* I have completed my OpenStack installation by following [1], after creating the VM and associating the floating IP, everything is fine. But I am not able to ping or SSH to the VM. I have add the ICMP and SSH to my security group rules. I configured my l2 bridge to use Eth1, which is not reachable from firewall or this might be all together a different problem, as my VM creation is successful without any errors. root at h018:~# openstack network create --share --external --provider-physical-network provider --provider-network-type flat provider-network +---------------------------+--------------------------------------+ | Field | Value | +---------------------------+--------------------------------------+ | admin_state_up | UP | | availability_zone_hints | | | availability_zones | | | created_at | 2019-06-03T09:45:20Z | | description | | | dns_domain | None | | id | 5e8f5ec9-9a65-4259-a246-1c7f95a2f33a | | ipv4_address_scope | None | | ipv6_address_scope | None | | is_default | False | | is_vlan_transparent | None | | mtu | 1500 | | name | provider-network | | port_security_enabled | True | | project_id | bb0f22d6efd64b31be6c37edc796d53e | | provider:network_type | flat | | provider:physical_network | provider | | provider:segmentation_id | None | | qos_policy_id | None | | revision_number | 5 | | router:external | External | | segments | None | | shared | True | | status | ACTIVE | | subnets | | | tags | | | updated_at | 2019-06-03T09:45:20Z | +---------------------------+--------------------------------------+ root at h018:~# root at h018:~# openstack subnet create --network provider-network \ > --allocation-pool start=XX.XX.169.101,end=XX.XX.169.250 \ > --dns-nameserver 8.8.4.4 --gateway XX.XX.169.1 \ > --subnet-range XX.XX.169.0/24 provider +-------------------+--------------------------------------+ | Field | Value | +-------------------+--------------------------------------+ | allocation_pools | XX.XX.169.101-XX.XX.169.250 | | cidr | XX.XX.169.0/24 | | created_at | 2019-06-03T09:49:45Z | | description | | | dns_nameservers | 8.8.4.4 | | enable_dhcp | True | | gateway_ip | XX.XX.169.1 | | host_routes | | | id | 51fb740f-1f06-4f6c-93c5-3690488e3980 | | ip_version | 4 | | ipv6_address_mode | None | | ipv6_ra_mode | None | | name | provider | | network_id | 5e8f5ec9-9a65-4259-a246-1c7f95a2f33a | | project_id | bb0f22d6efd64b31be6c37edc796d53e | | revision_number | 0 | | segment_id | None | | service_types | | | subnetpool_id | None | | tags | | | updated_at | 2019-06-03T09:49:45Z | +-------------------+--------------------------------------+ root at h018:~# neutron net-external-list neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead. +--------------------------------------+------------------+----------------------------------+------------------------------------------------------+ | id | name | tenant_id | subnets | +--------------------------------------+------------------+----------------------------------+------------------------------------------------------+ | 5e8f5ec9-9a65-4259-a246-1c7f95a2f33a | provider-network | bb0f22d6efd64b31be6c37edc796d53e | 51fb740f-1f06-4f6c-93c5-3690488e3980 XX.XX.169.0/24 | +--------------------------------------+------------------+----------------------------------+------------------------------------------------------+ root at h018:~# openstack network list +--------------------------------------+------------------+--------------------------------------+ | ID | Name | Subnets | +--------------------------------------+------------------+--------------------------------------+ | 3ee95928-012f-4a55-a0b3-e277c2d45080 | demo-network | 3427b6ac-3bc0-4529-9035-33e1ab05cb64 | | 5e8f5ec9-9a65-4259-a246-1c7f95a2f33a | provider-network | 51fb740f-1f06-4f6c-93c5-3690488e3980 | +--------------------------------------+------------------+--------------------------------------+ root at h018:~# nova list +--------------------------------------+--------+--------+------------+-------------+----------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------+--------+------------+-------------+----------------------------------------+ | 3f8ab4c2-9047-47c4-8634-0c93cf7d7460 | test15 | ACTIVE | - | Running | demo-network=10.1.0.12, XX.XX.169.108 | +--------------------------------------+--------+--------+------------+-------------+----------------------------------------+ root at h018:~# openstack port list +--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+ | ID | Name | MAC Address | Fixed IP Addresses | Status | +--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+ | 037d801d-5cae-4d88-ae2d-a4289a542057 | | fa:16:3e:a6:68:7b | ip_address='10.1.0.2', subnet_id='3427b6ac-3bc0-4529-9035-33e1ab05cb64' | ACTIVE | | 327fe5fe-4288-4d80-850c-fa7d7e29d3aa | | fa:16:3e:2f:0f:dd | ip_address='XX.XX.169.101', subnet_id='51fb740f-1f06-4f6c-93c5-3690488e3980' | ACTIVE | | 4208ac23-42bf-44ed-8b0d-af1e615b2542 | | fa:16:3e:c5:cb:94 | ip_address='XX.XX.169.108', subnet_id='51fb740f-1f06-4f6c-93c5-3690488e3980' | N/A | (VM) | 642729e6-f84c-4742-89b2-e5924d8e188e | | fa:16:3e:37:97:eb | ip_address='XX.XX.169.107', subnet_id='51fb740f-1f06-4f6c-93c5-3690488e3980' | ACTIVE | | bf5c3061-0c40-41da-bebf-95650e055ce2 | | fa:16:3e:03:bd:f8 | ip_address='10.1.0.1', subnet_id='3427b6ac-3bc0-4529-9035-33e1ab05cb64' | ACTIVE | | fdf976c0-99c6-49e4-b3db-9f26a09da7a9 | | fa:16:3e:c0:be:e9 | ip_address='10.1.0.12', subnet_id='3427b6ac-3bc0-4529-9035-33e1ab05cb64' | ACTIVE | +--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+ root at h018:~# ping -c4 XX.XX.169.101 PING XX.XX.169.101 (XX.XX.169.101) 56(84) bytes of data. --- XX.XX.169.101 ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 3024ms root at h018:~# ping -c4 XX.XX.169.107 PING XX.XX.169.107 (XX.XX.169.107) 56(84) bytes of data. --- XX.XX.169.107 ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 3023ms root at h018:~# ping -c4 XX.XX.169.108 PING XX.XX.169.108 (XX.XX.169.108) 56(84) bytes of data. --- XX.XX.169.108 ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 3001ms root at h018:~# openstack server list +--------------------------------------+--------+--------+----------------------------------------+-------------+----------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+--------+--------+----------------------------------------+-------------+----------+ | 3f8ab4c2-9047-47c4-8634-0c93cf7d7460 | test15 | ACTIVE | demo-network=10.1.0.12, XX.XX.169.108 | Ubuntu16.04 | m1.small | +--------------------------------------+--------+--------+----------------------------------------+-------------+----------+ root at h018:~# ip route default via 10.1.14.1 dev eth0 10.1.14.0/24 dev eth0 proto kernel scope link src 10.1.14.118 10.2.14.0/24 dev brq5e8f5ec9-9a proto kernel scope link src 10.2.14.118 10.3.15.0/24 dev eth2 proto kernel scope link src 10.3.15.118 10.4.15.0/24 dev eth3 proto kernel scope link src 10.4.15.118 root at h018:~# ifconfig brq3ee95928-01 Link encap:Ethernet HWaddr 72:77:4f:54:6a:93 inet6 addr: fe80::4459:b6ff:feb0:3352/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:34 errors:0 dropped:0 overruns:0 frame:0 TX packets:10 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3144 (3.1 KB) TX bytes:828 (828.0 B) brq5e8f5ec9-9a Link encap:Ethernet HWaddr 24:6e:96:84:25:1a inet addr:10.2.14.118 Bcast:10.2.14.255 Mask:255.255.255.0 inet6 addr: fe80::286d:e0ff:fefa:15a4/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:118004 errors:0 dropped:0 overruns:0 frame:0 TX packets:10175 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:5834402 (5.8 MB) TX bytes:1430189 (1.4 MB) eth0 Link encap:Ethernet HWaddr 24:6e:96:84:25:18 inet addr:10.1.14.118 Bcast:10.1.14.255 Mask:255.255.255.0 inet6 addr: fe80::266e:96ff:fe84:2518/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1977142 errors:0 dropped:0 overruns:0 frame:0 TX packets:2514801 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1013827869 (1.0 GB) TX bytes:1529933345 (1.5 GB) eth1 Link encap:Ethernet HWaddr 24:6e:96:84:25:1a inet6 addr: fe80::266e:96ff:fe84:251a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2622581 errors:0 dropped:14027 overruns:0 frame:0 TX packets:327841 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:166482697 (166.4 MB) TX bytes:28701550 (28.7 MB) eth2 Link encap:Ethernet HWaddr b4:96:91:0f:cd:28 inet addr:10.3.15.118 Bcast:10.3.15.255 Mask:255.255.255.0 inet6 addr: fe80::b696:91ff:fe0f:cd28/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:272 errors:0 dropped:0 overruns:0 frame:0 TX packets:45 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:16452 (16.4 KB) TX bytes:2370 (2.3 KB) eth3 Link encap:Ethernet HWaddr b4:96:91:0f:cd:2a inet addr:10.4.15.118 Bcast:10.4.15.255 Mask:255.255.255.0 inet6 addr: fe80::b696:91ff:fe0f:cd2a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7546483 errors:0 dropped:0 overruns:0 frame:0 TX packets:43 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:452789254 (452.7 MB) TX bytes:2118 (2.1 KB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:42373349 errors:0 dropped:0 overruns:0 frame:0 TX packets:42373349 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:12244256693 (12.2 GB) TX bytes:12244256693 (12.2 GB) tap037d801d-5c Link encap:Ethernet HWaddr ba:7a:4c:72:fb:05 UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:9 errors:0 dropped:0 overruns:0 frame:0 TX packets:40 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1950 (1.9 KB) TX bytes:4088 (4.0 KB) tap327fe5fe-42 Link encap:Ethernet HWaddr 6e:a2:fd:08:dc:bb UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7 errors:0 dropped:0 overruns:0 frame:0 TX packets:107768 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:618 (618.0 B) TX bytes:6253098 (6.2 MB) tap642729e6-f8 Link encap:Ethernet HWaddr 5a:11:77:05:54:e0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11858 errors:0 dropped:0 overruns:0 frame:0 TX packets:94601 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:498656 (498.6 KB) TX bytes:5676060 (5.6 MB) tapbf5c3061-0c Link encap:Ethernet HWaddr 72:77:4f:54:6a:93 UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:9122 errors:0 dropped:0 overruns:0 frame:0 TX packets:9186 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:928979 (928.9 KB) TX bytes:711090 (711.0 KB) vxlan-8 Link encap:Ethernet HWaddr a6:77:6e:2b:f7:1f UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:9186 errors:0 dropped:0 overruns:0 frame:0 TX packets:9113 errors:0 dropped:19 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:582486 (582.4 KB) TX bytes:801919 (801.9 KB) root at h018:~# If any other information is required , please let me know. I will share the info. I have seen many posts with similar issues, steps which worked for them are not working in my setup. May be I have done something wrong, not able to figure out that on my own. Thanks and regards, Jayachander. [0] https://docs.openstack.org/install-guide/. [1] https://docs.openstack.org/install-guide/openstack-services.html#minimal-deployment-for-queens -- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From kecarter at redhat.com Mon Jun 3 16:27:42 2019 From: kecarter at redhat.com (Kevin Carter) Date: Mon, 3 Jun 2019 11:27:42 -0500 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible Message-ID: Hello Stackers, I wanted to follow up on this post from last year, pick up from where it left off, and bring together a squad to get things moving. > http://lists.openstack.org/pipermail/openstack-dev/2018-August/133801.html The effort to convert tripleo Puppet and heat templates with embedded Ansible to a more consumable set of playbooks and roles is in full effect. As we're working through this effort we believe co-locating all of the Ansible tasks/roles/libraries/plugins throughout the code base into a single purpose-built repository will assist us in streamlining and simplifying. Structurally, at this time, most of tripleo will remain the same. However, the inclusion of tripleo-Ansible will allow us to create more focused solutions which are independently testable, much easier understand, and simple to include into the current heat template deployment methodologies. While a straight port of the existing Ansible tasks will not be entirely possible, the goal of this ongoing effort will be zero impact on our existing workflow and solutions. To reigniting this effort, I've put up a review to create a new "transformation" squad[0] geared toward building the structure around tripleo-ansible[1] and converting our current solutions into roles/playbooks/libraries/plugins. Initially, we'll be focused on our existing code base; however, long term, I believe it makes sense for this squad to work across projects to breakdown deployment barriers for folks using similar technologies. We're excited to get this effort rolling again and would love to work with anyone and everyone throughout the community. If folks are interested in this effort, please let us know. [0] - https://review.opendev.org/662763 [1] - https://opendev.org/openstack/tripleo-ansible -- Kevin Carter IRC: cloudnull -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Mon Jun 3 17:07:23 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 3 Jun 2019 13:07:23 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019 at 12:55 PM Kevin Carter wrote: > > Hello Stackers, > > I wanted to follow up on this post from last year, pick up from where it left off, and bring together a squad to get things moving. > > > http://lists.openstack.org/pipermail/openstack-dev/2018-August/133801.html > > The effort to convert tripleo Puppet and heat templates with embedded Ansible to a more consumable set of playbooks and roles is in full effect. As we're working through this effort we believe co-locating all of the Ansible tasks/roles/libraries/plugins throughout the code base into a single purpose-built repository will assist us in streamlining and simplifying. Structurally, at this time, most of tripleo will remain the same. However, the inclusion of tripleo-Ansible will allow us to create more focused solutions which are independently testable, much easier understand, and simple to include into the current heat template deployment methodologies. While a straight port of the existing Ansible tasks will not be entirely possible, the goal of this ongoing effort will be zero impact on our existing workflow and solutions. > > To reigniting this effort, I've put up a review to create a new "transformation" squad[0] geared toward building the structure around tripleo-ansible[1] and converting our current solutions into roles/playbooks/libraries/plugins. Initially, we'll be focused on our existing code base; however, long term, I believe it makes sense for this squad to work across projects to breakdown deployment barriers for folks using similar technologies. +1 > We're excited to get this effort rolling again and would love to work with anyone and everyone throughout the community. If folks are interested in this effort, please let us know. Most definitely in. We've had great success working with the TripleO team on integrating the Tempest role and on the OSA side, we'd be more than happy to help try and converge our roles to maintain them together. If there's any meetings or anything that will be scheduled, I'd be happy to attend. > [0] - https://review.opendev.org/662763 > [1] - https://opendev.org/openstack/tripleo-ansible > -- > > Kevin Carter > IRC: cloudnull -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From kecarter at redhat.com Mon Jun 3 17:42:19 2019 From: kecarter at redhat.com (Kevin Carter) Date: Mon, 3 Jun 2019 12:42:19 -0500 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019 at 12:08 PM Mohammed Naser wrote: > On Mon, Jun 3, 2019 at 12:55 PM Kevin Carter wrote: > > > > Hello Stackers, > > > > I wanted to follow up on this post from last year, pick up from where it > left off, and bring together a squad to get things moving. > > > > > > http://lists.openstack.org/pipermail/openstack-dev/2018-August/133801.html > > > > The effort to convert tripleo Puppet and heat templates with embedded > Ansible to a more consumable set of playbooks and roles is in full effect. > As we're working through this effort we believe co-locating all of the > Ansible tasks/roles/libraries/plugins throughout the code base into a > single purpose-built repository will assist us in streamlining and > simplifying. Structurally, at this time, most of tripleo will remain the > same. However, the inclusion of tripleo-Ansible will allow us to create > more focused solutions which are independently testable, much easier > understand, and simple to include into the current heat template deployment > methodologies. While a straight port of the existing Ansible tasks will not > be entirely possible, the goal of this ongoing effort will be zero impact > on our existing workflow and solutions. > > > > To reigniting this effort, I've put up a review to create a new > "transformation" squad[0] geared toward building the structure around > tripleo-ansible[1] and converting our current solutions into > roles/playbooks/libraries/plugins. Initially, we'll be focused on our > existing code base; however, long term, I believe it makes sense for this > squad to work across projects to breakdown deployment barriers for folks > using similar technologies. > > +1 > > > We're excited to get this effort rolling again and would love to work > with anyone and everyone throughout the community. If folks are interested > in this effort, please let us know. > > Most definitely in. We've had great success working with the TripleO team > on > integrating the Tempest role and on the OSA side, we'd be more than happy > to help try and converge our roles to maintain them together. > ++ > If there's any meetings or anything that will be scheduled, I'd be > happy to attend. > > its still very early but I expect to begin regular meetings (even if they're just impromptu IRC conversations to begin with) to work out what needs to be done and where we can begin collaborating with other folks. As soon as we have more I'll be sure to reach out here and on IRC. > > [0] - https://review.opendev.org/662763 > > [1] - https://opendev.org/openstack/tripleo-ansible > > -- > > > > Kevin Carter > > IRC: cloudnull > > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ssbarnea at redhat.com Mon Jun 3 18:20:24 2019 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Mon, 3 Jun 2019 19:20:24 +0100 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: <620D70C5-5EFD-42FF-A647-03164FA41A28@redhat.com> I am really happy to hear about that as this could be much more effective than having an uncontrollable number of roles scattered across lots of repositories which usually do not play very nice with each other. I hope that testing these roles using molecule (official ansible testing platform) is also part of this plan. Cheers Sorin > On 3 Jun 2019, at 18:42, Kevin Carter wrote: > > On Mon, Jun 3, 2019 at 12:08 PM Mohammed Naser > wrote: > On Mon, Jun 3, 2019 at 12:55 PM Kevin Carter > wrote: > > > > Hello Stackers, > > > > I wanted to follow up on this post from last year, pick up from where it left off, and bring together a squad to get things moving. > > > > > http://lists.openstack.org/pipermail/openstack-dev/2018-August/133801.html > > > > The effort to convert tripleo Puppet and heat templates with embedded Ansible to a more consumable set of playbooks and roles is in full effect. As we're working through this effort we believe co-locating all of the Ansible tasks/roles/libraries/plugins throughout the code base into a single purpose-built repository will assist us in streamlining and simplifying. Structurally, at this time, most of tripleo will remain the same. However, the inclusion of tripleo-Ansible will allow us to create more focused solutions which are independently testable, much easier understand, and simple to include into the current heat template deployment methodologies. While a straight port of the existing Ansible tasks will not be entirely possible, the goal of this ongoing effort will be zero impact on our existing workflow and solutions. > > > > To reigniting this effort, I've put up a review to create a new "transformation" squad[0] geared toward building the structure around tripleo-ansible[1] and converting our current solutions into roles/playbooks/libraries/plugins. Initially, we'll be focused on our existing code base; however, long term, I believe it makes sense for this squad to work across projects to breakdown deployment barriers for folks using similar technologies. > > +1 > > > We're excited to get this effort rolling again and would love to work with anyone and everyone throughout the community. If folks are interested in this effort, please let us know. > > Most definitely in. We've had great success working with the TripleO team on > integrating the Tempest role and on the OSA side, we'd be more than happy > to help try and converge our roles to maintain them together. > > ++ > > If there's any meetings or anything that will be scheduled, I'd be > happy to attend. > > > its still very early but I expect to begin regular meetings (even if they're just impromptu IRC conversations to begin with) to work out what needs to be done and where we can begin collaborating with other folks. As soon as we have more I'll be sure to reach out here and on IRC. > > > [0] - https://review.opendev.org/662763 > > [1] - https://opendev.org/openstack/tripleo-ansible > > -- > > > > Kevin Carter > > IRC: cloudnull > > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Mon Jun 3 18:21:30 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 3 Jun 2019 19:21:30 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Mon, 3 Jun 2019, 12:57 Jim Rollenhagen, wrote: > I don't think I have enough coffee in me to fully digest this, but wanted > to > point out a couple of things. FWIW, this is something I've thought we > should do > for a while now. > > On Sat, Jun 1, 2019 at 8:43 AM Mohammed Naser wrote: > >> Hi everyone, >> >> This is something that I've discussed with a few people over time and >> I think I'd probably want to bring it up by now. I'd like to propose >> and ask if it makes sense to perhaps replace devstack entirely with >> openstack-ansible. I think I have quite a few compelling reasons to >> do this that I'd like to outline, as well as why I *feel* (and I could >> be biased here, so call me out!) that OSA is the best option in terms >> of a 'replacement' >> >> # Why not another deployment project? >> I actually thought about this part too and considered this mainly for >> ease of use for a *developer*. >> >> At this point, Puppet-OpenStack pretty much only deploys packages >> (which means that it has no build infrastructure, a developer can't >> just get $commit checked out and deployed). >> >> TripleO uses Kolla containers AFAIK and those have to be pre-built >> beforehand, also, I feel they are much harder to use as a developer >> because if you want to make quick edits and restart services, you have >> to enter a container and make the edit there and somehow restart the >> service without the container going back to it's original state. >> Kolla-Ansible and the other combinations also suffer from the same >> "issue". >> > > FWIW, kolla-ansible (and maybe tripleo?) has a "development" mode which > mounts > the code as a volume, so you can make edits and just run "docker restart > $service". Though systemd does make that a bit nicer due to globs (e.g. > systemctl restart nova-*). > > That said, I do agree moving to something where systemd is running the > services > would make for a smoother transition for developers. > > >> >> OpenStack Ansible is unique in the way that it pretty much just builds >> a virtualenv and installs packages inside of it. The services are >> deployed as systemd units. This is very much similar to the current >> state of devstack at the moment (minus the virtualenv part, afaik). >> It makes it pretty straight forward to go and edit code if you >> need/have to. We also have support for Debian, CentOS, Ubuntu and >> SUSE. This allows "devstack 2.0" to have far more coverage and make >> it much more easy to deploy on a wider variety of operating systems. >> It also has the ability to use commits checked out from Zuul so all >> the fancy Depends-On stuff we use works. >> >> # Why do we care about this, I like my bash scripts! >> As someone who's been around for a *really* long time in OpenStack, >> I've seen a whole lot of really weird issues surface from the usage of >> DevStack to do CI gating. For example, one of the recent things is >> the fact it relies on installing package-shipped noVNC, where as the >> 'master' noVNC has actually changed behavior a few months back and it >> is completely incompatible at this point (it's just a ticking thing >> until we realize we're entirely broken). >> >> To this day, I still see people who want to POC something up with >> OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter >> how many warnings we'll put up, they'll always try to do it. With >> this way, at least they'll have something that has the shape of an >> actual real deployment. In addition, it would be *good* in the >> overall scheme of things for a deployment system to test against, >> because this would make sure things don't break in both ways. >> > > ++ > > >> >> Also: we run Zuul for our CI which supports Ansible natively, this can >> remove one layer of indirection (Zuul to run Bash) and have Zuul run >> the playbooks directly from the executor. >> >> # So how could we do this? >> The OpenStack Ansible project is made of many roles that are all >> composable, therefore, you can think of it as a combination of both >> Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained >> the base modules (i.e. puppet-nova, etc) and TripleO was the >> integration of all of it in a distribution. OSA is currently both, >> but it also includes both Ansible roles and playbooks. >> >> In order to make sure we maintain as much of backwards compatibility >> as possible, we can simply run a small script which does a mapping of >> devstack => OSA variables to make sure that the service is shipped >> with all the necessary features as per local.conf. >> > > ++ > This strikes me as being a considerable undertaking, that would never get full compatibility due to the lack of a defined API. It might get close with a bit of effort. I expect there are scripts and plugins that don't have an analogue in OSA (ironic, I'm looking at you). > > >> >> So the new process could be: >> >> 1) parse local.conf and generate Ansible variables files >> 2) install Ansible (if not running in gate) >> 3) run playbooks using variable generated in #1 >> >> The neat thing is after all of this, devstack just becomes a thin >> wrapper around Ansible roles. I also think it brings a lot of hands >> together, involving both the QA team and OSA team together, which I >> believe that pooling our resources will greatly help in being able to >> get more done and avoiding duplicating our efforts. >> >> # Conclusion >> This is a start of a very open ended discussion, I'm sure there is a >> lot of details involved here in the implementation that will surface, >> but I think it could be a good step overall in simplifying our CI and >> adding more coverage for real potential deployers. It will help two >> teams unite together and have more resources for something (that >> essentially is somewhat of duplicated effort at the moment). >> >> I will try to pick up sometime to POC a simple service being deployed >> by an OSA role instead of Bash, placement which seems like a very >> simple one and share that eventually. >> >> Thoughts? :) >> > > The reason this hasn't been pushed on in the past is to avoid the > perception > that the TC or QA team is choosing a "winner" in the deployment space. I > don't > think that's a good reason not to do something like this (especially with > the > drop in contributors since I've had that discussion). However, we do need > to > message this carefully at a minimum. > > With my Kolla hat on, this does concern me. If you're trying out OpenStack and spend enough quality time with OSA to become familiar with it, you're going to be less inclined to do your homework on deployment tools. It would be nice if the deployment space wasn't so fragmented, but we all have our reasons. > >> -- >> Mohammed Naser — vexxhost >> ----------------------------------------------------- >> D. 514-316-8872 >> D. 800-910-1726 ext. 200 >> E. mnaser at vexxhost.com >> W. http://vexxhost.com >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Mon Jun 3 18:28:02 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 3 Jun 2019 19:28:02 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Mon, 3 Jun 2019, 15:59 Clark Boylan, wrote: > On Sat, Jun 1, 2019, at 5:36 AM, Mohammed Naser wrote: > > Hi everyone, > > > > This is something that I've discussed with a few people over time and > > I think I'd probably want to bring it up by now. I'd like to propose > > and ask if it makes sense to perhaps replace devstack entirely with > > openstack-ansible. I think I have quite a few compelling reasons to > > do this that I'd like to outline, as well as why I *feel* (and I could > > be biased here, so call me out!) that OSA is the best option in terms > > of a 'replacement' > > > > # Why not another deployment project? > > I actually thought about this part too and considered this mainly for > > ease of use for a *developer*. > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > (which means that it has no build infrastructure, a developer can't > > just get $commit checked out and deployed). > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > beforehand, also, I feel they are much harder to use as a developer > > because if you want to make quick edits and restart services, you have > > to enter a container and make the edit there and somehow restart the > > service without the container going back to it's original state. > > Kolla-Ansible and the other combinations also suffer from the same > > "issue". > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > a virtualenv and installs packages inside of it. The services are > > deployed as systemd units. This is very much similar to the current > > state of devstack at the moment (minus the virtualenv part, afaik). > > It makes it pretty straight forward to go and edit code if you > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > it much more easy to deploy on a wider variety of operating systems. > > It also has the ability to use commits checked out from Zuul so all > > the fancy Depends-On stuff we use works. > > > > # Why do we care about this, I like my bash scripts! > > As someone who's been around for a *really* long time in OpenStack, > > I've seen a whole lot of really weird issues surface from the usage of > > DevStack to do CI gating. For example, one of the recent things is > > the fact it relies on installing package-shipped noVNC, where as the > > 'master' noVNC has actually changed behavior a few months back and it > > is completely incompatible at this point (it's just a ticking thing > > until we realize we're entirely broken). > > I'm not sure this is a great example case. We consume prebuilt software > for many of our dependencies. Everything from the kernel to the database to > rabbitmq to ovs (and so on) are consumed as prebuilt packages from our > distros. In many cases this is desirable to ensure that our software work > with the other software out there in the wild that people will be deploying > with. > > > > > To this day, I still see people who want to POC something up with > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > how many warnings we'll put up, they'll always try to do it. With > > this way, at least they'll have something that has the shape of an > > actual real deployment. In addition, it would be *good* in the > > overall scheme of things for a deployment system to test against, > > because this would make sure things don't break in both ways. > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > the playbooks directly from the executor. > > I think if you have developers running a small wrapper locally to deploy > this new development stack you should run that same wrapper in CI. This > ensure the wrapper doesn't break. > > > > > # So how could we do this? > > The OpenStack Ansible project is made of many roles that are all > > composable, therefore, you can think of it as a combination of both > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > integration of all of it in a distribution. OSA is currently both, > > but it also includes both Ansible roles and playbooks. > > > > In order to make sure we maintain as much of backwards compatibility > > as possible, we can simply run a small script which does a mapping of > > devstack => OSA variables to make sure that the service is shipped > > with all the necessary features as per local.conf. > > > > So the new process could be: > > > > 1) parse local.conf and generate Ansible variables files > > 2) install Ansible (if not running in gate) > > 3) run playbooks using variable generated in #1 > > > > The neat thing is after all of this, devstack just becomes a thin > > wrapper around Ansible roles. I also think it brings a lot of hands > > together, involving both the QA team and OSA team together, which I > > believe that pooling our resources will greatly help in being able to > > get more done and avoiding duplicating our efforts. > > > > # Conclusion > > This is a start of a very open ended discussion, I'm sure there is a > > lot of details involved here in the implementation that will surface, > > but I think it could be a good step overall in simplifying our CI and > > adding more coverage for real potential deployers. It will help two > > teams unite together and have more resources for something (that > > essentially is somewhat of duplicated effort at the moment). > > > > I will try to pick up sometime to POC a simple service being deployed > > by an OSA role instead of Bash, placement which seems like a very > > simple one and share that eventually. > > > > Thoughts? :) > > For me there are two major items to consider that haven't been brought up > yet. The first is devstack's (lack of) speed. Any replacement should be at > least as quick as the current tooling because the current tooling is slow > enough already. This is important. We would need to see benchmark comparisons between a devstack install and an OSA install. Shell may be slow but Ansible is generally slower. That's fine in production when reliability is king, but we need fast iteration for development. I haven't looked under the covers of devstack for some time, but it previously installed all python deps in one place, whereas OSA has virtualenvs for each service which could take a while to build. Perhaps this is configurable. The other is logging. I spend a lot of time helping people to debug CI job > runs and devstack has grown a fairly effective set of logging that just > about any time I have to help debug another deployment tool's CI jobs I > miss (because they tend to log only a tiny fraction of what devstack logs). > > Clark > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Mon Jun 3 18:37:25 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 3 Jun 2019 13:37:25 -0500 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> Message-ID: Hi Madhuri- > For this purpose, we would need to change a trait of the server’s > flavor in Nova. This trait is mapped to a deploy step in Ironic > which does some operation(change BIOS config and reboot in this use > case).____ If your trait was something that wasn't tracked in the flavor (or elsewhere in the instance's db record), you could just update it directly in placement. Then you'd have to figure out how to make ironic notice that and effect the change. (Or perhaps the other way around: tell ironic you want to make the change, and it updates the trait in placement as part of the process.) > In Nova, the only API to change trait in flavor is resize whereas > resize does migration and a reboot as well.____ > > In short, I am  looking for a Nova API that only changes the traits, > and trigger the ironic deploy steps but no reboot and migration. > Please suggest.____ It's inconvenient, but I'm afraid "resize" is the right way to get this done, because that's the only way to get the appropriate validation and changes effected in the general case. Now, there's a spec [1] we've been talking about for ~4.5 years that would let you do a resize without rebooting, when only a certain subset of properties are being changed. It is currently proposed for "upsizing" CPU, memory, and disk, and adding PCI devices, but clearly this ISS configuration would be a reasonable candidate to include. In fact, it's possible that leading the charge with something this unobtrusive would reduce some of the points of contention that have stalled the blueprint up to this point. Food for thought. Thanks, efried [1] https://review.opendev.org/#/c/141219/ From lshort at redhat.com Mon Jun 3 18:57:13 2019 From: lshort at redhat.com (Luke Short) Date: Mon, 3 Jun 2019 14:57:13 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: <620D70C5-5EFD-42FF-A647-03164FA41A28@redhat.com> References: <620D70C5-5EFD-42FF-A647-03164FA41A28@redhat.com> Message-ID: Hey Sorin, I'm glad to see you are excited as we are about this effort! Since you are one of the core developers of Molecule, I was hoping to get some of your insight on the work we have started in that regard. I have a few patches up for adding in Molecule tests to tripleo-common. At a later point in time, we can discuss the transition of moving all of the Ansible content into the tripleo-ansible repository. https://review.opendev.org/#/c/662803/ https://review.opendev.org/#/c/662577/ The first is to add the common files that will be used across most, if not all, of the Molecule tests in this repository. The second patch is where I actually implement Molecule tests and symlinks to those common files in the first patch. I wanted to get your thoughts on a few things. 1. How would we hook in the Molecule tests into tox so that it will be tested by CI? Do you have an example of this already being done? I believe from previous discussions you have already added a few Molecule tests to a TripleO repository before. Kevin also had a good ideal of creating an isolated Zuul job so that can be something we can investigate as well. 2. How should we handle the actual tests? In the second patch, I used the playbook.yaml to write the test in a playbook format (the actual test happens during the post_tasks phase). I have always done Molecule testing this way to keep things simple. However, would you recommend that we use the Python testinfra library instead to make sure that certain things exist? Thanks for any input you may have! Luke Short, RHCE Software Engineer, OpenStack Deployment Framework Red Hat, Inc. On Mon, Jun 3, 2019 at 2:29 PM Sorin Sbarnea wrote: > I am really happy to hear about that as this could be much more effective > than having an uncontrollable number of roles scattered across lots of > repositories which usually do not play very nice with each other. > > I hope that testing these roles using molecule (official ansible testing > platform) is also part of this plan. > > Cheers > Sorin > > On 3 Jun 2019, at 18:42, Kevin Carter wrote: > > On Mon, Jun 3, 2019 at 12:08 PM Mohammed Naser > wrote: > >> On Mon, Jun 3, 2019 at 12:55 PM Kevin Carter wrote: >> > >> > Hello Stackers, >> > >> > I wanted to follow up on this post from last year, pick up from where >> it left off, and bring together a squad to get things moving. >> > >> > > >> http://lists.openstack.org/pipermail/openstack-dev/2018-August/133801.html >> > >> > The effort to convert tripleo Puppet and heat templates with embedded >> Ansible to a more consumable set of playbooks and roles is in full effect. >> As we're working through this effort we believe co-locating all of the >> Ansible tasks/roles/libraries/plugins throughout the code base into a >> single purpose-built repository will assist us in streamlining and >> simplifying. Structurally, at this time, most of tripleo will remain the >> same. However, the inclusion of tripleo-Ansible will allow us to create >> more focused solutions which are independently testable, much easier >> understand, and simple to include into the current heat template deployment >> methodologies. While a straight port of the existing Ansible tasks will not >> be entirely possible, the goal of this ongoing effort will be zero impact >> on our existing workflow and solutions. >> > >> > To reigniting this effort, I've put up a review to create a new >> "transformation" squad[0] geared toward building the structure around >> tripleo-ansible[1] and converting our current solutions into >> roles/playbooks/libraries/plugins. Initially, we'll be focused on our >> existing code base; however, long term, I believe it makes sense for this >> squad to work across projects to breakdown deployment barriers for folks >> using similar technologies. >> >> +1 >> >> > We're excited to get this effort rolling again and would love to work >> with anyone and everyone throughout the community. If folks are interested >> in this effort, please let us know. >> >> Most definitely in. We've had great success working with the TripleO >> team on >> integrating the Tempest role and on the OSA side, we'd be more than happy >> to help try and converge our roles to maintain them together. >> > > ++ > > >> If there's any meetings or anything that will be scheduled, I'd be >> happy to attend. >> >> > its still very early but I expect to begin regular meetings (even if > they're just impromptu IRC conversations to begin with) to work out what > needs to be done and where we can begin collaborating with other folks. > As soon as we have more I'll be sure to reach out here and on IRC. > > >> > [0] - https://review.opendev.org/662763 >> > [1] - https://opendev.org/openstack/tripleo-ansible >> > -- >> > >> > Kevin Carter >> > IRC: cloudnull >> >> >> >> -- >> Mohammed Naser — vexxhost >> ----------------------------------------------------- >> D. 514-316-8872 >> D. 800-910-1726 ext. 200 >> E. mnaser at vexxhost.com >> W. http://vexxhost.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From MM9745 at att.com Mon Jun 3 19:32:02 2019 From: MM9745 at att.com (MCEUEN, MATT) Date: Mon, 3 Jun 2019 19:32:02 +0000 Subject: Elections for Airship In-Reply-To: References: <7C64A75C21BB8D43BD75BB18635E4D89709A2256@MOSTLS1MSGUSRFF.ITServices.sbc.com> <20190530223625.7ao2hmxlrrj3ny4b@yuggoth.org> Message-ID: <7C64A75C21BB8D43BD75BB18635E4D89709A8D00@MOSTLS1MSGUSRFF.ITServices.sbc.com> Thanks Kendall, Jeremy, Sean – this is very helpful! I think this gives us the tools we need to run a successful election; if we do have any questions I’ll let y’all know. Appreciate your help, Matt From: Kendall Nelson Sent: Monday, June 3, 2019 8:47 AM To: Jeremy Stanley Cc: OpenStack Discuss Subject: Re: Elections for Airship Might also be helpful to look at our document that outlines the process we go through[1]. If you have any questions, let us know! -Kendall (diablo_rojo) [1] https://opendev.org/openstack/election/src/branch/master/README.rst On Thu, May 30, 2019 at 3:37 PM Jeremy Stanley > wrote: On 2019-05-30 19:04:56 +0000 (+0000), MCEUEN, MATT wrote: > OpenStack Infra team, The OpenStack Infrastructure team hasn't been officially involved in running technical elections for OpenStack for several years now (subject tag removed accordingly). With the advent of Gerrit's REST API, contributor data can be queried and assembled anonymously by anyone. While I happen to be involved in these activities for longer than that's been the case, I'll be answering while wearing my OpenStack Technical Election Official hat throughout the remainder of this reply. > As the Airship project works to finalize our governance and > elected positions [1], we need to be ready to hold our first > elections. I wanted to reach out and ask for any experience, > guidance, materials, or tooling you can share that would help this > run correctly and smoothly? This is an area where the Airship team > doesn't have much experience so we may not know the right > questions to ask. > > Aside from a member of the Airship community creating a poll in > CIVS [2], is there anything else you would recommend? Is there any > additional tooling in place in the OpenStack world? Any potential > pitfalls, or other hard-won advice for us? [...] As Sean mentioned in his reply, the OpenStack community has been building and improving tooling in the openstack/election Git repository on OpenDev over the past few years. The important bits (in my opinion) center around querying Gerrit for a list of contributors whose changes have merged to sets of official project repositories within a qualifying date range. I've recently been assisting StarlingX's election officials with a similar request, and do have some recommendations. Probably the best place to start is adding an official structured dataset with your team/project information following the same schema used by OpenStack[0] and now StarlingX[1], then applying a couple of feature patches[2][3] (if they haven't merged by the time you read this) to the openstack/election master branch. After that, you ought to be able to run something along the lines of: tox -e venv -- owners --after 2018-05-30 --before 2019-05-31 --nonmember --outdir airship-electorate --projects ../../airship/governance/projects.yaml --ref master (Note that the --after and --before dates work like in Gerrit's query language and carry with them an implied midnight UTC, so one is the actual start date but the other is the day after the end date; "on or after" and "before but not on" is how I refer to them in prose.) You'll see the resulting airship-electorate directory includes a lot of individual files. There are two basic types: .yaml files which are structured data meant for human auditing as well as scripted analysis, and .txt files which are a strict list of one Gerrit preferred E-mail address per line for each voter (the format expected by the https://civs.cs.cornell.edu/ voting service). It's probably also obvious that there are sets of these named for each team in your governance, as well as a set which start with underscore (_). The former represent contributions to the deliverable repositories of each team, while the latter are produced from an aggregate of all deliverable repositories for all teams (this is what you might use for electing an Airship-wide governing body). There are a couple of extra underscore files... _duplicate_owners.yaml includes information on deduplicated entries for contributors where the script was able to detect more than one Gerrit account for the same individual, while the _invites.csv file isn't really election-related at all and is what the OSF normally feeds into the automation which sends event discounts to contributors. In case you're curious about the _invites.csv file, the first column is the OSF member ID (if known) or 0 (if no matching membership was found), the second column is the display name from Gerrit, the third column is the preferred E-mail address from Gerrit (this corresponds to the address used for the _electorate.txt file), and any subsequent columns are the extra non-preferred addresses configured in Gerrit for that account. Please don't hesitate to follow up with any additional questions you might have! [0] https://opendev.org/openstack/governance/src/branch/master/reference/projects.yaml [1] https://opendev.org/starlingx/governance/src/branch/master/reference/tsc/projects.yaml [2] https://review.opendev.org/661647 [3] https://review.opendev.org/661648 -- Jeremy Stanley -------------- next part -------------- An HTML attachment was scrubbed... URL: From colleen at gazlene.net Mon Jun 3 19:45:47 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Mon, 03 Jun 2019 12:45:47 -0700 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Sat, Jun 1, 2019, at 05:36, Mohammed Naser wrote: > Hi everyone, > > This is something that I've discussed with a few people over time and > I think I'd probably want to bring it up by now. I'd like to propose > and ask if it makes sense to perhaps replace devstack entirely with > openstack-ansible. I think I have quite a few compelling reasons to > do this that I'd like to outline, as well as why I *feel* (and I could > be biased here, so call me out!) that OSA is the best option in terms > of a 'replacement' You laid out three reasons below to switch, and to be frank, I don't find any of them compelling. This is tooling that hundreds of people and machines rely on and are familiar with, and to undertake a massive change like this deserves some *really* compelling, even *dire*, rationalization for it, and metrics showing it is better than the old thing. This thread reads as proposing change for the sake of change. Colleen > > # Why not another deployment project? > I actually thought about this part too and considered this mainly for > ease of use for a *developer*. > > At this point, Puppet-OpenStack pretty much only deploys packages > (which means that it has no build infrastructure, a developer can't > just get $commit checked out and deployed). > > TripleO uses Kolla containers AFAIK and those have to be pre-built > beforehand, also, I feel they are much harder to use as a developer > because if you want to make quick edits and restart services, you have > to enter a container and make the edit there and somehow restart the > service without the container going back to it's original state. > Kolla-Ansible and the other combinations also suffer from the same > "issue". > > OpenStack Ansible is unique in the way that it pretty much just builds > a virtualenv and installs packages inside of it. The services are > deployed as systemd units. This is very much similar to the current > state of devstack at the moment (minus the virtualenv part, afaik). > It makes it pretty straight forward to go and edit code if you > need/have to. We also have support for Debian, CentOS, Ubuntu and > SUSE. This allows "devstack 2.0" to have far more coverage and make > it much more easy to deploy on a wider variety of operating systems. > It also has the ability to use commits checked out from Zuul so all > the fancy Depends-On stuff we use works. > > # Why do we care about this, I like my bash scripts! > As someone who's been around for a *really* long time in OpenStack, > I've seen a whole lot of really weird issues surface from the usage of > DevStack to do CI gating. For example, one of the recent things is > the fact it relies on installing package-shipped noVNC, where as the > 'master' noVNC has actually changed behavior a few months back and it > is completely incompatible at this point (it's just a ticking thing > until we realize we're entirely broken). > > To this day, I still see people who want to POC something up with > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > how many warnings we'll put up, they'll always try to do it. With > this way, at least they'll have something that has the shape of an > actual real deployment. In addition, it would be *good* in the > overall scheme of things for a deployment system to test against, > because this would make sure things don't break in both ways. > > Also: we run Zuul for our CI which supports Ansible natively, this can > remove one layer of indirection (Zuul to run Bash) and have Zuul run > the playbooks directly from the executor. > > # So how could we do this? > The OpenStack Ansible project is made of many roles that are all > composable, therefore, you can think of it as a combination of both > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > the base modules (i.e. puppet-nova, etc) and TripleO was the > integration of all of it in a distribution. OSA is currently both, > but it also includes both Ansible roles and playbooks. > > In order to make sure we maintain as much of backwards compatibility > as possible, we can simply run a small script which does a mapping of > devstack => OSA variables to make sure that the service is shipped > with all the necessary features as per local.conf. > > So the new process could be: > > 1) parse local.conf and generate Ansible variables files > 2) install Ansible (if not running in gate) > 3) run playbooks using variable generated in #1 > > The neat thing is after all of this, devstack just becomes a thin > wrapper around Ansible roles. I also think it brings a lot of hands > together, involving both the QA team and OSA team together, which I > believe that pooling our resources will greatly help in being able to > get more done and avoiding duplicating our efforts. > > # Conclusion > This is a start of a very open ended discussion, I'm sure there is a > lot of details involved here in the implementation that will surface, > but I think it could be a good step overall in simplifying our CI and > adding more coverage for real potential deployers. It will help two > teams unite together and have more resources for something (that > essentially is somewhat of duplicated effort at the moment). > > I will try to pick up sometime to POC a simple service being deployed > by an OSA role instead of Bash, placement which seems like a very > simple one and share that eventually. > > Thoughts? :) > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com > > From feilong at catalyst.net.nz Mon Jun 3 20:01:07 2019 From: feilong at catalyst.net.nz (feilong) Date: Tue, 4 Jun 2019 08:01:07 +1200 Subject: [magnum] Meeting at 2019-06-04 2100 UTC In-Reply-To: References: Message-ID: <738dca30-1719-7034-ead1-b4c906681184@catalyst.net.nz> Thanks bringing this topic. Yes, we can discuss it on next weekly meeting. I have added it in our agenda https://wiki.openstack.org/wiki/Meetings/Containers On 4/06/19 12:02 AM, Spyros Trigazis wrote: > Hello all, > > I would like to discuss moving the drivers out-of-tree, as > we briefly discussed it in the PTG. Can you all make it for the > next meeting [1]? > > This is not super urgent, but it will accelerate development and bug > fixes at the driver level. > > Cheers, > Spyros > > [0] https://etherpad.openstack.org/p/magnum-train-ptg > [1] https://www.timeanddate.com/worldclock/fixedtime.html?msg=magnum-meeting&iso=20190604T21 -- Cheers & Best regards, Feilong Wang (王飞龙) ------------------------------------------------------ Senior Cloud Software Engineer Tel: +64-48032246 Email: flwang at catalyst.net.nz Catalyst IT Limited Level 6, Catalyst House, 150 Willis Street, Wellington ------------------------------------------------------ From mnaser at vexxhost.com Mon Jun 3 21:32:11 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 3 Jun 2019 17:32:11 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019 at 3:51 PM Colleen Murphy wrote: > > On Sat, Jun 1, 2019, at 05:36, Mohammed Naser wrote: > > Hi everyone, > > > > This is something that I've discussed with a few people over time and > > I think I'd probably want to bring it up by now. I'd like to propose > > and ask if it makes sense to perhaps replace devstack entirely with > > openstack-ansible. I think I have quite a few compelling reasons to > > do this that I'd like to outline, as well as why I *feel* (and I could > > be biased here, so call me out!) that OSA is the best option in terms > > of a 'replacement' > > You laid out three reasons below to switch, and to be frank, I don't find any of them compelling. This is tooling that hundreds of people and machines rely on and are familiar with, and to undertake a massive change like this deserves some *really* compelling, even *dire*, rationalization for it, and metrics showing it is better than the old thing. This thread reads as proposing change for the sake of change. That's fair. My argument was that we have a QA team that is strapped for resources which is doing the same work as the OSA team as working on, so most probably deduplicating efforts can help us get more things done because work can split across more people now. I do totally get people might not want to do it. That's fine, it is after all a proposal and if the majority of the community feels like devstack is okay, and the amount of maintainers it has is fine, then I wouldn't want to change that either. > Colleen > > > > > # Why not another deployment project? > > I actually thought about this part too and considered this mainly for > > ease of use for a *developer*. > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > (which means that it has no build infrastructure, a developer can't > > just get $commit checked out and deployed). > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > beforehand, also, I feel they are much harder to use as a developer > > because if you want to make quick edits and restart services, you have > > to enter a container and make the edit there and somehow restart the > > service without the container going back to it's original state. > > Kolla-Ansible and the other combinations also suffer from the same > > "issue". > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > a virtualenv and installs packages inside of it. The services are > > deployed as systemd units. This is very much similar to the current > > state of devstack at the moment (minus the virtualenv part, afaik). > > It makes it pretty straight forward to go and edit code if you > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > it much more easy to deploy on a wider variety of operating systems. > > It also has the ability to use commits checked out from Zuul so all > > the fancy Depends-On stuff we use works. > > > > # Why do we care about this, I like my bash scripts! > > As someone who's been around for a *really* long time in OpenStack, > > I've seen a whole lot of really weird issues surface from the usage of > > DevStack to do CI gating. For example, one of the recent things is > > the fact it relies on installing package-shipped noVNC, where as the > > 'master' noVNC has actually changed behavior a few months back and it > > is completely incompatible at this point (it's just a ticking thing > > until we realize we're entirely broken). > > > > To this day, I still see people who want to POC something up with > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > how many warnings we'll put up, they'll always try to do it. With > > this way, at least they'll have something that has the shape of an > > actual real deployment. In addition, it would be *good* in the > > overall scheme of things for a deployment system to test against, > > because this would make sure things don't break in both ways. > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > the playbooks directly from the executor. > > > > # So how could we do this? > > The OpenStack Ansible project is made of many roles that are all > > composable, therefore, you can think of it as a combination of both > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > integration of all of it in a distribution. OSA is currently both, > > but it also includes both Ansible roles and playbooks. > > > > In order to make sure we maintain as much of backwards compatibility > > as possible, we can simply run a small script which does a mapping of > > devstack => OSA variables to make sure that the service is shipped > > with all the necessary features as per local.conf. > > > > So the new process could be: > > > > 1) parse local.conf and generate Ansible variables files > > 2) install Ansible (if not running in gate) > > 3) run playbooks using variable generated in #1 > > > > The neat thing is after all of this, devstack just becomes a thin > > wrapper around Ansible roles. I also think it brings a lot of hands > > together, involving both the QA team and OSA team together, which I > > believe that pooling our resources will greatly help in being able to > > get more done and avoiding duplicating our efforts. > > > > # Conclusion > > This is a start of a very open ended discussion, I'm sure there is a > > lot of details involved here in the implementation that will surface, > > but I think it could be a good step overall in simplifying our CI and > > adding more coverage for real potential deployers. It will help two > > teams unite together and have more resources for something (that > > essentially is somewhat of duplicated effort at the moment). > > > > I will try to pick up sometime to POC a simple service being deployed > > by an OSA role instead of Bash, placement which seems like a very > > simple one and share that eventually. > > > > Thoughts? :) > > > > -- > > Mohammed Naser — vexxhost > > ----------------------------------------------------- > > D. 514-316-8872 > > D. 800-910-1726 ext. 200 > > E. mnaser at vexxhost.com > > W. http://vexxhost.com > > > > > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From egle.sigler at rackspace.com Mon Jun 3 21:34:03 2019 From: egle.sigler at rackspace.com (Egle Sigler) Date: Mon, 3 Jun 2019 21:34:03 +0000 Subject: [interop] [refstack] Interop WG Meeting this Wed. 11:00 AM CST/ 16:00 UTC Message-ID: Hello Everyone, We will be holding Interop WG meetings this Wednesday, at 1600 UTC in #openstack-meeting-3, everyone welcome. Etherpad for Wednesday’s meeting: https://etherpad.openstack.org/p/InteropWhistler.23 Please add items to the agenda. Web IRC link if you are not using IRC client: http://webchat.freenode.net/?channels=openstack-meeting-3 Meetbot quick reference guide: http://meetbot.debian.net/Manual.html#user-reference If you have general interop questions, please ask in #openstack-interopIRC channel. Thank you, Egle -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtroyer at gmail.com Tue Jun 4 03:36:24 2019 From: dtroyer at gmail.com (Dean Troyer) Date: Mon, 3 Jun 2019 22:36:24 -0500 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: [I have been trying to decide where to jump in here, this seems as good a place as any.] On Mon, Jun 3, 2019 at 2:48 PM Colleen Murphy wrote: > You laid out three reasons below to switch, and to be frank, I don't find any of them compelling. This is tooling that hundreds of people and machines rely on and are familiar with, and to undertake a massive change like this deserves some *really* compelling, even *dire*, rationalization for it, and metrics showing it is better than the old thing. This thread reads as proposing change for the sake of change. Colleen makes a great point here about the required scope of this proposal to actually be a replacement for DevStack... A few of us have wanted to replace DevStack with something better probably since a year after we introduced it in Boston (the first time). The primary problems with replacing it are both technical and business/political. There have been two serious attempts, the first was what became harlowja's Anvil project, which actually had different goals than DevStack, and the second was discussed at the first PTG in Atlanta as an OSA-based orchestrator that could replace parts incrementally and was going to (at least partially) leverage Zuul v3. That died with the rest of OSIC (RIP). The second proposal was very similar to mnaser's current one To actually _replace_ DevStack you have to meet a major fraction of its use cases, which are more than anyone imagined back in the day. Both prior attempts to replace it did not address all of the use cases and (I believe) that limited the number of people willing or able to get involved. Anything short of complete replacement fails to meet the 'deduplication of work' goal... (see https://xkcd.com/927/). IMHO the biggest problem here is finding anyone who is willing to fund this work. It is a huge project that will only count toward a sponsor company's stats in an area they usually do not pay much attention toward. I am not trying to throw cold water on this, I will gladly support from a moderate distance any effort to rid us of DevStack. I believe that knowing where things have been attempted in the past will either inform how to approach it differently now or identify what in our community has changed to make trying again worthwhile. Go for it! dt -- Dean Troyer dtroyer at gmail.com From amotoki at gmail.com Tue Jun 4 04:00:03 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Tue, 4 Jun 2019 13:00:03 +0900 Subject: [neutron] bug deputy report (week of May 27) Message-ID: Hi neutrinos, I was a neutron bug deputy last week (which covered from May 26 to Jun 2). Here is the bug deputy report from me. Last week was relatively quiet, but we got a couple of bugs on performance. Two of them seems related to RBAC mechanism and they are worth got attentions from the team. --- Get external networks too slowly because it would join subnet and rbac https://bugs.launchpad.net/neutron/+bug/1830630 Medium, New, loadimpact Security groups RBAC cause a major performance degradation https://bugs.launchpad.net/neutron/+bug/1830679 High, New, loadimpact Needs attentions amotoki is looking into it but more eyes would be appreciated. Improper close connection to database leading to mysql/mariadb block connection https://bugs.launchpad.net/neutron/+bug/1831009 Undecided, New This looks like a generic issue related to oslo.db, but it is better to get attentions in neutron side too as it happens in neutron. Debug neutron-tempest-plugin-dvr-multinode-scenario failures https://bugs.launchpad.net/neutron/+bug/1830763 High, Confirmed, assigned to mlavalle Best Regards Akihiro Motoki (irc: amotoki) From gael.therond at gmail.com Tue Jun 4 07:43:46 2019 From: gael.therond at gmail.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Tue, 4 Jun 2019 09:43:46 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia Message-ID: Hi guys, I’ve a weird situation here. I smoothly operate a large scale multi-region Octavia service using the default amphora driver which imply the use of nova instances as loadbalancers. Everything is running really well and our customers (K8s and traditional users) are really happy with the solution so far. However, yesterday one of those customers using the loadbalancer in front of their ElasticSearch cluster poked me because this loadbalancer suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer available but yet the anchor/member/pool and listeners settings were still existing. So I investigated and found out that the loadbalancer amphoras have been destroyed by the octavia user. The weird part is, both the master and the backup instance have been destroyed at the same moment by the octavia service user. Is there specific circumstances where the octavia service could decide to delete the instances but not the anchor/members/pool ? It’s worrying me a bit as there is no clear way to trace why does Octavia did take this action. I digged within the nova and Octavia DB in order to correlate the action but except than validating my investigation it doesn’t really help as there are no clue of why the octavia service did trigger the deletion. If someone have any clue or tips to give me I’ll be more than happy to discuss this situation. Cheers guys! -------------- next part -------------- An HTML attachment was scrubbed... URL: From madhuri.kumari at intel.com Tue Jun 4 07:47:25 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Tue, 4 Jun 2019 07:47:25 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC0142D@BGSMSX101.gar.corp.intel.com> Hi Mark, Replied inline. From: Mark Goddard [mailto:mark at stackhpc.com] Sent: Monday, June 3, 2019 2:16 PM To: Kumari, Madhuri Cc: openstack-discuss at lists.openstack.org Subject: Re: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning On Mon, 3 Jun 2019 at 06:57, Kumari, Madhuri > wrote: Hi Ironic, Nova Developers, I am currently working on implementing Intel Speed Select(ISS) feature[1] in Ironic and I have a use case where I want to change ISS configuration in BIOS after a node is provisioned. Such use case of changing the configuration post deployment is common and not specific to ISS. A real-life example for such a required post-deploy configuration change is the change of BIOS settings to disable hyper-threading in order to address a security vulnerability. Currently there is no way of changing any BIOS configuration after a node is provisioned in Ironic. One solution for it is to allow manual deploy steps in Ironic[2](not implemented yet) which can be trigged by changing traits in Nova. For this purpose, we would need to change a trait of the server’s flavor in Nova. This trait is mapped to a deploy step in Ironic which does some operation(change BIOS config and reboot in this use case). In Nova, the only API to change trait in flavor is resize whereas resize does migration and a reboot as well. In short, I am looking for a Nova API that only changes the traits, and trigger the ironic deploy steps but no reboot and migration. Please suggest. Hi, it is possible to modify a flavor (openstack flavor set --property =). However, changes to a flavor are not reflected in instances that were previously created from that flavor. Internally, nova stores an 'embedded flavor' in the instance state. I'm not aware of any API that would allow modifying the embedded flavor, nor any process that would synchronise those changes to ironic. The resize API in Nova allows changing the flavor of an instance. It does migration and reboots. But the API is not implemented for IronicDriver. Though this doesn’t match our use case but seems to be the only available one that allows changing a flavor and ultimately a trait. Thanks in advance. Regards, Madhuri [1] https://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/support-intel-speed-select.html [2] https://storyboard.openstack.org/#!/story/2005129 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ssbarnea at redhat.com Tue Jun 4 07:56:31 2019 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Tue, 4 Jun 2019 08:56:31 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> I am in favour of ditching or at least refactoring devstack because during the last year I often found myself blocked from fixing some zuul/jobs issues because the buggy code was still required by legacy devstack jobs that nobody had time maintain or fix, so they were isolated and the default job configurations were forced to use dirty hack needed for keeping these working. One such example is that there is a task that does a "chmod -R 0777 -R" on the entire source tree, a total security threat. In order to make other jobs running correctly* I had to rely undoing the damage done by such chmod because I was not able to disable the historical hack. * ansible throws warning with unsafe file permissions * ssh refuses to load unsafe keys That is why I am in favor of dropping features that are slowing down the progress of others. I know that the reality is more complicated but I also think that sometimes less* is more. * deployment projects ;) > On 4 Jun 2019, at 04:36, Dean Troyer wrote: > > > > On Mon, 3 Jun 2019, 15:59 Clark Boylan, > wrote: > On Sat, Jun 1, 2019, at 5:36 AM, Mohammed Naser wrote: > > Hi everyone, > > > > This is something that I've discussed with a few people over time and > > I think I'd probably want to bring it up by now. I'd like to propose > > and ask if it makes sense to perhaps replace devstack entirely with > > openstack-ansible. I think I have quite a few compelling reasons to > > do this that I'd like to outline, as well as why I *feel* (and I could > > be biased here, so call me out!) that OSA is the best option in terms > > of a 'replacement' > > > > # Why not another deployment project? > > I actually thought about this part too and considered this mainly for > > ease of use for a *developer*. > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > (which means that it has no build infrastructure, a developer can't > > just get $commit checked out and deployed). > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > beforehand, also, I feel they are much harder to use as a developer > > because if you want to make quick edits and restart services, you have > > to enter a container and make the edit there and somehow restart the > > service without the container going back to it's original state. > > Kolla-Ansible and the other combinations also suffer from the same > > "issue". > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > a virtualenv and installs packages inside of it. The services are > > deployed as systemd units. This is very much similar to the current > > state of devstack at the moment (minus the virtualenv part, afaik). > > It makes it pretty straight forward to go and edit code if you > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > it much more easy to deploy on a wider variety of operating systems. > > It also has the ability to use commits checked out from Zuul so all > > the fancy Depends-On stuff we use works. > > > > # Why do we care about this, I like my bash scripts! > > As someone who's been around for a *really* long time in OpenStack, > > I've seen a whole lot of really weird issues surface from the usage of > > DevStack to do CI gating. For example, one of the recent things is > > the fact it relies on installing package-shipped noVNC, where as the > > 'master' noVNC has actually changed behavior a few months back and it > > is completely incompatible at this point (it's just a ticking thing > > until we realize we're entirely broken). > > I'm not sure this is a great example case. We consume prebuilt software for many of our dependencies. Everything from the kernel to the database to rabbitmq to ovs (and so on) are consumed as prebuilt packages from our distros. In many cases this is desirable to ensure that our software work with the other software out there in the wild that people will be deploying with. > > > > > To this day, I still see people who want to POC something up with > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > how many warnings we'll put up, they'll always try to do it. With > > this way, at least they'll have something that has the shape of an > > actual real deployment. In addition, it would be *good* in the > > overall scheme of things for a deployment system to test against, > > because this would make sure things don't break in both ways. > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > the playbooks directly from the executor. > > I think if you have developers running a small wrapper locally to deploy this new development stack you should run that same wrapper in CI. This ensure the wrapper doesn't break. > > > > > # So how could we do this? > > The OpenStack Ansible project is made of many roles that are all > > composable, therefore, you can think of it as a combination of both > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > integration of all of it in a distribution. OSA is currently both, > > but it also includes both Ansible roles and playbooks. > > > > In order to make sure we maintain as much of backwards compatibility > > as possible, we can simply run a small script which does a mapping of > > devstack => OSA variables to make sure that the service is shipped > > with all the necessary features as per local.conf. > > > > So the new process could be: > > > > 1) parse local.conf and generate Ansible variables files > > 2) install Ansible (if not running in gate) > > 3) run playbooks using variable generated in #1 > > > > The neat thing is after all of this, devstack just becomes a thin > > wrapper around Ansible roles. I also think it brings a lot of hands > > together, involving both the QA team and OSA team together, which I > > believe that pooling our resources will greatly help in being able to > > get more done and avoiding duplicating our efforts. > > > > # Conclusion > > This is a start of a very open ended discussion, I'm sure there is a > > lot of details involved here in the implementation that will surface, > > but I think it could be a good step overall in simplifying our CI and > > adding more coverage for real potential deployers. It will help two > > teams unite together and have more resources for something (that > > essentially is somewhat of duplicated effort at the moment). > > > > I will try to pick up sometime to POC a simple service being deployed > > by an OSA role instead of Bash, placement which seems like a very > > simple one and share that eventually. > > > > Thoughts? :) > > For me there are two major items to consider that haven't been brought up yet. The first is devstack's (lack of) speed. Any replacement should be at least as quick as the current tooling because the current tooling is slow enough already. > > This is important. We would need to see benchmark comparisons between a devstack install and an OSA install. Shell may be slow but Ansible is generally slower. That's fine in production when reliability is king, but we need fast iteration for development. > > I haven't looked under the covers of devstack for some time, but it previously installed all python deps in one place, whereas OSA has virtualenvs for each service which could take a while to build. Perhaps this is configurable. > > The other is logging. I spend a lot of time helping people to debug CI job runs and devstack has grown a fairly effective set of logging that just about any time I have to help debug another deployment tool's CI jobs I miss (because they tend to log only a tiny fraction of what devstack logs). > > Clark -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcin.juszkiewicz at linaro.org Tue Jun 4 08:30:04 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Tue, 4 Jun 2019 10:30:04 +0200 Subject: [kolla] Python 3 status update Message-ID: <70478b1f-4de6-7e25-b115-01b74e2cec57@linaro.org> How we stand with Python 3 move in Kolla(-ansible) project? Quite good I would say but there are some issues still. # Kolla ## Debian/Ubuntu source Patch for Debian/Ubuntu source images [1] got 24th revision and depends on "make sure that there is /var/run/apache2 dir' patch [2]. CI jobs run fine except 'kolla-ansible-ubuntu-source-ceph' one where 'openstack image create' step fails in 'Run deploy.sh script' [3]. **Help needed to find out why it fails there as I am out of ideas.** On x86-64 I was able to deploy all-in-one setup using ubuntu/source images. Debian/source images require us to first do Ansible upgrade as 'kolla-toolbox' image contains 2.2 version which fails to run with Python 3.7 present in Debian 'buster'. We agreed to go for Ansible 2.7/2.8 version. On AArch64 we have issue with RabbitMQ container failing to run (restarts all over again). Possible fix on a way. 1. https://review.opendev.org/#/c/642375 2. https://review.opendev.org/#/c/661713 3. http://logs.openstack.org/75/642375/24/check/kolla-ansible-ubuntu-source-ceph/7650efd/ara-report/result/3f8beadd-8f66-472f-ab4e-12e1357851ac/ ## CentOS 7 binary RDO team decided to not provide binary Train packages for CentOS 7 [4]. This target needs to be replaced with CentOS 8 once it will be fully build and packages provided by RDO. 4. https://review.rdoproject.org/etherpad/p/moving-rdo-to-centos8 ## CentOS 7 source This target will stay with Python 2.7 for now. Once CentOS 8 gets built we may move to it to get rid of py2. ## Debian binary Ignored for now. Would need to rebuild whole set of OpenStack packages from 'experimental' to 'buster'. ## Ubuntu binary Here we depend on UCA developers and will install whatever they use. # Kolla ansible Current version depends on Python 2. Typical "TypeError: cannot use a string pattern on a bytes-like object" issues need to be solved. From felix.huettner at mail.schwarz Tue Jun 4 08:38:36 2019 From: felix.huettner at mail.schwarz (=?utf-8?B?RmVsaXggSMO8dHRuZXI=?=) Date: Tue, 4 Jun 2019 08:38:36 +0000 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Hi Gael, we had a similar issue in the past. You could check the octiava healthmanager log (should be on the same node where the worker is running). This component monitors the status of the Amphorae and restarts them if they don’t trigger a callback after a specific time. This might also happen if there is some connection issue between the two components. But normally it should at least restart the LB with new Amphorae… Hope that helps Felix From: Gaël THEROND Sent: Tuesday, June 4, 2019 9:44 AM To: Openstack Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia Hi guys, I’ve a weird situation here. I smoothly operate a large scale multi-region Octavia service using the default amphora driver which imply the use of nova instances as loadbalancers. Everything is running really well and our customers (K8s and traditional users) are really happy with the solution so far. However, yesterday one of those customers using the loadbalancer in front of their ElasticSearch cluster poked me because this loadbalancer suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer available but yet the anchor/member/pool and listeners settings were still existing. So I investigated and found out that the loadbalancer amphoras have been destroyed by the octavia user. The weird part is, both the master and the backup instance have been destroyed at the same moment by the octavia service user. Is there specific circumstances where the octavia service could decide to delete the instances but not the anchor/members/pool ? It’s worrying me a bit as there is no clear way to trace why does Octavia did take this action. I digged within the nova and Octavia DB in order to correlate the action but except than validating my investigation it doesn’t really help as there are no clue of why the octavia service did trigger the deletion. If someone have any clue or tips to give me I’ll be more than happy to discuss this situation. Cheers guys! Hinweise zum Datenschutz finden Sie hier. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.therond at gmail.com Tue Jun 4 09:06:41 2019 From: gael.therond at gmail.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Tue, 4 Jun 2019 11:06:41 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Hi Felix, « Glad » you had the same issue before, and yes of course I looked at the HM logs which is were I actually found out that this event was triggered by octavia (Beside the DB data that validated that) here is my log trace related to this event, It doesn't really shows major issue IMHO. Here is the stacktrace that our octavia service archived for our both controllers servers, with the initial loadbalancer creation trace (Worker.log) and both controllers triggered task (Health-Manager.log). http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ I well may have miss something in it, but I don't see something strange on from my point of view. Feel free to tell me if you spot something weird. Le mar. 4 juin 2019 à 10:38, Felix Hüttner a écrit : > Hi Gael, > > > > we had a similar issue in the past. > > You could check the octiava healthmanager log (should be on the same node > where the worker is running). > > This component monitors the status of the Amphorae and restarts them if > they don’t trigger a callback after a specific time. This might also happen > if there is some connection issue between the two components. > > > > But normally it should at least restart the LB with new Amphorae… > > > > Hope that helps > > > > Felix > > > > *From:* Gaël THEROND > *Sent:* Tuesday, June 4, 2019 9:44 AM > *To:* Openstack > *Subject:* [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly > deleted by octavia > > > > Hi guys, > > > > I’ve a weird situation here. > > > > I smoothly operate a large scale multi-region Octavia service using the > default amphora driver which imply the use of nova instances as > loadbalancers. > > > > Everything is running really well and our customers (K8s and traditional > users) are really happy with the solution so far. > > > > However, yesterday one of those customers using the loadbalancer in front > of their ElasticSearch cluster poked me because this loadbalancer suddenly > passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer > available but yet the anchor/member/pool and listeners settings were still > existing. > > > > So I investigated and found out that the loadbalancer amphoras have been > destroyed by the octavia user. > > > > The weird part is, both the master and the backup instance have been > destroyed at the same moment by the octavia service user. > > > > Is there specific circumstances where the octavia service could decide to > delete the instances but not the anchor/members/pool ? > > > > It’s worrying me a bit as there is no clear way to trace why does Octavia > did take this action. > > > > I digged within the nova and Octavia DB in order to correlate the action > but except than validating my investigation it doesn’t really help as there > are no clue of why the octavia service did trigger the deletion. > > > > If someone have any clue or tips to give me I’ll be more than happy to > discuss this situation. > > > > Cheers guys! > Hinweise zum Datenschutz finden Sie hier . > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Tue Jun 4 10:23:57 2019 From: openstack at fried.cc (Eric Fried) Date: Tue, 4 Jun 2019 05:23:57 -0500 Subject: [nova] Spec review sprint Tuesday June 04 In-Reply-To: References: <52df6449-5d49-ee77-5309-90f2cd90283c@fried.cc> Message-ID: Reminder: This is happening. On 5/30/19 4:53 PM, Eric Fried wrote: > Here's a slightly tighter dashboard, filtering out specs with -W. 23 > total as of right now. > > https://review.opendev.org/#/q/project:openstack/nova-specs+status:open+path:%255Especs/train/approved/.*+NOT+label:Workflow-1 > > On 5/30/19 10:47 AM, Eric Fried wrote: >> Hi all. We would like to do a nova-specs review push next Tuesday, June 4th. >> >> If you own one or more specs, please try to polish them and address any >> outstanding downvotes before Tuesday; and on Tuesday, please try to be >> available in #openstack-nova (or paying close attention to gerrit) to >> discuss them if needed. >> >> If you are a nova reviewer, contributor, or stakeholder, please try to >> spend a good chunk of your upstream time on Tuesday reviewing open Train >> specs [1]. >> >> Thanks, >> efried >> >> [1] Approximately: >> https://review.opendev.org/#/q/project:openstack/nova-specs+status:open+path:%255Especs/train/approved/.* >> > From james.slagle at gmail.com Tue Jun 4 10:54:14 2019 From: james.slagle at gmail.com (James Slagle) Date: Tue, 4 Jun 2019 06:54:14 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019 at 1:51 PM Kevin Carter wrote: > > On Mon, Jun 3, 2019 at 12:08 PM Mohammed Naser wrote: >> >> On Mon, Jun 3, 2019 at 12:55 PM Kevin Carter wrote: >> > >> > Hello Stackers, >> > >> > I wanted to follow up on this post from last year, pick up from where it left off, and bring together a squad to get things moving. >> > >> > > http://lists.openstack.org/pipermail/openstack-dev/2018-August/133801.html >> > >> > The effort to convert tripleo Puppet and heat templates with embedded Ansible to a more consumable set of playbooks and roles is in full effect. As we're working through this effort we believe co-locating all of the Ansible tasks/roles/libraries/plugins throughout the code base into a single purpose-built repository will assist us in streamlining and simplifying. Structurally, at this time, most of tripleo will remain the same. However, the inclusion of tripleo-Ansible will allow us to create more focused solutions which are independently testable, much easier understand, and simple to include into the current heat template deployment methodologies. While a straight port of the existing Ansible tasks will not be entirely possible, the goal of this ongoing effort will be zero impact on our existing workflow and solutions. >> > >> > To reigniting this effort, I've put up a review to create a new "transformation" squad[0] geared toward building the structure around tripleo-ansible[1] and converting our current solutions into roles/playbooks/libraries/plugins. Initially, we'll be focused on our existing code base; however, long term, I believe it makes sense for this squad to work across projects to breakdown deployment barriers for folks using similar technologies. >> >> +1 >> >> > We're excited to get this effort rolling again and would love to work with anyone and everyone throughout the community. If folks are interested in this effort, please let us know. >> >> Most definitely in. We've had great success working with the TripleO team on >> integrating the Tempest role and on the OSA side, we'd be more than happy >> to help try and converge our roles to maintain them together. > > > ++ > >> >> If there's any meetings or anything that will be scheduled, I'd be >> happy to attend. >> > > its still very early but I expect to begin regular meetings (even if they're just impromptu IRC conversations to begin with) to work out what needs to be done and where we can begin collaborating with other folks. As soon as we have more I'll be sure to reach out here and on IRC. Organizing a squad and starting with IRC meetings sounds good to me, and I'll be participating in the work. Thanks for kicking off the conversation! -- -- James Slagle -- From smooney at redhat.com Tue Jun 4 10:59:40 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 04 Jun 2019 11:59:40 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Message-ID: On Mon, 2019-06-03 at 08:39 -0400, Mohammed Naser wrote: > On Mon, Jun 3, 2019 at 8:27 AM Slawomir Kaplonski wrote: > > > > Hi, > > > > > On 1 Jun 2019, at 20:49, Mohammed Naser wrote: > > > > > > On Sat, Jun 1, 2019 at 1:46 PM Slawomir Kaplonski wrote: > > > > > > > > Hi, > > > > > > > > I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it in > > > > separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something > > > > similar possible with OSA or will it be needed to contribute always every change to OSA repository? > > > > > > Not a dumb question at all. So, we do have this concept of 'roles' > > > which you _could_ kinda technically identify similar to plugins. > > > However, I think one of the things that would maybe come out of this > > > is the inability for projects to maintain their own plugins (because > > > now you can host neutron/devstack/plugins and you maintain that repo > > > yourself), under this structure, you would indeed have to make those > > > changes to the OpenStack Ansible Neutron role > > > > > > i.e.: https://opendev.org/openstack/openstack-ansible-os_neutron > > > > > > However, I think from an OSA perspective, we would be more than happy > > > to add project maintainers for specific projects to their appropriate > > > roles. It would make sense that there is someone from the Neutron > > > team that could be a core on os_neutron from example. > > > > Yes, that may work for official projects like Neutron. But what about everything else, like projects hosted now in > > opendev.org/x/ repositories? Devstack gives everyone easy way to integrate own plugin/driver/project with it and > > install it together with everything else by simply adding one line (usually) in local.conf file. > > I think that it may be a bit hard to OSA team to accept and review patches with new roles for every project or > > driver which isn’t official OpenStack project. > > You raise a really good concern. Indeed, we might have to change the workflow > from "write a plugin" to "write an Ansible role" to be able to test > your project with > DevStack at that page (or maintain both a "legacy" solution) with a new one. the real probalem with that is who is going to port all of the existing plugins. kolla-ansible has also tried to be a devstack replacement in the past via the introduction of dev-mode which clones the git repo of the dev mode project locally and bind mounts them into the container. the problem is it still breaks peoles plugins and workflow. some devstack feature that osa would need to support in order to be a replacement for me are. 1 the ablity to install all openstack project form git if needed including gerrit reviews. abiltiy to eailly specify gerrit reiews or commits for each project # here i am declaring the os-vif should be installed from git not pypi LIBS_FROM_GIT=os-vif # and here i am specifying that gerrit should be used as the source and # i am provide a gerrit/git refs branch for a specific un merged patch OS_VIF_REPO=https://git.openstack.org/openstack/os-vif OS_VIF_BRANCH=refs/changes/25/629025/9 # *_REPO can obvioulsy take anythign that is valid in a git clone command so # i can use a local repo too NEUTRON_REPO=file:///opt/repos/neutron # and *_BRANCH as the name implices works with branches, tag commits* and gerrit ref brances. NEUTRON_BRANCH=bug/1788009 the next thing that would be needed is a way to simply override any config value like this [[post-config|/etc/nova/nova.conf]] #[compute] #live_migration_wait_for_vif_plug=True [libvirt] live_migration_uri = qemu+ssh://root@%s/system #cpu_mode = host-passthrough virt_type = kvm cpu_mode = custom cpu_model = kvm64 im sure that osa can do that but i really can just provide any path to any file if needed. so no need to update a role or plugin to set values in files created by plugins which is the next thing. we enable plugins with a single line like this enable_plugin networking-ovs-dpdk https://github.com/openstack/networking-ovs-dpdk master meaning there is no need to preinstall or clone the repo. in theory the plugin should install all its dependeices and devstack will clone and execute the plugins based on the single line above. plugins however can also read any varable defiend in the local.conf as it will be set in the environment which means i can easily share an exact configuration with someone by shareing a local.conf. im not against improving or replacing devstack but with the devstack ansible roles and the fact we use devstack for all our testing in the gate it is actually has become one of the best openstack installer out there. we do not recommend people run it in production but with the ansible automation of grenade and the move to systemd for services there are less mainatined installers out there that devstack is proably a better foundation for a cloud to build on. people should still not use it in production but i can see why some might. > > > > > > > > Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which > > > > uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I > > > > don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep in > > > > mind. > > > > > > Indeed, with our current CI infrastructure with OSA, we have the > > > ability to create these dynamic scenarios (which can actually be > > > defined by a simple Zuul variable). > > > > > > https://github.com/openstack/openstack-ansible/blob/master/zuul.d/playbooks/pre-gate-scenario.yml#L41-L46 > > > > > > We do some really neat introspection of the project name being tested > > > in order to run specific scenarios. Therefore, that is something that > > > should be quite easy to accomplish simply by overriding a scenario > > > name within Zuul. It also is worth mentioning we now support full > > > metal deploys for a while now, so not having to worry about containers > > > is something to keep in mind as well (with simplifying the developer > > > experience again). > > > > > > > > On 1 Jun 2019, at 14:35, Mohammed Naser wrote: > > > > > > > > > > Hi everyone, > > > > > > > > > > This is something that I've discussed with a few people over time and > > > > > I think I'd probably want to bring it up by now. I'd like to propose > > > > > and ask if it makes sense to perhaps replace devstack entirely with > > > > > openstack-ansible. I think I have quite a few compelling reasons to > > > > > do this that I'd like to outline, as well as why I *feel* (and I could > > > > > be biased here, so call me out!) that OSA is the best option in terms > > > > > of a 'replacement' > > > > > > > > > > # Why not another deployment project? > > > > > I actually thought about this part too and considered this mainly for > > > > > ease of use for a *developer*. > > > > > > > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > > > > (which means that it has no build infrastructure, a developer can't > > > > > just get $commit checked out and deployed). > > > > > > > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > > > > beforehand, also, I feel they are much harder to use as a developer > > > > > because if you want to make quick edits and restart services, you have > > > > > to enter a container and make the edit there and somehow restart the > > > > > service without the container going back to it's original state. > > > > > Kolla-Ansible and the other combinations also suffer from the same > > > > > "issue". > > > > > > > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > > > > a virtualenv and installs packages inside of it. The services are > > > > > deployed as systemd units. This is very much similar to the current > > > > > state of devstack at the moment (minus the virtualenv part, afaik). > > > > > It makes it pretty straight forward to go and edit code if you > > > > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > > > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > > > > it much more easy to deploy on a wider variety of operating systems. > > > > > It also has the ability to use commits checked out from Zuul so all > > > > > the fancy Depends-On stuff we use works. > > > > > > > > > > # Why do we care about this, I like my bash scripts! > > > > > As someone who's been around for a *really* long time in OpenStack, > > > > > I've seen a whole lot of really weird issues surface from the usage of > > > > > DevStack to do CI gating. For example, one of the recent things is > > > > > the fact it relies on installing package-shipped noVNC, where as the > > > > > 'master' noVNC has actually changed behavior a few months back and it > > > > > is completely incompatible at this point (it's just a ticking thing > > > > > until we realize we're entirely broken). > > > > > > > > > > To this day, I still see people who want to POC something up with > > > > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > > > > how many warnings we'll put up, they'll always try to do it. With > > > > > this way, at least they'll have something that has the shape of an > > > > > actual real deployment. In addition, it would be *good* in the > > > > > overall scheme of things for a deployment system to test against, > > > > > because this would make sure things don't break in both ways. > > > > > > > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > > > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > > > > the playbooks directly from the executor. > > > > > > > > > > # So how could we do this? > > > > > The OpenStack Ansible project is made of many roles that are all > > > > > composable, therefore, you can think of it as a combination of both > > > > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > > > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > > > > integration of all of it in a distribution. OSA is currently both, > > > > > but it also includes both Ansible roles and playbooks. > > > > > > > > > > In order to make sure we maintain as much of backwards compatibility > > > > > as possible, we can simply run a small script which does a mapping of > > > > > devstack => OSA variables to make sure that the service is shipped > > > > > with all the necessary features as per local.conf. > > > > > > > > > > So the new process could be: > > > > > > > > > > 1) parse local.conf and generate Ansible variables files > > > > > 2) install Ansible (if not running in gate) > > > > > 3) run playbooks using variable generated in #1 > > > > > > > > > > The neat thing is after all of this, devstack just becomes a thin > > > > > wrapper around Ansible roles. I also think it brings a lot of hands > > > > > together, involving both the QA team and OSA team together, which I > > > > > believe that pooling our resources will greatly help in being able to > > > > > get more done and avoiding duplicating our efforts. > > > > > > > > > > # Conclusion > > > > > This is a start of a very open ended discussion, I'm sure there is a > > > > > lot of details involved here in the implementation that will surface, > > > > > but I think it could be a good step overall in simplifying our CI and > > > > > adding more coverage for real potential deployers. It will help two > > > > > teams unite together and have more resources for something (that > > > > > essentially is somewhat of duplicated effort at the moment). > > > > > > > > > > I will try to pick up sometime to POC a simple service being deployed > > > > > by an OSA role instead of Bash, placement which seems like a very > > > > > simple one and share that eventually. > > > > > > > > > > Thoughts? :) > > > > > > > > > > -- > > > > > Mohammed Naser — vexxhost > > > > > ----------------------------------------------------- > > > > > D. 514-316-8872 > > > > > D. 800-910-1726 ext. 200 > > > > > E. mnaser at vexxhost.com > > > > > W. http://vexxhost.com > > > > > > > > > > > > > — > > > > Slawek Kaplonski > > > > Senior software engineer > > > > Red Hat > > > > > > > > > > > > > -- > > > Mohammed Naser — vexxhost > > > ----------------------------------------------------- > > > D. 514-316-8872 > > > D. 800-910-1726 ext. 200 > > > E. mnaser at vexxhost.com > > > W. http://vexxhost.com > > > > — > > Slawek Kaplonski > > Senior software engineer > > Red Hat > > > > From smooney at redhat.com Tue Jun 4 11:13:31 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 04 Jun 2019 12:13:31 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> Message-ID: <85d1e5ef4070ad3b24a910adc12cf61308e85088.camel@redhat.com> On Tue, 2019-06-04 at 08:56 +0100, Sorin Sbarnea wrote: > I am in favour of ditching or at least refactoring devstack because during the last year I often found myself blocked > from fixing some zuul/jobs issues because the buggy code was still required by legacy devstack jobs that nobody had > time maintain or fix, so they were isolated and the default job configurations were forced to use dirty hack needed > for keeping these working. this sound like the issue is more realted to the fact that it is still useing a legacy job. why not move it over to the ansible native devstack jobs. > > One such example is that there is a task that does a "chmod -R 0777 -R" on the entire source tree, a total security > threat. in a ci env it is not. and in a development env if it was in devstack gate or in the ansible jobs it is not. i would not want this in a production system but it feels a little contived. > > In order to make other jobs running correctly* I had to rely undoing the damage done by such chmod because I was not > able to disable the historical hack. > > * ansible throws warning with unsafe file permissions > * ssh refuses to load unsafe keys > > That is why I am in favor of dropping features that are slowing down the progress of others. that is a self contracdicting statement. if i depend on a feature then droping it slows donw my progress. e.g. if you state that as a goal you will find you will almost always fail as to speed someone up you slow someone else down. what you want to aim for is a better solution that supports both usecase in a clean and defiend way. > > I know that the reality is more complicated but I also think that sometimes less* is more. > > > * deployment projects ;) > > > On 4 Jun 2019, at 04:36, Dean Troyer wrote: > > > > > > > > On Mon, 3 Jun 2019, 15:59 Clark Boylan, > wrote: > > On Sat, Jun 1, 2019, at 5:36 AM, Mohammed Naser wrote: > > > Hi everyone, > > > > > > This is something that I've discussed with a few people over time and > > > I think I'd probably want to bring it up by now. I'd like to propose > > > and ask if it makes sense to perhaps replace devstack entirely with > > > openstack-ansible. I think I have quite a few compelling reasons to > > > do this that I'd like to outline, as well as why I *feel* (and I could > > > be biased here, so call me out!) that OSA is the best option in terms > > > of a 'replacement' > > > > > > # Why not another deployment project? > > > I actually thought about this part too and considered this mainly for > > > ease of use for a *developer*. > > > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > > (which means that it has no build infrastructure, a developer can't > > > just get $commit checked out and deployed). > > > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > > beforehand, also, I feel they are much harder to use as a developer > > > because if you want to make quick edits and restart services, you have > > > to enter a container and make the edit there and somehow restart the > > > service without the container going back to it's original state. > > > Kolla-Ansible and the other combinations also suffer from the same > > > "issue". > > > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > > a virtualenv and installs packages inside of it. The services are > > > deployed as systemd units. This is very much similar to the current > > > state of devstack at the moment (minus the virtualenv part, afaik). > > > It makes it pretty straight forward to go and edit code if you > > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > > it much more easy to deploy on a wider variety of operating systems. > > > It also has the ability to use commits checked out from Zuul so all > > > the fancy Depends-On stuff we use works. > > > > > > # Why do we care about this, I like my bash scripts! > > > As someone who's been around for a *really* long time in OpenStack, > > > I've seen a whole lot of really weird issues surface from the usage of > > > DevStack to do CI gating. For example, one of the recent things is > > > the fact it relies on installing package-shipped noVNC, where as the > > > 'master' noVNC has actually changed behavior a few months back and it > > > is completely incompatible at this point (it's just a ticking thing > > > until we realize we're entirely broken). > > > > I'm not sure this is a great example case. We consume prebuilt software for many of our dependencies. Everything > > from the kernel to the database to rabbitmq to ovs (and so on) are consumed as prebuilt packages from our distros. > > In many cases this is desirable to ensure that our software work with the other software out there in the wild that > > people will be deploying with. > > > > > > > > To this day, I still see people who want to POC something up with > > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > > how many warnings we'll put up, they'll always try to do it. With > > > this way, at least they'll have something that has the shape of an > > > actual real deployment. In addition, it would be *good* in the > > > overall scheme of things for a deployment system to test against, > > > because this would make sure things don't break in both ways. > > > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > > the playbooks directly from the executor. > > > > I think if you have developers running a small wrapper locally to deploy this new development stack you should run > > that same wrapper in CI. This ensure the wrapper doesn't break. > > > > > > > > # So how could we do this? > > > The OpenStack Ansible project is made of many roles that are all > > > composable, therefore, you can think of it as a combination of both > > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > > integration of all of it in a distribution. OSA is currently both, > > > but it also includes both Ansible roles and playbooks. > > > > > > In order to make sure we maintain as much of backwards compatibility > > > as possible, we can simply run a small script which does a mapping of > > > devstack => OSA variables to make sure that the service is shipped > > > with all the necessary features as per local.conf. > > > > > > So the new process could be: > > > > > > 1) parse local.conf and generate Ansible variables files > > > 2) install Ansible (if not running in gate) > > > 3) run playbooks using variable generated in #1 > > > > > > The neat thing is after all of this, devstack just becomes a thin > > > wrapper around Ansible roles. I also think it brings a lot of hands > > > together, involving both the QA team and OSA team together, which I > > > believe that pooling our resources will greatly help in being able to > > > get more done and avoiding duplicating our efforts. > > > > > > # Conclusion > > > This is a start of a very open ended discussion, I'm sure there is a > > > lot of details involved here in the implementation that will surface, > > > but I think it could be a good step overall in simplifying our CI and > > > adding more coverage for real potential deployers. It will help two > > > teams unite together and have more resources for something (that > > > essentially is somewhat of duplicated effort at the moment). > > > > > > I will try to pick up sometime to POC a simple service being deployed > > > by an OSA role instead of Bash, placement which seems like a very > > > simple one and share that eventually. > > > > > > Thoughts? :) > > > > For me there are two major items to consider that haven't been brought up yet. The first is devstack's (lack of) > > speed. Any replacement should be at least as quick as the current tooling because the current tooling is slow enough > > already. > > > > This is important. We would need to see benchmark comparisons between a devstack install and an OSA install. Shell > > may be slow but Ansible is generally slower. That's fine in production when reliability is king, but we need fast > > iteration for development. > > > > I haven't looked under the covers of devstack for some time, but it previously installed all python deps in one > > place, whereas OSA has virtualenvs for each service which could take a while to build. Perhaps this is configurable. > > > > The other is logging. I spend a lot of time helping people to debug CI job runs and devstack has grown a fairly > > effective set of logging that just about any time I have to help debug another deployment tool's CI jobs I miss > > (because they tend to log only a tiny fraction of what devstack logs). > > > > Clark > > From anlin.kong at gmail.com Tue Jun 4 11:38:27 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Tue, 4 Jun 2019 23:38:27 +1200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Hi Gaël, We also met with the issue before which happened during the failover process, but I'm not sure your situation is the same with us. I just paste my previous investigation here, hope that will help. "With the Octavia version we have deployed in the production, the amphora record in the `amphora_health` table is deleted at the beginning of the failover process in order to disable the amphora health monitoring, while the amphora record in `amphora` table is marked as DELETED. On the other hand, the octavia-housekeeper service will delete the amphora record in `amphora` table if it doesn’t find its related record in `amphora_health` table which is always true during the current failover process. As a result, if the failover process fails, there will be no amphora records relating to the load balancer in the database." This patch is here https://review.opendev.org/#/q/Ief97ddda8261b5bbc54c6824f90ae9c7a2d81701, unfortunately, it has not been backported to Rocky. Best regards, Lingxian Kong Catalyst Cloud On Tue, Jun 4, 2019 at 9:13 PM Gaël THEROND wrote: > Hi Felix, > > « Glad » you had the same issue before, and yes of course I looked at the > HM logs which is were I actually found out that this event was triggered > by octavia (Beside the DB data that validated that) here is my log trace > related to this event, It doesn't really shows major issue IMHO. > > Here is the stacktrace that our octavia service archived for our both > controllers servers, with the initial loadbalancer creation trace > (Worker.log) and both controllers triggered task (Health-Manager.log). > > http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ > > I well may have miss something in it, but I don't see something strange on > from my point of view. > Feel free to tell me if you spot something weird. > > > Le mar. 4 juin 2019 à 10:38, Felix Hüttner > a écrit : > >> Hi Gael, >> >> >> >> we had a similar issue in the past. >> >> You could check the octiava healthmanager log (should be on the same node >> where the worker is running). >> >> This component monitors the status of the Amphorae and restarts them if >> they don’t trigger a callback after a specific time. This might also happen >> if there is some connection issue between the two components. >> >> >> >> But normally it should at least restart the LB with new Amphorae… >> >> >> >> Hope that helps >> >> >> >> Felix >> >> >> >> *From:* Gaël THEROND >> *Sent:* Tuesday, June 4, 2019 9:44 AM >> *To:* Openstack >> *Subject:* [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly >> deleted by octavia >> >> >> >> Hi guys, >> >> >> >> I’ve a weird situation here. >> >> >> >> I smoothly operate a large scale multi-region Octavia service using the >> default amphora driver which imply the use of nova instances as >> loadbalancers. >> >> >> >> Everything is running really well and our customers (K8s and traditional >> users) are really happy with the solution so far. >> >> >> >> However, yesterday one of those customers using the loadbalancer in front >> of their ElasticSearch cluster poked me because this loadbalancer suddenly >> passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer >> available but yet the anchor/member/pool and listeners settings were still >> existing. >> >> >> >> So I investigated and found out that the loadbalancer amphoras have been >> destroyed by the octavia user. >> >> >> >> The weird part is, both the master and the backup instance have been >> destroyed at the same moment by the octavia service user. >> >> >> >> Is there specific circumstances where the octavia service could decide to >> delete the instances but not the anchor/members/pool ? >> >> >> >> It’s worrying me a bit as there is no clear way to trace why does Octavia >> did take this action. >> >> >> >> I digged within the nova and Octavia DB in order to correlate the action >> but except than validating my investigation it doesn’t really help as there >> are no clue of why the octavia service did trigger the deletion. >> >> >> >> If someone have any clue or tips to give me I’ll be more than happy to >> discuss this situation. >> >> >> >> Cheers guys! >> Hinweise zum Datenschutz finden Sie hier >> . >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Tue Jun 4 11:43:38 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Tue, 4 Jun 2019 13:43:38 +0200 Subject: [neutron] QoS meeting 4.06.2019 cancelled Message-ID: <0F4ED17F-D081-4ED5-9AEE-4DC6B7EBD3E8@redhat.com> Hi, Both me and Rodolfo can’t chair today’s QoS meeting so lets cancel it. See You on next meeting in 2 weeks. — Slawek Kaplonski Senior software engineer Red Hat From edmondsw at us.ibm.com Tue Jun 4 12:00:35 2019 From: edmondsw at us.ibm.com (William M Edmonds - edmondsw@us.ibm.com) Date: Tue, 4 Jun 2019 12:00:35 +0000 Subject: [openstack-ansible][powervm] dropping support In-Reply-To: References: Message-ID: <2C11DAD5-1ED6-409B-9374-0CB86059E5E2@us.ibm.com> On 5/31/19, 6:46 PM, "Mohammed Naser" wrote: > > Hi everyone, > > I've pushed up a patch to propose dropping support for PowerVM support > inside OpenStack Ansible. There has been no work done on this for a > few years now, the configured compute driver is the incorrect one for > ~2 years now which indicates that no one has been able to use it for > that long. > > It would be nice to have this driver however given the infrastructure > we have upstream, there would be no way for us to effectively test it > and bring it back to functional state. I'm proposing that we remove > the code here: > https://review.opendev.org/#/c/662587 powervm: drop support > > If you're using this code and would like to contribute to fixing it > and (somehow) adding coverage, please reach out, otherwise, we'll drop > this code to clean things up. Sadly, I don't know of anyone using it or willing to maintain it at this time. From doug at doughellmann.com Tue Jun 4 12:39:44 2019 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 04 Jun 2019 08:39:44 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Message-ID: Sean Mooney writes: > On Mon, 2019-06-03 at 08:39 -0400, Mohammed Naser wrote: >> On Mon, Jun 3, 2019 at 8:27 AM Slawomir Kaplonski wrote: >> > >> > Hi, >> > >> > > On 1 Jun 2019, at 20:49, Mohammed Naser wrote: >> > > >> > > On Sat, Jun 1, 2019 at 1:46 PM Slawomir Kaplonski wrote: >> > > > >> > > > Hi, >> > > > >> > > > I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it in >> > > > separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something >> > > > similar possible with OSA or will it be needed to contribute always every change to OSA repository? >> > > >> > > Not a dumb question at all. So, we do have this concept of 'roles' >> > > which you _could_ kinda technically identify similar to plugins. >> > > However, I think one of the things that would maybe come out of this >> > > is the inability for projects to maintain their own plugins (because >> > > now you can host neutron/devstack/plugins and you maintain that repo >> > > yourself), under this structure, you would indeed have to make those >> > > changes to the OpenStack Ansible Neutron role >> > > >> > > i.e.: https://opendev.org/openstack/openstack-ansible-os_neutron >> > > >> > > However, I think from an OSA perspective, we would be more than happy >> > > to add project maintainers for specific projects to their appropriate >> > > roles. It would make sense that there is someone from the Neutron >> > > team that could be a core on os_neutron from example. >> > >> > Yes, that may work for official projects like Neutron. But what about everything else, like projects hosted now in >> > opendev.org/x/ repositories? Devstack gives everyone easy way to integrate own plugin/driver/project with it and >> > install it together with everything else by simply adding one line (usually) in local.conf file. >> > I think that it may be a bit hard to OSA team to accept and review patches with new roles for every project or >> > driver which isn’t official OpenStack project. >> >> You raise a really good concern. Indeed, we might have to change the workflow >> from "write a plugin" to "write an Ansible role" to be able to test >> your project with >> DevStack at that page (or maintain both a "legacy" solution) with a new one. > the real probalem with that is who is going to port all of the > existing plugins. Do all projects and all jobs have to be converted at once? Or ever? How much complexity do those plugins actually contain? Would they be fairly straightforward to convert? Could we build a "devstack plugin wrapper" for OSA? Could we run OSA and then run devstack with just the plugin(s) needed for a given job? Is there enough appeal in the idea of replacing devstack with something closer to what is used for production deployments to drive us to find an iterative approach that doesn't require changing everything at one time? Or are we stuck with devstack forever? > kolla-ansible has also tried to be a devstack replacement in the past via the introduction > of dev-mode which clones the git repo of the dev mode project locally and bind mounts them into the container. > the problem is it still breaks peoles plugins and workflow. > > > some devstack feature that osa would need to support in order to be a > replacement for me are. You've made a good start on a requirements list for a devstack replacement. Perhaps a first step would be for some of the folks who support this idea to compile a more complete list of those requirements, and then we could analyze OSA to see how it might need to be changed or whether it makes sense to use OSA as the basis for a new toolset that takes on some of the "dev" features we might not want in a "production" deployment tool. Here's another potential gap for whoever is going to make that list: devstack pre-populates the environment with some data for things like flavors and images. I don't imagine OSA does that or, if it does, that they are an exact match. How do we change those settings? That leads to a good second step: Do the rest of the analysis to understand what it would take to set up a base job like we have for devstack, that produces a similar setup. Not necessarily identical, but similar enough to be able to run tempest. It seems likely that already exists in some form for testing OSA itself. Could a developer run that on a local system (clearly being able to build the test environment locally is a requirement for replacing devstack)? After that, I would want to see answers to some of the questions about dealing with plugins that I posed above. And only then, I think, could I provide an answer to the question of whether we should make the change or not. > 1 the ablity to install all openstack project form git if needed including gerrit reviews. > > abiltiy to eailly specify gerrit reiews or commits for each project > > # here i am declaring the os-vif should be installed from git not pypi > LIBS_FROM_GIT=os-vif > > # and here i am specifying that gerrit should be used as the source and > # i am provide a gerrit/git refs branch for a specific un merged patch > OS_VIF_REPO=https://git.openstack.org/openstack/os-vif > OS_VIF_BRANCH=refs/changes/25/629025/9 > > # *_REPO can obvioulsy take anythign that is valid in a git clone command so > # i can use a local repo too > NEUTRON_REPO=file:///opt/repos/neutron > # and *_BRANCH as the name implices works with branches, tag commits* and gerrit ref brances. > NEUTRON_BRANCH=bug/1788009 > > > the next thing that would be needed is a way to simply override any config value like this > > [[post-config|/etc/nova/nova.conf]] > #[compute] > #live_migration_wait_for_vif_plug=True > [libvirt] > live_migration_uri = qemu+ssh://root@%s/system > #cpu_mode = host-passthrough > virt_type = kvm > cpu_mode = custom > cpu_model = kvm64 > > im sure that osa can do that but i really can just provide any path to any file if needed. > so no need to update a role or plugin to set values in files created > by plugins which is the next thing. Does OSA need to support *every* configuration value? Or could it deploy a stack, and then rely on a separate tool to modify config values and restart a service? Clearly some values need to be there when the cloud first starts, but do they all? > we enable plugins with a single line like this > > enable_plugin networking-ovs-dpdk https://github.com/openstack/networking-ovs-dpdk master > > meaning there is no need to preinstall or clone the repo. in theory the plugin should install all its dependeices > and devstack will clone and execute the plugins based on the single > line above. plugins however can also This makes me think it might be most appropriate to be considering a tool that replaces devstack by wrapping OSA, rather than *being* OSA. Maybe that's just an extra playbook that runs before OSA, or maybe it's a simpler bash script that does some setup before invoking OSA. > read any varable defiend in the local.conf as it will be set in the environment which means i can easily share > an exact configuration with someone by shareing a local.conf. > > > im not against improving or replacing devstack but with the devstack ansible roles and the fact we use devstack > for all our testing in the gate it is actually has become one of the best openstack installer out there. we do > not recommend people run it in production but with the ansible automation of grenade and the move to systemd for > services there are less mainatined installers out there that devstack is proably a better foundation for a cloud > to build on. people should still not use it in production but i can see why some might. > >> >> > > >> > > > Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which >> > > > uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I >> > > > don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep in >> > > > mind. >> > > >> > > Indeed, with our current CI infrastructure with OSA, we have the >> > > ability to create these dynamic scenarios (which can actually be >> > > defined by a simple Zuul variable). >> > > >> > > https://github.com/openstack/openstack-ansible/blob/master/zuul.d/playbooks/pre-gate-scenario.yml#L41-L46 >> > > >> > > We do some really neat introspection of the project name being tested >> > > in order to run specific scenarios. Therefore, that is something that >> > > should be quite easy to accomplish simply by overriding a scenario >> > > name within Zuul. It also is worth mentioning we now support full >> > > metal deploys for a while now, so not having to worry about containers >> > > is something to keep in mind as well (with simplifying the developer >> > > experience again). >> > > >> > > > > On 1 Jun 2019, at 14:35, Mohammed Naser wrote: >> > > > > >> > > > > Hi everyone, >> > > > > >> > > > > This is something that I've discussed with a few people over time and >> > > > > I think I'd probably want to bring it up by now. I'd like to propose >> > > > > and ask if it makes sense to perhaps replace devstack entirely with >> > > > > openstack-ansible. I think I have quite a few compelling reasons to >> > > > > do this that I'd like to outline, as well as why I *feel* (and I could >> > > > > be biased here, so call me out!) that OSA is the best option in terms >> > > > > of a 'replacement' >> > > > > >> > > > > # Why not another deployment project? >> > > > > I actually thought about this part too and considered this mainly for >> > > > > ease of use for a *developer*. >> > > > > >> > > > > At this point, Puppet-OpenStack pretty much only deploys packages >> > > > > (which means that it has no build infrastructure, a developer can't >> > > > > just get $commit checked out and deployed). >> > > > > >> > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built >> > > > > beforehand, also, I feel they are much harder to use as a developer >> > > > > because if you want to make quick edits and restart services, you have >> > > > > to enter a container and make the edit there and somehow restart the >> > > > > service without the container going back to it's original state. >> > > > > Kolla-Ansible and the other combinations also suffer from the same >> > > > > "issue". >> > > > > >> > > > > OpenStack Ansible is unique in the way that it pretty much just builds >> > > > > a virtualenv and installs packages inside of it. The services are >> > > > > deployed as systemd units. This is very much similar to the current >> > > > > state of devstack at the moment (minus the virtualenv part, afaik). >> > > > > It makes it pretty straight forward to go and edit code if you >> > > > > need/have to. We also have support for Debian, CentOS, Ubuntu and >> > > > > SUSE. This allows "devstack 2.0" to have far more coverage and make >> > > > > it much more easy to deploy on a wider variety of operating systems. >> > > > > It also has the ability to use commits checked out from Zuul so all >> > > > > the fancy Depends-On stuff we use works. >> > > > > >> > > > > # Why do we care about this, I like my bash scripts! >> > > > > As someone who's been around for a *really* long time in OpenStack, >> > > > > I've seen a whole lot of really weird issues surface from the usage of >> > > > > DevStack to do CI gating. For example, one of the recent things is >> > > > > the fact it relies on installing package-shipped noVNC, where as the >> > > > > 'master' noVNC has actually changed behavior a few months back and it >> > > > > is completely incompatible at this point (it's just a ticking thing >> > > > > until we realize we're entirely broken). >> > > > > >> > > > > To this day, I still see people who want to POC something up with >> > > > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter >> > > > > how many warnings we'll put up, they'll always try to do it. With >> > > > > this way, at least they'll have something that has the shape of an >> > > > > actual real deployment. In addition, it would be *good* in the >> > > > > overall scheme of things for a deployment system to test against, >> > > > > because this would make sure things don't break in both ways. >> > > > > >> > > > > Also: we run Zuul for our CI which supports Ansible natively, this can >> > > > > remove one layer of indirection (Zuul to run Bash) and have Zuul run >> > > > > the playbooks directly from the executor. >> > > > > >> > > > > # So how could we do this? >> > > > > The OpenStack Ansible project is made of many roles that are all >> > > > > composable, therefore, you can think of it as a combination of both >> > > > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained >> > > > > the base modules (i.e. puppet-nova, etc) and TripleO was the >> > > > > integration of all of it in a distribution. OSA is currently both, >> > > > > but it also includes both Ansible roles and playbooks. >> > > > > >> > > > > In order to make sure we maintain as much of backwards compatibility >> > > > > as possible, we can simply run a small script which does a mapping of >> > > > > devstack => OSA variables to make sure that the service is shipped >> > > > > with all the necessary features as per local.conf. >> > > > > >> > > > > So the new process could be: >> > > > > >> > > > > 1) parse local.conf and generate Ansible variables files >> > > > > 2) install Ansible (if not running in gate) >> > > > > 3) run playbooks using variable generated in #1 >> > > > > >> > > > > The neat thing is after all of this, devstack just becomes a thin >> > > > > wrapper around Ansible roles. I also think it brings a lot of hands >> > > > > together, involving both the QA team and OSA team together, which I >> > > > > believe that pooling our resources will greatly help in being able to >> > > > > get more done and avoiding duplicating our efforts. >> > > > > >> > > > > # Conclusion >> > > > > This is a start of a very open ended discussion, I'm sure there is a >> > > > > lot of details involved here in the implementation that will surface, >> > > > > but I think it could be a good step overall in simplifying our CI and >> > > > > adding more coverage for real potential deployers. It will help two >> > > > > teams unite together and have more resources for something (that >> > > > > essentially is somewhat of duplicated effort at the moment). >> > > > > >> > > > > I will try to pick up sometime to POC a simple service being deployed >> > > > > by an OSA role instead of Bash, placement which seems like a very >> > > > > simple one and share that eventually. >> > > > > >> > > > > Thoughts? :) >> > > > > >> > > > > -- >> > > > > Mohammed Naser — vexxhost >> > > > > ----------------------------------------------------- >> > > > > D. 514-316-8872 >> > > > > D. 800-910-1726 ext. 200 >> > > > > E. mnaser at vexxhost.com >> > > > > W. http://vexxhost.com >> > > > > >> > > > >> > > > — >> > > > Slawek Kaplonski >> > > > Senior software engineer >> > > > Red Hat >> > > > >> > > >> > > >> > > -- >> > > Mohammed Naser — vexxhost >> > > ----------------------------------------------------- >> > > D. 514-316-8872 >> > > D. 800-910-1726 ext. 200 >> > > E. mnaser at vexxhost.com >> > > W. http://vexxhost.com >> > >> > — >> > Slawek Kaplonski >> > Senior software engineer >> > Red Hat >> > >> >> > > -- Doug From dmendiza at redhat.com Tue Jun 4 12:45:36 2019 From: dmendiza at redhat.com (=?UTF-8?Q?Douglas_Mendiz=c3=a1bal?=) Date: Tue, 4 Jun 2019 07:45:36 -0500 Subject: [nova][cinder][glance][Barbican]Finding Timeslot for weekly Image Encryption IRC meeting In-Reply-To: <798dc164-1ed3-10f3-6de2-e902ae269869@secustack.com> References: <798dc164-1ed3-10f3-6de2-e902ae269869@secustack.com> Message-ID: <6dd8e9e3-6959-01c7-46ac-9f3df472b973@redhat.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi Josephine, Thank you for organizing this. Do you still need more responses before scheduling the first meeting? It appears that Monday 1300 UTC is the best time slot for everyone so far. On a related note, we have the Secret Consumers spec up for review for Barbican: https://review.opendev.org/#/c/662013/ Regards, - - Douglas Mendizábal (redrobot) On 5/13/19 7:19 AM, Josephine Seifert wrote: > Just re-raising this :) > > Please vote, if you would like to participate: > https://doodle.com/poll/wtg9ha3e5dvym6yt > > Am 04.05.19 um 20:57 schrieb Josephine Seifert: >> Hello, >> >> as a result from the Summit and the PTG, I would like to hold a >> weekly IRC-meeting for the Image Encryption (soon to be a pop-up >> team). >> >> As I work in Europe I have made a doodle poll, with timeslots I >> can attend and hopefully many of you. If you would like to join >> in a weekly meeting, please fill out the poll and state your name >> and the project you are working in: >> https://doodle.com/poll/wtg9ha3e5dvym6yt >> >> Thank you Josephine (Luzi) >> >> >> > -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEwcapj5oGTj2zd3XogB6WFOq/OrcFAlz2Z7kACgkQgB6WFOq/ OreV5Q//Qf/kS4fBFJjzm2wK6NBeXPbdiLU+7RkKTE4RIePl6VmDnKnQbyIWSSMG oP7Ey9zIE52wdsSYNQwZ7eJfk2WQ3NojEtNoPkQspwMl66qlVWKCikBcJwyDFNWo qo5oD8fTQHm9EmUfGn0npYxyPaBRDiPAFJ4I9MakT6Vx5ChgXj9PStzdRZIOevfQ ezaT+j1ZziheTg6LClSxPE4jeOjTiTU4CupmDf70mqv6PRkq/1J82Nz9ZoLPgod0 lX8EJ15LXGnfUykP/GXZ56rVhkHxYSkK3TiQ26g/b90X3NBUVVAn2VdUhrEwnWXd i7U0lKFq7NMa6dlnU3g6VCQIT+oC7Hx173io+Bx6UjTrYPXur3cgApfLBufLM94S mvVWwcwXz7izf30fxZxa8E9cu1ZigILyp90UNGHLAPX0oNSdOrelnYmdhRoVv90+ IlfojnPG/GjCqAbimcMLL0wRRK946j8S/naa+32fTPTUrz/L/poCdi4x3gJzUQ9f x4Au96O1IoWEWChKsUID6su6kVfHfKH0U+6UuneDiYE3DBDdy+vUJlM6etoAes17 5fTgmk8tNPMbcmgZ9ajmh5iwZuooc+FSOgnE5cZt4U6UyY2k1Sr0n878sPgeiMH+ nHavQ8EEEGQ9jf+PDlyOc6yawrm28nyFGMyvH8LjLyN/7lZJwGI= =SWA4 -----END PGP SIGNATURE----- From gael.therond at gmail.com Tue Jun 4 13:03:20 2019 From: gael.therond at gmail.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Tue, 4 Jun 2019 15:03:20 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Hi Lingxian Kong, That’s actually very interesting as I’ve come to the same conclusion this morning during my investigation and was starting to think about a fix, which it seems you already made! Is there a reason why it didn’t was backported to rocky? Very helpful, many many thanks to you you clearly spare me hours of works! I’ll get a review of your patch and test it on our lab. Le mar. 4 juin 2019 à 11:06, Gaël THEROND a écrit : > Hi Felix, > > « Glad » you had the same issue before, and yes of course I looked at the > HM logs which is were I actually found out that this event was triggered > by octavia (Beside the DB data that validated that) here is my log trace > related to this event, It doesn't really shows major issue IMHO. > > Here is the stacktrace that our octavia service archived for our both > controllers servers, with the initial loadbalancer creation trace > (Worker.log) and both controllers triggered task (Health-Manager.log). > > http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ > > I well may have miss something in it, but I don't see something strange on > from my point of view. > Feel free to tell me if you spot something weird. > > > Le mar. 4 juin 2019 à 10:38, Felix Hüttner > a écrit : > >> Hi Gael, >> >> >> >> we had a similar issue in the past. >> >> You could check the octiava healthmanager log (should be on the same node >> where the worker is running). >> >> This component monitors the status of the Amphorae and restarts them if >> they don’t trigger a callback after a specific time. This might also happen >> if there is some connection issue between the two components. >> >> >> >> But normally it should at least restart the LB with new Amphorae… >> >> >> >> Hope that helps >> >> >> >> Felix >> >> >> >> *From:* Gaël THEROND >> *Sent:* Tuesday, June 4, 2019 9:44 AM >> *To:* Openstack >> *Subject:* [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly >> deleted by octavia >> >> >> >> Hi guys, >> >> >> >> I’ve a weird situation here. >> >> >> >> I smoothly operate a large scale multi-region Octavia service using the >> default amphora driver which imply the use of nova instances as >> loadbalancers. >> >> >> >> Everything is running really well and our customers (K8s and traditional >> users) are really happy with the solution so far. >> >> >> >> However, yesterday one of those customers using the loadbalancer in front >> of their ElasticSearch cluster poked me because this loadbalancer suddenly >> passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer >> available but yet the anchor/member/pool and listeners settings were still >> existing. >> >> >> >> So I investigated and found out that the loadbalancer amphoras have been >> destroyed by the octavia user. >> >> >> >> The weird part is, both the master and the backup instance have been >> destroyed at the same moment by the octavia service user. >> >> >> >> Is there specific circumstances where the octavia service could decide to >> delete the instances but not the anchor/members/pool ? >> >> >> >> It’s worrying me a bit as there is no clear way to trace why does Octavia >> did take this action. >> >> >> >> I digged within the nova and Octavia DB in order to correlate the action >> but except than validating my investigation it doesn’t really help as there >> are no clue of why the octavia service did trigger the deletion. >> >> >> >> If someone have any clue or tips to give me I’ll be more than happy to >> discuss this situation. >> >> >> >> Cheers guys! >> Hinweise zum Datenschutz finden Sie hier >> . >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Tue Jun 4 13:12:09 2019 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 4 Jun 2019 09:12:09 -0400 Subject: [ops] no openstack ops meetups team meeting today Message-ID: We're skipping the meetups team meeting on IRC today - currently in good shape and some of us are busy with other things. Will try to have both a meeting and some progress to report next week. Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgoncalves at redhat.com Tue Jun 4 13:16:47 2019 From: cgoncalves at redhat.com (Carlos Goncalves) Date: Tue, 4 Jun 2019 15:16:47 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND wrote: > > Hi Lingxian Kong, > > That’s actually very interesting as I’ve come to the same conclusion this morning during my investigation and was starting to think about a fix, which it seems you already made! > > Is there a reason why it didn’t was backported to rocky? The patch was merged in master branch during Rocky development cycle, hence included in stable/rocky as well. > > Very helpful, many many thanks to you you clearly spare me hours of works! I’ll get a review of your patch and test it on our lab. > > Le mar. 4 juin 2019 à 11:06, Gaël THEROND a écrit : >> >> Hi Felix, >> >> « Glad » you had the same issue before, and yes of course I looked at the HM logs which is were I actually found out that this event was triggered by octavia (Beside the DB data that validated that) here is my log trace related to this event, It doesn't really shows major issue IMHO. >> >> Here is the stacktrace that our octavia service archived for our both controllers servers, with the initial loadbalancer creation trace (Worker.log) and both controllers triggered task (Health-Manager.log). >> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ >> >> I well may have miss something in it, but I don't see something strange on from my point of view. >> Feel free to tell me if you spot something weird. >> >> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner a écrit : >>> >>> Hi Gael, >>> >>> >>> >>> we had a similar issue in the past. >>> >>> You could check the octiava healthmanager log (should be on the same node where the worker is running). >>> >>> This component monitors the status of the Amphorae and restarts them if they don’t trigger a callback after a specific time. This might also happen if there is some connection issue between the two components. >>> >>> >>> >>> But normally it should at least restart the LB with new Amphorae… >>> >>> >>> >>> Hope that helps >>> >>> >>> >>> Felix >>> >>> >>> >>> From: Gaël THEROND >>> Sent: Tuesday, June 4, 2019 9:44 AM >>> To: Openstack >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia >>> >>> >>> >>> Hi guys, >>> >>> >>> >>> I’ve a weird situation here. >>> >>> >>> >>> I smoothly operate a large scale multi-region Octavia service using the default amphora driver which imply the use of nova instances as loadbalancers. >>> >>> >>> >>> Everything is running really well and our customers (K8s and traditional users) are really happy with the solution so far. >>> >>> >>> >>> However, yesterday one of those customers using the loadbalancer in front of their ElasticSearch cluster poked me because this loadbalancer suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer available but yet the anchor/member/pool and listeners settings were still existing. >>> >>> >>> >>> So I investigated and found out that the loadbalancer amphoras have been destroyed by the octavia user. >>> >>> >>> >>> The weird part is, both the master and the backup instance have been destroyed at the same moment by the octavia service user. >>> >>> >>> >>> Is there specific circumstances where the octavia service could decide to delete the instances but not the anchor/members/pool ? >>> >>> >>> >>> It’s worrying me a bit as there is no clear way to trace why does Octavia did take this action. >>> >>> >>> >>> I digged within the nova and Octavia DB in order to correlate the action but except than validating my investigation it doesn’t really help as there are no clue of why the octavia service did trigger the deletion. >>> >>> >>> >>> If someone have any clue or tips to give me I’ll be more than happy to discuss this situation. >>> >>> >>> >>> Cheers guys! >>> >>> Hinweise zum Datenschutz finden Sie hier. From gael.therond at gmail.com Tue Jun 4 13:19:58 2019 From: gael.therond at gmail.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Tue, 4 Jun 2019 15:19:58 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Oh, that's perfect so, I'll just update my image and my platform as we're using kolla-ansible and that's super easy. You guys rocks!! (Pun intended ;-)). Many many thanks to all of you, that will real back me a lot regarding the Octavia solidity and Kolla flexibility actually ^^. Le mar. 4 juin 2019 à 15:17, Carlos Goncalves a écrit : > On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND > wrote: > > > > Hi Lingxian Kong, > > > > That’s actually very interesting as I’ve come to the same conclusion > this morning during my investigation and was starting to think about a fix, > which it seems you already made! > > > > Is there a reason why it didn’t was backported to rocky? > > The patch was merged in master branch during Rocky development cycle, > hence included in stable/rocky as well. > > > > > Very helpful, many many thanks to you you clearly spare me hours of > works! I’ll get a review of your patch and test it on our lab. > > > > Le mar. 4 juin 2019 à 11:06, Gaël THEROND a > écrit : > >> > >> Hi Felix, > >> > >> « Glad » you had the same issue before, and yes of course I looked at > the HM logs which is were I actually found out that this event was > triggered by octavia (Beside the DB data that validated that) here is my > log trace related to this event, It doesn't really shows major issue IMHO. > >> > >> Here is the stacktrace that our octavia service archived for our both > controllers servers, with the initial loadbalancer creation trace > (Worker.log) and both controllers triggered task (Health-Manager.log). > >> > >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ > >> > >> I well may have miss something in it, but I don't see something strange > on from my point of view. > >> Feel free to tell me if you spot something weird. > >> > >> > >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner > a écrit : > >>> > >>> Hi Gael, > >>> > >>> > >>> > >>> we had a similar issue in the past. > >>> > >>> You could check the octiava healthmanager log (should be on the same > node where the worker is running). > >>> > >>> This component monitors the status of the Amphorae and restarts them > if they don’t trigger a callback after a specific time. This might also > happen if there is some connection issue between the two components. > >>> > >>> > >>> > >>> But normally it should at least restart the LB with new Amphorae… > >>> > >>> > >>> > >>> Hope that helps > >>> > >>> > >>> > >>> Felix > >>> > >>> > >>> > >>> From: Gaël THEROND > >>> Sent: Tuesday, June 4, 2019 9:44 AM > >>> To: Openstack > >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly > deleted by octavia > >>> > >>> > >>> > >>> Hi guys, > >>> > >>> > >>> > >>> I’ve a weird situation here. > >>> > >>> > >>> > >>> I smoothly operate a large scale multi-region Octavia service using > the default amphora driver which imply the use of nova instances as > loadbalancers. > >>> > >>> > >>> > >>> Everything is running really well and our customers (K8s and > traditional users) are really happy with the solution so far. > >>> > >>> > >>> > >>> However, yesterday one of those customers using the loadbalancer in > front of their ElasticSearch cluster poked me because this loadbalancer > suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were > no longer available but yet the anchor/member/pool and listeners settings > were still existing. > >>> > >>> > >>> > >>> So I investigated and found out that the loadbalancer amphoras have > been destroyed by the octavia user. > >>> > >>> > >>> > >>> The weird part is, both the master and the backup instance have been > destroyed at the same moment by the octavia service user. > >>> > >>> > >>> > >>> Is there specific circumstances where the octavia service could decide > to delete the instances but not the anchor/members/pool ? > >>> > >>> > >>> > >>> It’s worrying me a bit as there is no clear way to trace why does > Octavia did take this action. > >>> > >>> > >>> > >>> I digged within the nova and Octavia DB in order to correlate the > action but except than validating my investigation it doesn’t really help > as there are no clue of why the octavia service did trigger the deletion. > >>> > >>> > >>> > >>> If someone have any clue or tips to give me I’ll be more than happy to > discuss this situation. > >>> > >>> > >>> > >>> Cheers guys! > >>> > >>> Hinweise zum Datenschutz finden Sie hier. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Tue Jun 4 14:02:21 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 04 Jun 2019 15:02:21 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Message-ID: <42a6e49abc54ab460c0a71529957e362f6d77eae.camel@redhat.com> On Tue, 2019-06-04 at 08:39 -0400, Doug Hellmann wrote: > Sean Mooney writes: > > > On Mon, 2019-06-03 at 08:39 -0400, Mohammed Naser wrote: > > > On Mon, Jun 3, 2019 at 8:27 AM Slawomir Kaplonski wrote: > > > > > > > > Hi, > > > > > > > > > On 1 Jun 2019, at 20:49, Mohammed Naser wrote: > > > > > > > > > > On Sat, Jun 1, 2019 at 1:46 PM Slawomir Kaplonski wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it > > > > > > in > > > > > > separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something > > > > > > similar possible with OSA or will it be needed to contribute always every change to OSA repository? > > > > > > > > > > Not a dumb question at all. So, we do have this concept of 'roles' > > > > > which you _could_ kinda technically identify similar to plugins. > > > > > However, I think one of the things that would maybe come out of this > > > > > is the inability for projects to maintain their own plugins (because > > > > > now you can host neutron/devstack/plugins and you maintain that repo > > > > > yourself), under this structure, you would indeed have to make those > > > > > changes to the OpenStack Ansible Neutron role > > > > > > > > > > i.e.: https://opendev.org/openstack/openstack-ansible-os_neutron > > > > > > > > > > However, I think from an OSA perspective, we would be more than happy > > > > > to add project maintainers for specific projects to their appropriate > > > > > roles. It would make sense that there is someone from the Neutron > > > > > team that could be a core on os_neutron from example. > > > > > > > > Yes, that may work for official projects like Neutron. But what about everything else, like projects hosted now > > > > in > > > > opendev.org/x/ repositories? Devstack gives everyone easy way to integrate own plugin/driver/project with it and > > > > install it together with everything else by simply adding one line (usually) in local.conf file. > > > > I think that it may be a bit hard to OSA team to accept and review patches with new roles for every project or > > > > driver which isn’t official OpenStack project. > > > > > > You raise a really good concern. Indeed, we might have to change the workflow > > > from "write a plugin" to "write an Ansible role" to be able to test > > > your project with > > > DevStack at that page (or maintain both a "legacy" solution) with a new one. > > > > the real probalem with that is who is going to port all of the > > existing plugins. > > Do all projects and all jobs have to be converted at once? Or ever? > > How much complexity do those plugins actually contain? Would they be > fairly straightforward to convert? that depends. some jsut add support for indivigual projects. others install infrastructure services like ceph or kubernetes which will be used by openstack services. others download and compiles c projects form source like networking-ovs-dpdk. the neutron devstack pluging also used to compiles ovs form source to work around some distro bugs and networking-ovn i belive also can? do the same. a devstack plugin allows all of the above to be done trivally. > > Could we build a "devstack plugin wrapper" for OSA? Could we run OSA and > then run devstack with just the plugin(s) needed for a given job? that would likely be possible. im sure we could generate local.conf form osa's inventories and and run the plugsins after osa runs. devstack always runs it in tree code in each phase and then runs the plugins in the order they are enabled in each phase https://docs.openstack.org/devstack/latest/plugins.html networking-ovs-dpdk for example replaces the _neutron_ovs_base_install_agent_packages function https://github.com/openstack/networking-ovs-dpdk/blob/master/devstack/libs/ovs-dpdk#L11-L16 with a noop and then in the install pahse we install ovs-dpdk form souce. _neutron_ovs_base_install_agent_packages just install kernel ovs but we replace it as our patches to make it condtional in devstack were rejected. its not nessiarily a patteren i encurage but if you have to you can replace any functionality that devstack provides via a plugin although most usecase relly dont requrie that. > > Is there enough appeal in the idea of replacing devstack with something > closer to what is used for production deployments to drive us to find an > iterative approach that doesn't require changing everything at one time? > Or are we stuck with devstack forever? > > > kolla-ansible has also tried to be a devstack replacement in the past via the introduction > > of dev-mode which clones the git repo of the dev mode project locally and bind mounts them into the container. > > the problem is it still breaks peoles plugins and workflow. > > > > > > some devstack feature that osa would need to support in order to be a > > replacement for me are. > > You've made a good start on a requirements list for a devstack > replacement. Perhaps a first step would be for some of the folks who > support this idea to compile a more complete list of those requirements, > and then we could analyze OSA to see how it might need to be changed or > whether it makes sense to use OSA as the basis for a new toolset that > takes on some of the "dev" features we might not want in a "production" > deployment tool. > > Here's another potential gap for whoever is going to make that list: > devstack pre-populates the environment with some data for things like > flavors and images. I don't imagine OSA does that or, if it does, that > they are an exact match. How do we change those settings? +1 yes this is somehting i forgot about > > That leads to a good second step: Do the rest of the analysis to > understand what it would take to set up a base job like we have for > devstack, that produces a similar setup. Not necessarily identical, but > similar enough to be able to run tempest. It seems likely that already > exists in some form for testing OSA itself. Could a developer run that > on a local system (clearly being able to build the test environment > locally is a requirement for replacing devstack)? > > After that, I would want to see answers to some of the questions about > dealing with plugins that I posed above. > > And only then, I think, could I provide an answer to the question of > whether we should make the change or not. > > > 1 the ablity to install all openstack project form git if needed including gerrit reviews. > > > > abiltiy to eailly specify gerrit reiews or commits for each project > > > > # here i am declaring the os-vif should be installed from git not pypi > > LIBS_FROM_GIT=os-vif > > > > # and here i am specifying that gerrit should be used as the source and > > # i am provide a gerrit/git refs branch for a specific un merged patch > > OS_VIF_REPO=https://git.openstack.org/openstack/os-vif > > OS_VIF_BRANCH=refs/changes/25/629025/9 > > > > # *_REPO can obvioulsy take anythign that is valid in a git clone command so > > # i can use a local repo too > > NEUTRON_REPO=file:///opt/repos/neutron > > # and *_BRANCH as the name implices works with branches, tag commits* and gerrit ref brances. > > NEUTRON_BRANCH=bug/1788009 > > > > > > the next thing that would be needed is a way to simply override any config value like this > > > > [[post-config|/etc/nova/nova.conf]] > > #[compute] > > #live_migration_wait_for_vif_plug=True > > [libvirt] > > live_migration_uri = qemu+ssh://root@%s/system > > #cpu_mode = host-passthrough > > virt_type = kvm > > cpu_mode = custom > > cpu_model = kvm64 > > > > im sure that osa can do that but i really can just provide any path to any file if needed. > > so no need to update a role or plugin to set values in files created > > by plugins which is the next thing. > > Does OSA need to support *every* configuration value? Or could it deploy > a stack, and then rely on a separate tool to modify config values and > restart a service? Clearly some values need to be there when the cloud > first starts, but do they all? i think to preserve the workflow yes we need to be able to override any config that is generated by OSA. kolla ansible supports a relly nice config override mechanism where you can supply overrieds are applied after it generates a template. even though i have used the generic functionality to change thing like libvirt configs in the past i generally have only used it for the openstack services and for development i think its very imporant to easibly configure different senarios without needing to lear the opinionated syntatic sugar provided by the install and just set the config values directly especially when developing a new feature that adds a new value. > > > we enable plugins with a single line like this > > > > enable_plugin networking-ovs-dpdk https://github.com/openstack/networking-ovs-dpdk master > > some bugs > > meaning there is no need to preinstall or clone the repo. in theory the plugin should install all its dependeices > > and devstack will clone and execute the plugins based on the single > > line above. plugins however can also > > This makes me think it might be most appropriate to be considering a > tool that replaces devstack by wrapping OSA, rather than *being* > OSA. Maybe that's just an extra playbook that runs before OSA, or maybe > it's a simpler bash script that does some setup before invoking OSA. on that point i had considerd porting networking-ovs-dpdk to an ansible role and invoking from the devstack plugin in the past but i have not had time to do that. part of what is nice about devstack plugin model is you can write you plugin in any language you like provided you have a plug.sh file as an entrypoint. i doublt we have devstack plugins today that just run ansibel or puppet but it is totally valid to do so. > > > read any varable defiend in the local.conf as it will be set in the environment which means i can easily share > > an exact configuration with someone by shareing a local.conf. > > > > > > im not against improving or replacing devstack but with the devstack ansible roles and the fact we use devstack > > for all our testing in the gate it is actually has become one of the best openstack installer out there. we do > > not recommend people run it in production but with the ansible automation of grenade and the move to systemd for > > services there are less mainatined installers out there that devstack is proably a better foundation for a cloud > > to build on. people should still not use it in production but i can see why some might. > > > > > > > > > > > > > > > > Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which > > > > > > uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I > > > > > > don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep > > > > > > in > > > > > > mind. > > > > > > > > > > Indeed, with our current CI infrastructure with OSA, we have the > > > > > ability to create these dynamic scenarios (which can actually be > > > > > defined by a simple Zuul variable). > > > > > > > > > > https://github.com/openstack/openstack-ansible/blob/master/zuul.d/playbooks/pre-gate-scenario.yml#L41-L46 > > > > > > > > > > We do some really neat introspection of the project name being tested > > > > > in order to run specific scenarios. Therefore, that is something that > > > > > should be quite easy to accomplish simply by overriding a scenario > > > > > name within Zuul. It also is worth mentioning we now support full > > > > > metal deploys for a while now, so not having to worry about containers > > > > > is something to keep in mind as well (with simplifying the developer > > > > > experience again). > > > > > > > > > > > > On 1 Jun 2019, at 14:35, Mohammed Naser wrote: > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > This is something that I've discussed with a few people over time and > > > > > > > I think I'd probably want to bring it up by now. I'd like to propose > > > > > > > and ask if it makes sense to perhaps replace devstack entirely with > > > > > > > openstack-ansible. I think I have quite a few compelling reasons to > > > > > > > do this that I'd like to outline, as well as why I *feel* (and I could > > > > > > > be biased here, so call me out!) that OSA is the best option in terms > > > > > > > of a 'replacement' > > > > > > > > > > > > > > # Why not another deployment project? > > > > > > > I actually thought about this part too and considered this mainly for > > > > > > > ease of use for a *developer*. > > > > > > > > > > > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > > > > > > (which means that it has no build infrastructure, a developer can't > > > > > > > just get $commit checked out and deployed). > > > > > > > > > > > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > > > > > > beforehand, also, I feel they are much harder to use as a developer > > > > > > > because if you want to make quick edits and restart services, you have > > > > > > > to enter a container and make the edit there and somehow restart the > > > > > > > service without the container going back to it's original state. > > > > > > > Kolla-Ansible and the other combinations also suffer from the same > > > > > > > "issue". > > > > > > > > > > > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > > > > > > a virtualenv and installs packages inside of it. The services are > > > > > > > deployed as systemd units. This is very much similar to the current > > > > > > > state of devstack at the moment (minus the virtualenv part, afaik). > > > > > > > It makes it pretty straight forward to go and edit code if you > > > > > > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > > > > > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > > > > > > it much more easy to deploy on a wider variety of operating systems. > > > > > > > It also has the ability to use commits checked out from Zuul so all > > > > > > > the fancy Depends-On stuff we use works. > > > > > > > > > > > > > > # Why do we care about this, I like my bash scripts! > > > > > > > As someone who's been around for a *really* long time in OpenStack, > > > > > > > I've seen a whole lot of really weird issues surface from the usage of > > > > > > > DevStack to do CI gating. For example, one of the recent things is > > > > > > > the fact it relies on installing package-shipped noVNC, where as the > > > > > > > 'master' noVNC has actually changed behavior a few months back and it > > > > > > > is completely incompatible at this point (it's just a ticking thing > > > > > > > until we realize we're entirely broken). > > > > > > > > > > > > > > To this day, I still see people who want to POC something up with > > > > > > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > > > > > > how many warnings we'll put up, they'll always try to do it. With > > > > > > > this way, at least they'll have something that has the shape of an > > > > > > > actual real deployment. In addition, it would be *good* in the > > > > > > > overall scheme of things for a deployment system to test against, > > > > > > > because this would make sure things don't break in both ways. > > > > > > > > > > > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > > > > > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > > > > > > the playbooks directly from the executor. > > > > > > > > > > > > > > # So how could we do this? > > > > > > > The OpenStack Ansible project is made of many roles that are all > > > > > > > composable, therefore, you can think of it as a combination of both > > > > > > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > > > > > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > > > > > > integration of all of it in a distribution. OSA is currently both, > > > > > > > but it also includes both Ansible roles and playbooks. > > > > > > > > > > > > > > In order to make sure we maintain as much of backwards compatibility > > > > > > > as possible, we can simply run a small script which does a mapping of > > > > > > > devstack => OSA variables to make sure that the service is shipped > > > > > > > with all the necessary features as per local.conf. > > > > > > > > > > > > > > So the new process could be: > > > > > > > > > > > > > > 1) parse local.conf and generate Ansible variables files > > > > > > > 2) install Ansible (if not running in gate) > > > > > > > 3) run playbooks using variable generated in #1 > > > > > > > > > > > > > > The neat thing is after all of this, devstack just becomes a thin > > > > > > > wrapper around Ansible roles. I also think it brings a lot of hands > > > > > > > together, involving both the QA team and OSA team together, which I > > > > > > > believe that pooling our resources will greatly help in being able to > > > > > > > get more done and avoiding duplicating our efforts. > > > > > > > > > > > > > > # Conclusion > > > > > > > This is a start of a very open ended discussion, I'm sure there is a > > > > > > > lot of details involved here in the implementation that will surface, > > > > > > > but I think it could be a good step overall in simplifying our CI and > > > > > > > adding more coverage for real potential deployers. It will help two > > > > > > > teams unite together and have more resources for something (that > > > > > > > essentially is somewhat of duplicated effort at the moment). > > > > > > > > > > > > > > I will try to pick up sometime to POC a simple service being deployed > > > > > > > by an OSA role instead of Bash, placement which seems like a very > > > > > > > simple one and share that eventually. > > > > > > > > > > > > > > Thoughts? :) > > > > > > > > > > > > > > -- > > > > > > > Mohammed Naser — vexxhost > > > > > > > ----------------------------------------------------- > > > > > > > D. 514-316-8872 > > > > > > > D. 800-910-1726 ext. 200 > > > > > > > E. mnaser at vexxhost.com > > > > > > > W. http://vexxhost.com > > > > > > > > > > > > > > > > > > > — > > > > > > Slawek Kaplonski > > > > > > Senior software engineer > > > > > > Red Hat > > > > > > > > > > > > > > > > > > > > > -- > > > > > Mohammed Naser — vexxhost > > > > > ----------------------------------------------------- > > > > > D. 514-316-8872 > > > > > D. 800-910-1726 ext. 200 > > > > > E. mnaser at vexxhost.com > > > > > W. http://vexxhost.com > > > > > > > > — > > > > Slawek Kaplonski > > > > Senior software engineer > > > > Red Hat > > > > > > > > > > > > > > > > From dpeacock at redhat.com Tue Jun 4 14:03:05 2019 From: dpeacock at redhat.com (David Peacock) Date: Tue, 4 Jun 2019 10:03:05 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019 at 12:52 PM Kevin Carter wrote: > Hello Stackers, > > I wanted to follow up on this post from last year, pick up from where it > left off, and bring together a squad to get things moving. > > Count me in. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jesse at odyssey4.me Tue Jun 4 14:17:20 2019 From: jesse at odyssey4.me (Jesse Pretorius) Date: Tue, 4 Jun 2019 14:17:20 +0000 Subject: [qa][openstack-ansible][tripleo-ansible] redefining devstack In-Reply-To: References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Message-ID: Hi everyone, I find myself wondering whether doing this in reverse would potentially be more useful and less disruptive. If devstack plugins in service repositories are converted from bash to ansible role(s), then there is potential for OSA to make use of that. This could potentially be a drop-in replacement for devstack by using a #!/bin/ansible (or whatever known path) shebang in a playbook file, or by changing the devstack entry point into a wrapper that runs ansible from a known path. Using this implementation process would allow a completely independent development process for the devstack conversion, and would allow OSA to retire its independent role repositories as and when the service’s ansible role is ready. Using this method would also allow devstack, OSA, triple-o and kolla-ansible to consume those ansible roles in whatever way they see fit using playbooks which are tailored to their own deployment philosophy. At the most recent PTG there was a discussion between OSA and kolla-ansible about something like this and the conversation for how that could be done would be to ensure that the roles have a clear set of inputs and outputs, with variables enabling the code paths to key outputs. My opinion is that the convergence of all Ansible-based deployment tools to use a common set of roles would be advantageous in many ways: 1. There will be more hands & eyeballs on the deployment code. 2. There will be more eyeballs on the reviews for service and deployment code. 3. There will be a convergence of developer and operator communities on the reviews. 4. The deployment code will co-exist with the service code, so changes can be done together. 5. Ansible is more pythonic than bash, and using it can likely result in the removal of a bunch of devstack bash libs. As Doug suggested, this starts with putting together some requirements - for the wrapping frameworks, as well as the component roles. It may be useful to get some sort of representative sample service to put together a PoC on to help figure out these requirements. I think that this may be useful for the tripleo-ansible team to have a view on, I’ve added the tag to the subject of this email. Best regards, Jesse IRC: odyssey4me From cboylan at sapwetik.org Tue Jun 4 14:30:11 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 04 Jun 2019 07:30:11 -0700 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> Message-ID: <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> On Tue, Jun 4, 2019, at 1:01 AM, Sorin Sbarnea wrote: > I am in favour of ditching or at least refactoring devstack because > during the last year I often found myself blocked from fixing some > zuul/jobs issues because the buggy code was still required by legacy > devstack jobs that nobody had time maintain or fix, so they were > isolated and the default job configurations were forced to use dirty > hack needed for keeping these working. > > One such example is that there is a task that does a "chmod -R 0777 -R" > on the entire source tree, a total security threat. This is needed by devstack-gate and *not* devstack. We have been trying now for almost two years to get people to stop using devstack-gate in favor of the zuul v3 jobs. Please don't conflate this with devstack itself, it is not related and not relevant to this discussion. > > In order to make other jobs running correctly* I had to rely undoing > the damage done by such chmod because I was not able to disable the > historical hack. In order to make other jobs run correctly we are asking you to stop using devstack-gate and use zuulv3 native jobs instead. > > * ansible throws warning with unsafe file permissions > * ssh refuses to load unsafe keys > > That is why I am in favor of dropping features that are slowing down > the progress of others. Again this has nothing to do with devstack. > > I know that the reality is more complicated but I also think that > sometimes less* is more. > > > * deployment projects ;) > From johfulto at redhat.com Tue Jun 4 14:37:53 2019 From: johfulto at redhat.com (John Fulton) Date: Tue, 4 Jun 2019 10:37:53 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: On Tue, Jun 4, 2019 at 10:08 AM David Peacock wrote: > > On Mon, Jun 3, 2019 at 12:52 PM Kevin Carter wrote: >> >> Hello Stackers, >> >> I wanted to follow up on this post from last year, pick up from where it left off, and bring together a squad to get things moving. >> > > Count me in. +1 From openstack at nemebean.com Tue Jun 4 14:44:48 2019 From: openstack at nemebean.com (Ben Nemec) Date: Tue, 4 Jun 2019 09:44:48 -0500 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Message-ID: <82388be7-88d3-ad4a-bba3-81fe86eb8034@nemebean.com> On 6/4/19 7:39 AM, Doug Hellmann wrote: > Sean Mooney writes: > >> On Mon, 2019-06-03 at 08:39 -0400, Mohammed Naser wrote: >>> On Mon, Jun 3, 2019 at 8:27 AM Slawomir Kaplonski wrote: >>>> >>>> Hi, >>>> >>>>> On 1 Jun 2019, at 20:49, Mohammed Naser wrote: >>>>> >>>>> On Sat, Jun 1, 2019 at 1:46 PM Slawomir Kaplonski wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it in >>>>>> separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something >>>>>> similar possible with OSA or will it be needed to contribute always every change to OSA repository? >>>>> >>>>> Not a dumb question at all. So, we do have this concept of 'roles' >>>>> which you _could_ kinda technically identify similar to plugins. >>>>> However, I think one of the things that would maybe come out of this >>>>> is the inability for projects to maintain their own plugins (because >>>>> now you can host neutron/devstack/plugins and you maintain that repo >>>>> yourself), under this structure, you would indeed have to make those >>>>> changes to the OpenStack Ansible Neutron role >>>>> >>>>> i.e.: https://opendev.org/openstack/openstack-ansible-os_neutron >>>>> >>>>> However, I think from an OSA perspective, we would be more than happy >>>>> to add project maintainers for specific projects to their appropriate >>>>> roles. It would make sense that there is someone from the Neutron >>>>> team that could be a core on os_neutron from example. >>>> >>>> Yes, that may work for official projects like Neutron. But what about everything else, like projects hosted now in >>>> opendev.org/x/ repositories? Devstack gives everyone easy way to integrate own plugin/driver/project with it and >>>> install it together with everything else by simply adding one line (usually) in local.conf file. >>>> I think that it may be a bit hard to OSA team to accept and review patches with new roles for every project or >>>> driver which isn’t official OpenStack project. >>> >>> You raise a really good concern. Indeed, we might have to change the workflow >>> from "write a plugin" to "write an Ansible role" to be able to test >>> your project with >>> DevStack at that page (or maintain both a "legacy" solution) with a new one. >> the real probalem with that is who is going to port all of the >> existing plugins. > > Do all projects and all jobs have to be converted at once? Or ever? Perhaps not all at once, but I would say they all need to be converted eventually or we end up in the situation Dean mentioned where we have to maintain two different deployment systems. I would argue that's much worse than just continuing with devstack as-is. On the other hand, practically speaking I don't think we can probably do them all at once, unless there are a lot fewer devstack plugins in the wild than I think there are (which is possible). Also, I suspect there may be downstream plugins running in third-party ci that need to be considered. That said, while I expect this would be _extremely_ painful in the short to medium term, I'm also a big proponent of making the thing developers care about the same as the thing users care about. However, if we go down this path I think we need sufficient buy in from a diverse enough group of contributors that losing one group (see OSIC) doesn't leave us with a half-finished migration. That would be a disaster IMHO. > > How much complexity do those plugins actually contain? Would they be > fairly straightforward to convert? > > Could we build a "devstack plugin wrapper" for OSA? Could we run OSA and > then run devstack with just the plugin(s) needed for a given job? > > Is there enough appeal in the idea of replacing devstack with something > closer to what is used for production deployments to drive us to find an > iterative approach that doesn't require changing everything at one time? > Or are we stuck with devstack forever? > >> kolla-ansible has also tried to be a devstack replacement in the past via the introduction >> of dev-mode which clones the git repo of the dev mode project locally and bind mounts them into the container. >> the problem is it still breaks peoles plugins and workflow. >> >> >> some devstack feature that osa would need to support in order to be a >> replacement for me are. > > You've made a good start on a requirements list for a devstack > replacement. Perhaps a first step would be for some of the folks who > support this idea to compile a more complete list of those requirements, > and then we could analyze OSA to see how it might need to be changed or > whether it makes sense to use OSA as the basis for a new toolset that > takes on some of the "dev" features we might not want in a "production" > deployment tool. > > Here's another potential gap for whoever is going to make that list: > devstack pre-populates the environment with some data for things like > flavors and images. I don't imagine OSA does that or, if it does, that > they are an exact match. How do we change those settings? > > That leads to a good second step: Do the rest of the analysis to > understand what it would take to set up a base job like we have for > devstack, that produces a similar setup. Not necessarily identical, but > similar enough to be able to run tempest. It seems likely that already > exists in some form for testing OSA itself. Could a developer run that > on a local system (clearly being able to build the test environment > locally is a requirement for replacing devstack)? > > After that, I would want to see answers to some of the questions about > dealing with plugins that I posed above. > > And only then, I think, could I provide an answer to the question of > whether we should make the change or not. > >> 1 the ablity to install all openstack project form git if needed including gerrit reviews. >> >> abiltiy to eailly specify gerrit reiews or commits for each project >> >> # here i am declaring the os-vif should be installed from git not pypi >> LIBS_FROM_GIT=os-vif >> >> # and here i am specifying that gerrit should be used as the source and >> # i am provide a gerrit/git refs branch for a specific un merged patch >> OS_VIF_REPO=https://git.openstack.org/openstack/os-vif >> OS_VIF_BRANCH=refs/changes/25/629025/9 >> >> # *_REPO can obvioulsy take anythign that is valid in a git clone command so >> # i can use a local repo too >> NEUTRON_REPO=file:///opt/repos/neutron >> # and *_BRANCH as the name implices works with branches, tag commits* and gerrit ref brances. >> NEUTRON_BRANCH=bug/1788009 >> >> >> the next thing that would be needed is a way to simply override any config value like this >> >> [[post-config|/etc/nova/nova.conf]] >> #[compute] >> #live_migration_wait_for_vif_plug=True >> [libvirt] >> live_migration_uri = qemu+ssh://root@%s/system >> #cpu_mode = host-passthrough >> virt_type = kvm >> cpu_mode = custom >> cpu_model = kvm64 >> >> im sure that osa can do that but i really can just provide any path to any file if needed. >> so no need to update a role or plugin to set values in files created >> by plugins which is the next thing. > > Does OSA need to support *every* configuration value? Or could it deploy > a stack, and then rely on a separate tool to modify config values and > restart a service? Clearly some values need to be there when the cloud > first starts, but do they all? > >> we enable plugins with a single line like this >> >> enable_plugin networking-ovs-dpdk https://github.com/openstack/networking-ovs-dpdk master >> >> meaning there is no need to preinstall or clone the repo. in theory the plugin should install all its dependeices >> and devstack will clone and execute the plugins based on the single >> line above. plugins however can also > > This makes me think it might be most appropriate to be considering a > tool that replaces devstack by wrapping OSA, rather than *being* > OSA. Maybe that's just an extra playbook that runs before OSA, or maybe > it's a simpler bash script that does some setup before invoking OSA. > >> read any varable defiend in the local.conf as it will be set in the environment which means i can easily share >> an exact configuration with someone by shareing a local.conf. >> >> >> im not against improving or replacing devstack but with the devstack ansible roles and the fact we use devstack >> for all our testing in the gate it is actually has become one of the best openstack installer out there. we do >> not recommend people run it in production but with the ansible automation of grenade and the move to systemd for >> services there are less mainatined installers out there that devstack is proably a better foundation for a cloud >> to build on. people should still not use it in production but i can see why some might. >> >>> >>>>> >>>>>> Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which >>>>>> uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I >>>>>> don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep in >>>>>> mind. >>>>> >>>>> Indeed, with our current CI infrastructure with OSA, we have the >>>>> ability to create these dynamic scenarios (which can actually be >>>>> defined by a simple Zuul variable). >>>>> >>>>> https://github.com/openstack/openstack-ansible/blob/master/zuul.d/playbooks/pre-gate-scenario.yml#L41-L46 >>>>> >>>>> We do some really neat introspection of the project name being tested >>>>> in order to run specific scenarios. Therefore, that is something that >>>>> should be quite easy to accomplish simply by overriding a scenario >>>>> name within Zuul. It also is worth mentioning we now support full >>>>> metal deploys for a while now, so not having to worry about containers >>>>> is something to keep in mind as well (with simplifying the developer >>>>> experience again). >>>>> >>>>>>> On 1 Jun 2019, at 14:35, Mohammed Naser wrote: >>>>>>> >>>>>>> Hi everyone, >>>>>>> >>>>>>> This is something that I've discussed with a few people over time and >>>>>>> I think I'd probably want to bring it up by now. I'd like to propose >>>>>>> and ask if it makes sense to perhaps replace devstack entirely with >>>>>>> openstack-ansible. I think I have quite a few compelling reasons to >>>>>>> do this that I'd like to outline, as well as why I *feel* (and I could >>>>>>> be biased here, so call me out!) that OSA is the best option in terms >>>>>>> of a 'replacement' >>>>>>> >>>>>>> # Why not another deployment project? >>>>>>> I actually thought about this part too and considered this mainly for >>>>>>> ease of use for a *developer*. >>>>>>> >>>>>>> At this point, Puppet-OpenStack pretty much only deploys packages >>>>>>> (which means that it has no build infrastructure, a developer can't >>>>>>> just get $commit checked out and deployed). >>>>>>> >>>>>>> TripleO uses Kolla containers AFAIK and those have to be pre-built >>>>>>> beforehand, also, I feel they are much harder to use as a developer >>>>>>> because if you want to make quick edits and restart services, you have >>>>>>> to enter a container and make the edit there and somehow restart the >>>>>>> service without the container going back to it's original state. >>>>>>> Kolla-Ansible and the other combinations also suffer from the same >>>>>>> "issue". >>>>>>> >>>>>>> OpenStack Ansible is unique in the way that it pretty much just builds >>>>>>> a virtualenv and installs packages inside of it. The services are >>>>>>> deployed as systemd units. This is very much similar to the current >>>>>>> state of devstack at the moment (minus the virtualenv part, afaik). >>>>>>> It makes it pretty straight forward to go and edit code if you >>>>>>> need/have to. We also have support for Debian, CentOS, Ubuntu and >>>>>>> SUSE. This allows "devstack 2.0" to have far more coverage and make >>>>>>> it much more easy to deploy on a wider variety of operating systems. >>>>>>> It also has the ability to use commits checked out from Zuul so all >>>>>>> the fancy Depends-On stuff we use works. >>>>>>> >>>>>>> # Why do we care about this, I like my bash scripts! >>>>>>> As someone who's been around for a *really* long time in OpenStack, >>>>>>> I've seen a whole lot of really weird issues surface from the usage of >>>>>>> DevStack to do CI gating. For example, one of the recent things is >>>>>>> the fact it relies on installing package-shipped noVNC, where as the >>>>>>> 'master' noVNC has actually changed behavior a few months back and it >>>>>>> is completely incompatible at this point (it's just a ticking thing >>>>>>> until we realize we're entirely broken). >>>>>>> >>>>>>> To this day, I still see people who want to POC something up with >>>>>>> OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter >>>>>>> how many warnings we'll put up, they'll always try to do it. With >>>>>>> this way, at least they'll have something that has the shape of an >>>>>>> actual real deployment. In addition, it would be *good* in the >>>>>>> overall scheme of things for a deployment system to test against, >>>>>>> because this would make sure things don't break in both ways. >>>>>>> >>>>>>> Also: we run Zuul for our CI which supports Ansible natively, this can >>>>>>> remove one layer of indirection (Zuul to run Bash) and have Zuul run >>>>>>> the playbooks directly from the executor. >>>>>>> >>>>>>> # So how could we do this? >>>>>>> The OpenStack Ansible project is made of many roles that are all >>>>>>> composable, therefore, you can think of it as a combination of both >>>>>>> Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained >>>>>>> the base modules (i.e. puppet-nova, etc) and TripleO was the >>>>>>> integration of all of it in a distribution. OSA is currently both, >>>>>>> but it also includes both Ansible roles and playbooks. >>>>>>> >>>>>>> In order to make sure we maintain as much of backwards compatibility >>>>>>> as possible, we can simply run a small script which does a mapping of >>>>>>> devstack => OSA variables to make sure that the service is shipped >>>>>>> with all the necessary features as per local.conf. >>>>>>> >>>>>>> So the new process could be: >>>>>>> >>>>>>> 1) parse local.conf and generate Ansible variables files >>>>>>> 2) install Ansible (if not running in gate) >>>>>>> 3) run playbooks using variable generated in #1 >>>>>>> >>>>>>> The neat thing is after all of this, devstack just becomes a thin >>>>>>> wrapper around Ansible roles. I also think it brings a lot of hands >>>>>>> together, involving both the QA team and OSA team together, which I >>>>>>> believe that pooling our resources will greatly help in being able to >>>>>>> get more done and avoiding duplicating our efforts. >>>>>>> >>>>>>> # Conclusion >>>>>>> This is a start of a very open ended discussion, I'm sure there is a >>>>>>> lot of details involved here in the implementation that will surface, >>>>>>> but I think it could be a good step overall in simplifying our CI and >>>>>>> adding more coverage for real potential deployers. It will help two >>>>>>> teams unite together and have more resources for something (that >>>>>>> essentially is somewhat of duplicated effort at the moment). >>>>>>> >>>>>>> I will try to pick up sometime to POC a simple service being deployed >>>>>>> by an OSA role instead of Bash, placement which seems like a very >>>>>>> simple one and share that eventually. >>>>>>> >>>>>>> Thoughts? :) >>>>>>> >>>>>>> -- >>>>>>> Mohammed Naser — vexxhost >>>>>>> ----------------------------------------------------- >>>>>>> D. 514-316-8872 >>>>>>> D. 800-910-1726 ext. 200 >>>>>>> E. mnaser at vexxhost.com >>>>>>> W. http://vexxhost.com >>>>>>> >>>>>> >>>>>> — >>>>>> Slawek Kaplonski >>>>>> Senior software engineer >>>>>> Red Hat >>>>>> >>>>> >>>>> >>>>> -- >>>>> Mohammed Naser — vexxhost >>>>> ----------------------------------------------------- >>>>> D. 514-316-8872 >>>>> D. 800-910-1726 ext. 200 >>>>> E. mnaser at vexxhost.com >>>>> W. http://vexxhost.com >>>> >>>> — >>>> Slawek Kaplonski >>>> Senior software engineer >>>> Red Hat >>>> >>> >>> >> >> > From fungi at yuggoth.org Tue Jun 4 15:47:26 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Jun 2019 15:47:26 +0000 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> Message-ID: <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> On 2019-06-04 07:30:11 -0700 (-0700), Clark Boylan wrote: > On Tue, Jun 4, 2019, at 1:01 AM, Sorin Sbarnea wrote: > > I am in favour of ditching or at least refactoring devstack because > > during the last year I often found myself blocked from fixing some > > zuul/jobs issues because the buggy code was still required by legacy > > devstack jobs that nobody had time maintain or fix, so they were > > isolated and the default job configurations were forced to use dirty > > hack needed for keeping these working. > > > > One such example is that there is a task that does a "chmod -R 0777 -R" > > on the entire source tree, a total security threat. > > This is needed by devstack-gate and *not* devstack. We have been > trying now for almost two years to get people to stop using > devstack-gate in favor of the zuul v3 jobs. Please don't conflate > this with devstack itself, it is not related and not relevant to > this discussion. [...] Unfortunately this is not entirely the case. It's likely that the chmod workaround in question is only needed by legacy jobs using the deprecated devstack-gate wrappers, but it's actually being done by the fetch-zuul-cloner role[0] from zuul-jobs which is incorporated in our base job[1]. I agree that the solution is to stop using devstack-gate (and the old zuul-cloner v2 compatibility shim for that matter), but for it to have the effect of removing the problem permissions we also need to move the fetch-zuul-cloner role out of our base job. I fully expect this will be a widely-disruptive change due to newer or converted jobs, which are no longer inheriting from legacy-base or legacy-dsvm-base in openstack-zuul-jobs[2], retaining a dependency on this behavior. But the longer we wait, the worse that is going to get. [0] https://opendev.org/zuul/zuul-jobs/src/commit/2f2d6ce3f7a0687fc8f655abc168d7afbfaf11aa/roles/fetch-zuul-cloner/tasks/main.yaml#L19-L25 [1] https://opendev.org/opendev/base-jobs/src/commit/dbb56dda99e8e2346b22479b4dae97a8fc137217/playbooks/base/pre.yaml#L38 [2] https://opendev.org/openstack/openstack-zuul-jobs/src/commit/a7aa530a6059b464b32df69509e3001dc97e2aed/zuul.d/jobs.yaml#L951-L1097 -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From a.settle at outlook.com Tue Jun 4 16:18:49 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Tue, 4 Jun 2019 16:18:49 +0000 Subject: [tc] agenda for Technical Committee Meeting 6 June 2019 @ 1400 UTC Message-ID: TC Members, Our next meeting will be this Thursday, 6 June at 1400 UTC in #openstack-tc. Since there has not been a TC meeting for some time, the last suggested meeting agenda has been erased in favor of focusing on post-PTG and forum content. You will find the outlined agenda below. Any suggestions, please contact me before COB (relative to local time) on Wednesday 5 June. This email contains the agenda for the meeting, based on the content of the wiki [0]. If you will not be able to attend, please include your name in the "Apologies for Absence" section of the wiki page [0]. [0] https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee * Review Denver PTG All items are from the PTG etherpad at [1] ** Changes to the health check process. The suggestion was to remove the formality of "health checks" and focus on TC members being liaisons to projects in an effort to help manage a project's health (if required). We need to assign TC members to teams on the project team page. fungi offered to update the wiki, and asettle offered to update the ML on this change once the wiki was updated. See more action items on line 62 of the etherpad. ** Evolution of the help-most-needed list - this was discussed and decided to become an uncapped list, adding docs surrounding annual re-submission for items. We will go through the action items on this list (line 84 of the etherpad). Business cases requiring updates. ** Goal selection is changing. The way goals are being selected for the upcoming release will change. For Train, we will work on socialising the idea of proposing goals that are OpenStack-wide, but not tech-heavy. The new goal selection process splits the goals into "goal" and "implementation". Further details at the meeting. See line 101 of the etherpad for action items. ** Pop-up teams have been officially recognised and implemented into governance thanks to ttx. Please review his patch here [2] ** SIG governance is being defined (ricolin and asettle). See line 137 for action items. Will detail further at meeting. ** Python 3 check in. Finalising the migration (mugsie) ** Leaderless projects are becoming a concern - action items were on line 185 of the etherpad. Suggestions include reworking the documentation around the current role of the PTL and providing tips on how to "be a better PTL" and offering shadowing and mentoring for potential candidates. This all needs to be socialised further. ** Kickstarting innovation in Openstack - Zane proposed a zany (har har har) suggestion regarding a new multi-tenant cloud with ironic/neutron/optionally cinder/keytstone/octavia (vision will be completed with k8s on top of OpenStack). Suggestion was for new white paper written by zaneb and mnaser. ** Deleting all the things! [3] See line 234 for action items (mugsie). [1] https://etherpad.openstack.org/p/tc-train-ptg [2] https://review.opendev.org/#/c/661356/ [3] https://memegenerator.net/img/instances/14634311.jpg * Review Denver Forum ** Forum session planning for the next summit as this one was done rather hastily and we missed a few things (such as a goals session). See action items on line 243 of the PTG etherpad [1] * Other ** Socialising successbot and thanksbot. Get to it, team! Cheers, Alex p.s - I have a well known habit for getting meeting agenda's HORRIBLY wrong every time, so feel free to ping a gal and tell her what you know. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gr at ham.ie Tue Jun 4 16:23:46 2019 From: gr at ham.ie (Graham Hayes) Date: Tue, 4 Jun 2019 17:23:46 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> Message-ID: <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> On 04/06/2019 16:47, Jeremy Stanley wrote: > On 2019-06-04 07:30:11 -0700 (-0700), Clark Boylan wrote: >> On Tue, Jun 4, 2019, at 1:01 AM, Sorin Sbarnea wrote: >>> I am in favour of ditching or at least refactoring devstack because >>> during the last year I often found myself blocked from fixing some >>> zuul/jobs issues because the buggy code was still required by legacy >>> devstack jobs that nobody had time maintain or fix, so they were >>> isolated and the default job configurations were forced to use dirty >>> hack needed for keeping these working. >>> >>> One such example is that there is a task that does a "chmod -R 0777 -R" >>> on the entire source tree, a total security threat. >> >> This is needed by devstack-gate and *not* devstack. We have been >> trying now for almost two years to get people to stop using >> devstack-gate in favor of the zuul v3 jobs. Please don't conflate >> this with devstack itself, it is not related and not relevant to >> this discussion. > [...] > > Unfortunately this is not entirely the case. It's likely that the > chmod workaround in question is only needed by legacy jobs using the > deprecated devstack-gate wrappers, but it's actually being done by > the fetch-zuul-cloner role[0] from zuul-jobs which is incorporated > in our base job[1]. I agree that the solution is to stop using > devstack-gate (and the old zuul-cloner v2 compatibility shim for > that matter), but for it to have the effect of removing the problem > permissions we also need to move the fetch-zuul-cloner role out of > our base job. I fully expect this will be a widely-disruptive change > due to newer or converted jobs, which are no longer inheriting from > legacy-base or legacy-dsvm-base in openstack-zuul-jobs[2], retaining > a dependency on this behavior. But the longer we wait, the worse > that is going to get. I have been trying to limit this behaviour for nearly 4 years [3] (it can actually add 10-15 mins sometimes depending on what source trees I have mounted via NFS into a devstack VM when doing dev) > [0] https://opendev.org/zuul/zuul-jobs/src/commit/2f2d6ce3f7a0687fc8f655abc168d7afbfaf11aa/roles/fetch-zuul-cloner/tasks/main.yaml#L19-L25 > [1] https://opendev.org/opendev/base-jobs/src/commit/dbb56dda99e8e2346b22479b4dae97a8fc137217/playbooks/base/pre.yaml#L38 > [2] https://opendev.org/openstack/openstack-zuul-jobs/src/commit/a7aa530a6059b464b32df69509e3001dc97e2aed/zuul.d/jobs.yaml#L951-L1097 > [3] - https://review.opendev.org/#/c/203698 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From smooney at redhat.com Tue Jun 4 16:59:59 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 04 Jun 2019 17:59:59 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> Message-ID: <1b0924d73d2317dc1e6a9c0128f3ee5e1a3152d4.camel@redhat.com> On Tue, 2019-06-04 at 17:23 +0100, Graham Hayes wrote: > On 04/06/2019 16:47, Jeremy Stanley wrote: > > On 2019-06-04 07:30:11 -0700 (-0700), Clark Boylan wrote: > > > On Tue, Jun 4, 2019, at 1:01 AM, Sorin Sbarnea wrote: > > > > I am in favour of ditching or at least refactoring devstack because > > > > during the last year I often found myself blocked from fixing some > > > > zuul/jobs issues because the buggy code was still required by legacy > > > > devstack jobs that nobody had time maintain or fix, so they were > > > > isolated and the default job configurations were forced to use dirty > > > > hack needed for keeping these working. > > > > > > > > One such example is that there is a task that does a "chmod -R 0777 -R" > > > > on the entire source tree, a total security threat. > > > > > > This is needed by devstack-gate and *not* devstack. We have been > > > trying now for almost two years to get people to stop using > > > devstack-gate in favor of the zuul v3 jobs. Please don't conflate > > > this with devstack itself, it is not related and not relevant to > > > this discussion. > > > > [...] > > > > Unfortunately this is not entirely the case. It's likely that the > > chmod workaround in question is only needed by legacy jobs using the > > deprecated devstack-gate wrappers, but it's actually being done by > > the fetch-zuul-cloner role[0] from zuul-jobs which is incorporated > > in our base job[1]. I agree that the solution is to stop using > > devstack-gate (and the old zuul-cloner v2 compatibility shim for > > that matter), but for it to have the effect of removing the problem > > permissions we also need to move the fetch-zuul-cloner role out of > > our base job. I fully expect this will be a widely-disruptive change > > due to newer or converted jobs, which are no longer inheriting from > > legacy-base or legacy-dsvm-base in openstack-zuul-jobs[2], retaining > > a dependency on this behavior. But the longer we wait, the worse > > that is going to get. > > I have been trying to limit this behaviour for nearly 4 years [3] > (it can actually add 10-15 mins sometimes depending on what source trees > I have mounted via NFS into a devstack VM when doing dev) without looking into it i assuem this doeing this so that the stack user can read/execute scipts in the different git repos but chown -R stack:stack would be sainer. in anycase this is still a ci issue not a devstack one as devstack does not do this iteslf. by defualt it clones the repos if they dont exist as the current user so you dont need to change permissions. > > [0] > > https://opendev.org/zuul/zuul-jobs/src/commit/2f2d6ce3f7a0687fc8f655abc168d7afbfaf11aa/roles/fetch-zuul-cloner/tasks/main.yaml#L19-L25 > > [1] > > https://opendev.org/opendev/base-jobs/src/commit/dbb56dda99e8e2346b22479b4dae97a8fc137217/playbooks/base/pre.yaml#L38 > > [2] > > https://opendev.org/openstack/openstack-zuul-jobs/src/commit/a7aa530a6059b464b32df69509e3001dc97e2aed/zuul.d/jobs.yaml#L951-L1097 > > > > [3] - https://review.opendev.org/#/c/203698 > From dmsimard at redhat.com Tue Jun 4 17:08:16 2019 From: dmsimard at redhat.com (David Moreau Simard) Date: Tue, 4 Jun 2019 13:08:16 -0400 Subject: [all] Announcing the release of ARA Records Ansible 1.0 Message-ID: Hi openstack-discuss ! ARA 1.0 has been released and the announcement about it can be found here [1]. I wanted to personally thank the OpenStack community for believing in the project and the contributors who have helped get the project to where it is today. If you have any questions, feel free to reply to reach out ! [1]: https://ara.recordsansible.org/blog/2019/06/04/announcing-the-release-of-ara-records-ansible-1.0 David Moreau Simard dmsimard = [irc, github, twitter] From fungi at yuggoth.org Tue Jun 4 17:32:41 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Jun 2019 17:32:41 +0000 Subject: [infra] fetch-zuul-cloner and permissions (was: redefining devstack) In-Reply-To: <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> Message-ID: <20190604173241.3r22gjulzwuvihbk@yuggoth.org> On 2019-06-04 17:23:46 +0100 (+0100), Graham Hayes wrote: [...] > I have been trying to limit this behaviour for nearly 4 years [3] > (it can actually add 10-15 mins sometimes depending on what source trees > I have mounted via NFS into a devstack VM when doing dev) > > [3] - https://review.opendev.org/#/c/203698 Similar I suppose, though the problem mentioned in this subthread is actually not about the mass permission change itself, rather about the resulting permissions. In particular the fetch-zuul-cloner role makes the entire set of provided repositories world-writeable because the zuul-cloner v2 compatibility shim performs clones from those file paths and Git wants to hardlink them if they're being cloned within the same filesystem. This is necessary to support occasions where the original copies aren't owned by the same user running the zuul-cloner shim, since you can't hardlink files for which your account lacks write access. I've done a bit of digging into the history of this now, so the following is probably boring to the majority of you. If you want to help figure out why it's still there at the moment and what's left to do, read on... Change https://review.openstack.org/512285 which added the chmod task includes a rather prescient comment from Paul about not adding it to the mirror-workspace-git-repos role because "we might not want to chmod 777 on no-legacy jobs." Unfortunately I think we failed to realize that it already would because we had added fetch-zuul-cloner to our base job a month earlier in https://review.openstack.org/501843 for reasons which are not recorded in the change (presumably a pragmatic compromise related to the scramble to convert our v2 jobs at the time, I did not resort to digging in IRC history just yet). Soon after, we added fetch-zuul-cloner to the main "legacy" pre playbook with https://review.opendev.org/513067 and prepared to test its removal from the base job with https://review.opendev.org/513079 but that was never completed and I can't seem to find the results of the testing (or even any indication it was ever actually performed). At this point, I feel like we probably just need to re-propose an equivalent of 513079 in our base-jobs repository, exercise it with some DNM changes running a mix of legacy imported v2 and modern v3 native jobs, announce a flag day for the cut over, and try to help address whatever fallout we're unable to predict ahead of time. This is somewhat complicated by the need to also do something similar in https://review.opendev.org/656195 with the bindep "fallback" packages list, so we're going to need to decide how those two efforts will be sequenced, or whether we want to combine them into a single (and likely doubly-painful) event. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From doug at doughellmann.com Tue Jun 4 18:07:21 2019 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 04 Jun 2019 14:07:21 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <42a6e49abc54ab460c0a71529957e362f6d77eae.camel@redhat.com> References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> <42a6e49abc54ab460c0a71529957e362f6d77eae.camel@redhat.com> Message-ID: Sean Mooney writes: > On Tue, 2019-06-04 at 08:39 -0400, Doug Hellmann wrote: >> Sean Mooney writes: >> > >> > the real probalem with that is who is going to port all of the >> > existing plugins. >> >> Do all projects and all jobs have to be converted at once? Or ever? >> >> How much complexity do those plugins actually contain? Would they be >> fairly straightforward to convert? > that depends. some jsut add support for indivigual projects. > others install infrastructure services like ceph or kubernetes which will be used by openstack > services. others download and compiles c projects form source like networking-ovs-dpdk. > the neutron devstack pluging also used to compiles ovs form source to work around some distro bugs > and networking-ovn i belive also can? do the same. > a devstack plugin allows all of the above to be done trivally. It's possible to do all of that sort of thing through Ansible, too. I compile a couple of different tools as part of my developer setup playbooks. If the logic is complicated, the playbook can always call a script. >> Could we build a "devstack plugin wrapper" for OSA? Could we run OSA and >> then run devstack with just the plugin(s) needed for a given job? > that would likely be possible. im sure we could generate local.conf form osa's inventories > and and run the plugsins after osa runs. devstack always runs it in tree code in each phase and > then runs the plugins in the order they are enabled in each phase > > https://docs.openstack.org/devstack/latest/plugins.html > > > networking-ovs-dpdk for example replaces the _neutron_ovs_base_install_agent_packages function > https://github.com/openstack/networking-ovs-dpdk/blob/master/devstack/libs/ovs-dpdk#L11-L16 > with a noop and then in the install pahse we install ovs-dpdk form souce. > _neutron_ovs_base_install_agent_packages just install kernel ovs but we replace it as > our patches to make it condtional in devstack were rejected. What we end up with after this transition might work differently. Is there any reason it would have to maintain the "phase" approach? The ovs-dpdk example you give feels like it would be swapping one role for another in the playbook for the job that needs ovs-dpdk. > its not nessiarily a patteren i encurage but if you have to you can replace any functionality > that devstack provides via a plugin although most usecase relly dont > requrie that. Maybe we don't need to design around that if the requirement isn't common, then? That's another question for the analysis someone needs to do. >> Does OSA need to support *every* configuration value? Or could it deploy >> a stack, and then rely on a separate tool to modify config values and >> restart a service? Clearly some values need to be there when the cloud >> first starts, but do they all? > i think to preserve the workflow yes we need to be able to override > any config that is generated OK, so it sounds like that's an area to look at for gaps for OSA. I would imagine it would be possible to create a role to change arbitrary config settings based on inputs from the playbook or a vars file. -- Doug From doug at doughellmann.com Tue Jun 4 18:10:17 2019 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 04 Jun 2019 14:10:17 -0400 Subject: [qa][openstack-ansible][tripleo-ansible] redefining devstack In-Reply-To: References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Message-ID: Jesse Pretorius writes: > Hi everyone, > > I find myself wondering whether doing this in reverse would potentially be more useful and less disruptive. > > If devstack plugins in service repositories are converted from bash to ansible role(s), then there is potential for OSA to make use of that. This could potentially be a drop-in replacement for devstack by using a #!/bin/ansible (or whatever known path) shebang in a playbook file, or by changing the devstack entry point into a wrapper that runs ansible from a known path. > > Using this implementation process would allow a completely independent > development process for the devstack conversion, and would allow OSA > to retire its independent role repositories as and when the service’s > ansible role is ready. It depends on whether you want to delay the deprecation of devstack itself until enough services have done that, or if you want to make NewDevstack (someone should come up with a name for the OSA-based devstack replacement) consume those existing plugins in parallel with OSA. > Using this method would also allow devstack, OSA, triple-o and > kolla-ansible to consume those ansible roles in whatever way they see > fit using playbooks which are tailored to their own deployment > philosophy. That would be useful. > > At the most recent PTG there was a discussion between OSA and kolla-ansible about something like this and the conversation for how that could be done would be to ensure that the roles have a clear set of inputs and outputs, with variables enabling the code paths to key outputs. > > My opinion is that the convergence of all Ansible-based deployment tools to use a common set of roles would be advantageous in many ways: > > 1. There will be more hands & eyeballs on the deployment code. > 2. There will be more eyeballs on the reviews for service and deployment code. > 3. There will be a convergence of developer and operator communities > on the reviews. That might make all of this worth it, even if there is no other benefit. > 4. The deployment code will co-exist with the service code, so changes can be done together. > 5. Ansible is more pythonic than bash, and using it can likely result in the removal of a bunch of devstack bash libs. > > As Doug suggested, this starts with putting together some requirements - for the wrapping frameworks, as well as the component roles. It may be useful to get some sort of representative sample service to put together a PoC on to help figure out these requirements. > > I think that this may be useful for the tripleo-ansible team to have a view on, I’ve added the tag to the subject of this email. > > Best regards, > > Jesse > IRC: odyssey4me -- Doug From doug at doughellmann.com Tue Jun 4 18:15:09 2019 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 04 Jun 2019 14:15:09 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <82388be7-88d3-ad4a-bba3-81fe86eb8034@nemebean.com> References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> <82388be7-88d3-ad4a-bba3-81fe86eb8034@nemebean.com> Message-ID: Ben Nemec writes: > On 6/4/19 7:39 AM, Doug Hellmann wrote: >> >> Do all projects and all jobs have to be converted at once? Or ever? > > Perhaps not all at once, but I would say they all need to be converted > eventually or we end up in the situation Dean mentioned where we have to > maintain two different deployment systems. I would argue that's much > worse than just continuing with devstack as-is. On the other hand, > practically speaking I don't think we can probably do them all at once, > unless there are a lot fewer devstack plugins in the wild than I think > there are (which is possible). Also, I suspect there may be downstream > plugins running in third-party ci that need to be considered. I think we can't do them all at once. We can never do anything all at once; we're too big. I don't think we should have a problem saying that devstack is frozen for new features but will continue to run as-is, and new things should use the replacement (when it is available). As soon as the new thing can provide a bridge with *some* level of support for plugins, we could start transitioning as teams have time and need. Jesse's proposal to rewrite devstack plugins as ansible roles may give us that bridge. > That said, while I expect this would be _extremely_ painful in the short > to medium term, I'm also a big proponent of making the thing developers > care about the same as the thing users care about. However, if we go > down this path I think we need sufficient buy in from a diverse enough > group of contributors that losing one group (see OSIC) doesn't leave us > with a half-finished migration. That would be a disaster IMHO. Oh, yes. We would need this not to be a project undertaken by a group of people from one funding source. It needs to be a shift in direction of the community as a whole to improve our developer and testing tools. -- Doug From fungi at yuggoth.org Tue Jun 4 20:50:02 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Jun 2019 20:50:02 +0000 Subject: [infra] fetch-zuul-cloner and permissions (was: redefining devstack) In-Reply-To: <20190604173241.3r22gjulzwuvihbk@yuggoth.org> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> <20190604173241.3r22gjulzwuvihbk@yuggoth.org> Message-ID: <20190604205001.v7jc3y3sepsdfjcv@yuggoth.org> On 2019-06-04 17:32:41 +0000 (+0000), Jeremy Stanley wrote: [...] > Change https://review.openstack.org/512285 which added the chmod > task includes a rather prescient comment from Paul about not adding > it to the mirror-workspace-git-repos role because "we might not want > to chmod 777 on no-legacy jobs." Unfortunately I think we failed to > realize that it already would because we had added fetch-zuul-cloner > to our base job a month earlier in > https://review.openstack.org/501843 for reasons which are not > recorded in the change (presumably a pragmatic compromise related to > the scramble to convert our v2 jobs at the time, I did not resort to > digging in IRC history just yet). David Shrewsbury reminded me that the reason was we didn't have a separate legacy-base job yet at the time fetch-zuul-cloner was added, so it initially went into the normal base job. > Soon after, we added fetch-zuul-cloner to the main "legacy" pre > playbook with https://review.opendev.org/513067 and prepared to > test its removal from the base job with > https://review.opendev.org/513079 but that was never completed and > I can't seem to find the results of the testing (or even any > indication it was ever actually performed). > > At this point, I feel like we probably just need to re-propose an > equivalent of 513079 in our base-jobs repository, Proposed as https://review.opendev.org/663135 and once that merges we should be able to... > exercise it with some DNM changes running a mix of legacy imported > v2 and modern v3 native jobs, announce a flag day for the cut > over, and try to help address whatever fallout we're unable to > predict ahead of time. This is somewhat complicated by the need to > also do something similar in https://review.opendev.org/656195 > with the bindep "fallback" packages list, so we're going to need > to decide how those two efforts will be sequenced, or whether we > want to combine them into a single (and likely doubly-painful) > event. During the weekly Infrastructure team meeting which just wrapped up, we decided go ahead and combine the two cleanups for maximum pain and suffering. ;) http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-06-04-19.01.log.html#l-207 Tentatively, we're scheduling the removal of the fetch-zuul-cloner role and the bindep fallback package list from non-legacy jobs for Monday June 24. The details of this plan will of course be more widely disseminated in the coming days, assuming we don't identify any early blockers. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From pabelanger at redhat.com Tue Jun 4 22:07:27 2019 From: pabelanger at redhat.com (Paul Belanger) Date: Tue, 4 Jun 2019 18:07:27 -0400 Subject: [infra] fetch-zuul-cloner and permissions (was: redefining devstack) In-Reply-To: <20190604173241.3r22gjulzwuvihbk@yuggoth.org> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> <20190604173241.3r22gjulzwuvihbk@yuggoth.org> Message-ID: <20190604220727.GB32715@localhost.localdomain> On Tue, Jun 04, 2019 at 05:32:41PM +0000, Jeremy Stanley wrote: > On 2019-06-04 17:23:46 +0100 (+0100), Graham Hayes wrote: > [...] > > I have been trying to limit this behaviour for nearly 4 years [3] > > (it can actually add 10-15 mins sometimes depending on what source trees > > I have mounted via NFS into a devstack VM when doing dev) > > > > [3] - https://review.opendev.org/#/c/203698 > > Similar I suppose, though the problem mentioned in this subthread is > actually not about the mass permission change itself, rather about > the resulting permissions. In particular the fetch-zuul-cloner role > makes the entire set of provided repositories world-writeable > because the zuul-cloner v2 compatibility shim performs clones from > those file paths and Git wants to hardlink them if they're being > cloned within the same filesystem. This is necessary to support > occasions where the original copies aren't owned by the same user > running the zuul-cloner shim, since you can't hardlink files for > which your account lacks write access. > > I've done a bit of digging into the history of this now, so the > following is probably boring to the majority of you. If you want to > help figure out why it's still there at the moment and what's left > to do, read on... > > Change https://review.openstack.org/512285 which added the chmod > task includes a rather prescient comment from Paul about not adding > it to the mirror-workspace-git-repos role because "we might not want > to chmod 777 on no-legacy jobs." Unfortunately I think we failed to > realize that it already would because we had added fetch-zuul-cloner > to our base job a month earlier in > https://review.openstack.org/501843 for reasons which are not > recorded in the change (presumably a pragmatic compromise related to > the scramble to convert our v2 jobs at the time, I did not resort to > digging in IRC history just yet). Soon after, we added > fetch-zuul-cloner to the main "legacy" pre playbook with > https://review.opendev.org/513067 and prepared to test its removal > from the base job with https://review.opendev.org/513079 but that > was never completed and I can't seem to find the results of the > testing (or even any indication it was ever actually performed). > Testing was done, you can see that in https://review.opendev.org/513506/. However the issue was, at the time, projects that were using tools/tox_install.sh would break (I have no idea is that is still the case). For humans interested, https://etherpad.openstack.org/p/zuulv3-remove-zuul-cloner was the etherpad to capture this work. Eventually I ended up abandoning the patch, because I wasn't able to keep pushing on it. > At this point, I feel like we probably just need to re-propose an > equivalent of 513079 in our base-jobs repository, exercise it with > some DNM changes running a mix of legacy imported v2 and modern v3 > native jobs, announce a flag day for the cut over, and try to help > address whatever fallout we're unable to predict ahead of time. This > is somewhat complicated by the need to also do something similar > in https://review.opendev.org/656195 with the bindep "fallback" > packages list, so we're going to need to decide how those two > efforts will be sequenced, or whether we want to combine them into a > single (and likely doubly-painful) event. > -- > Jeremy Stanley From fungi at yuggoth.org Tue Jun 4 22:20:16 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Jun 2019 22:20:16 +0000 Subject: [infra] fetch-zuul-cloner and permissions (was: redefining devstack) In-Reply-To: <20190604220727.GB32715@localhost.localdomain> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> <20190604173241.3r22gjulzwuvihbk@yuggoth.org> <20190604220727.GB32715@localhost.localdomain> Message-ID: <20190604222016.2cyx6ghdlk6oamqp@yuggoth.org> On 2019-06-04 18:07:27 -0400 (-0400), Paul Belanger wrote: [...] > Testing was done, you can see that in > https://review.opendev.org/513506/. However the issue was, at the time, > projects that were using tools/tox_install.sh would break (I have no > idea is that is still the case). > > For humans interested, > https://etherpad.openstack.org/p/zuulv3-remove-zuul-cloner was the > etherpad to capture this work. Aha! I missed the breadcrumbs which led to those, though I'll admit to only having performed a cursory grep through the relevant repo histories. > Eventually I ended up abandoning the patch, because I wasn't able to > keep pushing on it. [...] Happy to start pushing that boulder uphill again, and thanks for paving the way the first time! -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From cboylan at sapwetik.org Tue Jun 4 22:45:58 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 04 Jun 2019 15:45:58 -0700 Subject: [all] Long overdue cleanups of Zuulv2 compatibility base configs Message-ID: <5980f30f-d4c7-4e0d-8d3a-984e2dcd707a@www.fastmail.com> As part of our transition to Zuulv3 a year and a half ago, we carried over some compatibility tooling that we would now like to clean up. Specifically, we add a zuul-cloner (which went away in zuulv3) shim and set a global bindep fallback file value in all jobs. Zuulv3 native jobs are expected to use the repos zuul has precloned for you (no zuul-cloner required) as well as supply an in repo bindep.txt (or specify a bindep.txt path or install packages via some other method). This means that we should be able to remove both of these items from the non legacy base job in OpenDev's zuul. The legacy base job will continue to carry these for you so that you can write new native jobs over time. We have two changes [0][1] ready to go for this; however, due to the potential for disruption we would like to give everyone some time to test and prepare for this change. Fungi has a change to base-test [2] which will remove the zuul-cloner shim. Once this is in you can push "Do Not Merge" changes to your zuul config that reparent your tests from "base" to "base-test" and that will run the jobs without the zuul-cloner shim. Testing the bindep fallback removal is a bit more difficult as we set that in zuul's server config globally. What you can do is check your jobs' job-output.txt log files for usage of "bindep-fallback.txt". Our current plan is to merge these changes on June 24, 2019. We will be around to help debug any unexpected issues that come up. Jobs can be updated to use the "legacy-base" base job instead of the "base" base job if they need to be reverted to the old behavior quickly. Finally, Fungi did some excellent spelunking through history to understand how we got here. If you are curious you can find more details at: http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006881.html. [0] https://review.opendev.org/656195 [1] https://review.opendev.org/663151 [2] https://review.opendev.org/663135 Clark From kecarter at redhat.com Tue Jun 4 23:51:00 2019 From: kecarter at redhat.com (Kevin Carter) Date: Tue, 4 Jun 2019 18:51:00 -0500 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: In doing a brief audit of the `tripleo-ansible` landscape it seems like we have many repositories [0] with little to no activity (most being simple role shells) [1]. While these repositories seem well intentioned, I'm not really sure we need them. All of these roles fall under the `tripleo-ansible` acls and, in my opinion, are at odds with the initial stated goal: > ... co-locating all of the Ansible tasks/roles/libraries/plugins throughout the code base into a single purpose-built repository ... While I can see a place for some of these roles and could rationalize building independent, reusable, repositories I don't think we're anywhere near ready for that at this time. I also believe that when where we're ready to begin building independent role repositories we should do so collaboratively; working with the likes of Infra, OpenStack-Ansible, Kolla, Airship, and anyone else who wants to contribute. So the questions at hand are: what, if anything, should we do with these repositories? Should we retire them or just ignore them? Is there anyone using any of the roles? [0] - https://opendev.org/openstack/project-config/src/commit/a12c6b531f58aaf9c838299cc0f2abc8c9ee9f40/gerrit/projects.yaml#L891-L1060= [1] - https://review.opendev.org/#/q/project:%255Eopenstack/ansible-role-tripleo.*+status:open Kevin Carter IRC: cloudnull On Mon, Jun 3, 2019 at 11:27 AM Kevin Carter wrote: > Hello Stackers, > > I wanted to follow up on this post from last year, pick up from where it > left off, and bring together a squad to get things moving. > > > > http://lists.openstack.org/pipermail/openstack-dev/2018-August/133801.html > > The effort to convert tripleo Puppet and heat templates with embedded > Ansible to a more consumable set of playbooks and roles is in full effect. > As we're working through this effort we believe co-locating all of the > Ansible tasks/roles/libraries/plugins throughout the code base into a > single purpose-built repository will assist us in streamlining and > simplifying. Structurally, at this time, most of tripleo will remain the > same. However, the inclusion of tripleo-Ansible will allow us to create > more focused solutions which are independently testable, much easier > understand, and simple to include into the current heat template deployment > methodologies. While a straight port of the existing Ansible tasks will not > be entirely possible, the goal of this ongoing effort will be zero impact > on our existing workflow and solutions. > > To reigniting this effort, I've put up a review to create a new > "transformation" squad[0] geared toward building the structure around > tripleo-ansible[1] and converting our current solutions into > roles/playbooks/libraries/plugins. Initially, we'll be focused on our > existing code base; however, long term, I believe it makes sense for this > squad to work across projects to breakdown deployment barriers for folks > using similar technologies. > > We're excited to get this effort rolling again and would love to work with > anyone and everyone throughout the community. If folks are interested in > this effort, please let us know. > > [0] - https://review.opendev.org/662763 > [1] - https://opendev.org/openstack/tripleo-ansible > -- > > Kevin Carter > IRC: cloudnull > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Jun 5 06:36:44 2019 From: gmann at ghanshyammann.com (gmann at ghanshyammann.com) Date: Wed, 05 Jun 2019 15:36:44 +0900 Subject: [tc][form][ptg] Summary for "help-most-needed" list & its Future Message-ID: <16b265aa91f.ef3c27c179554.6636309287366653079@ghanshyammann.com> Hello Everyone, We had many discussions on the help-most-needed list [1] & its future at Denver conference ( Joint leadership meeting, Forum and PTG). This time we are able to get volunteers from the board of directors, also new ideas to redefine the list and give another chance if we can get contributors to help from companies or universities. I am summarising all the discussions below, feel free to append/correct if I missed anything. * Joint Leadership Meeting: TC raised the topic of help-most-needed list to Board of Directors during the Joint leadership meeting and briefed the progress on adding the business value to each item in the list(thanks lance for working on those). TC raised the point of not getting help on those items which are there for many years. For example, many companies using designate and glance but there is no help from those companies in term of contributors. There are few ideas to market and publish this list on different platform and stakeholders. Alan gave a good idea to pick and publish one of the items in the foundation newsletter. For the first time, we had two volunteers from the Board of Directors, 1. Prakash Ramchandran 2. Allison Randal who will broadcast this list in companies and universities to get some contributors. Big thanks for helping on this. * Forum: 'Planning & Defining Structure for ‘Help most needed’ list' [2] We hosted forum sessions dedicated to further discussion on this 'help most needed’ list planning and future Structure. We discussed the ideas of defining the template for new entry and if there can be any exit criteria. There was mixed suggestion on exit criteria and how we make this list more effective. It is not easy to get help from companies at least now where many of the companies are reducing their upstream developers. Allison suggested that foundation staff and BoD are the good candidates to bridge between list and companies. She also suggested to reach out to professor instead of the student at OpenStack-interested universities and volunteers to reach out to OSU for that. Below are the action items we collected from that session. If you would like to volunteer for any of the unassigned AI, feel free to reach out to TC. Action Items: - (suggestion made by Alan Clark at joint leadership meeting) Pick a random item from the list and highlight it in the foundation newsletter (ttx) - (suggestion made by wendar) OSF Staff and BoD acting as matchmaker between list items and ecosystem companies who may be a good fit for pitching in (prakash, wendar) - make clear that we don't need full-time commitments, but we do need long-term commitments - gmann - pair up glance contribution needs with Edge WG participants who are interested in integrating/improving it to ensure it's also maintained - UC has a stronger connection to users/operators and could also help identify potential organizations who are interested in the long-term survival of these projects - (wendar)Identify professors at OpenStack-interested universities and give them project ideas. No point in advertising directly to students (post-doctorate programs in particular, not undergraduate programs) - adjust the business value sections to make it clear where these contributions impact the bottom line/profits arising from solving the problem statements (evrardjp asettle) - Reach user groups? - wendar will reach out to OSU * PTG: 'Evolution of help-most-needed list' [3] PTG discussion collected the good steps forward and working items to redefine/replan this list. The suggestion is to remove the cap on the number of items in the help needed list. It will be uncapped and will be renamed to something else. zaneb already proposed the renaming the entry template idea [4]. The list will be reiterated annually and new list obviously can have items from previous year list. Leveraging User Survey on grabbing information about project usage and what project companies contribute can provide good data to TC to decide the reach out companies for the help needed items. Below are the AI from PTG discussion- Action Items: - Update ML regarding forum session + discussion around uncapping list (gmann) - Suggest adding questions in user survey (ttx) - Rename "help most needed" to (zaneb, asettle) - Uncap the list & add governance docs surrounding annual resubmission for items (ttx, evrardjp) - Include a "completition/exit/etc criteria" - Include (or not?) SIGs in all of this (to be discussed in reviews) [1] https://governance.openstack.org/tc/reference/help-most-needed.html [2] https://etherpad.openstack.org/p/Den-forum-help-most-needed [3] https://etherpad.openstack.org/p/tc-train-ptg [4] https://review.opendev.org/#/c/657447/ - gmann From aj at suse.com Wed Jun 5 06:47:35 2019 From: aj at suse.com (Andreas Jaeger) Date: Wed, 5 Jun 2019 08:47:35 +0200 Subject: [infra] fetch-zuul-cloner and permissions (was: redefining devstack) In-Reply-To: <20190604220727.GB32715@localhost.localdomain> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> <20190604173241.3r22gjulzwuvihbk@yuggoth.org> <20190604220727.GB32715@localhost.localdomain> Message-ID: On 05/06/2019 00.07, Paul Belanger wrote: > Testing was done, you can see that in > https://review.opendev.org/513506/. However the issue was, at the time, > projects that were using tools/tox_install.sh would break (I have no > idea is that is still the case). I have a couple of changes open to remove the final tools/tox_install.sh files, see: https://review.opendev.org/#/q/status:open+++topic:tox-siblings There are a few more repos that didn't take my changes from last year which I abandoned in the mean time - and a few dead repos that I did not submit to when double checking today ;( Also, compute-hyperv and nova-blazar need https://review.opendev.org/663234 (requirements change) first. So, we should be pretty good if these changes get reviewed and merged, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From dangtrinhnt at gmail.com Wed Jun 5 07:19:53 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Wed, 5 Jun 2019 16:19:53 +0900 Subject: [telemetry] Meeting tomorrow cancelled Message-ID: Hi team, I'm leaving the office for vacation tomorrow so I will not able to hold the meeting. The next meeting will be on June 20th. Mean while, if you have any thing to discuss, please let me know or put it on the agenda [1]. Thanks, [1] https://etherpad.openstack.org/p/telemetry-meeting-agenda -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From stig.openstack at telfer.org Wed Jun 5 07:36:57 2019 From: stig.openstack at telfer.org (Stig Telfer) Date: Wed, 5 Jun 2019 08:36:57 +0100 Subject: [scientific-sig] IRC Meeting today: CERN OpenStack Day, SDN, Secure computing and more Message-ID: <9A2AA1FB-285A-4BEC-BA95-22A9A67627BE@telfer.org> Hi all - We have a Scientific SIG IRC meeting today at 1100 UTC in channel #openstack-meeting. Everyone is welcome. Today’s agenda is online here: https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_June_5th_2019 If you’d like anything added, please let us know. Today we’ll do a roundup of last week’s excellent CERN OpenStack day, and follow up on the issues with south-bound coherency between Neutron and SDN. We’d also like to restart some discussions around secure computing environments - please come along with your experiences. Finally, we are looking for EMEA-region contributors to the research computing private/hybrid cloud advocacy study discussed at the PTG. Plenty to cover! Cheers, Stig -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Wed Jun 5 09:10:39 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:10:39 +0100 Subject: [kolla] Moving weekly IRC meetings to #openstack-kolla Message-ID: Hi, In the recent virtual PTG we agreed to move the weekly IRC meetings to the #openstack-kolla channel. This will take effect from today's meeting at 1500 UTC. Cheers, Mark From mark at stackhpc.com Wed Jun 5 09:10:53 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:10:53 +0100 Subject: [kolla] Feedback request: removing ceph deployment Message-ID: Hi, We discussed during the kolla virtual PTG [1] the option of removing support for deploying Ceph, as a way to improve the long term sustainability of the project. Ceph support in kolla does require maintenance, and is not the core focus of the project. There are other good tools for deploying Ceph (ceph-deploy, ceph-ansible), and Kolla Ansible supports integration with an external Ceph cluster deployed using these or other methods. To avoid leaving people in a difficult position, we would recommend a deployment tool (probably ceph-ansible), and provide an automated, tested and documented migration path to it. Here is a rough proposed schedule for removal: * Train: deprecate Ceph deployment, add CI tests for kolla-ansible with ceph-ansible * U: Obsolete Ceph deployment, provide migration path to ceph-ansible * V: Remove Ceph deployment Please provide feedback on this plan and how it will affect you. Thanks, Mark [1] https://etherpad.openstack.org/p/kolla-train-ptg From mark at stackhpc.com Wed Jun 5 09:10:56 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:10:56 +0100 Subject: [kolla] Feedback request: removing OracleLinux support Message-ID: Hi, We discussed during the kolla virtual PTG [1] the option of removing support for Oracle Linux, as a way to improve the long term sustainability of the project. Since (from afar) OracleLinux is very similar to CentOS, it does not require too much maintenance, however it is non-zero and does consume CI resources. Contributors from Oracle left the community some time ago, and we do not generally see Oracle Linux in bug reports, so must assume it is not well used. We propose dropping support for OracleLinux in the Train cycle. If this will affect you and you would like to help maintain it, please get in touch. Thanks, Mark [1] https://etherpad.openstack.org/p/kolla-train-ptg From mark at stackhpc.com Wed Jun 5 09:10:59 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:10:59 +0100 Subject: [kolla] Feedback request: removing kolla-cli Message-ID: Hi, We discussed during the kolla virtual PTG [1] the option of removing support for the kolla-cli deliverable [2], as a way to improve the long term sustainability of the project. kolla-cli was a project started by Oracle, and accepted as a kolla deliverable. While it looks interesting and potentially useful, it never gained much traction (as far as I'm aware) and the maintainers left the community. We have never released it and CI has been failing for some time. We propose dropping support for kolla-cli in the Train cycle. If this will affect you and you would like to help maintain it, please get in touch. Thanks, Mark [1] https://etherpad.openstack.org/p/kolla-train-ptg [2] https://github.com/openstack/kolla-cli From mark at stackhpc.com Wed Jun 5 09:11:03 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:11:03 +0100 Subject: [kolla][tripleo] Podman/buildah Message-ID: Hi, At the Denver forum during the kolla feedback session [1], the topic of support for podman [2] and buildah [3] was raised. RHEL/CentOS 8 removes support for Docker [4], although presumably it will still be possible to pull down an RPM from https://download.docker.com/linux/. Currently both kolla and kolla-ansible interact with docker via the docker python module [5], which interacts with the docker daemon API. Given that podman and buildah are not daemons, I would expect the usage model to be a bit different. There is a python API for podman [6] which we might be able to use. I understand that Tripleo uses buildah to build images already (please correct me if I'm wrong). How is this achieved with kolla? Perhaps using 'kolla-build --template-only' to generate Dockerfiles then invoking buildah separately? Are you planning to work on adding buildah support to kolla itself? For running services, Tripleo uses Paunch [7] to abstract away the container engine, and it appears Podman support was added here - building CLI argument strings rather than via a python API [8]. For anyone using kolla/kolla-ansible, please provide feedback on how useful/necessary this would be to you. Thanks, Mark [1] https://etherpad.openstack.org/p/DEN-train-kolla-feedback [2] https://podman.io/ [3] https://buildah.io/ [4] https://access.redhat.com/solutions/3696691 [5] https://github.com/docker/docker-py [6] https://github.com/containers/python-podman [7] https://docs.openstack.org/developer/paunch/readme.html [8] https://github.com/openstack/paunch/blob/ecc2047b2ec5eaf39cce119abe1678ac19139d79/paunch/builder/podman.py From mark at stackhpc.com Wed Jun 5 09:11:06 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:11:06 +0100 Subject: [kolla][tripleo] Python 3, CentOS/RHEL 8 Message-ID: Hi, At the recent kolla virtual PTG [1], we discussed the move to python 3 images in the Train cycle. hrw has started this effort for Ubuntu/Debian source images [2] and is making good progress. Next we will need to consider CentOS and RHEL. It seems that for Train RDO will provide only python 3 packages with support for CentOS 8 [3]. There may be some overlap in the trunk (master) packages where there is support for both CentOS 7 and 8. We will therefore need to combine the switch to python 3 with a switch to a CentOS/RHEL 8 base image. Some work was started during the Stein cycle to support RHEL 8 images with python 3 packages. There will no doubt be a few scripts that need updating to complete this work. We'll also need to test to ensure that both binary and source images work in this new world. Tripleo team - what are your plans for CentOS/RHEL 8 and python 3 this cycle? Are you planning to continue the work started in kolla during the Stein release? Thanks, Mark [1] https://etherpad.openstack.org/p/kolla-train-ptg [2] https://blueprints.launchpad.net/kolla/+spec/debian-ubuntu-python3 [3] https://review.rdoproject.org/etherpad/p/moving-rdo-to-centos8 [4] https://review.opendev.org/#/c/632156/ From mark at stackhpc.com Wed Jun 5 09:11:12 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:11:12 +0100 Subject: [nova][kolla][openstack-ansible][tripleo] Cells v2 upgrades Message-ID: Hi, At the recent kolla virtual PTG [1] we had a good discussion about adding support for multiple nova cells in kolla-ansible. We agreed a key requirement is to be able to perform operations on one or more cells without affecting the rest for damage limitation. This also seems like it would apply to upgrades. We're seeking input on ordering. Looking at the nova upgrade guide [2] I might propose something like this: 1. DB syncs 2. Upgrade API, super conductor For each cell: 3a. Upgrade cell conductor 3b. Upgrade cell computes 4. SIGHUP all services 5. Run online migrations At some point in here we also need to run the upgrade check. Presumably between steps 1 and 2? It would be great to get feedback both from the nova team and anyone running cells Thanks, Mark [1] https://etherpad.openstack.org/p/kolla-train-ptg [2] https://docs.openstack.org/nova/latest/user/upgrade.html From mark at stackhpc.com Wed Jun 5 09:11:14 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:11:14 +0100 Subject: [tc][kolla][kayobe] Feedback request: kayobe seeking official status Message-ID: Hi, The Kayobe project [1] seeks to become an official OpenStack project during the Train cycle. Kayobe is a deployment tool that uses Kolla Ansible and Bifrost to deploy a containerised OpenStack control plane to bare metal. The project was started 2 years ago to fill in some gaps in Kolla Ansible, and has since been used for a number of production deployments. It's frequently deployed in Scientific Computing environments, but is not limited to this. We ran a packed workshop on Kayobe at the Denver summit and got some great feedback, with many people agreeing that it makes Kolla Ansible easier to adopt in environments with no existing provisioning system. We use OpenStack development workflows, including IRC, the mailing list, opendev, zuul, etc. We see two options for becoming an official OpenStack project: 1. become a deliverable of the Kolla project 2. become an official top level OpenStack project Given the affinity with the Kolla project I feel that option 1 seems natural. However, I do not want to use influence as PTL to force this approach. There is currently only one person (me) who is a member of both core teams, although all kayobe cores are active in the Kolla community. I would not expect core memberships to change, although we would probably end up combining IRC channels, meetings and design sessions. I would hope that combining these communities would be to the benefit of both. Please provide feedback on this matter - whether positive or negative. Thanks, Mark [1] http://kayobe.readthedocs.io From mark at stackhpc.com Wed Jun 5 09:11:19 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:11:19 +0100 Subject: [kolla] Priorities for the Train cycle Message-ID: Hi, Thanks to those who attended the recent kolla virtual PTG [1]. We had some good technical discussions but did not get to a topic that I was keen to cover - priorities for the Train cycle. As a community of volunteers it can be difficult to guide the project in a particular direction, but agreeing on some priorities can help us to focus our efforts - both with reviews and development. I have seen this work well in the ironic team. Based on our recent discussions and knowledge of community goals, I compiled a list of candidate work items [2]. If you are involved with the kolla community (operator, developer, core, etc.), please vote on these priorities to indicate what you think we should focus on in the Train cycle. I will leave this open for a week, then order the items. At this point we can try to assign an owner to each. Thanks, Mark [1] https://etherpad.openstack.org/p/kolla-train-ptg [2] https://etherpad.openstack.org/p/kolla-train-priorities From marcin.juszkiewicz at linaro.org Wed Jun 5 09:33:20 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Wed, 5 Jun 2019 11:33:20 +0200 Subject: [tc][kolla][kayobe] Feedback request: kayobe seeking official status In-Reply-To: References: Message-ID: <56d693a4-01f5-0865-3948-0974b14f1dab@linaro.org> W dniu 05.06.2019 o 11:11, Mark Goddard pisze: > The Kayobe project [1] seeks to become an official OpenStack project > during the Train cycle. > We see two options for becoming an official OpenStack project: > > 1. become a deliverable of the Kolla project > 2. become an official top level OpenStack project > Please provide feedback on this matter - whether positive or negative. As Kolla core I support both options. If Kayobe became kolla/kayobe then I will be fine. Similar with openstack/kayobe project. From marcin.juszkiewicz at linaro.org Wed Jun 5 09:42:27 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Wed, 5 Jun 2019 11:42:27 +0200 Subject: [kolla] Feedback request: removing OracleLinux support In-Reply-To: References: Message-ID: <81438050-7699-c7a7-b883-a707cc3f53db@linaro.org> W dniu 05.06.2019 o 11:10, Mark Goddard pisze: > We propose dropping support for OracleLinux in the Train cycle. If > this will affect you and you would like to help maintain it, please > get in touch. First we drop it from CI. Then (IMHO) it will be removed once we move to CentOS 8. From luka.peschke at objectif-libre.com Wed Jun 5 12:17:35 2019 From: luka.peschke at objectif-libre.com (Luka Peschke) Date: Wed, 05 Jun 2019 14:17:35 +0200 Subject: [cloudkitty] Core team updates Message-ID: Hi all, I'd like to propose some updates to the CloudKitty core team: * First of all I'd like to welcome Justin Ferrieu (jferrieu on IRC) to the core team. He's been around contributing (mostly on the Prometheus collector and fetcher) and reviewing a lot for the last two releases (https://www.stackalytics.com/report/contribution/cloudkitty/90). It would be great if he had +2/+A power. * Some cores have been inactive for a long time. For now, Pierre-Alexandre Bardina can be removed from the core team, and I've reached out to the other inactive cores. We'll wait a bit for a reply before we proceed. Of course, if these people want to contribute again in the future, we'd be glad to welcome them back in the core team. Thanks to all contributors! Cheers, -- Luka Peschke From emilien at redhat.com Wed Jun 5 12:28:18 2019 From: emilien at redhat.com (Emilien Macchi) Date: Wed, 5 Jun 2019 08:28:18 -0400 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: Message-ID: On Wed, Jun 5, 2019 at 5:21 AM Mark Goddard wrote: [...] > I understand that Tripleo uses buildah to build images already (please > correct me if I'm wrong). How is this achieved with kolla? Perhaps > using 'kolla-build --template-only' to generate Dockerfiles then > invoking buildah separately? Are you planning to work on adding > buildah support to kolla itself? > That's what we did indeed, we use Kolla to generate Dockerfiles, then call Buildah from tripleoclient to build containers. We have not planned (yet) to port that workflow to Kolla, which would involve some refacto in the build code (last time I checked). I wrote a blog post about it a while ago: https://my1.fr/blog/openstack-containerization-with-podman-part-5-image-build/ -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From christophe.sauthier at objectif-libre.com Wed Jun 5 12:29:32 2019 From: christophe.sauthier at objectif-libre.com (Christophe Sauthier) Date: Wed, 05 Jun 2019 14:29:32 +0200 Subject: [cloudkitty] Core team updates In-Reply-To: References: Message-ID: Hello It is a great +1 from me to welcome Justin ! He's doing a great work on the team both commiting and reviewing stuffs. Thanks ! Christophe ---- Christophe Sauthier Directeur Général Objectif Libre : Au service de votre Cloud +33 (0) 6 16 98 63 96 | christophe.sauthier at objectif-libre.com https://www.objectif-libre.com | @objectiflibre Recevez la Pause Cloud Et DevOps : https://olib.re/abo-pause Le 2019-06-05 14:17, Luka Peschke a écrit : > Hi all, > > I'd like to propose some updates to the CloudKitty core team: > > * First of all I'd like to welcome Justin Ferrieu (jferrieu on IRC) > to the core team. He's been around contributing (mostly on the > Prometheus collector and fetcher) and reviewing a lot for the last two > releases > (https://www.stackalytics.com/report/contribution/cloudkitty/90). It > would be great if he had +2/+A power. > > * Some cores have been inactive for a long time. For now, > Pierre-Alexandre Bardina can be removed from the core team, and I've > reached out to the other inactive cores. We'll wait a bit for a reply > before we proceed. Of course, if these people want to contribute again > in the future, we'd be glad to welcome them back in the core team. > > Thanks to all contributors! > > Cheers, From doka.ua at gmx.com Wed Jun 5 12:34:52 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Wed, 5 Jun 2019 15:34:52 +0300 Subject: [glance] zeroing image, preserving other parameters Message-ID: <1e0e0c49-0488-f899-e866-18f438db82fb@gmx.com> Dear colleagues, for some reasons, I need to shrink image size to zero (freeing storage as well), while keeping this record in Glance database. First which come to my mind is to delete image and then create new one with same name/uuid/... and --file /dev/null, but this is impossible because Glance don't really delete records from database, marking them as 'deleted' instead. Next try was to use glance image-upload from /dev/null, but this is also prohibited with message "409 Conflict: Image status transition from [activated, deactivated] to saving is not allowed (HTTP 409)" I found https://docs.openstack.org/glance/rocky/contributor/database_architecture.html#glance-database-public-api's "image_destroy" but have no clues on how to access this API. Is it kind of library or kind of REST API, how to access it and whether it's safe to use it in terms of longevity and compatibility between versions? Or, may be, you can advise any other methods to solve the problem of zeroing glance image data / freeing storage, while keeping in database just a record about this image? Thank you. -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison From dpeacock at redhat.com Wed Jun 5 12:45:42 2019 From: dpeacock at redhat.com (David Peacock) Date: Wed, 5 Jun 2019 08:45:42 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter wrote: > So the questions at hand are: what, if anything, should we do with these > repositories? Should we retire them or just ignore them? Is there anyone > using any of the roles? > My initial reaction was to suggest we just ignore them, but on second thought I'm wondering if there is anything negative if we leave them lying around. Unless we're going to benefit from them in the future if we start actively working in these repos, they represent obfuscation and debt, so it might be best to retire / dispose of them. David > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Wed Jun 5 12:47:07 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 13:47:07 +0100 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: Message-ID: On Wed, 5 Jun 2019 at 13:28, Emilien Macchi wrote: > > On Wed, Jun 5, 2019 at 5:21 AM Mark Goddard wrote: > [...] >> >> I understand that Tripleo uses buildah to build images already (please >> correct me if I'm wrong). How is this achieved with kolla? Perhaps >> using 'kolla-build --template-only' to generate Dockerfiles then >> invoking buildah separately? Are you planning to work on adding >> buildah support to kolla itself? > > > That's what we did indeed, we use Kolla to generate Dockerfiles, then call Buildah from tripleoclient to build containers. > We have not planned (yet) to port that workflow to Kolla, which would involve some refacto in the build code (last time I checked). > > I wrote a blog post about it a while ago: > https://my1.fr/blog/openstack-containerization-with-podman-part-5-image-build/ Thanks for following up. It wouldn't be a trivial change to add buildah support in kolla, but it would have saved reimplementing the task parallelisation in Tripleo and would benefit others too. Never mind. > -- > Emilien Macchi From guoyongxhzhf at 163.com Wed Jun 5 12:52:28 2019 From: guoyongxhzhf at 163.com (=?GBK?B?ufnTwg==?=) Date: Wed, 5 Jun 2019 20:52:28 +0800 (CST) Subject: [airship] Is Ironic ready for Airship? Message-ID: <15fc408f.c51c.16b27b2a86d.Coremail.guoyongxhzhf@163.com> I know Airship choose Maas as bare mental management tool. I want to know whether Maas is more suitable for Airship when it comes to under-infrastructure? If Maas is more suitable, then what feature should ironic develop? Thanks for your reply -------------- next part -------------- An HTML attachment was scrubbed... URL: From emilien at redhat.com Wed Jun 5 14:31:56 2019 From: emilien at redhat.com (Emilien Macchi) Date: Wed, 5 Jun 2019 10:31:56 -0400 Subject: [tripleo] Proposing Kamil Sambor core on TripleO Message-ID: Kamil has been working on TripleO for a while now and is providing really insightful reviews, specially on Python best practices but not only; he is one of the major contributors of the OVN integration, which was a ton of work. I believe he has the right knowledge to review any TripleO patch and provide excellent reviews in our project. We're lucky to have him with us in the team! I would like to propose him core on TripleO, please raise any objection if needed. -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Jun 5 14:37:33 2019 From: smooney at redhat.com (Sean Mooney) Date: Wed, 05 Jun 2019 15:37:33 +0100 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: Message-ID: On Wed, 2019-06-05 at 13:47 +0100, Mark Goddard wrote: > On Wed, 5 Jun 2019 at 13:28, Emilien Macchi wrote: > > > > On Wed, Jun 5, 2019 at 5:21 AM Mark Goddard wrote: > > [...] > > > > > > I understand that Tripleo uses buildah to build images already (please > > > correct me if I'm wrong). How is this achieved with kolla? Perhaps > > > using 'kolla-build --template-only' to generate Dockerfiles then > > > invoking buildah separately? Are you planning to work on adding > > > buildah support to kolla itself? > > > > > > That's what we did indeed, we use Kolla to generate Dockerfiles, then call Buildah from tripleoclient to build > > containers. > > We have not planned (yet) to port that workflow to Kolla, which would involve some refacto in the build code (last > > time I checked). > > > > I wrote a blog post about it a while ago: > > https://my1.fr/blog/openstack-containerization-with-podman-part-5-image-build/ > > Thanks for following up. It wouldn't be a trivial change to add > buildah support in kolla, but it would have saved reimplementing the > task parallelisation in Tripleo and would benefit others too. Never > mind. actully im not sure about that buildah should actully be pretty simple to add support for. its been a while but we looksed at swaping out the building with a python script a few years ago https://review.opendev.org/#/c/503882/ and it really did not take that much to enable so simply invoking buildah in a simlar maner should be trivail. podman support will be harder but again we confied all interaction with docker in kolla-ansibel to be via https://github.com/openstack/kolla-ansible/blob/master/ansible/library/kolla_docker.py so we sould jsut need to write a similar module that would work with podman and then select the correct one to use. the interface is a little large but it shoudld reactively mechanical to implement podman supprot. > > > -- > > Emilien Macchi > > From alexandre.arents at corp.ovh.com Wed Jun 5 14:38:12 2019 From: alexandre.arents at corp.ovh.com (Alexandre Arents) Date: Wed, 5 Jun 2019 16:38:12 +0200 Subject: [ops][nova][placement] NUMA topology vs non-NUMA workloads Message-ID: <20190605143812.4wqzswzr2xnbe6dp@corp.ovh.com> >From OVH point of view, We do not plan for now to mix NUMA aware and NUMA unaware workload on same compute. So you can go ahead without "can_split" feature if it helps. Alex >This message is primarily addressed at operators, and of those, >operators who are interested in effectively managing and mixing >workloads that care about NUMA with workloads that do not. There are >some questions within, after some background to explain the issue. > >At the PTG, Nova and Placement developers made a commitment to more >effectively manage NUMA topologies within Nova and Placement. On the >placement side this resulted in a spec which proposed several >features that would enable more expressive queries when requesting >allocation candidates (places for workloads to go), resulting in >fewer late scheduling failures. > >At first there was one spec that discussed all the features. This >morning it was split in two because one of the features is proving >hard to resolve. Those two specs can be found at: > >* https://review.opendev.org/658510 (has all the original discussion) >* https://review.opendev.org/662191 (the less contentious features split out) > >After much discussion, we would prefer to not do the feature >discussed in 658510. Called 'can_split', it would allow specified >classes of resource (notably VCPU and memory) to be split across >multiple numa nodes when each node can only contribute a portion of >the required resources and where those resources are modelled as >inventory on the NUMA nodes, not the host at large. > >While this is a good idea in principle it turns out (see the spec) >to cause many issues that require changes throughout the ecosystem, >for example enforcing pinned cpus for workloads that would normally >float. It's possible to make the changes, but it would require >additional contributors to join the effort, both in terms of writing >the code and understanding the many issues. > >So the questions: > >* How important, in your cloud, is it to co-locate guests needing a > NUMA topology with guests that do not? A review of documentation > (upstream and vendor) shows differing levels of recommendation on > this, but in many cases the recommendation is to not do it. > >* If your answer to the above is "we must be able to do that": How > important is it that your cloud be able to pack workloads as tight > as possible? That is: If there are two NUMA nodes and each has 2 > VCPU free, should a 4 VCPU demanding non-NUMA workload be able to > land there? Or would you prefer that not happen? > >* If the answer to the first question is "we can get by without > that" is it satisfactory to be able to configure some hosts as NUMA > aware and others as not, as described in the "NUMA topology with > RPs" spec [1]? In this set up some non-NUMA workloads could end up > on a NUMA host (unless otherwise excluded by traits or aggregates), > but only when there was contiguous resource available. > >This latter question articulates the current plan unless responses >to this message indicate it simply can't work or legions of >assistance shows up. Note that even if we don't do can_split, we'll >still be enabling significant progress with the other features >described in the second spec [2]. > >Thanks for your help in moving us in the right direction. > >[1] https://review.opendev.org/552924 >[2] https://review.opendev.org/662191 >-- >Chris Dent ٩◔̯◔۶ https://anticdent.org/ >freenode: cdent -- Alexandre Arents From rosmaita.fossdev at gmail.com Wed Jun 5 14:38:53 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 5 Jun 2019 10:38:53 -0400 Subject: [glance] zeroing image, preserving other parameters In-Reply-To: <1e0e0c49-0488-f899-e866-18f438db82fb@gmx.com> References: <1e0e0c49-0488-f899-e866-18f438db82fb@gmx.com> Message-ID: <6faa9bd8-c273-ae19-1fa1-9ecaa5a7b94c@gmail.com> On 6/5/19 8:34 AM, Volodymyr Litovka wrote: > Dear colleagues, > > for some reasons, I need to shrink image size to zero (freeing storage > as well), while keeping this record in Glance database. > > First which come to my mind is to delete image and then create new one > with same name/uuid/... and --file /dev/null, but this is impossible > because Glance don't really delete records from database, marking them > as 'deleted' instead. The glance-manage utility program allows you to purge the database. The images table (where the image UUIDs are stored) is not purged by default because of OSSN-0075 [0]. See the glance docs [1] for details. [0] https://wiki.openstack.org/wiki/OSSN/OSSN-0075 [1] https://docs.openstack.org/glance/latest/admin/db.html#database-maintenance (That doesn't really help your issue, I just wanted to point out that there is a way to purge the database.) > Next try was to use glance image-upload from /dev/null, but this is also > prohibited with message "409 Conflict: Image status transition from > [activated, deactivated] to saving is not allowed (HTTP 409)" That's correct, Glance will not allow you to replace the image data once an image has gone to 'active' status. > I found > https://docs.openstack.org/glance/rocky/contributor/database_architecture.html#glance-database-public-api's > > "image_destroy" but have no clues on how to access this API. Is it kind > of library or kind of REST API, how to access it and whether it's safe > to use it in terms of longevity and compatibility between versions? The title of that document is misleading. It describes the interface that Glance developers can use when they need to interact with the database. There's no tool that exposes those operations to operators. > Or, may be, you can advise any other methods to solve the problem of > zeroing glance image data / freeing storage, while keeping in database > just a record about this image? If you purged the database, you could do your proposal to recreate the image with a zero-size file -- but that would give you an image with status 'active' that an end user could try to boot an instance with. I don't think that's a good idea. Additionally, purging the images table of all UUIDs, not just the few you want to replace, exposes you to OSSN-0075. An alternative--and I'm not sure this is a good idea either--would be to deactivate the image [2]. This would preserve all the current metadata but not allow the image to be downloaded by a non-administrator. With the image not in 'active' status, nova or cinder won't try to use it to create instances or volumes. The image data would still exist, though, so you'd need to delete it manually from the backend to really clear out the space. Additionally, the image size would remain, which might be useful for record-keeping, although on the other hand, it will still count against the user_storage_quota. And the image locations will still exist even though they won't refer to any existing data any more. (Like I said, I'm not sure this is a good idea.) [2] https://developer.openstack.org/api-ref/image/v2/#deactivate-image > Thank you. Not sure I was much help. Let's see if other operators have a good workaround or a need for this kind of functionality. > > -- > Volodymyr Litovka >   "Vision without Execution is Hallucination." -- Thomas Edison > > From emilien at redhat.com Wed Jun 5 14:48:02 2019 From: emilien at redhat.com (Emilien Macchi) Date: Wed, 5 Jun 2019 10:48:02 -0400 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: Message-ID: On Wed, Jun 5, 2019 at 8:47 AM Mark Goddard wrote: > Thanks for following up. It wouldn't be a trivial change to add > buildah support in kolla, but it would have saved reimplementing the > task parallelisation in Tripleo and would benefit others too. Never > mind. > To be fair, at the time I wrote the code in python-tripleoclient the container tooling wasn't really stable and we weren't sure about the directions we would take yet; which is the main reason which drove us to not invest too much time into refactoring Kolla to support a tool that we weren't sure we would end up using in production for the container image building. It has been a few months now and so far it works ok for our needs; so if there is interest in supporting Buildah in Kolla then we might want to do the refactor and of course TripleO would use this new feature. -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Wed Jun 5 14:59:24 2019 From: aschultz at redhat.com (Alex Schultz) Date: Wed, 5 Jun 2019 08:59:24 -0600 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: Message-ID: On Wed, Jun 5, 2019 at 8:46 AM Sean Mooney wrote: > On Wed, 2019-06-05 at 13:47 +0100, Mark Goddard wrote: > > On Wed, 5 Jun 2019 at 13:28, Emilien Macchi wrote: > > > > > > On Wed, Jun 5, 2019 at 5:21 AM Mark Goddard wrote: > > > [...] > > > > > > > > I understand that Tripleo uses buildah to build images already > (please > > > > correct me if I'm wrong). How is this achieved with kolla? Perhaps > > > > using 'kolla-build --template-only' to generate Dockerfiles then > > > > invoking buildah separately? Are you planning to work on adding > > > > buildah support to kolla itself? > > > > > > > > > That's what we did indeed, we use Kolla to generate Dockerfiles, then > call Buildah from tripleoclient to build > > > containers. > > > We have not planned (yet) to port that workflow to Kolla, which would > involve some refacto in the build code (last > > > time I checked). > > > > > > I wrote a blog post about it a while ago: > > > > https://my1.fr/blog/openstack-containerization-with-podman-part-5-image-build/ > > > > Thanks for following up. It wouldn't be a trivial change to add > > buildah support in kolla, but it would have saved reimplementing the > > task parallelisation in Tripleo and would benefit others too. Never > > mind. > actully im not sure about that buildah should actully be pretty simple to > add support for. > its been a while but we looksed at swaping out the building with a python > script a few years ago > https://review.opendev.org/#/c/503882/ > and it really did not take that much to enable so simply invoking buildah > in a simlar maner should be trivail. > > The issue was trying to build the appropriate parallelization logic based on the kolla container build order[0]. We're using the --list-dependencies to get the ordering for the build[1] and then run it through our builder[2]. You wouldn't want to do it serially because it's dramatically slower. Our buildah builder is only slightly slower than the docker one at this point. > podman support will be harder but again we confied all interaction with > docker in kolla-ansibel to be via > > https://github.com/openstack/kolla-ansible/blob/master/ansible/library/kolla_docker.py > so we sould jsut need to write a similar module that would work with > podman and then select the correct one to use. > the interface is a little large but it shoudld reactively mechanical to > implement podman supprot. > The podman support is a bit more complex because there is no daemon associated with it. We wrote systemd/podman support into paunch[3] for us to handle the management of the life cycles of the containers. We'd like to investigate switching our invocation of paunch from cli to an ansible plugin/module which might be beneficial for kolla-ansible as well. Thanks, -Alex [0] https://opendev.org/openstack/tripleo-common/src/branch/master/tripleo_common/image/builder/buildah.py#L156 [1] https://opendev.org/openstack/tripleo-common/src/branch/master/tripleo_common/image/kolla_builder.py#L496 [2] https://opendev.org/openstack/python-tripleoclient/src/branch/master/tripleoclient/v1/container_image.py#L207-L228 [3] https://opendev.org/openstack/paunch > > > > -- > > > Emilien Macchi > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aj at suse.com Wed Jun 5 15:20:37 2019 From: aj at suse.com (Andreas Jaeger) Date: Wed, 5 Jun 2019 17:20:37 +0200 Subject: [infra] fetch-zuul-cloner and permissions (was: redefining devstack) In-Reply-To: References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> <20190604173241.3r22gjulzwuvihbk@yuggoth.org> <20190604220727.GB32715@localhost.localdomain> Message-ID: On 05/06/2019 08.47, Andreas Jaeger wrote: > On 05/06/2019 00.07, Paul Belanger wrote: >> Testing was done, you can see that in >> https://review.opendev.org/513506/. However the issue was, at the time, >> projects that were using tools/tox_install.sh would break (I have no >> idea is that is still the case). > > I have a couple of changes open to remove the final tools/tox_install.sh > files, see: > > https://review.opendev.org/#/q/status:open+++topic:tox-siblings > > > There are a few more repos that didn't take my changes from last year > which I abandoned in the mean time - and a few dead repos that I did not > submit to when double checking today ;( > > Also, compute-hyperv and nova-blazar need > https://review.opendev.org/663234 (requirements change) first. That one has a -2 now. ;( I won't be able to work on alternative solutions and neither can access whether this blocks the changes. Anybody to take this over, please? > So, we should be pretty good if these changes get reviewed and merged, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From openstack at nemebean.com Wed Jun 5 15:28:35 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 5 Jun 2019 10:28:35 -0500 Subject: [oslo] Bandit Strategy In-Reply-To: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> References: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> Message-ID: Since it seems we need to backport this to the stable branches, I've added stable branch columns to https://ethercalc.openstack.org/ml1qj9xrnyfg I know some backports have already been proposed, so if people can fill in the appropriate columns that would help avoid unnecessary work on projects that are already done. Hopefully these will be clean backports, but I know at least one included a change to requirements.txt too. We'll need to make sure we don't accidentally backport any of those or we won't be able to release the stable branches. As discussed in the meeting this week, we're only planning to backport to the active branches. The em branches can be updated if necessary, but we don't need to do a mass backport to them. I think that's it. Let me know if you have any comments or questions. Thanks. -Ben On 5/13/19 12:23 PM, Ben Nemec wrote: > Nefarious cap bandits are running amok in the OpenStack community! Won't > someone take a stand against these villainous headwear thieves?! > > Oh, sorry, just pasted the elevator pitch for my new novel. ;-) > > Actually, this email is to summarize the plan we came up with in the > Oslo meeting this morning. Since we have a bunch of projects affected by > the Bandit breakage I wanted to make sure we had a common fix so we > don't have a bunch of slightly different approaches in each project. The > plan we agreed on in the meeting was to push a two patch series to each > repo - one to cap bandit <1.6.0 and one to uncap it with a !=1.6.0 > exclusion. The first should be merged immediately to unblock ci, and the > latter can be rechecked once bandit 1.6.1 releases to verify that it > fixes the problem for us. > > We chose this approach instead of just tweaking the exclusion in tox.ini > because it's not clear that the current behavior will continue once > Bandit fixes the bug. Assuming they restore the old behavior, this > should require the least churn in our repos and means we're still > compatible with older versions that people may already have installed. > > I started pushing patches under > https://review.opendev.org/#/q/topic:cap-bandit (which prompted the > digression to start this email ;-) to implement this plan. This is > mostly intended to be informational, but if you have any concerns with > the plan above please do let us know immediately. > > Thanks. > > -Ben > From Kevin.Fox at pnnl.gov Wed Jun 5 15:31:26 2019 From: Kevin.Fox at pnnl.gov (Fox, Kevin M) Date: Wed, 5 Jun 2019 15:31:26 +0000 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: , Message-ID: <1A3C52DFCD06494D8528644858247BF01C359A88@EX10MBOX03.pnnl.gov> Whats the plan for Kubernetes integration at this point? I keep seeing more and more talk/work on integrating podman and paunch and such but its a lot of work that doesn't apply when switching to Kubernetes? Thanks, Kevin ________________________________ From: Alex Schultz [aschultz at redhat.com] Sent: Wednesday, June 05, 2019 7:59 AM To: Sean Mooney Cc: Mark Goddard; Emilien Macchi; openstack-discuss Subject: Re: [kolla][tripleo] Podman/buildah On Wed, Jun 5, 2019 at 8:46 AM Sean Mooney > wrote: On Wed, 2019-06-05 at 13:47 +0100, Mark Goddard wrote: > On Wed, 5 Jun 2019 at 13:28, Emilien Macchi > wrote: > > > > On Wed, Jun 5, 2019 at 5:21 AM Mark Goddard > wrote: > > [...] > > > > > > I understand that Tripleo uses buildah to build images already (please > > > correct me if I'm wrong). How is this achieved with kolla? Perhaps > > > using 'kolla-build --template-only' to generate Dockerfiles then > > > invoking buildah separately? Are you planning to work on adding > > > buildah support to kolla itself? > > > > > > That's what we did indeed, we use Kolla to generate Dockerfiles, then call Buildah from tripleoclient to build > > containers. > > We have not planned (yet) to port that workflow to Kolla, which would involve some refacto in the build code (last > > time I checked). > > > > I wrote a blog post about it a while ago: > > https://my1.fr/blog/openstack-containerization-with-podman-part-5-image-build/ > > Thanks for following up. It wouldn't be a trivial change to add > buildah support in kolla, but it would have saved reimplementing the > task parallelisation in Tripleo and would benefit others too. Never > mind. actully im not sure about that buildah should actully be pretty simple to add support for. its been a while but we looksed at swaping out the building with a python script a few years ago https://review.opendev.org/#/c/503882/ and it really did not take that much to enable so simply invoking buildah in a simlar maner should be trivail. The issue was trying to build the appropriate parallelization logic based on the kolla container build order[0]. We're using the --list-dependencies to get the ordering for the build[1] and then run it through our builder[2]. You wouldn't want to do it serially because it's dramatically slower. Our buildah builder is only slightly slower than the docker one at this point. podman support will be harder but again we confied all interaction with docker in kolla-ansibel to be via https://github.com/openstack/kolla-ansible/blob/master/ansible/library/kolla_docker.py so we sould jsut need to write a similar module that would work with podman and then select the correct one to use. the interface is a little large but it shoudld reactively mechanical to implement podman supprot. The podman support is a bit more complex because there is no daemon associated with it. We wrote systemd/podman support into paunch[3] for us to handle the management of the life cycles of the containers. We'd like to investigate switching our invocation of paunch from cli to an ansible plugin/module which might be beneficial for kolla-ansible as well. Thanks, -Alex [0] https://opendev.org/openstack/tripleo-common/src/branch/master/tripleo_common/image/builder/buildah.py#L156 [1] https://opendev.org/openstack/tripleo-common/src/branch/master/tripleo_common/image/kolla_builder.py#L496 [2] https://opendev.org/openstack/python-tripleoclient/src/branch/master/tripleoclient/v1/container_image.py#L207-L228 [3] https://opendev.org/openstack/paunch > > > -- > > Emilien Macchi > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Wed Jun 5 15:40:57 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 5 Jun 2019 15:40:57 +0000 Subject: [infra] fetch-zuul-cloner and permissions (was: redefining devstack) In-Reply-To: References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> <20190604173241.3r22gjulzwuvihbk@yuggoth.org> <20190604220727.GB32715@localhost.localdomain> Message-ID: <20190605154056.qpcrwa3jppgifece@yuggoth.org> On 2019-06-05 17:20:37 +0200 (+0200), Andreas Jaeger wrote: > On 05/06/2019 08.47, Andreas Jaeger wrote: [...] > > There are a few more repos that didn't take my changes from last year > > which I abandoned in the mean time - and a few dead repos that I did not > > submit to when double checking today ;( > > > > Also, compute-hyperv and nova-blazar need > > https://review.opendev.org/663234 (requirements change) first. > > That one has a -2 now. ;( > > I won't be able to work on alternative solutions and neither can access > whether this blocks the changes. Anybody to take this over, please? [...] It should be the responsibility of the compute-hyperv and nova-blazar maintainers to solve this problem, though your attempts to help them with a possible solution have been admirable. Thanks for this, and for all the others which did get merged already! -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From aschultz at redhat.com Wed Jun 5 15:41:37 2019 From: aschultz at redhat.com (Alex Schultz) Date: Wed, 5 Jun 2019 09:41:37 -0600 Subject: [kolla][tripleo] Python 3, CentOS/RHEL 8 In-Reply-To: References: Message-ID: On Wed, Jun 5, 2019 at 3:19 AM Mark Goddard wrote: > Hi, > > At the recent kolla virtual PTG [1], we discussed the move to python 3 > images in the Train cycle. hrw has started this effort for > Ubuntu/Debian source images [2] and is making good progress. > > Next we will need to consider CentOS and RHEL. It seems that for Train > RDO will provide only python 3 packages with support for CentOS 8 [3]. > There may be some overlap in the trunk (master) packages where there > is support for both CentOS 7 and 8. We will therefore need to combine > the switch to python 3 with a switch to a CentOS/RHEL 8 base image. > > Some work was started during the Stein cycle to support RHEL 8 images > with python 3 packages. There will no doubt be a few scripts that need > updating to complete this work. We'll also need to test to ensure that > both binary and source images work in this new world. > > When CentOS8 is available, we'll be working on that more with TripleO to ensure it's working and if there are issues we'll likely submit fixes as necessary. Currently https://review.opendev.org/#/c/632156/ should be the actual support for the python3 bits as currently required when using the RDO provided packages. We're not aware of any outstanding issues but if we run into them, then we will help as needed. We currently use kolla to generate the related Dockerfiles for building with RHEL8 and have posted the issues that we've run across so far. The related work for podman/buildah (if desired) is currently being discussed in a different thread. > Tripleo team - what are your plans for CentOS/RHEL 8 and python 3 this > cycle? Are you planning to continue the work started in kolla during > the Stein release? > As mentioned, we're not currently aware of any outstanding issues around this so as of Stein, the python3 related packages (when available) combined with an 8 based base image+repos should work. > > Thanks, > Mark > > [1] https://etherpad.openstack.org/p/kolla-train-ptg > [2] https://blueprints.launchpad.net/kolla/+spec/debian-ubuntu-python3 > [3] https://review.rdoproject.org/etherpad/p/moving-rdo-to-centos8 > [4] https://review.opendev.org/#/c/632156/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Wed Jun 5 15:49:27 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 5 Jun 2019 15:49:27 +0000 Subject: [oslo] Bandit Strategy In-Reply-To: References: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> Message-ID: <20190605154927.nskpuosejx2of6rp@yuggoth.org> On 2019-06-05 10:28:35 -0500 (-0500), Ben Nemec wrote: > Since it seems we need to backport this to the stable branches [...] You've probably been following along, but a fix for https://github.com/PyCQA/bandit/issues/488 was merged upstream on May 26, so now we're just waiting for a new release to be tagged. It may make sense to spend some time lobbying them to accelerate their release process if it means less time spent backporting exclusions to a bazillion projects. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gsteinmuller at vexxhost.com Wed Jun 5 15:57:45 2019 From: gsteinmuller at vexxhost.com (=?UTF-8?Q?Guilherme_Steinm=C3=BCller?=) Date: Wed, 5 Jun 2019 12:57:45 -0300 Subject: [horizon] dropping 2012.2 tag on pypi Message-ID: Hello, As we've discussed with nova tag recently [1], I'd suggest the same for horizon. When we search on pypi the version it shows is 2012.2 and when we click release history we can see that the most recent version is 15.1.0 [2] [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006780.html [2] https://pypi.org/project/horizon/#history Regards, Guilherme Steinmuller -------------- next part -------------- An HTML attachment was scrubbed... URL: From hberaud at redhat.com Wed Jun 5 16:16:25 2019 From: hberaud at redhat.com (Herve Beraud) Date: Wed, 5 Jun 2019 18:16:25 +0200 Subject: [oslo] Bandit Strategy In-Reply-To: <20190605154927.nskpuosejx2of6rp@yuggoth.org> References: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> <20190605154927.nskpuosejx2of6rp@yuggoth.org> Message-ID: I think that waiting the bandit release is a good idea Le mer. 5 juin 2019 à 17:54, Jeremy Stanley a écrit : > On 2019-06-05 10:28:35 -0500 (-0500), Ben Nemec wrote: > > Since it seems we need to backport this to the stable branches > [...] > > You've probably been following along, but a fix for > https://github.com/PyCQA/bandit/issues/488 was merged upstream on > May 26, so now we're just waiting for a new release to be tagged. It > may make sense to spend some time lobbying them to accelerate their > release process if it means less time spent backporting exclusions > to a bazillion projects. > -- > Jeremy Stanley > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Wed Jun 5 16:27:09 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 5 Jun 2019 11:27:09 -0500 Subject: [oslo] Bandit Strategy In-Reply-To: References: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> <20190605154927.nskpuosejx2of6rp@yuggoth.org> Message-ID: <26d80bb6-9893-6e5d-5be6-abce99c2dd76@nemebean.com> Agreed. There's probably an argument that we should cap bandit on stable branches anyway, but it would save us a lot of tedious patches if we just hope bandit doesn't break us again. :-) On 6/5/19 11:16 AM, Herve Beraud wrote: > I think that waiting the bandit release is a good idea > > Le mer. 5 juin 2019 à 17:54, Jeremy Stanley > a écrit : > > On 2019-06-05 10:28:35 -0500 (-0500), Ben Nemec wrote: > > Since it seems we need to backport this to the stable branches > [...] > > You've probably been following along, but a fix for > https://github.com/PyCQA/bandit/issues/488 was merged upstream on > May 26, so now we're just waiting for a new release to be tagged. It > may make sense to spend some time lobbying them to accelerate their > release process if it means less time spent backporting exclusions > to a bazillion projects. > -- > Jeremy Stanley > > > > -- > Hervé Beraud > Senior Software Engineer > Red Hat - Openstack Oslo > irc: hberaud > -----BEGIN PGP SIGNATURE----- > > wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ > Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ > RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP > F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G > 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g > glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw > m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ > hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 > qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y > F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 > B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O > v6rDpkeNksZ9fFSyoY2o > =ECSj > -----END PGP SIGNATURE----- > From lshort at redhat.com Wed Jun 5 16:27:17 2019 From: lshort at redhat.com (Luke Short) Date: Wed, 5 Jun 2019 12:27:17 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: Hey everyone, For the upcoming work on focusing on more Ansible automation and testing, I have created a dedicated #tripleo-transformation channel for our new squad. Feel free to join if you are interested in joining and helping out! +1 to removing repositories we don't use, especially if they have no working code. I'd like to see the consolidation of TripleO specific things into the tripleo-ansible repository and then using upstream Ansible roles for all of the different services (nova, glance, cinder, etc.). Sincerely, Luke Short, RHCE Software Engineer, OpenStack Deployment Framework Red Hat, Inc. On Wed, Jun 5, 2019 at 8:53 AM David Peacock wrote: > On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter wrote: > >> So the questions at hand are: what, if anything, should we do with these >> repositories? Should we retire them or just ignore them? Is there anyone >> using any of the roles? >> > > My initial reaction was to suggest we just ignore them, but on second > thought I'm wondering if there is anything negative if we leave them lying > around. Unless we're going to benefit from them in the future if we start > actively working in these repos, they represent obfuscation and debt, so it > might be best to retire / dispose of them. > > David > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Wed Jun 5 16:31:45 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 17:31:45 +0100 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: Message-ID: On Wed, 5 Jun 2019 at 15:48, Emilien Macchi wrote: > > On Wed, Jun 5, 2019 at 8:47 AM Mark Goddard wrote: >> >> Thanks for following up. It wouldn't be a trivial change to add >> buildah support in kolla, but it would have saved reimplementing the >> task parallelisation in Tripleo and would benefit others too. Never >> mind. > > > To be fair, at the time I wrote the code in python-tripleoclient the container tooling wasn't really stable and we weren't sure about the directions we would take yet; which is the main reason which drove us to not invest too much time into refactoring Kolla to support a tool that we weren't sure we would end up using in production for the container image building. > That's fair, sorry to grumble :) > It has been a few months now and so far it works ok for our needs; so if there is interest in supporting Buildah in Kolla then we might want to do the refactor and of course TripleO would use this new feature. > -- > Emilien Macchi From fungi at yuggoth.org Wed Jun 5 16:32:56 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 5 Jun 2019 16:32:56 +0000 Subject: [oslo] Bandit Strategy In-Reply-To: <26d80bb6-9893-6e5d-5be6-abce99c2dd76@nemebean.com> References: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> <20190605154927.nskpuosejx2of6rp@yuggoth.org> <26d80bb6-9893-6e5d-5be6-abce99c2dd76@nemebean.com> Message-ID: <20190605163256.6muy2ooxwlmdissq@yuggoth.org> On 2019-06-05 11:27:09 -0500 (-0500), Ben Nemec wrote: > Agreed. There's probably an argument that we should cap bandit on > stable branches anyway, but it would save us a lot of tedious > patches if we just hope bandit doesn't break us again. :-) [...] Oh, yes, I think capping on stable is probably a fine idea regardless (we should be doing that anyway for all our static analyzers on principle). What I meant is that it would likely render those updates no longer urgent. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Wed Jun 5 16:35:52 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 5 Jun 2019 16:35:52 +0000 Subject: [horizon] dropping 2012.2 tag on pypi In-Reply-To: References: Message-ID: <20190605163552.bnmjqxtoncct6lxr@yuggoth.org> On 2019-06-05 12:57:45 -0300 (-0300), Guilherme Steinmüller wrote: > As we've discussed with nova tag recently [1], I'd suggest the same for > horizon. > > When we search on pypi the version it shows is 2012.2 and when we click > release history we can see that the most recent version is 15.1.0 [2] > > [1] > http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006780.html > [2] https://pypi.org/project/horizon/#history Thanks for pointing this out. Since we basically got blanket approval to do this for any official OpenStack project some years back, I've removed the 2012.2 from the horizon project on PyPI just now. If anybody spots others, please do mention them! -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From mthode at mthode.org Wed Jun 5 16:58:07 2019 From: mthode at mthode.org (Matthew Thode) Date: Wed, 5 Jun 2019 11:58:07 -0500 Subject: [requirements][kuryr][flame] openshift dificulties In-Reply-To: <20190530151739.nfzrqfstlb2sbrq5@mthode.org> References: <20190529205352.f2dxzckgvfavbvtv@mthode.org> <20190530151739.nfzrqfstlb2sbrq5@mthode.org> Message-ID: <20190605165807.jmhogmfyrxltx5b3@mthode.org> On 19-05-30 10:17:39, Matthew Thode wrote: > On 19-05-30 17:07:54, Michał Dulko wrote: > > On Wed, 2019-05-29 at 15:53 -0500, Matthew Thode wrote: > > > Openshift upstream is giving us difficulty as they are capping the > > > version of urllib3 and kubernetes we are using. > > > > > > -urllib3===1.25.3 > > > +urllib3===1.24.3 > > > -kubernetes===9.0.0 > > > +kubernetes===8.0.1 > > > > > > I've opened an issue with them but not had much luck there (and their > > > prefered solution just pushes the can down the road). > > > > > > https://github.com/openshift/openshift-restclient-python/issues/289 > > > > > > What I'd us to do is move off of openshift as our usage doesn't seem too > > > much. > > > > > > openstack/kuryr-tempest-plugin uses it for one import (and just one > > > function with that import). I'm not sure exactly what you are doing > > > with it but would it be too much to ask to move to something else? > > > > From Kuryr side it's not really much effort, we can switch to bare REST > > calls, but obviously we prefer the client. If there's much support for > > getting rid of it, we can do the switch. > > > > Right now Kyryr is only using it in that one place and it's blocking the > update of urllib3 and kubernetes for the rest of openstack. So if it's > not too much trouble it'd be nice to have happen. > > > > x/flame has it in it's constraints but I don't see any actual usage, so > > > perhaps it's a false flag. > > > > > > Please let me know what you think > > > > Any updates on this? I'd like to move forward on removing the dependency if possible. -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From jp.methot at planethoster.info Wed Jun 5 17:01:50 2019 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Wed, 5 Jun 2019 13:01:50 -0400 Subject: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq Message-ID: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> Hi, We had a Pike openstack setup that we updated to Queens earlier this week. It’s a 30 compute nodes infrastructure with 2 controller nodes and 2 network nodes, using openvswitch for networking. Since we upgraded to queens, neutron-server on the controller nodes has been unable to contact the openvswitch-agents through rabbitmq. The rabbitmq is clustered on both controller nodes and has been giving us the following error when neutron-server connections fail : =ERROR REPORT==== 5-Jun-2019::18:50:08 === closing AMQP connection <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d): missed heartbeats from client, timeout: 60s The neutron-server logs show this error: 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on controller1:5672 is unreachable: . Trying again in 1 seconds.: RecoverableConnectionError: The relevant service version numbers are as follow: rabbitmq-server-3.6.5-1.el7.noarch openstack-neutron-12.0.6-1.el7.noarch python2-oslo-messaging-5.35.4-1.el7.noarch Rabbitmq does not show any alert. It also has plenty of memory and a high enough file limit. The login user and credentials are fine as they are used in other openstack services which can contact rabbitmq without issues. I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing timeouts in neutron services, etc, to no avail. I find myself at a loss and would appreciate if anyone has any idea as to where to go from there. Best regards, Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From haleyb.dev at gmail.com Wed Jun 5 19:09:59 2019 From: haleyb.dev at gmail.com (Brian Haley) Date: Wed, 5 Jun 2019 15:09:59 -0400 Subject: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq In-Reply-To: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> References: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> Message-ID: <99c58ab2-0b01-16d0-859d-afbde7dab3fb@gmail.com> On 6/5/19 1:01 PM, Jean-Philippe Méthot wrote: > Hi, > > We had a Pike openstack setup that we updated to Queens earlier this > week. It’s a 30 compute nodes infrastructure with 2 controller nodes and > 2 network nodes, using openvswitch for networking. Since we upgraded to > queens, neutron-server on the controller nodes has been unable to > contact the openvswitch-agents through rabbitmq. The rabbitmq is > clustered on both controller nodes and has been giving us the following > error when neutron-server connections fail : > > =ERROR REPORT==== 5-Jun-2019::18:50:08 === > closing AMQP connection <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 > - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d): > missed heartbeats from client, timeout: 60s > > The neutron-server logs show this error: > > 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit > [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] > [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 > is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 > seconds.: error: [Errno 104] Connection reset by peer > 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit > [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on > controller1:5672 is unreachable: error>. Trying again in 1 seconds.: RecoverableConnectionError: > Are there possibly any firewall rules getting in the way? Connection reset by peer usually means the other end has sent a TCP Reset, which wouldn't happen if the permissions were wrong. As a test, does this connect? $ telnet controller1 5672 Trying $IP... Connected to controller1. Escape character is '^]'. -Brian > The relevant service version numbers are as follow: > rabbitmq-server-3.6.5-1.el7.noarch > openstack-neutron-12.0.6-1.el7.noarch > python2-oslo-messaging-5.35.4-1.el7.noarch > > Rabbitmq does not show any alert. It also has plenty of memory and a > high enough file limit. The login user and credentials are fine as they > are used in other openstack services which can contact rabbitmq without > issues. > > I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing > timeouts in neutron services, etc, to no avail. I find myself at a loss > and would appreciate if anyone has any idea as to where to go from there. > > Best regards, > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > From jp.methot at planethoster.info Wed Jun 5 19:31:32 2019 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Wed, 5 Jun 2019 15:31:32 -0400 Subject: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq In-Reply-To: <99c58ab2-0b01-16d0-859d-afbde7dab3fb@gmail.com> References: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> <99c58ab2-0b01-16d0-859d-afbde7dab3fb@gmail.com> Message-ID: Hi, Thank you for your reply. There’s no firewall. However, we ended up figuring out that we were running out of tcp sockets. On a related note, we are still having issues but only with metadata fed through Neutron. Seems that it’s nova-api refusing the connection with http 500 error when the metadata-agent tries to connect to it. This is a completely different issue and may be more related to nova than neutron though, so it may very well not be the right mail thread to discuss it. Best regards, Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. > Le 5 juin 2019 à 15:09, Brian Haley a écrit : > > On 6/5/19 1:01 PM, Jean-Philippe Méthot wrote: >> Hi, >> We had a Pike openstack setup that we updated to Queens earlier this week. It’s a 30 compute nodes infrastructure with 2 controller nodes and 2 network nodes, using openvswitch for networking. Since we upgraded to queens, neutron-server on the controller nodes has been unable to contact the openvswitch-agents through rabbitmq. The rabbitmq is clustered on both controller nodes and has been giving us the following error when neutron-server connections fail : >> =ERROR REPORT==== 5-Jun-2019::18:50:08 === >> closing AMQP connection <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d): >> missed heartbeats from client, timeout: 60s >> The neutron-server logs show this error: >> 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer >> 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on controller1:5672 is unreachable: . Trying again in 1 seconds.: RecoverableConnectionError: > > Are there possibly any firewall rules getting in the way? Connection reset by peer usually means the other end has sent a TCP Reset, which wouldn't happen if the permissions were wrong. > > As a test, does this connect? > > $ telnet controller1 5672 > Trying $IP... > Connected to controller1. > Escape character is '^]'. > > -Brian > > >> The relevant service version numbers are as follow: >> rabbitmq-server-3.6.5-1.el7.noarch >> openstack-neutron-12.0.6-1.el7.noarch >> python2-oslo-messaging-5.35.4-1.el7.noarch >> Rabbitmq does not show any alert. It also has plenty of memory and a high enough file limit. The login user and credentials are fine as they are used in other openstack services which can contact rabbitmq without issues. >> I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing timeouts in neutron services, etc, to no avail. I find myself at a loss and would appreciate if anyone has any idea as to where to go from there. >> Best regards, >> Jean-Philippe Méthot >> Openstack system administrator >> Administrateur système Openstack >> PlanetHoster inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgagne at calavera.ca Wed Jun 5 19:31:43 2019 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Wed, 5 Jun 2019 15:31:43 -0400 Subject: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq In-Reply-To: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> References: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> Message-ID: Hi Jean-Philippe, On Wed, Jun 5, 2019 at 1:01 PM Jean-Philippe Méthot wrote: > > We had a Pike openstack setup that we updated to Queens earlier this week. It’s a 30 compute nodes infrastructure with 2 controller nodes and 2 network nodes, using openvswitch for networking. Since we upgraded to queens, neutron-server on the controller nodes has been unable to contact the openvswitch-agents through rabbitmq. The rabbitmq is clustered on both controller nodes and has been giving us the following error when neutron-server connections fail : > > =ERROR REPORT==== 5-Jun-2019::18:50:08 === > closing AMQP connection <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d): > missed heartbeats from client, timeout: 60s > > The neutron-server logs show this error: > > 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer > 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on controller1:5672 is unreachable: . Trying again in 1 seconds.: RecoverableConnectionError: > > The relevant service version numbers are as follow: > rabbitmq-server-3.6.5-1.el7.noarch > openstack-neutron-12.0.6-1.el7.noarch > python2-oslo-messaging-5.35.4-1.el7.noarch > > Rabbitmq does not show any alert. It also has plenty of memory and a high enough file limit. The login user and credentials are fine as they are used in other openstack services which can contact rabbitmq without issues. > > I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing timeouts in neutron services, etc, to no avail. I find myself at a loss and would appreciate if anyone has any idea as to where to go from there. > We had a very similar issue after upgrading to Neutron Queens. In fact, all Neutron agents were "down" according to status API and messages weren't getting through. IIRC, this only happened in regions which had more load than the others. We applied a bunch of fixes which I suspect are only a bunch of bandaids. Here are the changes we made: * Split neutron-api from neutron-server. Create a whole new controller running neutron-api with mod_wsgi. * Increase [database]/max_overflow = 200 * Disable RabbitMQ heartbeat in oslo.messaging: [oslo_messaging_rabbit]/heartbeat_timeout_threshold = 0 * Increase [agent]/report_interval = 120 * Increase [DEFAULT]/agent_down_time = 600 We also have those sysctl configs due to firewall dropping sessions. But those have been on the server forever: net.ipv4.tcp_keepalive_time = 30 net.ipv4.tcp_keepalive_intvl = 1 net.ipv4.tcp_keepalive_probes = 5 We never figured out why a service that was working before the upgrade but no longer is. This is kind of frustrating as it caused us all short of intermittent issues and stress during our upgrade. Hope this helps. -- Mathieu From eandersson at blizzard.com Wed Jun 5 23:21:54 2019 From: eandersson at blizzard.com (Erik Olof Gunnar Andersson) Date: Wed, 5 Jun 2019 23:21:54 +0000 Subject: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq In-Reply-To: References: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> Message-ID: We have experienced similar issues when upgrading from Mitaka to Rocky. Distributing the RabbitMQ connections between the Rabbits helps a lot. At least with larger deployments. Since not all services re-connecting will be establishing it's connections against a single RabbitMQ server. > oslo_messaging_rabbit/kombu_failover_strategy = shuffle An alternative is to increase the SSL (and/or TCP) acceptors on RabbitMQ to allow it to process new connections faster. > num_tcp_acceptors / num_ssl_acceptors https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq.config.example#L35 https://groups.google.com/forum/#!topic/rabbitmq-users/0ApuN2ES0Ks > We had a very similar issue after upgrading to Neutron Queens. In fact, all Neutron agents were "down" according to status API and messages weren't getting through. IIRC, this only happened in regions which had more load than the others. We haven't quite figured this one out yet, but just after upgrade, Neutron handles about 1-2 of these per second. Restarting Neutron and it consumes messages super-fast for a few minutes and then slows down again. A few hours after the upgrade it consumes these without an issue. We ended up making similar tuning > report_interval 60 > agent_down_time 150 The most problematic for us so far has been the memory usage of Neutron. We see it peak at 8.2GB for neutron-server (rpc) instances. Which means we can only have ~10 neutron-rpc workers on a 128GB machine. Best Regards, Erik Olof Gunnar Andersson -----Original Message----- From: Mathieu Gagné Sent: Wednesday, June 5, 2019 12:32 PM To: Jean-Philippe Méthot Cc: openstack-discuss at lists.openstack.org Subject: Re: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq Hi Jean-Philippe, On Wed, Jun 5, 2019 at 1:01 PM Jean-Philippe Méthot wrote: > > We had a Pike openstack setup that we updated to Queens earlier this week. It’s a 30 compute nodes infrastructure with 2 controller nodes and 2 network nodes, using openvswitch for networking. Since we upgraded to queens, neutron-server on the controller nodes has been unable to contact the openvswitch-agents through rabbitmq. The rabbitmq is clustered on both controller nodes and has been giving us the following error when neutron-server connections fail : > > =ERROR REPORT==== 5-Jun-2019::18:50:08 === closing AMQP connection > <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d): > missed heartbeats from client, timeout: 60s > > The neutron-server logs show this error: > > 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit > [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] > [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 > is unreachable: [Errno 104] Connection reset by peer. Trying again in > 1 seconds.: error: [Errno 104] Connection reset by peer > 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit > [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on > controller1:5672 is unreachable: error>. Trying again in 1 seconds.: RecoverableConnectionError: > > > The relevant service version numbers are as follow: > rabbitmq-server-3.6.5-1.el7.noarch > openstack-neutron-12.0.6-1.el7.noarch > python2-oslo-messaging-5.35.4-1.el7.noarch > > Rabbitmq does not show any alert. It also has plenty of memory and a high enough file limit. The login user and credentials are fine as they are used in other openstack services which can contact rabbitmq without issues. > > I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing timeouts in neutron services, etc, to no avail. I find myself at a loss and would appreciate if anyone has any idea as to where to go from there. > We had a very similar issue after upgrading to Neutron Queens. In fact, all Neutron agents were "down" according to status API and messages weren't getting through. IIRC, this only happened in regions which had more load than the others. We applied a bunch of fixes which I suspect are only a bunch of bandaids. Here are the changes we made: * Split neutron-api from neutron-server. Create a whole new controller running neutron-api with mod_wsgi. * Increase [database]/max_overflow = 200 * Disable RabbitMQ heartbeat in oslo.messaging: [oslo_messaging_rabbit]/heartbeat_timeout_threshold = 0 * Increase [agent]/report_interval = 120 * Increase [DEFAULT]/agent_down_time = 600 We also have those sysctl configs due to firewall dropping sessions. But those have been on the server forever: net.ipv4.tcp_keepalive_time = 30 net.ipv4.tcp_keepalive_intvl = 1 net.ipv4.tcp_keepalive_probes = 5 We never figured out why a service that was working before the upgrade but no longer is. This is kind of frustrating as it caused us all short of intermittent issues and stress during our upgrade. Hope this helps. -- Mathieu From cjeanner at redhat.com Thu Jun 6 05:43:36 2019 From: cjeanner at redhat.com (=?UTF-8?Q?C=c3=a9dric_Jeanneret?=) Date: Thu, 6 Jun 2019 07:43:36 +0200 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: <379bd822-05dc-edd5-704c-8ae8ed37b32b@redhat.com> Even if I'm no core: huge +1 :) On 6/5/19 4:31 PM, Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing > really insightful reviews, specially on Python best practices but not > only; he is one of the major contributors of the OVN integration, which > was a ton of work. I believe he has the right knowledge to review any > TripleO patch and provide excellent reviews in our project. We're lucky > to have him with us in the team! > > I would like to propose him core on TripleO, please raise any objection > if needed. > -- > Emilien Macchi -- Cédric Jeanneret Software Engineer - OpenStack Platform Red Hat EMEA https://www.redhat.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From mjozefcz at redhat.com Thu Jun 6 06:33:18 2019 From: mjozefcz at redhat.com (Maciej Jozefczyk) Date: Thu, 6 Jun 2019 08:33:18 +0200 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: Congratulations! On Wed, Jun 5, 2019 at 4:45 PM Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing really > insightful reviews, specially on Python best practices but not only; he is > one of the major contributors of the OVN integration, which was a ton of > work. I believe he has the right knowledge to review any TripleO patch and > provide excellent reviews in our project. We're lucky to have him with us > in the team! > > I would like to propose him core on TripleO, please raise any objection if > needed. > -- > Emilien Macchi > -- Best regards, Maciej Józefczyk -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdulko at redhat.com Thu Jun 6 07:13:46 2019 From: mdulko at redhat.com (=?UTF-8?Q?Micha=C5=82?= Dulko) Date: Thu, 06 Jun 2019 09:13:46 +0200 Subject: [requirements][kuryr][flame] openshift dificulties In-Reply-To: <20190605165807.jmhogmfyrxltx5b3@mthode.org> References: <20190529205352.f2dxzckgvfavbvtv@mthode.org> <20190530151739.nfzrqfstlb2sbrq5@mthode.org> <20190605165807.jmhogmfyrxltx5b3@mthode.org> Message-ID: <06337c09a594e16e40086b6c64495a59c3e6cd84.camel@redhat.com> On Wed, 2019-06-05 at 11:58 -0500, Matthew Thode wrote: > On 19-05-30 10:17:39, Matthew Thode wrote: > > On 19-05-30 17:07:54, Michał Dulko wrote: > > > On Wed, 2019-05-29 at 15:53 -0500, Matthew Thode wrote: > > > > Openshift upstream is giving us difficulty as they are capping the > > > > version of urllib3 and kubernetes we are using. > > > > > > > > -urllib3===1.25.3 > > > > +urllib3===1.24.3 > > > > -kubernetes===9.0.0 > > > > +kubernetes===8.0.1 > > > > > > > > I've opened an issue with them but not had much luck there (and their > > > > prefered solution just pushes the can down the road). > > > > > > > > https://github.com/openshift/openshift-restclient-python/issues/289 > > > > > > > > What I'd us to do is move off of openshift as our usage doesn't seem too > > > > much. > > > > > > > > openstack/kuryr-tempest-plugin uses it for one import (and just one > > > > function with that import). I'm not sure exactly what you are doing > > > > with it but would it be too much to ask to move to something else? > > > > > > From Kuryr side it's not really much effort, we can switch to bare REST > > > calls, but obviously we prefer the client. If there's much support for > > > getting rid of it, we can do the switch. > > > > > > > Right now Kyryr is only using it in that one place and it's blocking the > > update of urllib3 and kubernetes for the rest of openstack. So if it's > > not too much trouble it'd be nice to have happen. > > > > > > x/flame has it in it's constraints but I don't see any actual usage, so > > > > perhaps it's a false flag. > > > > > > > > Please let me know what you think > > > > > > Any updates on this? I'd like to move forward on removing the > dependency if possible. > Sure, I'm waiting for some spare time to do this. Fastest it may happen will probably be next week. From amotoki at gmail.com Thu Jun 6 07:19:18 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Thu, 6 Jun 2019 16:19:18 +0900 Subject: [upgrade][neutron] supported release window on rolling upgrade Message-ID: The neutron team is discussing how many releases we should support in RPC messages [1] (to drop downgrade codes in OVO). This affects rolling upgrade scenarios. Controller nodes are upgrade in FFU way, but we cannot upgrade compute nodes at once. This means controller nodes with N+X release need to talk compute nodes with N release. As of now, the neutron team is thinking to support LTS to LTS upgrade scenarios for major distributions and N->N+4 looks like the longest window. Rolling upgrade scenarios affect not only neutron but also other projects like nova, so I am sending this mail for broader input. [1] https://review.opendev.org/#/c/661995/ Thanks, Akihiro Motoki (irc: amotoki) From mark at stackhpc.com Thu Jun 6 08:19:09 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 6 Jun 2019 09:19:09 +0100 Subject: [kolla][tripleo] Python 3, CentOS/RHEL 8 In-Reply-To: References: Message-ID: On Wed, 5 Jun 2019 at 16:42, Alex Schultz wrote: > > > > On Wed, Jun 5, 2019 at 3:19 AM Mark Goddard wrote: >> >> Hi, >> >> At the recent kolla virtual PTG [1], we discussed the move to python 3 >> images in the Train cycle. hrw has started this effort for >> Ubuntu/Debian source images [2] and is making good progress. >> >> Next we will need to consider CentOS and RHEL. It seems that for Train >> RDO will provide only python 3 packages with support for CentOS 8 [3]. >> There may be some overlap in the trunk (master) packages where there >> is support for both CentOS 7 and 8. We will therefore need to combine >> the switch to python 3 with a switch to a CentOS/RHEL 8 base image. >> >> Some work was started during the Stein cycle to support RHEL 8 images >> with python 3 packages. There will no doubt be a few scripts that need >> updating to complete this work. We'll also need to test to ensure that >> both binary and source images work in this new world. >> > > When CentOS8 is available, we'll be working on that more with TripleO to ensure it's working and if there are issues we'll likely submit fixes as necessary. Currently https://review.opendev.org/#/c/632156/ should be the actual support for the python3 bits as currently required when using the RDO provided packages. We're not aware of any outstanding issues but if we run into them, then we will help as needed. We currently use kolla to generate the related Dockerfiles for building with RHEL8 and have posted the issues that we've run across so far. The related work for podman/buildah (if desired) is currently being discussed in a different thread. Thanks for clarifying. I expect we'll have some kinks in the CentOS source images to iron out (e.g. install python3-devel) but hopefully the majority should be covered by https://review.opendev.org/#/c/632156/. There will also be the less glamorous cleanup tasks to remove python 2 support, but they won't block python 3 images. > >> >> Tripleo team - what are your plans for CentOS/RHEL 8 and python 3 this >> cycle? Are you planning to continue the work started in kolla during >> the Stein release? > > > As mentioned, we're not currently aware of any outstanding issues around this so as of Stein, the python3 related packages (when available) combined with an 8 based base image+repos should work. > >> >> >> Thanks, >> Mark >> >> [1] https://etherpad.openstack.org/p/kolla-train-ptg >> [2] https://blueprints.launchpad.net/kolla/+spec/debian-ubuntu-python3 >> [3] https://review.rdoproject.org/etherpad/p/moving-rdo-to-centos8 >> [4] https://review.opendev.org/#/c/632156/ >> From florian.engelmann at everyware.ch Thu Jun 6 08:20:53 2019 From: florian.engelmann at everyware.ch (Florian Engelmann) Date: Thu, 6 Jun 2019 10:20:53 +0200 Subject: [telemetry] volume_type_id stored instead of volume_type name Message-ID: <4081acb6-be89-3249-e535-67c192be3743@everyware.ch> Hi, some volumes are stored with the volume_type Id instead of the volume_type name: openstack metric resource history --details b5496a42-c766-4267-9248-6149aa9dd483 -c id -c revision_start -c revision_end -c instance_id -c volume_type +--------------------------------------+----------------------------------+----------------------------------+--------------------------------------+--------------------------------------+ | id | revision_start | revision_end | instance_id | volume_type | +--------------------------------------+----------------------------------+----------------------------------+--------------------------------------+--------------------------------------+ | b5496a42-c766-4267-9248-6149aa9dd483 | 2019-05-08T07:21:35.354474+00:00 | 2019-05-21T09:18:32.767426+00:00 | 662998da-c3d1-45c5-9120-2cff6240e3b6 | v-ssd-std | | b5496a42-c766-4267-9248-6149aa9dd483 | 2019-05-21T09:18:32.767426+00:00 | 2019-05-21T09:18:32.845700+00:00 | 662998da-c3d1-45c5-9120-2cff6240e3b6 | v-ssd-std | | b5496a42-c766-4267-9248-6149aa9dd483 | 2019-05-21T09:18:32.845700+00:00 | None | 662998da-c3d1-45c5-9120-2cff6240e3b6 | 8bd7e1b1-3396-49bf-802c-8c31a9444895 | +--------------------------------------+----------------------------------+----------------------------------+--------------------------------------+--------------------------------------+ I was not able to find anything fishy in ceilometer. So I guess it could be some event/notification with a wrong payload? Could anyone please verify this error is not uniq to our (rocky) environment by running: openstack metric resource list --type volume -c id -c volume_type All the best, Florian -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5230 bytes Desc: not available URL: From thierry at openstack.org Thu Jun 6 08:22:53 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 6 Jun 2019 10:22:53 +0200 Subject: [tc][kolla][kayobe] Feedback request: kayobe seeking official status In-Reply-To: References: Message-ID: Mark Goddard wrote: > [...] > We see two options for becoming an official OpenStack project: > > 1. become a deliverable of the Kolla project > 2. become an official top level OpenStack project > > Given the affinity with the Kolla project I feel that option 1 seems > natural. However, I do not want to use influence as PTL to force this > approach. > [...] From a governance perspective, the two options are definitely possible. Kayobe can be seen as one of the Kolla-derived deployment tools, or it can be seen as a new deployment tool combining two existing projects (Kolla and Bifrost). Project teams are cheap: the best solution is the one that best aligns to the social reality. So I'd say the decision depends on how much independence Kayobe wants to have from Kolla. Having a separate project team will for example make it easier to have separate meetings, but harder to have common meetings. How much of a separate team is Kayobe from Kolla? How much do you want it to stay that way? -- Thierry Carrez (ttx) From thierry at openstack.org Thu Jun 6 08:25:14 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 6 Jun 2019 10:25:14 +0200 Subject: [airship] Is Ironic ready for Airship? In-Reply-To: <15fc408f.c51c.16b27b2a86d.Coremail.guoyongxhzhf@163.com> References: <15fc408f.c51c.16b27b2a86d.Coremail.guoyongxhzhf@163.com> Message-ID: <09029ebf-93c1-2f1f-7e02-ec55fef51f60@openstack.org> 郭勇 wrote: > I know Airship choose Maas as bare mental management tool. > > I want to know whether Maas is more suitable for Airship when it comes > to under-infrastructure? > > If Maas is more suitable, then what feature should ironic develop? Note that airship has its own discussion list: http://lists.airshipit.org/cgi-bin/mailman/listinfo/airship-discuss This is an openstack-specific discussion list. -- Thierry Carrez (ttx) From hberaud at redhat.com Thu Jun 6 08:39:18 2019 From: hberaud at redhat.com (Herve Beraud) Date: Thu, 6 Jun 2019 10:39:18 +0200 Subject: [oslo] Bandit Strategy In-Reply-To: <20190605163256.6muy2ooxwlmdissq@yuggoth.org> References: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> <20190605154927.nskpuosejx2of6rp@yuggoth.org> <26d80bb6-9893-6e5d-5be6-abce99c2dd76@nemebean.com> <20190605163256.6muy2ooxwlmdissq@yuggoth.org> Message-ID: +1 Le mer. 5 juin 2019 à 18:38, Jeremy Stanley a écrit : > On 2019-06-05 11:27:09 -0500 (-0500), Ben Nemec wrote: > > Agreed. There's probably an argument that we should cap bandit on > > stable branches anyway, but it would save us a lot of tedious > > patches if we just hope bandit doesn't break us again. :-) > [...] > > Oh, yes, I think capping on stable is probably a fine idea > regardless (we should be doing that anyway for all our static > analyzers on principle). What I meant is that it would likely render > those updates no longer urgent. > -- > Jeremy Stanley > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Thu Jun 6 08:49:44 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 6 Jun 2019 09:49:44 +0100 Subject: [tc][kolla][kayobe] Feedback request: kayobe seeking official status In-Reply-To: References: Message-ID: On Thu, 6 Jun 2019 at 09:27, Thierry Carrez wrote: > > Mark Goddard wrote: > > [...] > > We see two options for becoming an official OpenStack project: > > > > 1. become a deliverable of the Kolla project > > 2. become an official top level OpenStack project > > > > Given the affinity with the Kolla project I feel that option 1 seems > > natural. However, I do not want to use influence as PTL to force this > > approach. > > [...] > > From a governance perspective, the two options are definitely possible. > > Kayobe can be seen as one of the Kolla-derived deployment tools, or it > can be seen as a new deployment tool combining two existing projects > (Kolla and Bifrost). Project teams are cheap: the best solution is the > one that best aligns to the social reality. > > So I'd say the decision depends on how much independence Kayobe wants to > have from Kolla. Having a separate project team will for example make it > easier to have separate meetings, but harder to have common meetings. > How much of a separate team is Kayobe from Kolla? How much do you want > it to stay that way? Right now the intersection of the core teams is only me. While all Kayobe contributors are familiar with Kolla projects, the reverse is not true. This is partly because Kolla and/or Kolla Ansible can be used without Kayobe, and partly because Kayobe is a newer project which typically gets adopted at the beginning of a cloud deployment. It certainly seems to make sense from the Kayobe community perspective to join these communities. I think the question the Kolla team needs to ask is whether the benefit of a more complete set of tooling is worth the overhead of adding a new deliverable that may not be used by all contributors or in all deployments. > > -- > Thierry Carrez (ttx) > From mark at stackhpc.com Thu Jun 6 08:51:26 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 6 Jun 2019 09:51:26 +0100 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: Message-ID: On Wed, 5 Jun 2019 at 17:31, Mark Goddard wrote: > > On Wed, 5 Jun 2019 at 15:48, Emilien Macchi wrote: > > > > On Wed, Jun 5, 2019 at 8:47 AM Mark Goddard wrote: > >> > >> Thanks for following up. It wouldn't be a trivial change to add > >> buildah support in kolla, but it would have saved reimplementing the > >> task parallelisation in Tripleo and would benefit others too. Never > >> mind. > > > > > > To be fair, at the time I wrote the code in python-tripleoclient the container tooling wasn't really stable and we weren't sure about the directions we would take yet; which is the main reason which drove us to not invest too much time into refactoring Kolla to support a tool that we weren't sure we would end up using in production for the container image building. > > > That's fair, sorry to grumble :) > > It has been a few months now and so far it works ok for our needs; so if there is interest in supporting Buildah in Kolla then we might want to do the refactor and of course TripleO would use this new feature. If there are resources to do it, I'm sure the Kolla team would be receptive. > > -- > > Emilien Macchi From moguimar at redhat.com Thu Jun 6 08:52:32 2019 From: moguimar at redhat.com (Moises Guimaraes de Medeiros) Date: Thu, 6 Jun 2019 10:52:32 +0200 Subject: [oslo] Bandit Strategy In-Reply-To: References: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> <20190605154927.nskpuosejx2of6rp@yuggoth.org> <26d80bb6-9893-6e5d-5be6-abce99c2dd76@nemebean.com> <20190605163256.6muy2ooxwlmdissq@yuggoth.org> Message-ID: +1 Jeremy Em qui, 6 de jun de 2019 às 10:42, Herve Beraud escreveu: > +1 > > Le mer. 5 juin 2019 à 18:38, Jeremy Stanley a écrit : > >> On 2019-06-05 11:27:09 -0500 (-0500), Ben Nemec wrote: >> > Agreed. There's probably an argument that we should cap bandit on >> > stable branches anyway, but it would save us a lot of tedious >> > patches if we just hope bandit doesn't break us again. :-) >> [...] >> >> Oh, yes, I think capping on stable is probably a fine idea >> regardless (we should be doing that anyway for all our static >> analyzers on principle). What I meant is that it would likely render >> those updates no longer urgent. >> -- >> Jeremy Stanley >> > > > -- > Hervé Beraud > Senior Software Engineer > Red Hat - Openstack Oslo > irc: hberaud > -----BEGIN PGP SIGNATURE----- > > wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ > Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ > RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP > F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G > 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g > glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw > m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ > hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 > qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y > F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 > B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O > v6rDpkeNksZ9fFSyoY2o > =ECSj > -----END PGP SIGNATURE----- > > -- Moisés Guimarães Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre-samuel.le-stang at corp.ovh.com Thu Jun 6 09:13:14 2019 From: pierre-samuel.le-stang at corp.ovh.com (Pierre-Samuel LE STANG) Date: Thu, 6 Jun 2019 11:13:14 +0200 Subject: [ops] database archiving tool In-Reply-To: <20190509151428.im2c6dbxpv6hwhyo@corp.ovh.com> References: <20190509151428.im2c6dbxpv6hwhyo@corp.ovh.com> Message-ID: <20190606091215.p7pms5bvwyo7qm6d@corp.ovh.com> Hi all, We finally opensourced the tool on our github repository. You may get it here: https://github.com/ovh/osarchiver/ Thanks for your feedbacks. -- PS Pierre-Samuel LE STANG wrote on jeu. [2019-mai-09 17:14:35 +0200]: > Hi all, > > At OVH we needed to write our own tool that archive data from OpenStack > databases to prevent some side effect related to huge tables (slower response > time, changing MariaDB query plan) and to answer to some legal aspects. > > So we started to write a python tool which is called OSArchiver that I briefly > presented at Denver few days ago in the "Optimizing OpenStack at large scale" > talk. We think that this tool could be helpful to other and are ready to open > source it, first we would like to get the opinion of the ops community about > that tool. > > To sum-up OSArchiver is written to work regardless of Openstack project. The > tool relies on the fact that soft deleted data are recognizable because of > their 'deleted' column which is set to 1 or uuid and 'deleted_at' column which > is set to the date of deletion. > > The points to have in mind about OSArchiver: > * There is no knowledge of business objects > * One table might be archived if it contains 'deleted' column > * Children rows are archived before parents rows > * A row can not be deleted if it fails to be archived > > Here are features already implemented: > * Archive data in an other database and/or file (actually SQL and CSV > formats are supported) to be easily imported > * Delete data from Openstack databases > * Customizable (retention, exclude DBs, exclude tables, bulk insert/delete) > * Multiple archiving configuration > * Dry-run mode > * Easily extensible, you can add your own destination module (other file > format, remote storage etc...) > * Archive and/or delete only mode > > It also means that by design you can run osarchiver not only on OpenStack > databases but also on archived OpenStack databases. > > Thanks in advance for your feedbacks. > > -- > Pierre-Samuel Le Stang -- Pierre-Samuel Le Stang From Vrushali.Kamde at nttdata.com Thu Jun 6 09:32:09 2019 From: Vrushali.Kamde at nttdata.com (Kamde, Vrushali) Date: Thu, 6 Jun 2019 09:32:09 +0000 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' Message-ID: Hi, Working on implementation of 'Support filtering of allocation_candidates by forbidden aggregates' spec. Need discussion particularly for point [1] where traits needs to be sync along with aggregates at placement. Master implementation for 'nova-manage placement sync_aggregates' command is to sync the nova host aggregates. Modifying this command to sync trait metadata of aggregate at placement. Below are the aggregate restful APIs which currently supports: 1. 'POST'-- /os-aggregates/{aggregate_id}/action(Add host) getting synced on the placement service 2. 'POST'-- /os-aggregates/{aggregate_id}/action(Remove host) getting synced on the placement service 3. 'POST'-- /os-aggregates/{aggregate_id}/action(set metadata) Doesn't get sync on the placement service. 4. 'POST'-- /os-aggregates/{aggregate_id}/action(unset metadata) Doesn't get sync on the placement service. I have added code to sync traits for below APIs and I don't see any issues there: 1. 'POST'-- /os-aggregates/{aggregate_id}/action(Add host) 2. 'POST'-- /os-aggregates/{aggregate_id}/action(set metadata) But there is an issue while removing traits for below APIs: 1. 'POST'-- /os-aggregates/{aggregate_id}/action(Remove host) 2. 'POST'-- /os-aggregates/{aggregate_id}/action(unset metadata) Ideally, we should remove traits set in the aggregate metadata from the resource providers associated with the aggregate for above two APIs but it could cause a problem for below scenario:- For example: 1. Create two aggregates 'agg1' and 'agg2' by using: 'POST'-- /os-aggregates(Create aggregate) 2. Associate above aggregates to host 'RP1' by using: 'POST'-- /os-aggregates/{aggregate_id}/action(Add host) 3. Setting metadata (trait:STORAGE_DISK_SSD='required') on the aggregate agg1 by using: 'POST'-- /os-aggregates/{aggregate_id}/action(set metadata) 4. Setting metadata (trait:STORAGE_DISK_SSD='required', trait:HW_CPU_X86_SGX='required') on the aggregate agg2 by using: 'POST'-- /os-aggregates/{aggregate_id}/action(set metadata) Traits set to 'RP1' are: STORAGE_DISK_SSD HW_CPU_X86_SGX Note: Here trait 'STORAGE_DISK_SSD' is set on both agg1 and agg2. Now, If we remove host 'RP1' from 'agg1' then the trait 'STORAGE_DISK_SSD' set to `RP1` also needs to be removed but since 'RP1' is also assigned to 'agg2', removing 'STORAGE_DISK_SSD' trait from 'RP1' is not correct. I have discussed about syncing traits issues with Eric on IRC [2], he has suggested few approaches as below: - Leave all traits alone. If they need to be removed, it would have to be manually via a separate step. - Support a new option so the caller can dictate whether the operation should remove the traits. (This is all-or-none.) - Define a "namespace" - a trait substring - and remove only traits in that namespace. If I'm not wrong, for last two approaches, we would need to change RestFul APIs. Need your feedback whether traits should be deleted from resource provider or not for below two cases? 1. 'POST'-- /os-aggregates/{aggregate_id}/action(Remove host) 2. 'POST'-- /os-aggregates/{aggregate_id}/action(unset metadata) [1]: https://review.opendev.org/#/c/609960/8/specs/train/approved/placement-req-filter-forbidden-aggregates.rst at 203 [2]: http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-05-30.log.html Thanks & Regards, Vrushali Kamde Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From marios at redhat.com Thu Jun 6 09:43:43 2019 From: marios at redhat.com (Marios Andreou) Date: Thu, 6 Jun 2019 12:43:43 +0300 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: +1 haven't really worked with Kamil but *have* noticed him out and about in gerrit reviews. So I just did a quick code review review ;) and i see that he is there [1] - not by itself the most important thing but it demonstrates some dedication to TripleO for a while now! Looking at some recent random reviews agree Kamil would be a great addition thanks! [1] https://www.stackalytics.com/report/contribution/tripleo-group/360 On Wed, Jun 5, 2019 at 5:34 PM Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing really > insightful reviews, specially on Python best practices but not only; he is > one of the major contributors of the OVN integration, which was a ton of > work. I believe he has the right knowledge to review any TripleO patch and > provide excellent reviews in our project. We're lucky to have him with us > in the team! > > I would like to propose him core on TripleO, please raise any objection if > needed. > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Thu Jun 6 09:55:17 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 6 Jun 2019 11:55:17 +0200 Subject: [Release-job-failures] Tag of openstack/ironic failed In-Reply-To: References: Message-ID: <48722ba6-f862-490b-926b-508625631aef@openstack.org> zuul at openstack.org wrote: > Build failed. > > - publish-openstack-releasenotes-python3 http://logs.openstack.org/c9/c9009f704afed7579c9d8dfcf7b774623966ef5b/tag/publish-openstack-releasenotes-python3/d6f47db/ : POST_FAILURE in 13m 56s This error occurred after tagging openstack/ironic 11.1.3, due to some transient network issue during release notes build rsync: http://zuul.openstack.org/build/d6f47db4f78b44599c4036a7039a1f5b It prevented release notes for 11.1.3 from being published. However the ironic release notes were regenerated after that, resulting in proper publication: http://zuul.openstack.org/build/e1b75f44857e44b1a94c2499e8b5f742 https://docs.openstack.org/releasenotes/ironic/rocky.html#relnotes-11-1-3-stable-rocky Note that the release pipeline jobs completed successfully, so the release itself is OK. Impact: Release notes for ironic 11.1.3 were unavailable for 30min. TODO: None -- Thierry Carrez (ttx) From tobias.rydberg at citynetwork.eu Thu Jun 6 10:04:52 2019 From: tobias.rydberg at citynetwork.eu (Tobias Rydberg) Date: Thu, 6 Jun 2019 12:04:52 +0200 Subject: [sigs][publiccloud][publiccloud-wg][publiccloud-sig][billing] Bi-weekly meeting today at 1400 UTC Message-ID: Hi all, This is a reminder for todays meeting for the Public Cloud SIG - 1400 UTC in #openstack-publiccloud. The main focus for the meeting will be continues discussions regarding the billing initiative. More information about that at https://etherpad.openstack.org/p/publiccloud-sig-billing-implementation-proposal Agenda at: https://etherpad.openstack.org/p/publiccloud-wg See you all later today! Cheers, Tobias -- Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED From bdobreli at redhat.com Thu Jun 6 10:04:57 2019 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Thu, 6 Jun 2019 12:04:57 +0200 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: <7546e166-adc3-21fc-12c8-8ce1f75069b1@redhat.com> On 05.06.2019 16:31, Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing > really insightful reviews, specially on Python best practices but not > only; he is one of the major contributors of the OVN integration, which > was a ton of work. I believe he has the right knowledge to review any > TripleO patch and provide excellent reviews in our project. We're lucky +1 > to have him with us in the team! > > I would like to propose him core on TripleO, please raise any objection > if needed. > -- > Emilien Macchi -- Best regards, Bogdan Dobrelya, Irc #bogdando From hjensas at redhat.com Thu Jun 6 10:43:49 2019 From: hjensas at redhat.com (Harald =?ISO-8859-1?Q?Jens=E5s?=) Date: Thu, 06 Jun 2019 12:43:49 +0200 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: <21e6bd11ac136fffe325d6726d6c615574982f9e.camel@redhat.com> On Wed, 2019-06-05 at 10:31 -0400, Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing > really insightful reviews, specially on Python best practices but not > only; he is one of the major contributors of the OVN integration, > which was a ton of work. I believe he has the right knowledge to > review any TripleO patch and provide excellent reviews in our > project. We're lucky to have him with us in the team! > > I would like to propose him core on TripleO, please raise any > objection if needed. > -- > Emilien Macchi +1, I've seen Kamil around gerrit providing insightful reviews. Thanks Kamil! From doka.ua at gmx.com Thu Jun 6 10:49:37 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Thu, 6 Jun 2019 13:49:37 +0300 Subject: [glance] zeroing image, preserving other parameters In-Reply-To: <6faa9bd8-c273-ae19-1fa1-9ecaa5a7b94c@gmail.com> References: <1e0e0c49-0488-f899-e866-18f438db82fb@gmx.com> <6faa9bd8-c273-ae19-1fa1-9ecaa5a7b94c@gmail.com> Message-ID: <22a9d4d2-ebec-ce9d-97c3-cbc25bd9f859@gmx.com> Hi Brian, thanks for the response. I solved my issue from other (client) side - I'm using Heat and Heat don't look whether uuid of image changed, it just check for existense of image with specified name. So it's safe to delete image and then create another one with same name and parameters and zero size. But in fact, Glance has a bit contradictory approach: Documentation on db purge says: "Remember that image identifiers are used by other OpenStack services that require access to images. These services expect that when an image is requested by ID, they will receive the same data every time." but there are no ways to get list of images including 'deleted' or details of 'deleted' image, e.g. doka at lagavulin(admin at admin):~$ openstack image show b179ecee-775d-4ee4-81c0-d3ec3a769d35 Could not find resource b179ecee-775d-4ee4-81c0-d3ec3a769d35 so preserving image record in database makes no sense for 3rd party services, which talk to Glance over public API. On the other hand, having in DB API ready for use 'image_destroy' call, it's pretty easy (of course, for those who work with Glance code :-) ) to add public API call kind of images/{image_id}/actions/destroy , calling DB API's image_destroy. And, in that case, it makes sense to allow image uuid to be specified during image create (since client can purge specified record and recreate it using same characteristics), otherwise I don't see where, in general, specifying uuid (when creating image) can be useful. The good news is that I solved my problem. The bad news is that solution relies on relaxed requirements of 3rd party products but not on Glance's API itself :-) Thanks! On 6/5/19 5:38 PM, Brian Rosmaita wrote: > On 6/5/19 8:34 AM, Volodymyr Litovka wrote: >> Dear colleagues, >> >> for some reasons, I need to shrink image size to zero (freeing storage >> as well), while keeping this record in Glance database. >> >> First which come to my mind is to delete image and then create new one >> with same name/uuid/... and --file /dev/null, but this is impossible >> because Glance don't really delete records from database, marking them >> as 'deleted' instead. > The glance-manage utility program allows you to purge the database. The > images table (where the image UUIDs are stored) is not purged by default > because of OSSN-0075 [0]. See the glance docs [1] for details. > > [0] https://wiki.openstack.org/wiki/OSSN/OSSN-0075 > [1] > https://docs.openstack.org/glance/latest/admin/db.html#database-maintenance > > (That doesn't really help your issue, I just wanted to point out that > there is a way to purge the database.) > >> Next try was to use glance image-upload from /dev/null, but this is also >> prohibited with message "409 Conflict: Image status transition from >> [activated, deactivated] to saving is not allowed (HTTP 409)" > That's correct, Glance will not allow you to replace the image data once > an image has gone to 'active' status. > >> I found >> https://docs.openstack.org/glance/rocky/contributor/database_architecture.html#glance-database-public-api's >> >> "image_destroy" but have no clues on how to access this API. Is it kind >> of library or kind of REST API, how to access it and whether it's safe >> to use it in terms of longevity and compatibility between versions? > The title of that document is misleading. It describes the interface > that Glance developers can use when they need to interact with the > database. There's no tool that exposes those operations to operators. > >> Or, may be, you can advise any other methods to solve the problem of >> zeroing glance image data / freeing storage, while keeping in database >> just a record about this image? > If you purged the database, you could do your proposal to recreate the > image with a zero-size file -- but that would give you an image with > status 'active' that an end user could try to boot an instance with. I > don't think that's a good idea. Additionally, purging the images table > of all UUIDs, not just the few you want to replace, exposes you to > OSSN-0075. > > An alternative--and I'm not sure this is a good idea either--would be to > deactivate the image [2]. This would preserve all the current metadata > but not allow the image to be downloaded by a non-administrator. With > the image not in 'active' status, nova or cinder won't try to use it to > create instances or volumes. The image data would still exist, though, > so you'd need to delete it manually from the backend to really clear out > the space. Additionally, the image size would remain, which might be > useful for record-keeping, although on the other hand, it will still > count against the user_storage_quota. And the image locations will > still exist even though they won't refer to any existing data any more. > (Like I said, I'm not sure this is a good idea.) > > [2] https://developer.openstack.org/api-ref/image/v2/#deactivate-image > >> Thank you. > Not sure I was much help. Let's see if other operators have a good > workaround or a need for this kind of functionality. > >> -- >> Volodymyr Litovka >>   "Vision without Execution is Hallucination." -- Thomas Edison >> >> > -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison From beagles at redhat.com Thu Jun 6 10:51:55 2019 From: beagles at redhat.com (Brent Eagles) Date: Thu, 6 Jun 2019 08:21:55 -0230 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: On Wed, Jun 5, 2019 at 12:04 PM Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing really > insightful reviews, specially on Python best practices but not only; he is > one of the major contributors of the OVN integration, which was a ton of > work. I believe he has the right knowledge to review any TripleO patch and > provide excellent reviews in our project. We're lucky to have him with us > in the team! > > I would like to propose him core on TripleO, please raise any objection if > needed. > -- > Emilien Macchi > +1, indeed!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jun 6 11:35:23 2019 From: gmann at ghanshyammann.com (gmann at ghanshyammann.com) Date: Thu, 06 Jun 2019 20:35:23 +0900 Subject: [nova] API updates week 19-23 Message-ID: <16b2c92714e.10196b432132392.265414056494798985@ghanshyammann.com> Hi All, Please find the Nova API updates of this week. API Related BP : ============ Code Ready for Review: ------------------------------ 1. Support adding description while locking an instance: - Topic: https://review.opendev.org/#/q/topic:bp/add-locked-reason+(status:open+OR+status:merged) - Weekly Progress: OSC patch has been updated by tssurya. 2. Add host and hypervisor_hostname flag to create server - Topic: https://review.opendev.org/#/q/topic:bp/add-host-and-hypervisor-hostname-flag-to-create-server+(status:open+OR+status:merged) - Weekly Progress: patches have been updated with review comments. 3. Detach and attach boot volumes: - Topic: https://review.openstack.org/#/q/topic:bp/detach-boot-volume+(status:open+OR+status:merged) - Weekly Progress: No Progress Spec Ready for Review: ----------------------------- 1. Nova API policy improvement - Spec: https://review.openstack.org/#/c/547850/ - PoC: https://review.openstack.org/#/q/topic:bp/policy-default-refresh+(status:open+OR+status:merged) - Weekly Progress: Under review and updates. 2. Support for changing deleted_on_termination after boot -Spec: https://review.openstack.org/#/c/580336/ - Weekly Progress: No update this week. 3. Nova API cleanup - Spec: https://review.openstack.org/#/c/603969/ - Weekly Progress: Spec is merged. 4. Specifying az when restore shelved server - Spec: https://review.openstack.org/#/c/624689/ - Weekly Progress: Spec is updated for review comments. 5. Support delete_on_termination in volume attach api -Spec: https://review.openstack.org/#/c/612949/ - Weekly Progress: No updates this week. 7. Add API ref guideline for body text - ~8 api-ref are left to fix. Previously approved Spec needs to be re-proposed for Train: --------------------------------------------------------------------------- 1. Servers Ips non-unique network names : - https://blueprints.launchpad.net/nova/+spec/servers-ips-non-unique-network-names - https://review.openstack.org/#/q/topic:bp/servers-ips-non-unique-network-names+(status:open+OR+status:merged) 2. Volume multiattach enhancements: - https://blueprints.launchpad.net/nova/+spec/volume-multiattach-enhancements - https://review.openstack.org/#/q/topic:bp/volume-multiattach-enhancements+(status:open+OR+status:merged) Bugs: ==== No progress report in this week. NOTE- There might be some bug which is not tagged as 'api' or 'api-ref', those are not in the above list. Tag such bugs so that we can keep our eyes. -gmann From geguileo at redhat.com Thu Jun 6 12:00:06 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Thu, 6 Jun 2019 14:00:06 +0200 Subject: [k8s-sig][cinder] Ember-CSI goes Beta! Message-ID: <20190606120006.rhqoiomfgm4mvksa@localhost> Hi, The Ember-CSI team is happy to announce the release of version v0.9, marking the graduation of the project into Beta. Ember-CSI is where Kubernetes storage and OpenStack storage intersect. By leveraging existing Cinder drivers via cinderlib [1], Ember-CSI is able to provide block and mount storage to containers in any Kubernetes cluster supporting the Container Storage Interface (CSI) in a lightweight solution, as it doesn't need to deploy a DBMS or Message Broker systems, and doesn't need to deploy the usual Cinder API, Volume, or Scheduler services either. Key features of the project are: - Multi-driver support on single container - Support for mount filesystems - Support for block - Topology support - Snapshot support - Liveness probe - No need to deploy a DBMS in K8s (uses CRD for metadata) - Multi-CSI version support on single container (v0.2, v0.3, and v1.0) - Storage driver list tool - Support live debugging of running driver - Duplicated requests queuing support (for k8s) - Support of mocked probe (when using faulty sidecars) - Configurable default mount filesystem The Beta is available in Docker Hub [2] -under "stable" and "ember_0.9.0-stein" tags- as well as in PyPi [3]. After this milestone, where we have achieved feature parity with CSI v1.0, we will mostly focus on the areas we consider necessary for the transition into GA: upgrading mechanism, performance improvements, and documentation. If time permits, we will also work on features from newer CSI spec versions, such as volume expansion. For those interested in the project, the team can be reached on FreeNode's #ember-csi channel and at the Google group [4]. We also have a small site [5] with articles on how to try it on K8s with 2 backends (LVM and Ceph) and the github org [6]. Cheers, Gorka. [1]: https://opendev.org/openstack/cinderlib [2]: https://hub.docker.com/r/embercsi/ember-csi [3]: https://pypi.org/project/ember-csi/ [4]: https://groups.google.com/forum/#!forum/embercsi [5]: https://ember-csi.io [6]: https://github.com/embercsi/ From rosmaita.fossdev at gmail.com Thu Jun 6 12:10:17 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 6 Jun 2019 08:10:17 -0400 Subject: [glance] zeroing image, preserving other parameters In-Reply-To: <22a9d4d2-ebec-ce9d-97c3-cbc25bd9f859@gmx.com> References: <1e0e0c49-0488-f899-e866-18f438db82fb@gmx.com> <6faa9bd8-c273-ae19-1fa1-9ecaa5a7b94c@gmail.com> <22a9d4d2-ebec-ce9d-97c3-cbc25bd9f859@gmx.com> Message-ID: <158cc12f-ce3e-12ff-e76f-42393ce4088d@gmail.com> On 6/6/19 6:49 AM, Volodymyr Litovka wrote: [snip] > But in fact, Glance has a bit contradictory approach: > > Documentation on db purge says: "Remember that image identifiers are > used by other OpenStack services that require access to images. These > services expect that when an image is requested by ID, they will receive > the same data every time." but there are no ways to get list of images > including 'deleted' or details of 'deleted' image, e.g. > > doka at lagavulin(admin at admin):~$ openstack image show > b179ecee-775d-4ee4-81c0-d3ec3a769d35 > Could not find resource b179ecee-775d-4ee4-81c0-d3ec3a769d35 That's correct, Glance lets you know that an image doesn't exist anymore by returning a 404. > so preserving image record in database makes no sense for 3rd party > services, which talk to Glance over public API. That's right, Glance keeps this data around for its own internal bookkeeping purposes, namely, to ensure that an image_id isn't reused. (A design decision was made when the v2 API was introduced in Folsom not to expose deleted image records to end users. I think the idea was that "cloudy" behavior would entail lots of image creation and deletion, and there was no sense clogging up the database. This was before the discovery of OSSN-0075, however.) > On the other hand, having in DB API ready for use 'image_destroy' call, > it's pretty easy (of course, for those who work with Glance code :-) ) > to add public API call kind of images/{image_id}/actions/destroy , > calling DB API's image_destroy. And, in that case, it makes sense to > allow image uuid to be specified during image create (since client can > purge specified record and recreate it using same characteristics), > otherwise I don't see where, in general, specifying uuid (when creating > image) can be useful. The use case I've seen for specifying the uuid is where a provider has multiple independent clouds and wants to make it easy for an end user to find the "same" public image in each cloud. Unlike image_ids, there is no uniqueness requirement on image names. OSSN-0075 is the reason why we don't expose an destroy action through the API. A user could post a useful image with image_id 1, share it or make it a community image, then after a sufficient number of people are using it, replace it with a completely different image with some kind of malicious content, keeping all the other metadata (id, name, etc.) identical to the original image (except for the os_hash_value, which would definitely be different). > The good news is that I solved my problem. The bad news is that solution > relies on relaxed requirements of 3rd party products but not on Glance's > API itself :-) Glad you solved your problem! I think I don't quite grasp your use case, but I'm glad you got something working. > Thanks! > > On 6/5/19 5:38 PM, Brian Rosmaita wrote: >> On 6/5/19 8:34 AM, Volodymyr Litovka wrote: >>> Dear colleagues, >>> >>> for some reasons, I need to shrink image size to zero (freeing storage >>> as well), while keeping this record in Glance database. >>> >>> First which come to my mind is to delete image and then create new one >>> with same name/uuid/... and --file /dev/null, but this is impossible >>> because Glance don't really delete records from database, marking them >>> as 'deleted' instead. >> The glance-manage utility program allows you to purge the database.  The >> images table (where the image UUIDs are stored) is not purged by default >> because of OSSN-0075 [0].  See the glance docs [1] for details. >> >> [0] https://wiki.openstack.org/wiki/OSSN/OSSN-0075 >> [1] >> https://docs.openstack.org/glance/latest/admin/db.html#database-maintenance >> >> >> (That doesn't really help your issue, I just wanted to point out that >> there is a way to purge the database.) >> >>> Next try was to use glance image-upload from /dev/null, but this is also >>> prohibited with message "409 Conflict: Image status transition from >>> [activated, deactivated] to saving is not allowed (HTTP 409)" >> That's correct, Glance will not allow you to replace the image data once >> an image has gone to 'active' status. >> >>> I found >>> https://docs.openstack.org/glance/rocky/contributor/database_architecture.html#glance-database-public-api's >>> >>> >>> "image_destroy" but have no clues on how to access this API. Is it kind >>> of library or kind of REST API, how to access it and whether it's safe >>> to use it in terms of longevity and compatibility between versions? >> The title of that document is misleading.  It describes the interface >> that Glance developers can use when they need to interact with the >> database.  There's no tool that exposes those operations to operators. >> >>> Or, may be, you can advise any other methods to solve the problem of >>> zeroing glance image data / freeing storage, while keeping in database >>> just a record about this image? >> If you purged the database, you could do your proposal to recreate the >> image with a zero-size file -- but that would give you an image with >> status 'active' that an end user could try to boot an instance with.  I >> don't think that's a good idea.  Additionally, purging the images table >> of all UUIDs, not just the few you want to replace, exposes you to >> OSSN-0075. >> >> An alternative--and I'm not sure this is a good idea either--would be to >> deactivate the image [2].  This would preserve all the current metadata >> but not allow the image to be downloaded by a non-administrator.  With >> the image not in 'active' status, nova or cinder won't try to use it to >> create instances or volumes.  The image data would still exist, though, >> so you'd need to delete it manually from the backend to really clear out >> the space.  Additionally, the image size would remain, which might be >> useful for record-keeping, although on the other hand, it will still >> count against the user_storage_quota.  And the image locations will >> still exist even though they won't refer to any existing data any more. >>   (Like I said, I'm not sure this is a good idea.) >> >> [2] https://developer.openstack.org/api-ref/image/v2/#deactivate-image >> >>> Thank you. >> Not sure I was much help.  Let's see if other operators have a good >> workaround or a need for this kind of functionality. >> >>> -- >>> Volodymyr Litovka >>>    "Vision without Execution is Hallucination." -- Thomas Edison >>> >>> >> > > -- > Volodymyr Litovka >   "Vision without Execution is Hallucination." -- Thomas Edison > From gmann at ghanshyammann.com Thu Jun 6 12:53:27 2019 From: gmann at ghanshyammann.com (gmann at ghanshyammann.com) Date: Thu, 06 Jun 2019 21:53:27 +0900 Subject: [nova] Validation for requested host/node on server create In-Reply-To: References: <78fa937a-beb6-c63d-01a0-40e6519928be@gmail.com> Message-ID: <16b2cd9eb2e.119cf7a71136389.2518769832623783484@ghanshyammann.com> ---- On Fri, 24 May 2019 07:02:15 +0900 Matt Riedemann wrote ---- > On 5/22/2019 5:13 PM, Matt Riedemann wrote: > > 3. Validate both the host and node in the API. This can be broken down: > > > > a) If only host is specified, do #2 above. > > b) If only node is specified, iterate the cells looking for the node (or > > query a resource provider with that name in placement which would avoid > > down cell issues) > > c) If both host and node is specified, get the HostMapping and from that > > lookup the ComputeNode in the given cell (per the HostMapping) > > > > Pros: fail fast behavior in the API if either the host and/or node do > > not exist > > > > Cons: performance hit in the API to validate the host/node and > > redundancy with the scheduler to find the ComputeNode to get its uuid > > for the in_tree filtering on GET /allocation_candidates. > > > > Note that if we do find the ComputeNode in the API, we could also > > (later?) make a change to the Destination object to add a node_uuid > > field so we can pass that through on the RequestSpec from > > API->conductor->scheduler and that should remove the need for the > > duplicate query in the scheduler code for the in_tree logic. > > > > I'm personally in favor of option 3 since we know that users hate > > NoValidHost errors and we have ways to mitigate the performance overhead > > of that validation. > > > > Note that this isn't necessarily something that has to happen in the > > same change that introduces the host/hypervisor_hostname parameters to > > the API. If we do the validation in the API I'd probably split the > > validation logic into it's own patch to make it easier to test and > > review on its own. > > > > [1] https://review.opendev.org/#/c/645520/ > > [2] > > https://github.com/openstack/nova/blob/2e85453879533af0b4d0e1178797d26f026a9423/nova/scheduler/utils.py#L528 > > > > [3] https://docs.openstack.org/nova/latest/admin/availability-zones.html > > Per the nova meeting today [1] it sounds like we're going to go with > option 3 and do the validation in the API - check hostmapping for the > host, check placement for the node, we can optimize the redundant > scheduler calculation for in_tree later. For review and test sanity I > ask that the API validation code comes in a separate patch in the series. +1 on option3. For more optimization, can we skip b) and c) for non-baremental case assuming if there is Hostmapping then node also will be valid. -gmann > > [1] > http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-05-23-21.00.log.html#l-104 > > -- > > Thanks, > > Matt > > From luka.peschke at objectif-libre.com Thu Jun 6 13:00:00 2019 From: luka.peschke at objectif-libre.com (Luka Peschke) Date: Thu, 06 Jun 2019 15:00:00 +0200 Subject: [cloudkitty] Shift IRC meeting of June 7th Message-ID: Hi all, Tomorrow's IRC meeting for CloudKitty will be held at 14h UTC (16 CEST) instead of 15h UTC as huats and I won't be available at 15h. Cheers, -- Luka Peschke From aschultz at redhat.com Thu Jun 6 13:04:00 2019 From: aschultz at redhat.com (Alex Schultz) Date: Thu, 6 Jun 2019 07:04:00 -0600 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: +1 On Wed, Jun 5, 2019 at 8:42 AM Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing really > insightful reviews, specially on Python best practices but not only; he is > one of the major contributors of the OVN integration, which was a ton of > work. I believe he has the right knowledge to review any TripleO patch and > provide excellent reviews in our project. We're lucky to have him with us > in the team! > > I would like to propose him core on TripleO, please raise any objection if > needed. > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Thu Jun 6 13:04:39 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 6 Jun 2019 15:04:39 +0200 Subject: [tc][tripleo][charms][helm][kolla][ansible][puppet][chef] Deployment tools capabilities Message-ID: <569964f6-ead3-a983-76d2-ffa59c753dbe@openstack.org> Hi everyone, In the "software" section of the openstack.org website, the deployment tools page is not very helpful for users looking into picking a way to deploy OpenStack: https://www.openstack.org/software/project-navigator/deployment-tools Furthermore, each detailed page is a bit dry. We do not display deliverable tags as most are irrelevant: https://www.openstack.org/software/releases/rocky/components/openstack-helm This was discussed in a forum session in Denver[1], and the outcome was that we should develop a taxonomy of deployment tools capabilities and characteristics, that would help users understand the technologies the various tools are based on, their prerequisites, which services and versions they cover, etc. The web UI should allow users to search deployment tools based on those tags. [1] https://etherpad.openstack.org/p/DEN-deployment-tools-capabilities As a first step, volunteers from that session worked on a draft categorized list[2] of those tags. If you are interested, please review that list, add to it or comment: [2] https://etherpad.openstack.org/p/deployment-tools-tags The next steps are: - commit the detailed list of tags (action:ttx) - apply it to existing deployment tools (action:deploy tools teams) - implementation those tags and data in the openstack website (action:jimmymcarthur) - maybe expand to list 3rd-party installers in a separate tab (tbd) The first two next steps will be implemented as patches to the osf/openstack-map repository, which already contains the base YAML data used in the software pages. Thanks for your help, -- Thierry Carrez (ttx) From e0ne at e0ne.info Thu Jun 6 13:06:16 2019 From: e0ne at e0ne.info (Ivan Kolodyazhny) Date: Thu, 6 Jun 2019 16:06:16 +0300 Subject: [horizon] dropping 2012.2 tag on pypi In-Reply-To: <20190605163552.bnmjqxtoncct6lxr@yuggoth.org> References: <20190605163552.bnmjqxtoncct6lxr@yuggoth.org> Message-ID: Thank you, Jeremy and Guilherme! Regards, Ivan Kolodyazhny, http://blog.e0ne.info/ On Wed, Jun 5, 2019 at 7:36 PM Jeremy Stanley wrote: > On 2019-06-05 12:57:45 -0300 (-0300), Guilherme Steinmüller wrote: > > As we've discussed with nova tag recently [1], I'd suggest the same for > > horizon. > > > > When we search on pypi the version it shows is 2012.2 and when we click > > release history we can see that the most recent version is 15.1.0 [2] > > > > [1] > > > http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006780.html > > [2] https://pypi.org/project/horizon/#history > > Thanks for pointing this out. Since we basically got blanket > approval to do this for any official OpenStack project some years > back, I've removed the 2012.2 from the horizon project on PyPI just > now. > > If anybody spots others, please do mention them! > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bfournie at redhat.com Thu Jun 6 13:27:12 2019 From: bfournie at redhat.com (Bob Fournier) Date: Thu, 6 Jun 2019 09:27:12 -0400 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: +1 On Thu, Jun 6, 2019 at 9:13 AM Alex Schultz wrote: > +1 > > On Wed, Jun 5, 2019 at 8:42 AM Emilien Macchi wrote: > >> Kamil has been working on TripleO for a while now and is providing really >> insightful reviews, specially on Python best practices but not only; he is >> one of the major contributors of the OVN integration, which was a ton of >> work. I believe he has the right knowledge to review any TripleO patch and >> provide excellent reviews in our project. We're lucky to have him with us >> in the team! >> >> I would like to propose him core on TripleO, please raise any objection >> if needed. >> -- >> Emilien Macchi >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jun 6 13:37:26 2019 From: gmann at ghanshyammann.com (gmann at ghanshyammann.com) Date: Thu, 06 Jun 2019 22:37:26 +0900 Subject: [tc][kolla][kayobe] Feedback request: kayobe seeking official status In-Reply-To: References: Message-ID: <16b2d022f31.116dfa0cc138586.1656207772173337725@ghanshyammann.com> ---- On Thu, 06 Jun 2019 17:49:44 +0900 Mark Goddard wrote ---- > On Thu, 6 Jun 2019 at 09:27, Thierry Carrez wrote: > > > > Mark Goddard wrote: > > > [...] > > > We see two options for becoming an official OpenStack project: > > > > > > 1. become a deliverable of the Kolla project > > > 2. become an official top level OpenStack project > > > > > > Given the affinity with the Kolla project I feel that option 1 seems > > > natural. However, I do not want to use influence as PTL to force this > > > approach. > > > [...] > > > > From a governance perspective, the two options are definitely possible. > > > > Kayobe can be seen as one of the Kolla-derived deployment tools, or it > > can be seen as a new deployment tool combining two existing projects > > (Kolla and Bifrost). Project teams are cheap: the best solution is the > > one that best aligns to the social reality. > > > > So I'd say the decision depends on how much independence Kayobe wants to > > have from Kolla. Having a separate project team will for example make it > > easier to have separate meetings, but harder to have common meetings. > > How much of a separate team is Kayobe from Kolla? How much do you want > > it to stay that way? > > Right now the intersection of the core teams is only me. While all > Kayobe contributors are familiar with Kolla projects, the reverse is > not true. This is partly because Kolla and/or Kolla Ansible can be > used without Kayobe, and partly because Kayobe is a newer project > which typically gets adopted at the beginning of a cloud deployment. > > It certainly seems to make sense from the Kayobe community perspective > to join these communities. I think the question the Kolla team needs > to ask is whether the benefit of a more complete set of tooling is > worth the overhead of adding a new deliverable that may not be used by > all contributors or in all deployments. With my quick read on technical relation between Kolla-ansible and Kayobe, options1 make much sense to me too. It can give more benefits of working more closely and handle the dependencies and future roadmap etc. And having a completely separate team (in Kayobe case you have some overlap too even only you but that can increase in future) for repo under the same project is not new. We have a lot of existing projects which maintain the separate team for their different repo/deliverables without overlap. There are more extra work you need to consider if you go with a separate Project. For example, PTL things and its responsibility. I would say we can avoid that in Kayobe case because of its technical mission/relation to Kolla-ansible. -gmann > > > > > -- > > Thierry Carrez (ttx) > > > > From mthode at mthode.org Thu Jun 6 14:17:47 2019 From: mthode at mthode.org (Matthew Thode) Date: Thu, 6 Jun 2019 09:17:47 -0500 Subject: [requirements][kuryr][flame] openshift dificulties In-Reply-To: <06337c09a594e16e40086b6c64495a59c3e6cd84.camel@redhat.com> References: <20190529205352.f2dxzckgvfavbvtv@mthode.org> <20190530151739.nfzrqfstlb2sbrq5@mthode.org> <20190605165807.jmhogmfyrxltx5b3@mthode.org> <06337c09a594e16e40086b6c64495a59c3e6cd84.camel@redhat.com> Message-ID: <20190606141747.gxoyrcels266rcgv@mthode.org> On 19-06-06 09:13:46, Michał Dulko wrote: > On Wed, 2019-06-05 at 11:58 -0500, Matthew Thode wrote: > > On 19-05-30 10:17:39, Matthew Thode wrote: > > > On 19-05-30 17:07:54, Michał Dulko wrote: > > > > On Wed, 2019-05-29 at 15:53 -0500, Matthew Thode wrote: > > > > > Openshift upstream is giving us difficulty as they are capping the > > > > > version of urllib3 and kubernetes we are using. > > > > > > > > > > -urllib3===1.25.3 > > > > > +urllib3===1.24.3 > > > > > -kubernetes===9.0.0 > > > > > +kubernetes===8.0.1 > > > > > > > > > > I've opened an issue with them but not had much luck there (and their > > > > > prefered solution just pushes the can down the road). > > > > > > > > > > https://github.com/openshift/openshift-restclient-python/issues/289 > > > > > > > > > > What I'd us to do is move off of openshift as our usage doesn't seem too > > > > > much. > > > > > > > > > > openstack/kuryr-tempest-plugin uses it for one import (and just one > > > > > function with that import). I'm not sure exactly what you are doing > > > > > with it but would it be too much to ask to move to something else? > > > > > > > > From Kuryr side it's not really much effort, we can switch to bare REST > > > > calls, but obviously we prefer the client. If there's much support for > > > > getting rid of it, we can do the switch. > > > > > > > > > > Right now Kyryr is only using it in that one place and it's blocking the > > > update of urllib3 and kubernetes for the rest of openstack. So if it's > > > not too much trouble it'd be nice to have happen. > > > > > > > > x/flame has it in it's constraints but I don't see any actual usage, so > > > > > perhaps it's a false flag. > > > > > > > > > > Please let me know what you think > > > > > > > > > Any updates on this? I'd like to move forward on removing the > > dependency if possible. > > > > Sure, I'm waiting for some spare time to do this. Fastest it may happen > will probably be next week. > Sounds good, thanks for working on it. -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From pierre at stackhpc.com Thu Jun 6 15:07:16 2019 From: pierre at stackhpc.com (Pierre Riteau) Date: Thu, 6 Jun 2019 16:07:16 +0100 Subject: [blazar] IRC meeting today Message-ID: Hello, We have our biweekly Blazar IRC meeting in less than one hour: https://wiki.openstack.org/wiki/Meetings/Blazar#Agenda_for_06_Jun_2019_.28Americas.29 We have the opportunity of discussing further how we can enforce policies for limiting reservation usage. I would also like to discuss downstream patches which could be contributed to the project. Everyone is welcome to join. Cheers, Pierre From madhuri.kumari at intel.com Thu Jun 6 15:19:48 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Thu, 6 Jun 2019 15:19:48 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> Hi Eric, Thank you for your response. Please see my response inline. Regards, Madhuri >-----Original Message----- >From: Eric Fried [mailto:openstack at fried.cc] >Sent: Tuesday, June 4, 2019 12:07 AM >To: openstack-discuss at lists.openstack.org >Subject: Re: [Nova][Ironic] Reset Configurations in Baremetals Post >Provisioning > >Hi Madhuri- > >> For this purpose, we would need to change a trait of the server’s >> flavor in Nova. This trait is mapped to a deploy step in Ironic >> which does some operation(change BIOS config and reboot in this use >> case).____ > >If your trait was something that wasn't tracked in the flavor (or elsewhere in >the instance's db record), you could just update it directly in placement. >Then you'd have to figure out how to make ironic notice that and effect the >change. (Or perhaps the other way around: >tell ironic you want to make the change, and it updates the trait in >placement as part of the process.) In this case, the trait is stored with flavor so it is known to Nova. The new trait should be added in the database and the old one removed. For an ex: An instance with flavor bm_hyperthreading with trait:CUSTOM_HYPERTHREADING_ON=required is created in Nova. Now the user wants to turn off the hyperthreading, than they could update the flavor with trait:CUSTOM_HYPERTHREADING_OFF=required. This should remove the trait:CUSTOM_HYPERTHREADING_ON and add trait:CUSTOM_HYPERTHREADING_OFF associated with the new flavor. > >> In Nova, the only API to change trait in flavor is resize whereas >> resize does migration and a reboot as well.____ >> >> In short, I am  looking for a Nova API that only changes the traits, >> and trigger the ironic deploy steps but no reboot and migration. >> Please suggest.____ > >It's inconvenient, but I'm afraid "resize" is the right way to get this done, >because that's the only way to get the appropriate validation and changes >effected in the general case. Yes, resize seems to be the only valid one. > >Now, there's a spec [1] we've been talking about for ~4.5 years that would >let you do a resize without rebooting, when only a certain subset of >properties are being changed. It is currently proposed for "upsizing" >CPU, memory, and disk, and adding PCI devices, but clearly this ISS >configuration would be a reasonable candidate to include. > Looking at the specs, it seems it's mostly talking about changing VMs resources without rebooting. However that's not the actual intent of the Ironic use case I explained in the email. Yes, it requires a reboot to reflect the BIOS changes. This reboot can be either be done by Nova IronicDriver or Ironic deploy step can also do it. So I am not sure if the spec actually satisfies the use case. I hope to get more response from the team to get more clarity. >In fact, it's possible that leading the charge with something this unobtrusive >would reduce some of the points of contention that have stalled the >blueprint up to this point. > >Food for thought. > >Thanks, >efried > >[1] https://review.opendev.org/#/c/141219/ From jaypipes at gmail.com Thu Jun 6 15:36:59 2019 From: jaypipes at gmail.com (Jay Pipes) Date: Thu, 6 Jun 2019 11:36:59 -0400 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> Message-ID: <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> On 6/6/19 11:19 AM, Kumari, Madhuri wrote: > In this case, the trait is stored with flavor so it is known to Nova. The new trait should be added in the database and the old one removed. For an ex: > > An instance with flavor bm_hyperthreading with trait:CUSTOM_HYPERTHREADING_ON=required is created in Nova. > Now the user wants to turn off the hyperthreading, than they could update the flavor with trait:CUSTOM_HYPERTHREADING_OFF=required. > This should remove the trait:CUSTOM_HYPERTHREADING_ON and add trait:CUSTOM_HYPERTHREADING_OFF associated with the new flavor. The absence of a trait on a provider should be represented by the provider not having a trait. Just have a single trait "CUSTOM_HYPERTHREADING" that you either place on the provider or do not place on a provider. The flavor should then either request that the trait be present on a provider that the instance is scheduled to (trait:CUSTOM_HYPERTHREADING=required) or that the trait should *not* be present on a provider that the instance is scheduled to (trait:CUSTOM_HYPERTHREADING=forbidden). Best, -jay From a.settle at outlook.com Thu Jun 6 15:50:45 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Thu, 6 Jun 2019 15:50:45 +0000 Subject: [tc] Recap for Technical Committee Meeting 6 June 2019 @ 1400 UTC Message-ID: Hello all, Thanks to those who joined the TC meeting today and running through it with me at the speed of light. Gif game was impeccably strong and that's primarily what I like about this community. For a recap of the meeting, please see the eavesdrop [0] for full detailed logs and action items. All items in the agenda [1] were covered and no major concerns raised. Next meeting will be on the 8th of July 2019. Cheers, Alex [0] http://eavesdrop.openstack.org/meetings/tc/2019/tc.2019-06-06-14.00.txt [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006877.html From dsneddon at redhat.com Thu Jun 6 16:33:13 2019 From: dsneddon at redhat.com (Dan Sneddon) Date: Thu, 6 Jun 2019 09:33:13 -0700 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: +1 On Wed, Jun 5, 2019 at 7:34 AM Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing really > insightful reviews, specially on Python best practices but not only; he is > one of the major contributors of the OVN integration, which was a ton of > work. I believe he has the right knowledge to review any TripleO patch and > provide excellent reviews in our project. We're lucky to have him with us > in the team! > > I would like to propose him core on TripleO, please raise any objection if > needed. > > -- > Emilien Macchi > -- Dan Sneddon | Senior Principal Software Engineer dsneddon at redhat.com | redhat.com/cloud dsneddon:irc | @dxs:twitter -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Jun 6 17:01:57 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 6 Jun 2019 12:01:57 -0500 Subject: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures? In-Reply-To: <8c92910f-a8a0-5f1c-41b0-784ff2c3d00a@gmail.com> References: <81d14679c181cdc1a252570529ca5c4b@bitskrieg.net> <25e24abf-aebc-2881-9981-7f9683ffc700@gmail.com> <06029fcf4648d3aa784783389e986a8d@bitskrieg.net> <26839d31-18b8-ba76-56cc-8bbe4b73fc37@gmail.com> <34763ede-45a3-2d22-37a1-c3fc75ea84d2@gmail.com> <4fbe5786f0765d97229147cc1137a6ce@bitskrieg.net> <20180809172447.GB19251@redhat.com> <16524beca00.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> <8c92910f-a8a0-5f1c-41b0-784ff2c3d00a@gmail.com> Message-ID: <23b6eca0-592b-7d32-4789-98629fe0438c@gmail.com> On 8/18/2018 10:09 PM, Matt Riedemann wrote: >> This sounds promising and there seems to be a feasible way to do this, >> but it also sounds like a decent amount of effort and would be a new >> feature in a future release rather than a bugfix - am I correct in >> that assessment? > > Yes I'd say it's a blueprint and not a bug fix - it's not something we'd > backport to stable branches upstream, for example. Just an update on this since it came up in IRC today (unrelated discussion which reminded me of this thread), Kashyap has created a nova blueprint: https://blueprints.launchpad.net/nova/+spec/pick-guest-arch-based-on-host-arch-in-libvirt-driver -- Thanks, Matt From openstack at fried.cc Thu Jun 6 17:51:28 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 6 Jun 2019 12:51:28 -0500 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> Message-ID: <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> > Looking at the specs, it seems it's mostly talking about changing VMs resources without rebooting. However that's not the actual intent of the Ironic use case I explained in the email. > Yes, it requires a reboot to reflect the BIOS changes. This reboot can be either be done by Nova IronicDriver or Ironic deploy step can also do it. > So I am not sure if the spec actually satisfies the use case. > I hope to get more response from the team to get more clarity. Waitwait. The VM needs to be rebooted for the BIOS change to take effect? So (non-live) resize would actually satisfy your use case just fine. But the problem is that the ironic driver doesn't support resize at all? Without digging too hard, that seems like it would be a fairly straightforward thing to add. It would be limited to only "same host" and initially you could only change this one attribute (anything else would have to fail). Nova people, thoughts? efried . From colleen at gazlene.net Thu Jun 6 18:57:19 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Thu, 06 Jun 2019 11:57:19 -0700 Subject: [dev][keystone] M-1 check-in and retrospective meeting In-Reply-To: <627ae3a7-b998-4323-8981-2d1cd7bc3085@www.fastmail.com> References: <627ae3a7-b998-4323-8981-2d1cd7bc3085@www.fastmail.com> Message-ID: I've drafted an agenda[1] for the check-in/retrospective/review/planning meeting scheduled for next week (June 11, 1500 UTC: We'll be using jitsi.org[2] (hosted OSS video conferencing tool) for the call (I've only ever used it for one-on-one calls so we'll have to see how it performs with several people on the call). We'll keep the retrospective to no more than one hour (since we just had a retrospective that shouldn't be too hard). Still, it's an ambitious agenda for a two-hour meeting. You can help us get through it quickly by: * pre-filling out your thoughts on the retrospective etherpad[3] * reviewing the Train roadmap[4] and reflecting on the stories and tasks listed there, including updating task statuses or adding tasks where needed The agenda is a draft, so feel free to edit it or let me know if you have thoughts or questions on it. Colleen [1] https://etherpad.openstack.org/p/keystone-train-M-1-review-planning-meeting [2] https://meet.jit.si/keystone-train-m-1 [3] https://etherpad.openstack.org/p/keystone-train-m-1-retrospective [4] https://trello.com/b/ClKW9C8x/keystone-train-roadmap From corey.bryant at canonical.com Thu Jun 6 19:06:37 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Thu, 6 Jun 2019 15:06:37 -0400 Subject: [goal][python3] Train unit tests weekly update (goal-14) Message-ID: This is the goal-14 weekly update for the "Update Python 3 test runtimes for Train" goal [1]. There are 14 weeks remaining for completion of Train community goals [2]. == What's the Goal? == To ensure (in the Train cycle) that all official OpenStack repositories with Python 3 unit tests are exclusively using the 'openstack-python3-train-jobs' Zuul template or one of its variants (e.g. 'openstack-python3-train-jobs-neutron') to run unit tests, and that tests are passing. This will ensure that all official projects are running py36 and py37 unit tests in Train. That is the main goal. Other work items will consist of: * Dropping py35 and old py3 zuul templates (e.g. drop 'openstack-python35-jobs', 'openstack-python36-jobs', 'openstack-python37-jobs', etc) * Updating setup.cfg classifiers (e.g. drop 'Programming Language :: Python :: 3.5' and add 'Programming Language :: Python :: 3.7') * Updating the list of default tox.ini environment (e.g. drop py35 and add py37) For complete details please see [1]. == Role of Goal Champion == Ensure patches are proposed to all affected repositories, encourage teams to help land patches, and report weekly status. == Role of Project Teams == Fix failing tests so that the proposed patches can merge. Project teams should merge the change to the Zuul config before the end of the Train cycle. == Ongoing Work == I will be the goal champion for this goal. I'm just getting organized at this point. Over the next week I plan to get scripts working to automate patch generation for all supported projects and start submitting patches. Open patches needing reviews: https://review.openstack.org/#/q/topic:python3-train+is:open == Completed Work == Merged patches: https://review.openstack.org/#/q/topic:python3-train+is:merged (Wow, look at that! Thanks Zhong Shengping.) == How can you help? == Please take a look at the failing patches and help fix any failing unit tests for your project(s). Python 3.7 unit tests will be self-testing in zuul. If you're interested in helping submit patches, please let me know. Failing patches: https://review.openstack.org/#/q/topic:python3-train+status:open+(+label:Verified-1+OR+label:Verified-2+) == Reference Material == [1] Goal description: https://review.opendev.org/#/c/657908 [2] Train release schedule: https://releases.openstack.org/train/schedule.html (see R-5 for "Train Community Goals Completed") Storyboard: https://storyboard.openstack.org/#!/board/ Porting to Python 3.7: https://docs.python.org/3/whatsnew/3.7.html#porting-to-python-3-7 Python Update Process: https://opendev.org/openstack/governance/src/branch/master/resolutions/20181024-python-update-process.rst Train runtimes: https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/train.rst Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Jun 6 19:20:33 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 6 Jun 2019 14:20:33 -0500 Subject: [nova] Validation for requested host/node on server create In-Reply-To: <16b2cd9eb2e.119cf7a71136389.2518769832623783484@ghanshyammann.com> References: <78fa937a-beb6-c63d-01a0-40e6519928be@gmail.com> <16b2cd9eb2e.119cf7a71136389.2518769832623783484@ghanshyammann.com> Message-ID: <35cafbbf-6e97-e1e1-e3b6-a390aa1e03f5@gmail.com> On 6/6/2019 7:53 AM, gmann at ghanshyammann.com wrote: > +1 on option3. For more optimization, can we skip b) and c) for non-baremental case > assuming if there is Hostmapping then node also will be valid. You won't know it's a baremetal node until you get the ComputeNode object and check the hypervisor_type, and at that point you've already validated that it exists. -- Thanks, Matt From mriedemos at gmail.com Thu Jun 6 20:32:57 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 6 Jun 2019 15:32:57 -0500 Subject: [nova] [cyborg] Impact of moving bind to compute In-Reply-To: <1CC272501B5BC543A05DB90AA509DED52757522F@fmsmsx122.amr.corp.intel.com> References: <1CC272501B5BC543A05DB90AA509DED52757522F@fmsmsx122.amr.corp.intel.com> Message-ID: <9f9ea648-34a7-9783-3372-40325a8ced27@gmail.com> On 5/23/2019 7:00 AM, Nadathur, Sundar wrote: > Hi, > >     The feedback in the Nova – Cyborg interaction spec [1] is to move > the call for creating/binding accelerator requests (ARQs) from the > conductor (just before the call to build_and_run_instance, [2]) to the > compute manager (just before spawn, without holding the build sempahore > [3]). The point where the results of the bind are needed is in the virt > driver [4] – that is not changing. The reason for the move is to enable > Cyborg to notify Nova [5] instead of Nova virt driver polling Cyborg, > thus making the interaction similar to other services like Neutron. > > The binding involves device preparation by Cyborg, which may take some > time (ballpark: milliseconds to few seconds to perhaps 10s of seconds – > of course devices vary a lot). We want to overlap as much of this as > possible with other tasks, by starting the binding as early as possible > and making it asynchronous, so that bulk VM creation rate etc. are not > affected. These considerations are probably specific to Cyborg, so > trying to make it uniform with other projects deserve a closer look > before we commit to it. > > Moving the binding from [2] to [3] reduces this overlap. I did some > measurements of the time window from [2] to [3]: it was consistently > between 20 and 50 milliseconds, whether I launched 1 VM at a time, 2 at > a time, etc. This seems acceptable. > > But this was just in a two-node deployment. Are there situations where > this window could get much larger (thus reducing the overlap)? Such as > in larger deployments, or issues with RabbitMQ messaging, etc. Are there > larger considerations of performance or scaling for this approach? > > Thanks in advance. > > [1] https://review.opendev.org/#/c/603955/ > > [2] > https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L1501 > > [3] > https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1882 > > [4] > https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3215 > > > [5] https://wiki.openstack.org/wiki/Nova/ExternalEventAPI > > Regards, > > Sundar > I'm OK with binding in the compute since that's where we trigger the callback event and want to setup something to wait for it before proceeding, like we do with port binding. What I've talked about in detail in the spec is doing the ARQ *creation* in conductor rather than compute. I realize that doing the creation in the compute service means fewer (if any) RPC API changes to get phase 1 of this code going, but I can't imagine any RPC API changes for that would be very big (it's a new parameter to the compute service methods, or something we lump into the RequestSpec). The bigger concern I have is that we've long talked about moving port (and at times volume) creation from the compute service to conductor because it's less expensive to manage external resources there if something fails, e.g. going over-quota creating volumes. The problem with failing late in the compute is we have to cleanup other things (ports and volumes) and then reschedule, which may also fail on the next alternate host. Failing fast in conductor is more efficient and also helps take some of the guesswork out of which service is managing the resources (we've had countless bugs over the years about ports and volumes being leaked because we didn't clean them up properly on failure). Take a look at any of the error handling in the server create flow in the ComputeManager and you'll see what I'm talking about. Anyway, if we're voting I vote that ARQ creation happens in conductor and binding happens in compute. -- Thanks, Matt From guoyongxhzhf at h3c.com Wed Jun 5 08:59:36 2019 From: guoyongxhzhf at h3c.com (Guoyong) Date: Wed, 5 Jun 2019 08:59:36 +0000 Subject: [airship] Is Ironic ready for Airship? Message-ID: I know Airship choose Maas as bare mental management tool. I want to know whether Maas is more suitable for Airship when it comes to under- infrastructure? If Maas is more suitable, then what feature should ironic develop? Thanks for your reply ------------------------------------------------------------------------------------------------------------------------------------- 本邮件及其附件含有新华三集团的保密信息,仅限于发送给上面地址中列出 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 邮件! This e-mail and its attachments contain confidential information from New H3C, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Anirudh.Gupta at hsc.com Thu Jun 6 04:59:28 2019 From: Anirudh.Gupta at hsc.com (Anirudh Gupta) Date: Thu, 6 Jun 2019 04:59:28 +0000 Subject: Unable to run ssh/iperf on StarlingX Vm Message-ID: Hi Team, I have created All in one simplex setup using release 2018.10. I have spawned 2 VM’s on it. The ping is successful between the VM’s, but I am unable to ssh or run iperf on it. Can you please help me in resolving the issue. Regards अनिरुद्ध गुप्ता (वरिष्ठ अभियंता) Hughes Systique Corporation D-23,24 Infocity II, Sector 33, Gurugram, Haryana 122001 DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Thu Jun 6 21:52:54 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 6 Jun 2019 16:52:54 -0500 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: References: Message-ID: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> Let me TL;DR this: The forbidden aggregates filter spec [1] says when we put trait metadata onto a host aggregate, we should add the same trait to the compute node RPs for all the hosts in that aggregate, so that the feature actually works when we use it. But we never talked about what to do when we *remove* a trait from such an aggregate, or trash an aggregate with traits, or remove a host from such an aggregate. Here are the alternatives, as Vrushali laid them out (letters added by me): > (a) Leave all traits alone. If they need to be removed, it would have to > be manually via a separate step. > > (b) Support a new option so the caller can dictate whether the operation > should remove the traits. (This is all-or-none.) > > (c) Define a "namespace" - a trait substring - and remove only traits in > that namespace. I'm going to -1 (b). It's too big a hammer, at too big a cost (including API changes). > If I’m not wrong, for last two approaches, we would need to change > RestFul APIs. No, (c) does not. By "define a namespace" I mean we would establish a naming convention for traits to be used with this feature. For example: CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE And when we do any of the removal things, we always and only remove any trait containing the substring _AGGREGATE_ISOLATION_ (assuming it's not asserted by other aggregates in which the host is also a member, yatta yatta). IMO (a) and (c) both suck, but (c) is a slightly better experience for the user. efried [1] http://specs.openstack.org/openstack/nova-specs/specs/train/approved/placement-req-filter-forbidden-aggregates.html P.S. > Disclaimer: This email and any attachments are sent in strictest > confidence for the sole use of the addressee and may contain legally > privileged, confidential, and proprietary data. If you are not the > intended recipient, please advise the sender by replying promptly to > this email and then delete and destroy this email and any attachments > without any further use, copying or forwarding. Hear that, pipermail? From smooney at redhat.com Thu Jun 6 23:31:54 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 07 Jun 2019 00:31:54 +0100 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> Message-ID: <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> On Thu, 2019-06-06 at 16:52 -0500, Eric Fried wrote: > Let me TL;DR this: > > The forbidden aggregates filter spec [1] says when we put trait metadata > onto a host aggregate, we should add the same trait to the compute node > RPs for all the hosts in that aggregate, so that the feature actually > works when we use it. > > But we never talked about what to do when we *remove* a trait from such > an aggregate, or trash an aggregate with traits, or remove a host from > such an aggregate. > > Here are the alternatives, as Vrushali laid them out (letters added by me): > > > (a) Leave all traits alone. If they need to be removed, it would have to > > be manually via a separate step. > > > > (b) Support a new option so the caller can dictate whether the operation > > should remove the traits. (This is all-or-none.) > > > > (c) Define a "namespace" - a trait substring - and remove only traits in > > that namespace. > > I'm going to -1 (b). It's too big a hammer, at too big a cost (including > API changes). > > > If I’m not wrong, for last two approaches, we would need to change > > RestFul APIs. > > No, (c) does not. By "define a namespace" I mean we would establish a > naming convention for traits to be used with this feature. For example: > > CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE i personaly dislike c as it means we cannot use any standard traits in host aggrates. there is also an option d. when you remove a trait form a host aggregate for each host in the aggregate check if that traits exists on another aggregate the host is a member of and remove it if not found on another aggregate. > > And when we do any of the removal things, we always and only remove any > trait containing the substring _AGGREGATE_ISOLATION_ (assuming it's not > asserted by other aggregates in which the host is also a member, yatta > yatta). > > IMO (a) and (c) both suck, but (c) is a slightly better experience for > the user. c is only a good option if we are talking about a specific set of new traits for this usecase e.g. CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE but if we want to allow setting tratis genericaly on hosts via host-aggrages its not really that usefull. for example we talked about useing a hyperthreading trait in the cpu pinning spec which will not be managed by the compute driver. host aggages would be a convient way to be able to manage that trait if this was a generic feature. for c you still have to deal with the fact a host can be in multiple host aggrates too by the way so jsut because a thread is namespace d and it is removed from an aggrate does not mean its correcct to remove it from a host. From jrist at redhat.com Fri Jun 7 04:34:18 2019 From: jrist at redhat.com (Jason Rist) Date: Thu, 6 Jun 2019 22:34:18 -0600 Subject: Retiring TripleO-UI - no longer supported In-Reply-To: <3924F5DE-314C-4D41-8CEA-DCF7A2A2CDEA@redhat.com> References: <3924F5DE-314C-4D41-8CEA-DCF7A2A2CDEA@redhat.com> Message-ID: Follow-up - this work is now done. https://review.opendev.org/#/q/topic:retire_tripleo_ui+(status:open+OR+status:merged) -J Jason Rist Red Hat  jrist / knowncitizen > On May 23, 2019, at 2:35 PM, Jason Rist wrote: > > Hi everyone - I’m writing the list to announce that we are retiring TripleO-UI and it will no longer be supported. It’s already deprecated in Zuul and removed from requirements, so I’ve submitted a patch to remove all code. > > https://review.opendev.org/661113 > > Thanks, > Jason > > Jason Rist > Red Hat  > jrist / knowncitizen > ` -------------- next part -------------- An HTML attachment was scrubbed... URL: From sundar.nadathur at intel.com Fri Jun 7 05:17:30 2019 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Fri, 7 Jun 2019 05:17:30 +0000 Subject: [nova] [cyborg] Impact of moving bind to compute In-Reply-To: <9f9ea648-34a7-9783-3372-40325a8ced27@gmail.com> References: <1CC272501B5BC543A05DB90AA509DED52757522F@fmsmsx122.amr.corp.intel.com> <9f9ea648-34a7-9783-3372-40325a8ced27@gmail.com> Message-ID: <1CC272501B5BC543A05DB90AA509DED527591596@fmsmsx122.amr.corp.intel.com> > -----Original Message----- > From: Matt Riedemann > Sent: Thursday, June 6, 2019 1:33 PM > To: openstack-discuss at lists.openstack.org > Subject: Re: [nova] [cyborg] Impact of moving bind to compute > > On 5/23/2019 7:00 AM, Nadathur, Sundar wrote: > > [....] > > Moving the binding from [2] to [3] reduces this overlap. I did some > > measurements of the time window from [2] to [3]: it was consistently > > between 20 and 50 milliseconds, whether I launched 1 VM at a time, 2 > > at a time, etc. This seems acceptable. > > > > [2] https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L1501 > > > > [3] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1882 > > Regards, > > > > Sundar > I'm OK with binding in the compute since that's where we trigger the callback > event and want to setup something to wait for it before proceeding, like we > do with port binding. > > What I've talked about in detail in the spec is doing the ARQ *creation* in > conductor rather than compute. I realize that doing the creation in the > compute service means fewer (if any) RPC API changes to get phase 1 of this > code going, but I can't imagine any RPC API changes for that would be very big > (it's a new parameter to the compute service methods, or something we lump > into the RequestSpec). > The bigger concern I have is that we've long talked about moving port (and at > times volume) creation from the compute service to conductor because it's > less expensive to manage external resources there if something fails, e.g. > going over-quota creating volumes. The problem with failing late in the > compute is we have to cleanup other things (ports and volumes) and then > reschedule, which may also fail on the next alternate host. The ARQ creation could be done at [1], followed by the binding, before acquiring the semaphore or creating other resources. Why is that not a good option? [1] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1898 > Failing fast in > conductor is more efficient and also helps take some of the guesswork out of > which service is managing the resources (we've had countless bugs over the > years about ports and volumes being leaked because we didn't clean them up > properly on failure). Take a look at any of the error handling in the server > create flow in the ComputeManager and you'll see what I'm talking about. > > Anyway, if we're voting I vote that ARQ creation happens in conductor and > binding happens in compute. > > -- > > Thanks, > > Matt Regards, Sundar From aj at suse.com Fri Jun 7 05:24:32 2019 From: aj at suse.com (Andreas Jaeger) Date: Fri, 7 Jun 2019 07:24:32 +0200 Subject: Retiring TripleO-UI - no longer supported In-Reply-To: References: <3924F5DE-314C-4D41-8CEA-DCF7A2A2CDEA@redhat.com> Message-ID: <0583152c-5a85-a34d-577e-e7789cac344b@suse.com> On 07/06/2019 06.34, Jason Rist wrote: > Follow-up - this work is now done. > > https://review.opendev.org/#/q/topic:retire_tripleo_ui+(status:open+OR+status:merged) > Not yet for ansible-role-tripleo-ui - please remove the repo from project-config and governance repo, step 4 and 5 of https://docs.openstack.org/infra/manual/drivers.html#retiring-a-project are missing. Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From madhuri.kumari at intel.com Fri Jun 7 06:53:49 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Fri, 7 Jun 2019 06:53:49 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC17D2B@BGSMSX102.gar.corp.intel.com> Hi Jay, >-----Original Message----- >From: Jay Pipes [mailto:jaypipes at gmail.com] >The absence of a trait on a provider should be represented by the provider >not having a trait. Just have a single trait "CUSTOM_HYPERTHREADING" that >you either place on the provider or do not place on a provider. > >The flavor should then either request that the trait be present on a provider >that the instance is scheduled to >(trait:CUSTOM_HYPERTHREADING=required) or that the trait should *not* >be present on a provider that the instance is scheduled to >(trait:CUSTOM_HYPERTHREADING=forbidden). > I understand that these traits are used for scheduling while server create in Nova. Whereas these traits means more to Ironic. Ironic runs multiple deploy steps matching the name of traits in flavor[1]. The use case explained in the email is about changing some BIOS configuration post server create. By changing the trait in flavor from CUSTOM_HYPERTHREADING_ON to CUSTOM_HYPERTHREADING_OFF, Ironic should run the matching deploy step to disable hyperthreading in BIOS and do a reboot. But currently there isn't a way in Nova about telling Ironic about the trait has changed in flavor, so perform the corresponding deploy steps. [1] https://docs.openstack.org/ironic/stein/admin/node-deployment.html#matching-deploy-templates Regards, Madhuri From balazs.gibizer at est.tech Fri Jun 7 07:38:31 2019 From: balazs.gibizer at est.tech (=?utf-8?B?QmFsw6F6cyBHaWJpemVy?=) Date: Fri, 7 Jun 2019 07:38:31 +0000 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> Message-ID: <1559893108.11890.1@smtp.office365.com> On Fri, Jun 7, 2019 at 1:31 AM, Sean Mooney wrote: > On Thu, 2019-06-06 at 16:52 -0500, Eric Fried wrote: >> Let me TL;DR this: >> >> The forbidden aggregates filter spec [1] says when we put trait >> metadata >> onto a host aggregate, we should add the same trait to the compute >> node >> RPs for all the hosts in that aggregate, so that the feature >> actually >> works when we use it. >> >> But we never talked about what to do when we *remove* a trait from >> such >> an aggregate, or trash an aggregate with traits, or remove a host >> from >> such an aggregate. >> >> Here are the alternatives, as Vrushali laid them out (letters added >> by me): >> >> > (a) Leave all traits alone. If they need to be removed, it would >> have to >> > be manually via a separate step. >> > >> > (b) Support a new option so the caller can dictate whether the >> operation >> > should remove the traits. (This is all-or-none.) >> > >> > (c) Define a "namespace" - a trait substring - and remove only >> traits in >> > that namespace. >> >> I'm going to -1 (b). It's too big a hammer, at too big a cost >> (including >> API changes). >> >> > If Iʼm not wrong, for last two approaches, we would need to >> change >> > RestFul APIs. >> >> No, (c) does not. By "define a namespace" I mean we would establish >> a >> naming convention for traits to be used with this feature. For >> example: >> >> CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE > i personaly dislike c as it means we cannot use any standard traits > in host > aggrates. > > there is also an option d. when you remove a trait form a host > aggregate for each host in > the aggregate check if that traits exists on another aggregate the > host is a member of and remove > it if not found on another aggregate. Besides possible performance impacts, I think this would be the logical behavior from nova to do. cheers, gibi From madhuri.kumari at intel.com Fri Jun 7 08:48:07 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Fri, 7 Jun 2019 08:48:07 +0000 Subject: '[Nova][Ironic] Reset Configurations in Baremetals Post Provisioning Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC17DEA@BGSMSX102.gar.corp.intel.com> Hi Eric, >-----Original Message----- >From: Eric Fried [mailto:openstack at fried.cc] >Waitwait. The VM needs to be rebooted for the BIOS change to take effect? >So (non-live) resize would actually satisfy your use case just fine. But the >problem is that the ironic driver doesn't support resize at all? > Yes you're right here. But the resize in Nova does migration which we don't want to do. Just apply the new traits and IronicDriver will run the matching deploy steps. >Without digging too hard, that seems like it would be a fairly straightforward >thing to add. It would be limited to only "same host" >and initially you could only change this one attribute (anything else would >have to fail). > >Nova people, thoughts? > >efried >. Regards, Madhuri From Chris.Winnicki at windriver.com Thu Jun 6 21:38:56 2019 From: Chris.Winnicki at windriver.com (Winnicki, Chris) Date: Thu, 6 Jun 2019 21:38:56 +0000 Subject: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm In-Reply-To: References: Message-ID: <7E4792BA14B1DE4BAB354DF77FE0233ABC8CE3D9@ALA-MBD.corp.ad.wrs.com> Anirudh, can you provide some details with respect to: 1) How are you pinging from one VM to the other (is it over the graphical console ? or namespace ?) 2) What VM image are you using? - Is the VM image enabled for SSH with password ? (assuming sshd is running) 3) Network topology 4) Are you trying to ssh from one VM to the other or from a different network segment? Is there a virtual router in the picture, etc.. Chris Winnicki chris.winnicki at windriver.com 613-963-1329 ________________________________ From: Anirudh Gupta [Anirudh.Gupta at hsc.com] Sent: Thursday, June 06, 2019 12:59 AM To: openstack at lists.openstack.org; openstack-dev at lists.openstack.org; starlingx-announce at lists.starlingx.io; starlingx-discuss at lists.starlingx.io Subject: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm Hi Team, I have created All in one simplex setup using release 2018.10. I have spawned 2 VM’s on it. The ping is successful between the VM’s, but I am unable to ssh or run iperf on it. Can you please help me in resolving the issue. Regards अनिरुद्ध गुप्ता (वरिष्ठ अभियंता) Hughes Systique Corporation D-23,24 Infocity II, Sector 33, Gurugram, Haryana 122001 DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Anirudh.Gupta at hsc.com Fri Jun 7 02:48:45 2019 From: Anirudh.Gupta at hsc.com (Anirudh Gupta) Date: Fri, 7 Jun 2019 02:48:45 +0000 Subject: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm In-Reply-To: <7E4792BA14B1DE4BAB354DF77FE0233ABC8CE3D9@ALA-MBD.corp.ad.wrs.com> References: <7E4792BA14B1DE4BAB354DF77FE0233ABC8CE3D9@ALA-MBD.corp.ad.wrs.com> Message-ID: Hi Chris, I am pinging from one VM to another over graphical console, which is successful. But when I try to run ssh or iperf, then there is no success. I am using Ubuntu 16.04 Image, which is ssh enabled and yes the service sshd is running. I have created a flat network as well as vlan network and tried doing ssh/iperf on both, but with no success. There is no virtual router. I am suspecting the issue mentioned in the below bug https://bugs.launchpad.net/starlingx/+bug/1790514 But I have no understanding as why it is happening. Regards Anirudh Gupta (Senior Engineer) From: Winnicki, Chris Sent: 07 June 2019 03:09 To: Anirudh Gupta ; openstack at lists.openstack.org; openstack-dev at lists.openstack.org; starlingx-announce at lists.starlingx.io; starlingx-discuss at lists.starlingx.io Subject: RE: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm Anirudh, can you provide some details with respect to: 1) How are you pinging from one VM to the other (is it over the graphical console ? or namespace ?) 2) What VM image are you using? - Is the VM image enabled for SSH with password ? (assuming sshd is running) 3) Network topology 4) Are you trying to ssh from one VM to the other or from a different network segment? Is there a virtual router in the picture, etc.. Chris Winnicki chris.winnicki at windriver.com 613-963-1329 ________________________________ From: Anirudh Gupta [Anirudh.Gupta at hsc.com] Sent: Thursday, June 06, 2019 12:59 AM To: openstack at lists.openstack.org; openstack-dev at lists.openstack.org; starlingx-announce at lists.starlingx.io; starlingx-discuss at lists.starlingx.io Subject: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm Hi Team, I have created All in one simplex setup using release 2018.10. I have spawned 2 VM’s on it. The ping is successful between the VM’s, but I am unable to ssh or run iperf on it. Can you please help me in resolving the issue. Regards अनिरुद्ध गुप्ता (वरिष्ठ अभियंता) Hughes Systique Corporation D-23,24 Infocity II, Sector 33, Gurugram, Haryana 122001 DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaypipes at gmail.com Fri Jun 7 12:07:19 2019 From: jaypipes at gmail.com (Jay Pipes) Date: Fri, 7 Jun 2019 08:07:19 -0400 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <0512CBBECA36994BAA14C7FEDE986CA60FC17D2B@BGSMSX102.gar.corp.intel.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> <0512CBBECA36994BAA14C7FEDE986CA60FC17D2B@BGSMSX102.gar.corp.intel.com> Message-ID: <06c5a684-4825-5f86-ae46-1c3e89389b2e@gmail.com> On 6/7/19 2:53 AM, Kumari, Madhuri wrote: > Hi Jay, > >> -----Original Message----- >> From: Jay Pipes [mailto:jaypipes at gmail.com] > > >> The absence of a trait on a provider should be represented by the provider >> not having a trait. Just have a single trait "CUSTOM_HYPERTHREADING" that >> you either place on the provider or do not place on a provider. >> >> The flavor should then either request that the trait be present on a provider >> that the instance is scheduled to >> (trait:CUSTOM_HYPERTHREADING=required) or that the trait should *not* >> be present on a provider that the instance is scheduled to >> (trait:CUSTOM_HYPERTHREADING=forbidden). >> > > I understand that these traits are used for scheduling while server create in Nova. Whereas these traits means more to Ironic. Ironic runs multiple deploy steps matching the name of traits in flavor[1]. > > The use case explained in the email is about changing some BIOS configuration post server create. By changing the trait in flavor from CUSTOM_HYPERTHREADING_ON to CUSTOM_HYPERTHREADING_OFF, Ironic should run the matching deploy step to disable hyperthreading in BIOS and do a reboot. > But currently there isn't a way in Nova about telling Ironic about the trait has changed in flavor, so perform the corresponding deploy steps. > > [1] https://docs.openstack.org/ironic/stein/admin/node-deployment.html#matching-deploy-templates Yes, I understand that theses aren't really traits but are actually configuration information. However, what I'm saying is that if you pass the flavor information during resize (as Eric has suggested), then you don't need *two* trait strings (one for CUSTOM_HYPERTHREADING_ON and one for CUSTOM_HYPERTHREADING_OFF). You only need the single CUSTOM_HYPERTHREADING trait and the driver should simply look for the absence of that trait (or, alternately, the flavor saying "=forbid" instead of "=required". Better still, add a standardized trait to os-traits for hyperthreading support, which is what I'd recommended in the original cpu-resource-tracking spec. Best, -jay From iurygregory at gmail.com Fri Jun 7 12:14:46 2019 From: iurygregory at gmail.com (Iury Gregory) Date: Fri, 7 Jun 2019 14:14:46 +0200 Subject: [ironic] Should we add ironic-prometheus-exporter under Ironic umbrella? Message-ID: Greetings Ironicers! I would like to have your input on the matter of moving the ironic-prometheus-exporter to Ironic umbrella. *What is the ironic-prometheus-exporter? * The ironic-prometheus-exporter[1] provides a way to export hardware sensor data from Ironic project in OpenStack to Prometheus [2]. It's implemented as an oslo-messaging notification driver to get the sensor data and a Flask Application to export the metrics to Prometheus. It can not only be used in metal3-io but also in any OpenStack deployment which includes Ironic service. *How to ensure the sensor data will follow the Prometheus format?* We are using the prometheus client_python [3] to generate the file with the metrics that come trough the oslo notifier plugin. *How it will be tested on the gate?* Virtualbmc can't provide sensor data that the actual plugin supports. We would collect sample metrics from the hardware and use it in the unit tests. Maybe we should discuss this in the next ironic weekly meeting (10th June)? [1] https://github.com/metal3-io/ironic-prometheus-exporter [2] https://prometheus.io/ [3] https://github.com/prometheus/client_python -- *Att[]'sIury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Part of the puppet-manager-core team in OpenStack* *Software Engineer at Red Hat Czech* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdent+os at anticdent.org Fri Jun 7 12:17:47 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Fri, 7 Jun 2019 13:17:47 +0100 (BST) Subject: [placement] update 19-22 Message-ID: HTML: https://anticdent.org/placement-update-19-22.html Welcome to placement update 19-22. # Most Important We are continuing to work through issues associated with the [spec for nested magic](https://review.opendev.org/662191). Unsurprisingly, there are edge cases where we need to be sure we're doing the right thing, both in terms of satisfying the use cases as well as making sure we don't violate the general model of how things are supposed to work. # What's Changed * We've had a few responses on [the thread to determine the fate of can_split](http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006726.html). The consensus at this point is to not worry about workloads that mix NUMA-aware guests with non-NUMA-aware on the same host. * Support forbidden traits (microversion 1.22) has been added to osc-placement. * Office hours will be 1500 UTC on Wednesdays. * os-traits 0.13.0 and 0.14.0 were released. * Code to optionaly run a [wsgi profiler](https://docs.openstack.org/placement/latest/contributor/testing.html#profiling) in placement has merged. * The request group mapping in allocation candidates spec has merged, more on that in themes, below. # Specs/Features * Support Consumer Types. This has some open questions that need to be addressed, but we're still go on the general idea. * Spec for nested magic 1. The easier parts of nested magic: same_subtree, resource request groups, verbose suffixes (already merged as 1.33). Recently some new discussion here. These and other features being considered can be found on the [feature worklist](https://storyboard.openstack.org/#!/worklist/594). Some non-placement specs are listed in the Other section below. # Stories/Bugs (Numbers in () are the change since the last pupdate.) There are 20 (1) stories in [the placement group](https://storyboard.openstack.org/#!/project_group/placement). 0 (0) are [untagged](https://storyboard.openstack.org/#!/worklist/580). 3 (1) are [bugs](https://storyboard.openstack.org/#!/worklist/574). 4 (0) are [cleanups](https://storyboard.openstack.org/#!/worklist/575). 11 (0) are [rfes](https://storyboard.openstack.org/#!/worklist/594). 2 (0) are [docs](https://storyboard.openstack.org/#!/worklist/637). If you're interested in helping out with placement, those stories are good places to look. * Placement related nova [bugs not yet in progress](https://goo.gl/TgiPXb) on launchpad: 15 (-1). * Placement related nova [in progress bugs](https://goo.gl/vzGGDQ) on launchpad: 7 (0). # osc-placement osc-placement is currently behind by 11 microversions. Pending Changes: * Add 'resource provider inventory update' command (that helps with aggregate allocation ratios). * Provide a useful message in the case of 500-error # Main Themes ## Nested Magic The overview of the features encapsulated by the term "nested magic" are in a [story](https://storyboard.openstack.org/#!/story/2005575). There is some in progress code, some of it WIPs to expose issues: * PoC: resourceless request, including some code from WIP: Allow RequestGroups without resources * Add NUMANetworkFixture for gabbits * Prepare objects for allocation request mappings. This work exposed a [bug in hash handling](https://storyboard.openstack.org/#!/story/2005822) that is [being fixed](https://review.opendev.org/663137). ## Consumer Types Adding a type to consumers will allow them to be grouped for various purposes, including quota accounting. A [spec](https://review.opendev.org/654799) has started. There are some questions about request and response details that need to be resolved, but the overall concept is sound. ## Cleanup We continue to do cleanup work to lay in reasonable foundations for the nested work above. As a nice bonus, we keep eking out additional performance gains too. * Add olso.middleware.cors to conf generator * Modernize CORS config and setup. Ed Leafe's ongoing work with using a graph database probably needs some kind of report or update. # Other Placement Miscellaneous changes can be found in [the usual place](https://review.opendev.org/#/q/project:openstack/placement+status:open). There are several [os-traits changes](https://review.opendev.org/#/q/project:openstack/os-traits+status:open) being discussed. # Other Service Users New discoveries are added to the end. Merged stuff is removed. Anything that has had no activity in 4 weeks has been removed. * Nova: spec: support virtual persistent memory * Nova: nova-manage: heal port allocations * nova-spec: Allow compute nodes to use DISK_GB from shared storage RP * Cyborg: Placement report * Nova: Spec to pre-filter disabled computes with placement * rpm-packaging: placement service * Delete resource providers for all nodes when deleting compute service * nova fix for: Drop source node allocations if finish_resize fails * nova: WIP: Hey let's support routed networks y'all! * starlingx: Add placement chart patch to openstack-helm * helm: WIP: add placement chart * kolla-ansible: Add a explanatory note for "placement_api_port" * neutron-spec: L3 agent capacity and scheduling * Nova: Use OpenStack SDK for placement * puppet: Implement generic placement::config::placement_config * puppet: Add parameter for `scheduler/query_placement_for_image_type_support` * Nova: Spec: Provider config YAML file * Nova: single pass instance info fetch in host manager * Watcher: Add Placement helper * docs: Add Placement service to Minimal deployment for Stein * devstack: Add setting of placement microversion on tempest conf * libvirt: report pmem namespaces resources by provider tree * Nova: Defaults missing group_policy to 'none' * Nova: Remove PlacementAPIConnectFailure handling from AggregateAPI # End Making good headway. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From smooney at redhat.com Fri Jun 7 12:23:31 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 07 Jun 2019 13:23:31 +0100 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <1559893108.11890.1@smtp.office365.com> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> <1559893108.11890.1@smtp.office365.com> Message-ID: <249c58bce0376638f713d2165782b6f094e319b3.camel@redhat.com> On Fri, 2019-06-07 at 07:38 +0000, Balázs Gibizer wrote: > > On Fri, Jun 7, 2019 at 1:31 AM, Sean Mooney wrote: > > On Thu, 2019-06-06 at 16:52 -0500, Eric Fried wrote: > > > Let me TL;DR this: > > > > > > The forbidden aggregates filter spec [1] says when we put trait > > > metadata > > > onto a host aggregate, we should add the same trait to the compute > > > node > > > RPs for all the hosts in that aggregate, so that the feature > > > actually > > > works when we use it. > > > > > > But we never talked about what to do when we *remove* a trait from > > > such > > > an aggregate, or trash an aggregate with traits, or remove a host > > > from > > > such an aggregate. > > > > > > Here are the alternatives, as Vrushali laid them out (letters added > > > by me): > > > > > > > (a) Leave all traits alone. If they need to be removed, it would > > > have to > > > > be manually via a separate step. > > > > > > > > (b) Support a new option so the caller can dictate whether the > > > operation > > > > should remove the traits. (This is all-or-none.) > > > > > > > > (c) Define a "namespace" - a trait substring - and remove only > > > traits in > > > > that namespace. > > > > > > I'm going to -1 (b). It's too big a hammer, at too big a cost > > > (including > > > API changes). > > > > > > > If Iʼm not wrong, for last two approaches, we would need to > > > change > > > > RestFul APIs. > > > > > > No, (c) does not. By "define a namespace" I mean we would establish > > > a > > > naming convention for traits to be used with this feature. For > > > example: > > > > > > CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE > > > > i personaly dislike c as it means we cannot use any standard traits > > in host > > aggrates. > > > > there is also an option d. when you remove a trait form a host > > aggregate for each host in > > the aggregate check if that traits exists on another aggregate the > > host is a member of and remove > > it if not found on another aggregate. > > Besides possible performance impacts, I think this would be the logical > behavior from nova to do. option d is worstcase aproxmatly O(NlogN) but is technicall between O(n) and O(NM) where N is the number of instance and M is the maxium number of aggrates a host is a memeber of. so it grows non linearly but the plroblem is not quadratic and is much closer to O(N) or O(NlogN) then it is to O(N^2) so long if we are smart about how we look up the data form the db its probably ok. we are basically asking for all the host in this aggrate, give me the hosts that dont have another aggrate with the trait i am about to remove form this aggragte. those host are the set of host we need to update in placemnt and sql is good at anserwing that type of question. if we do this in python it will make me sad. > > cheers, > gibi > From zoltan.langi at namecheap.com Fri Jun 7 12:53:56 2019 From: zoltan.langi at namecheap.com (Zoltan Langi) Date: Fri, 7 Jun 2019 14:53:56 +0200 Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan Message-ID: <19f2868b-419f-cee3-f137-dcf277719e42@namecheap.com> Hello everyone, I hope someone more experienced can help me with this problem I've been struggling for a while now. I'm trying to set up ASAP2 ovs vxlan offload on a dual port Mellanox ConnectX5 card between 2 hosts using LACP link aggregation and Rocky release. When the LAG is not there, only one pf is being used, the offload works just fine, getting the line speed out of the vf-s. (I initially followed this ASAP2 guide, works well: https://community.mellanox.com/s/article/getting-started-with-mellanox-asap-2) Decided, to provide HA for the vf-s, may as well use LACP as it's supported for ASAP2 according to Mellanox: https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf (page15) So what I've done I've created a systemd script that puts the eswitch into switchdev mode on both ports before the networking starts at boot time, the bond0 comes up after the mode was changed just like in the docs. The bond0 interface wasn't added to the ovs as the doc recommends as it keeps the vxlan tunnel, only the vf is there after openstack creates the vm. The problem is only one direction of the traffic is offloaded when LAG is being used. I opened a mellanox case and they recommended to install the latest ovs version which I did: https://community.mellanox.com/s/question/0D51T00006kXkRzSAK/connectx5-asap2-vxlan-offload-bond-openstack-problem After using the latest OVS from master, the problem still exist, the offload simply doesn't work properly. The speed I am getting is way far away from the values that I get when only a single port is used. Does anyone has any experience or any idea what should I look our for or check? Thank you very much, anything is appreciated! Zoltan From openstack at fried.cc Fri Jun 7 14:05:58 2019 From: openstack at fried.cc (Eric Fried) Date: Fri, 7 Jun 2019 09:05:58 -0500 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> Message-ID: <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> >>> (a) Leave all traits alone. If they need to be removed, it would have to >>> be manually via a separate step. >>> >>> (b) Support a new option so the caller can dictate whether the operation >>> should remove the traits. (This is all-or-none.) >>> >>> (c) Define a "namespace" - a trait substring - and remove only traits in >>> that namespace. >> >> I'm going to -1 (b). It's too big a hammer, at too big a cost (including >> API changes). >> >>> If I’m not wrong, for last two approaches, we would need to change >>> RestFul APIs. >> >> No, (c) does not. By "define a namespace" I mean we would establish a >> naming convention for traits to be used with this feature. For example: >> >> CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE > i personaly dislike c as it means we cannot use any standard traits in host > aggrates. Actually, it means we *can*, whereas (b) and (d) mean we *can't*. That's the whole point. If we want to isolate our hyperthreading hosts, we put them in an aggregate with HW_CPU_HYPERTHREADING on it. The sync here should be a no-op because those hosts should already have HW_CPU_HYPERTHREADING on them. And then if we decide to remove such a host, or destroy the aggregate, or whatever, we *don't want* HW_CPU_HYPERTHREADING to be removed from the providers, because they can still do that. (Unless you mean we can't make a standard trait that we can use for isolation that gets (conditionally) removed in these scenarios? There's nothing preventing us from creating a standard trait called COMPUTE_AGGREGATE_ISOLATION_WINDOWS_LICENSE, which would work just the same.) > there is also an option d. when you remove a trait form a host aggregate for each host in > the aggregate check if that traits exists on another aggregate the host is a member of and remove > it if not found on another aggregate. Sorry I wasn't clear, (b) also does this ^ but with the condition that also checks for the _AGGREGATE_ISOLATION_ infix. > for c you still have to deal with the fact a host can be in multiple host aggrates too by > the way so jsut because a thread is namespace d and it is removed from an aggrate does not > mean its correcct to remove it from a host. Right - in reality, there should be one algorithm, idempotent, to sync host RP traits when anything happens to aggregates. It always goes out and does the appropriate {set} math to decide which traits should exist on which hosts and effects any necessary changes. And yes, the performance will suck in a large deployment, because we have to get all the compute RPs in all the aggregates (even the ones with no trait metadata) to do that calculation. But aggregate operations are fairly rare, aren't they? Perhaps this is where we provide a nova-manage tool to do (b)'s sync manually (which we'll surely have to do anyway as a "heal" or for upgrades). So if you're not using the feature, you don't suffer the penalty. > for example we talked about useing a hyperthreading trait in the cpu pinning spec which > will not be managed by the compute driver. host aggages would be a convient way > to be able to manage that trait if this was a generic feature. Oh, okay, yeah, I don't accept this as a use case for this feature. It will work, but we shouldn't recommend it precisely because it's asymmetrical (you can't remove the trait by removing it from the aggregate). There are other ways to add a random trait to all hosts in an aggregate (for host in `get providers in aggregate`; do openstack resource provider trait add ...; done). But for the sake of discussion, what about: (e) Fully manual. Aggregate operations never touch (add or remove) traits on host RPs. You always have to do that manually. As noted above, it's easy to do - and we could make it easier with a tiny wrapper that takes an aggregate, a list of traits, and an --add/--remove command. So initially, setting up aggregate isolation is a two-step process, and in the future we can consider making new API/CLI affordance that combines the steps. efried . From openstack at fried.cc Fri Jun 7 15:23:48 2019 From: openstack at fried.cc (Eric Fried) Date: Fri, 7 Jun 2019 10:23:48 -0500 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <06c5a684-4825-5f86-ae46-1c3e89389b2e@gmail.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> <0512CBBECA36994BAA14C7FEDE986CA60FC17D2B@BGSMSX102.gar.corp.intel.com> <06c5a684-4825-5f86-ae46-1c3e89389b2e@gmail.com> Message-ID: <85189fc6-0394-8330-9b6e-ad5abdd8438b@fried.cc> > Better still, add a standardized trait to os-traits for hyperthreading > support, which is what I'd recommended in the original > cpu-resource-tracking spec. HW_CPU_HYPERTHREADING was added via [1] and has been in os-traits since 0.8.0. efried [1] https://review.opendev.org/#/c/576030/ From mriedemos at gmail.com Fri Jun 7 16:01:36 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 7 Jun 2019 11:01:36 -0500 Subject: [all] Long overdue cleanups of Zuulv2 compatibility base configs In-Reply-To: <5980f30f-d4c7-4e0d-8d3a-984e2dcd707a@www.fastmail.com> References: <5980f30f-d4c7-4e0d-8d3a-984e2dcd707a@www.fastmail.com> Message-ID: <6e5aad37-75ab-a53f-39fb-7b1de68dbeb2@gmail.com> On 6/4/2019 5:45 PM, Clark Boylan wrote: > Once this is in you can push "Do Not Merge" changes to your zuul config that reparent your tests from "base" to "base-test" and that will run the jobs without the zuul-cloner shim. The nova dsvm jobs inherit from legacy-dsvm-base which doesn't have a parent (it's abstract). Given that, how would I go about testing this change on the nova legacy jobs? -- Thanks, Matt From fungi at yuggoth.org Fri Jun 7 16:42:45 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 7 Jun 2019 16:42:45 +0000 Subject: [all] Long overdue cleanups of Zuulv2 compatibility base configs In-Reply-To: <6e5aad37-75ab-a53f-39fb-7b1de68dbeb2@gmail.com> References: <5980f30f-d4c7-4e0d-8d3a-984e2dcd707a@www.fastmail.com> <6e5aad37-75ab-a53f-39fb-7b1de68dbeb2@gmail.com> Message-ID: <20190607164245.ilaxh25xsopl3n22@yuggoth.org> On 2019-06-07 11:01:36 -0500 (-0500), Matt Riedemann wrote: [...] > The nova dsvm jobs inherit from legacy-dsvm-base which doesn't > have a parent (it's abstract). Given that, how would I go about > testing this change on the nova legacy jobs? Sorry, I meant to get shim changes up yesterday for making this easier to test. Now you can add: Depends-On: https://review.opendev.org/663996 -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From jaypipes at gmail.com Fri Jun 7 16:59:39 2019 From: jaypipes at gmail.com (Jay Pipes) Date: Fri, 7 Jun 2019 12:59:39 -0400 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <85189fc6-0394-8330-9b6e-ad5abdd8438b@fried.cc> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> <0512CBBECA36994BAA14C7FEDE986CA60FC17D2B@BGSMSX102.gar.corp.intel.com> <06c5a684-4825-5f86-ae46-1c3e89389b2e@gmail.com> <85189fc6-0394-8330-9b6e-ad5abdd8438b@fried.cc> Message-ID: <0681bc52-ef1c-8dc5-be22-68fcabd9dbd2@gmail.com> On 6/7/19 11:23 AM, Eric Fried wrote: >> Better still, add a standardized trait to os-traits for hyperthreading >> support, which is what I'd recommended in the original >> cpu-resource-tracking spec. > > HW_CPU_HYPERTHREADING was added via [1] and has been in os-traits since > 0.8.0. Excellent, I had a faint recollection of that... -jay From mnaser at vexxhost.com Fri Jun 7 17:42:50 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 7 Jun 2019 13:42:50 -0400 Subject: [ironic] Should we add ironic-prometheus-exporter under Ironic umbrella? In-Reply-To: References: Message-ID: Hi Iury, This seems pretty awesome. I threw in some comments On Fri, Jun 7, 2019 at 11:08 AM Iury Gregory wrote: > > Greetings Ironicers! > > I would like to have your input on the matter of moving the ironic-prometheus-exporter to Ironic umbrella. > > What is the ironic-prometheus-exporter? > The ironic-prometheus-exporter[1] provides a way to export hardware sensor data from > Ironic project in OpenStack to Prometheus [2]. It's implemented as an oslo-messaging notification driver to get the sensor data and a Flask Application to export the metrics to Prometheus. It can not only be used in metal3-io but also in any OpenStack deployment which includes Ironic service. This seems really neat. From my perspective, it seems like it waits for notifications, and then writes it out to a file. The flask server seems to do nothing but pretty much serve the contents at /metrics. I think we should be doing more of this inside OpenStack to be honest and this can be really useful in the perspective of operators. I don't want to complicate this more however, but I would love for this to be a pattern/framework that other projects can adopt. > How to ensure the sensor data will follow the Prometheus format? > We are using the prometheus client_python [3] to generate the file with the metrics that come trough the oslo notifier plugin. > > How it will be tested on the gate? > Virtualbmc can't provide sensor data that the actual plugin supports. We would collect sample metrics from the hardware and use it in the unit tests. > > Maybe we should discuss this in the next ironic weekly meeting (10th June)? > > [1] https://github.com/metal3-io/ironic-prometheus-exporter > [2] https://prometheus.io/ > [3] https://github.com/prometheus/client_python > > -- > Att[]'s > Iury Gregory Melo Ferreira > MSc in Computer Science at UFCG > Part of the puppet-manager-core team in OpenStack > Software Engineer at Red Hat Czech > Social: https://www.linkedin.com/in/iurygregory > E-mail: iurygregory at gmail.com -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From mnaser at vexxhost.com Fri Jun 7 17:52:03 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 7 Jun 2019 13:52:03 -0400 Subject: [nova][kolla][openstack-ansible][tripleo] Cells v2 upgrades In-Reply-To: References: Message-ID: On Wed, Jun 5, 2019 at 5:16 AM Mark Goddard wrote: > > Hi, > > At the recent kolla virtual PTG [1] we had a good discussion about > adding support for multiple nova cells in kolla-ansible. We agreed a > key requirement is to be able to perform operations on one or more > cells without affecting the rest for damage limitation. This also > seems like it would apply to upgrades. We're seeking input on > ordering. Looking at the nova upgrade guide [2] I might propose > something like this: > > 1. DB syncs > 2. Upgrade API, super conductor > > For each cell: > 3a. Upgrade cell conductor > 3b. Upgrade cell computes > > 4. SIGHUP all services Unfortunately, this is a problem right now: https://review.opendev.org/#/c/641907/ I sat down at the PTG to settle this down, I was going to finish this patch up but I didn't get around to it. That might be an action item to be able to do this successfully. > 5. Run online migrations > > At some point in here we also need to run the upgrade check. > Presumably between steps 1 and 2? > > It would be great to get feedback both from the nova team and anyone > running cells > Thanks, > Mark > > [1] https://etherpad.openstack.org/p/kolla-train-ptg > [2] https://docs.openstack.org/nova/latest/user/upgrade.html > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From jm at artfiles.de Fri Jun 7 13:15:19 2019 From: jm at artfiles.de (Jan Marquardt) Date: Fri, 7 Jun 2019 15:15:19 +0200 Subject: Neutron with LBA and BGP-EVPN over IP fabric Message-ID: Hi, we are currently trying to build an Openstack Cloud with an IP fabric and FRR directly running on each host. Therefore each host is supposed to advertise its VNIs to the fabric. For this purpose I’d need VXLAN interfaces with the following config: 18: vx-50: mtu 1500 qdisc noqueue master br-test state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 7e:d2:e6:3c:5a:65 brd ff:ff:ff:ff:ff:ff promiscuity 1 vxlan id 50 local 10.0.0.101 srcport 0 0 dstport 8472 nolearning ttl inherit ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning off flood on port_id 0x8001 port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge 8000.7e:d2:e6:3c:5a:65 designated_root 8000.7e:d2:e6:3c:5a:65 hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off group_fwd_mask 0x0 group_fwd_mask_str 0x0 vlan_tunnel off addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 It seems that Neutron/lba is not capable of creating VXLAN interfaces with such a config. By default lba creates them with mode multicast, but I’d need unicast. The only way to activate unicast mode seems to be setting l2pop, but then lba does not set local address. Furthermore, I don't think we really need l2pop, because this part is supposed to be done by BGP-EVPN. Is there any way to achieve such config with Neutron/lba? Best Regards Jan -- Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95 E-Mail: support at artfiles.de | Web: http://www.artfiles.de Geschäftsführer: Harald Oltmanns | Tim Evers Eingetragen im Handelsregister Hamburg - HRB 81478 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From Chris.Winnicki at windriver.com Fri Jun 7 14:03:29 2019 From: Chris.Winnicki at windriver.com (Winnicki, Chris) Date: Fri, 7 Jun 2019 14:03:29 +0000 Subject: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm In-Reply-To: References: <7E4792BA14B1DE4BAB354DF77FE0233ABC8CE3D9@ALA-MBD.corp.ad.wrs.com>, Message-ID: <7E4792BA14B1DE4BAB354DF77FE0233ABC8CE67D@ALA-MBD.corp.ad.wrs.com> Anirudh: Have you tried the workaround mentioned in comment #7 in the bug report ? Are you able to ssh to localhost (ssh to VM itself from within the VM) Chris Winnicki chris.winnicki at windriver.com 613-963-1329 ________________________________ From: Anirudh Gupta [Anirudh.Gupta at hsc.com] Sent: Thursday, June 06, 2019 10:48 PM To: Winnicki, Chris; openstack at lists.openstack.org; openstack-dev at lists.openstack.org; starlingx-announce at lists.starlingx.io; starlingx-discuss at lists.starlingx.io Subject: RE: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm Hi Chris, I am pinging from one VM to another over graphical console, which is successful. But when I try to run ssh or iperf, then there is no success. I am using Ubuntu 16.04 Image, which is ssh enabled and yes the service sshd is running. I have created a flat network as well as vlan network and tried doing ssh/iperf on both, but with no success. There is no virtual router. I am suspecting the issue mentioned in the below bug https://bugs.launchpad.net/starlingx/+bug/1790514 But I have no understanding as why it is happening. Regards Anirudh Gupta (Senior Engineer) From: Winnicki, Chris Sent: 07 June 2019 03:09 To: Anirudh Gupta ; openstack at lists.openstack.org; openstack-dev at lists.openstack.org; starlingx-announce at lists.starlingx.io; starlingx-discuss at lists.starlingx.io Subject: RE: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm Anirudh, can you provide some details with respect to: 1) How are you pinging from one VM to the other (is it over the graphical console ? or namespace ?) 2) What VM image are you using? - Is the VM image enabled for SSH with password ? (assuming sshd is running) 3) Network topology 4) Are you trying to ssh from one VM to the other or from a different network segment? Is there a virtual router in the picture, etc.. Chris Winnicki chris.winnicki at windriver.com 613-963-1329 ________________________________ From: Anirudh Gupta [Anirudh.Gupta at hsc.com] Sent: Thursday, June 06, 2019 12:59 AM To: openstack at lists.openstack.org; openstack-dev at lists.openstack.org; starlingx-announce at lists.starlingx.io; starlingx-discuss at lists.starlingx.io Subject: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm Hi Team, I have created All in one simplex setup using release 2018.10. I have spawned 2 VM’s on it. The ping is successful between the VM’s, but I am unable to ssh or run iperf on it. Can you please help me in resolving the issue. Regards अनिरुद्ध गुप्ता (वरिष्ठ अभियंता) Hughes Systique Corporation D-23,24 Infocity II, Sector 33, Gurugram, Haryana 122001 DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Fri Jun 7 17:54:44 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 7 Jun 2019 13:54:44 -0400 Subject: [openstack-ansible][powervm] dropping support In-Reply-To: <2C11DAD5-1ED6-409B-9374-0CB86059E5E2@us.ibm.com> References: <2C11DAD5-1ED6-409B-9374-0CB86059E5E2@us.ibm.com> Message-ID: On Tue, Jun 4, 2019 at 8:00 AM William M Edmonds - edmondsw at us.ibm.com wrote: > > On 5/31/19, 6:46 PM, "Mohammed Naser" wrote: > > > > Hi everyone, > > > > I've pushed up a patch to propose dropping support for PowerVM support > > inside OpenStack Ansible. There has been no work done on this for a > > few years now, the configured compute driver is the incorrect one for > > ~2 years now which indicates that no one has been able to use it for > > that long. > > > > It would be nice to have this driver however given the infrastructure > > we have upstream, there would be no way for us to effectively test it > > and bring it back to functional state. I'm proposing that we remove > > the code here: > > https://review.opendev.org/#/c/662587 powervm: drop support > > > > If you're using this code and would like to contribute to fixing it > > and (somehow) adding coverage, please reach out, otherwise, we'll drop > > this code to clean things up. > > Sadly, I don't know of anyone using it or willing to maintain it at this time. > :( We've merged that commit and we've now dropped PowerVM support for now (alongside the +1 Eric has provided too). We'd be happy to support it again if/when someone steps up :) -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From smooney at redhat.com Fri Jun 7 18:04:44 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 07 Jun 2019 19:04:44 +0100 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> Message-ID: On Fri, 2019-06-07 at 09:05 -0500, Eric Fried wrote: > > > > (a) Leave all traits alone. If they need to be removed, it would have to > > > > be manually via a separate step. > > > > > > > > (b) Support a new option so the caller can dictate whether the operation > > > > should remove the traits. (This is all-or-none.) > > > > > > > > (c) Define a "namespace" - a trait substring - and remove only traits in > > > > that namespace. > > > > > > I'm going to -1 (b). It's too big a hammer, at too big a cost (including > > > API changes). > > > > > > > If I’m not wrong, for last two approaches, we would need to change > > > > RestFul APIs. > > > > > > No, (c) does not. By "define a namespace" I mean we would establish a > > > naming convention for traits to be used with this feature. For example: > > > > > > CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE > > > > i personaly dislike c as it means we cannot use any standard traits in host > > aggrates. > > Actually, it means we *can*, whereas (b) and (d) mean we *can't*. That's > the whole point. If we want to isolate our hyperthreading hosts, we put > them in an aggregate with HW_CPU_HYPERTHREADING on it. The sync here > should be a no-op because those hosts should already have > HW_CPU_HYPERTHREADING on them. And then if we decide to remove such a > host, or destroy the aggregate, or whatever, we *don't want* > HW_CPU_HYPERTHREADING to be removed from the providers, because they can > still do that. in the cpu pinning spec we said HW_CPU_HYPERTHREADING was not going to be managed by the virt driver so it wont be reported unless the admin manulaly adds it. https://github.com/openstack/nova-specs/blob/master/specs/train/approved/cpu-resources.rst#add-hw_cpu_hyperthreading-trait "The HW_CPU_HYPERTHREADING trait will need to be among the traits that the virt driver cannot always override, since the operator may want to indicate that a single NUMA node on a multi-NUMA-node host is meant for guests that tolerate hyperthread siblings as dedicated CPUs." so i was suggesting this was a way to enable that the operator to manage whic host report that trait although as the spec suggest we may want to report this differently per numa node which would still require you to use osc-placment or some other way to set it manually. > > (Unless you mean we can't make a standard trait that we can use for > isolation that gets (conditionally) removed in these scenarios? There's > nothing preventing us from creating a standard trait called > COMPUTE_AGGREGATE_ISOLATION_WINDOWS_LICENSE, which would work just the > same.) im suggesting it woudl be nice to be able to use host aggates to manage statdard or custom traits on hosts that are not managed by the driver thwer ethat is a COMPUTE_AGGREGATE_ISOLATION_WINDOWS_LICENSE trait or something else. so i was hoping to make this feature more reusable for other usecase in the future. for example it would be nice to be able to say this set of host has the CUSTOM_DPDK_NETWORKING trait by putting them in a host aggrage and then adding a forbindent trait to my non hugepage backed guests. > > > there is also an option d. when you remove a trait form a host aggregate for each host in > > the aggregate check if that traits exists on another aggregate the host is a member of and remove > > it if not found on another aggregate. > > Sorry I wasn't clear, (b) also does this ^ but with the condition that > also checks for the _AGGREGATE_ISOLATION_ infix. > > > for c you still have to deal with the fact a host can be in multiple > > host aggrates too by > > the way so jsut because a thread is namespace d and it is removed from > > an aggrate does not > > mean its correcct to remove it from a host. > > Right - in reality, there should be one algorithm, idempotent, to sync > host RP traits when anything happens to aggregates. It always goes out > and does the appropriate {set} math to decide which traits should exist > on which hosts and effects any necessary changes. > > And yes, the performance will suck in a large deployment, because we > have to get all the compute RPs in all the aggregates (even the ones > with no trait metadata) to do that calculation. But aggregate operations > are fairly rare, aren't they? > > Perhaps this is where we provide a nova-manage tool to do (b)'s sync > manually (which we'll surely have to do anyway as a "heal" or for > upgrades). So if you're not using the feature, you don't suffer the penalty. > > > for example we talked about useing a hyperthreading trait in the cpu pinning spec which > > will not be managed by the compute driver. host aggages would be a convient way > > to be able to manage that trait if this was a generic feature. > > Oh, okay, yeah, I don't accept this as a use case for this feature. It > will work, but we shouldn't recommend it precisely because it's > asymmetrical (you can't remove the trait by removing it from the > aggregate). why not we do not expect the virt driver to report the hypertreading trait since we said it can be extrenally managed. even if we allowed the virt drvier to conditionally report it only when it frst creates a RP it is not allowed to readd if it is remvoed by someone else. > There are other ways to add a random trait to all hosts in > an aggregate (for host in `get providers in aggregate`; do openstack > resource provider trait add ...; done). > > But for the sake of discussion, what about: > > (e) Fully manual. Aggregate operations never touch (add or remove) > traits on host RPs. You always have to do that manually. As noted above, > it's easy to do - and we could make it easier with a tiny wrapper that > takes an aggregate, a list of traits, and an --add/--remove command. So > initially, setting up aggregate isolation is a two-step process, and in > the future we can consider making new API/CLI affordance that combines > the steps. > > efried > . > From smooney at redhat.com Fri Jun 7 18:15:45 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 07 Jun 2019 19:15:45 +0100 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> Message-ID: <9876798ff1a3700fcc0f7202bbfe12da1314a23e.camel@redhat.com> On Fri, 2019-06-07 at 19:04 +0100, Sean Mooney wrote: > On Fri, 2019-06-07 at 09:05 -0500, Eric Fried wrote: > > > > > (a) Leave all traits alone. If they need to be removed, it would have to > > > > > be manually via a separate step. > > > > > > > > > > (b) Support a new option so the caller can dictate whether the operation > > > > > should remove the traits. (This is all-or-none.) > > > > > > > > > > (c) Define a "namespace" - a trait substring - and remove only traits in > > > > > that namespace. > > > > > > > > I'm going to -1 (b). It's too big a hammer, at too big a cost (including > > > > API changes). > > > > > > > > > If I’m not wrong, for last two approaches, we would need to change > > > > > RestFul APIs. > > > > > > > > No, (c) does not. By "define a namespace" I mean we would establish a > > > > naming convention for traits to be used with this feature. For example: > > > > > > > > CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE > > > > > > i personaly dislike c as it means we cannot use any standard traits in host > > > aggrates. > > > > Actually, it means we *can*, whereas (b) and (d) mean we *can't*. That's > > the whole point. If we want to isolate our hyperthreading hosts, we put > > them in an aggregate with HW_CPU_HYPERTHREADING on it. The sync here > > should be a no-op because those hosts should already have > > HW_CPU_HYPERTHREADING on them. And then if we decide to remove such a > > host, or destroy the aggregate, or whatever, we *don't want* > > HW_CPU_HYPERTHREADING to be removed from the providers, because they can > > still do that. > > in the cpu pinning spec we said HW_CPU_HYPERTHREADING was not going to be managed > by the virt driver so it wont be reported unless the admin manulaly adds it. > > https://github.com/openstack/nova-specs/blob/master/specs/train/approved/cpu-resources.rst#add-hw_cpu_hyperthreading-trait > "The HW_CPU_HYPERTHREADING trait will need to be among the traits that the virt driver cannot always override, since > the > operator may want to indicate that a single NUMA node on a multi-NUMA-node host is meant for guests that tolerate > hyperthread siblings as dedicated CPUs." > > so i was suggesting this was a way to enable that the operator to manage whic host report that trait > although as the spec suggest we may want to report this differently per numa node which would still > require you to use osc-placment or some other way to set it manually. > > > > (Unless you mean we can't make a standard trait that we can use for > > isolation that gets (conditionally) removed in these scenarios? There's > > nothing preventing us from creating a standard trait called > > COMPUTE_AGGREGATE_ISOLATION_WINDOWS_LICENSE, which would work just the > > same.) > > im suggesting it woudl be nice to be able to use host aggates to manage statdard or custom traits on hosts that are > not managed by the driver thwer ethat is a COMPUTE_AGGREGATE_ISOLATION_WINDOWS_LICENSE trait or something > else. so i was hoping to make this feature more reusable for other usecase in the future. for example it would be nice > to be able to say this set of host has the CUSTOM_DPDK_NETWORKING trait by putting them in a host aggrage and then > adding a forbindent trait to my non hugepage backed guests. > > > > > there is also an option d. when you remove a trait form a host aggregate for each host in > > > the aggregate check if that traits exists on another aggregate the host is a member of and remove > > > it if not found on another aggregate. > > > > Sorry I wasn't clear, (b) also does this ^ but with the condition that > > also checks for the _AGGREGATE_ISOLATION_ infix. > > > > > for c you still have to deal with the fact a host can be in multiple > > > > host aggrates too by > > > the way so jsut because a thread is namespace d and it is removed from > > > > an aggrate does not > > > mean its correcct to remove it from a host. > > > > Right - in reality, there should be one algorithm, idempotent, to sync > > host RP traits when anything happens to aggregates. It always goes out > > and does the appropriate {set} math to decide which traits should exist > > on which hosts and effects any necessary changes. > > > > And yes, the performance will suck in a large deployment, because we > > have to get all the compute RPs in all the aggregates (even the ones > > with no trait metadata) to do that calculation. But aggregate operations > > are fairly rare, aren't they? > > > > Perhaps this is where we provide a nova-manage tool to do (b)'s sync > > manually (which we'll surely have to do anyway as a "heal" or for > > upgrades). So if you're not using the feature, you don't suffer the penalty. > > > > > for example we talked about useing a hyperthreading trait in the cpu pinning spec which > > > will not be managed by the compute driver. host aggages would be a convient way > > > to be able to manage that trait if this was a generic feature. > > > > Oh, okay, yeah, I don't accept this as a use case for this feature. It > > will work, but we shouldn't recommend it precisely because it's > > asymmetrical (you can't remove the trait by removing it from the > > aggregate). > > why not we do not expect the virt driver to report the hypertreading trait > since we said it can be extrenally managed. even if we allowed the virt drvier to > conditionally report it only when it frst creates a RP it is not allowed to readd > if it is remvoed by someone else. > > > There are other ways to add a random trait to all hosts in > > an aggregate (for host in `get providers in aggregate`; do openstack > > resource provider trait add ...; done). > > > > But for the sake of discussion, what about: > > > > (e) Fully manual. Aggregate operations never touch (add or remove) > > traits on host RPs. You always have to do that manually. As noted above, > > it's easy to do - and we could make it easier with a tiny wrapper that > > takes an aggregate, a list of traits, and an --add/--remove command. So > > initially, setting up aggregate isolation is a two-step process, and in > > the future we can consider making new API/CLI affordance that combines > > the steps. ya e could work too. melanie added a similar functionality to osc placment for managing the alloction ratios of specific resource classes per aggregate a few months ago https://review.opendev.org/#/c/640898/ we could proably provide somthing similar for managing traits but determining what RP to add the trait too would be a littel tricker. we would have to be able to filter to RP with either a specific inventory or with a specific trait or in a speicic subtree. you could have a --root or somthing to jsut say add or remove the tratit from the root RPs in an aggregate. but yes you could certely automate this in a simile cli extention. > > > > efried > > . > > > > From ssbarnea at redhat.com Fri Jun 7 18:38:57 2019 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Fri, 7 Jun 2019 19:38:57 +0100 Subject: [tripleo][molecule] feedback on testing ansible roles with molecule Message-ID: Hi! While we do now have a POC job running molecule with tox for testing one tripleo-common role, I would like to ask for some feedback from running the same test locally, on your dev box. The report generated but openstack-tox-mol job looks like http://logs.openstack.org/36/663336/14/check/openstack-tox-mol/aa7345d/tox/reports.html https://review.opendev.org/#/c/663336/ Just download it and run: tox -e mol You will either need docker or at least to define DOCKER_HOST=ssh://somehost as an alternative. Please send the feedback back to me or directly on the the review. Over the last days I fixed few minor issues related to differences between user environments and I want to make I improve it as much as possible. Thanks Sorin Sbarnea Tripleo CI -------------- next part -------------- An HTML attachment was scrubbed... URL: From moshele at mellanox.com Fri Jun 7 19:05:46 2019 From: moshele at mellanox.com (Moshe Levi) Date: Fri, 7 Jun 2019 19:05:46 +0000 Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan In-Reply-To: <19f2868b-419f-cee3-f137-dcf277719e42@namecheap.com> References: <19f2868b-419f-cee3-f137-dcf277719e42@namecheap.com> Message-ID: Hi Zoltan, What OS and kernel are you using? -----Original Message----- From: Zoltan Langi Sent: Friday, June 7, 2019 3:54 PM To: openstack-discuss at lists.openstack.org Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan Hello everyone, I hope someone more experienced can help me with this problem I've been struggling for a while now. I'm trying to set up ASAP2 ovs vxlan offload on a dual port Mellanox ConnectX5 card between 2 hosts using LACP link aggregation and Rocky release. When the LAG is not there, only one pf is being used, the offload works just fine, getting the line speed out of the vf-s. (I initially followed this ASAP2 guide, works well: https://community.mellanox.com/s/article/getting-started-with-mellanox-asap-2) Decided, to provide HA for the vf-s, may as well use LACP as it's supported for ASAP2 according to Mellanox: https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf (page15) So what I've done I've created a systemd script that puts the eswitch into switchdev mode on both ports before the networking starts at boot time, the bond0 comes up after the mode was changed just like in the docs. The bond0 interface wasn't added to the ovs as the doc recommends as it keeps the vxlan tunnel, only the vf is there after openstack creates the vm. The problem is only one direction of the traffic is offloaded when LAG is being used. I opened a mellanox case and they recommended to install the latest ovs version which I did: https://community.mellanox.com/s/question/0D51T00006kXkRzSAK/connectx5-asap2-vxlan-offload-bond-openstack-problem After using the latest OVS from master, the problem still exist, the offload simply doesn't work properly. The speed I am getting is way far away from the values that I get when only a single port is used. Does anyone has any experience or any idea what should I look our for or check? Thank you very much, anything is appreciated! Zoltan From zoltan.langi at namecheap.com Fri Jun 7 19:44:42 2019 From: zoltan.langi at namecheap.com (Zoltan Langi) Date: Fri, 7 Jun 2019 21:44:42 +0200 Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan In-Reply-To: References: <19f2868b-419f-cee3-f137-dcf277719e42@namecheap.com> Message-ID: Hello Moshe, OS is Ubuntu 18.04.2 LTS, Kernel is: 4.18.0-21-generic According to Mellanox this os is definitely supported. Zoltan On 07.06.19 21:05, Moshe Levi wrote: > Hi Zoltan, > > What OS and kernel are you using? > > -----Original Message----- > From: Zoltan Langi > Sent: Friday, June 7, 2019 3:54 PM > To: openstack-discuss at lists.openstack.org > Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan > > Hello everyone, I hope someone more experienced can help me with this problem I've been struggling for a while now. > > I'm trying to set up ASAP2 ovs vxlan offload on a dual port Mellanox > ConnectX5 card between 2 hosts using LACP link aggregation and Rocky release. > > When the LAG is not there, only one pf is being used, the offload works just fine, getting the line speed out of the vf-s. > > (I initially followed this ASAP2 guide, works well: > https://community.mellanox.com/s/article/getting-started-with-mellanox-asap-2) > > Decided, to provide HA for the vf-s, may as well use LACP as it's supported for ASAP2 according to Mellanox: > > https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf > (page15) > > So what I've done I've created a systemd script that puts the eswitch into switchdev mode on both ports before the networking starts at boot time, the bond0 comes up after the mode was changed just like in the docs. > > The bond0 interface wasn't added to the ovs as the doc recommends as it keeps the vxlan tunnel, only the vf is there after openstack creates the vm. > > The problem is only one direction of the traffic is offloaded when LAG is being used. > > I opened a mellanox case and they recommended to install the latest ovs version which I did: > > https://community.mellanox.com/s/question/0D51T00006kXkRzSAK/connectx5-asap2-vxlan-offload-bond-openstack-problem > > After using the latest OVS from master, the problem still exist, the offload simply doesn't work properly. The speed I am getting is way far away from the values that I get when only a single port is used. > > Does anyone has any experience or any idea what should I look our for or check? > > > Thank you very much, anything is appreciated! > > Zoltan > > > > > From gagehugo at gmail.com Fri Jun 7 19:51:06 2019 From: gagehugo at gmail.com (Gage Hugo) Date: Fri, 7 Jun 2019 14:51:06 -0500 Subject: [Security SIG] Weekly Newsletter - June 06th 2019 Message-ID: #Week of: 06 June 2019 - Security SIG Meeting Info: http://eavesdrop.openstack.org/#Security_SIG_meeting - Weekly on Thursday at 1500 UTC in #openstack-meeting - Agenda: https://etherpad.openstack.org/p/security-agenda - https://security.openstack.org/ - https://wiki.openstack.org/wiki/Security-SIG #Meeting Notes - Summary: http://eavesdrop.openstack.org/meetings/security/2019/security.2019-06-06-15.01.html - This week we discussed the [openstack-security] mailing list usage. Currently it's only being used for launchpad to send automated notifications on security bugs. Due to this, we are considering designating the [openstack-security] mailing list to only be used for automated notifications and rewording the description for the mailing list to state this for clarity. If anyone is wanting to ask questions about security related questions, we will suggest using the -discussion mailing list. - We will discuss more on this next week and hammer out the final details. ## News - the scientific sig meeting this week featured a discussion on secure computing environments, if anyone here is interested in the transcript or wants to reach out to the participants about anything: - http://eavesdrop.openstack.org/meetings/scientific_sig/2019/scientific_sig.2019-06-05-11.00.log.html#l-93 - Image Encryption pop-up team's proposal: - https://review.opendev.org/#/c/661983/ - Storyboard work is nearly in place for a mechanism to auto-assign security teams to private stories marked "security" - https://review.opendev.org/#/q/topic:security-teams # VMT Reports - A full list of publicly marked security issues can be found here: https://bugs.launchpad.net/ossa/ - No new public security bugs this week -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Fri Jun 7 20:16:29 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 7 Jun 2019 15:16:29 -0500 Subject: [nova] [cyborg] Impact of moving bind to compute In-Reply-To: <1CC272501B5BC543A05DB90AA509DED527591596@fmsmsx122.amr.corp.intel.com> References: <1CC272501B5BC543A05DB90AA509DED52757522F@fmsmsx122.amr.corp.intel.com> <9f9ea648-34a7-9783-3372-40325a8ced27@gmail.com> <1CC272501B5BC543A05DB90AA509DED527591596@fmsmsx122.amr.corp.intel.com> Message-ID: <1533bd72-9f26-2873-4976-2bf25620acaa@gmail.com> On 6/7/2019 12:17 AM, Nadathur, Sundar wrote: > The ARQ creation could be done at [1], followed by the binding, before acquiring the semaphore or creating other resources. Why is that not a good option? > > [1]https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1898 If we created the ARQs in compute I think we'd do it in the ComputeManager._build_resources method to be consistent with where we create volumes and ports. My bigger point is if the ARQ creation fails in compute for whatever reason, then we have to rollback any other resources we create (ports and volumes) which gets messy. Doing the ARQ creation before _build_resources in ComputeManager (what you're suggesting) would side-step that bit but then we've got inconsistencies in where the server create flow creates external resources within the compute service, which I don't love. So I think if we're going to do the ARQ creation early then we should do it in the conductor so we can fail fast and avoid a reschedule from the compute. -- Thanks, Matt From emilien at redhat.com Fri Jun 7 20:16:58 2019 From: emilien at redhat.com (Emilien Macchi) Date: Fri, 7 Jun 2019 16:16:58 -0400 Subject: [tripleo][molecule] feedback on testing ansible roles with molecule In-Reply-To: References: Message-ID: On Fri, Jun 7, 2019 at 3:50 PM Sorin Sbarnea wrote: > Hi! > > While we do now have a POC job running molecule with tox for testing one > tripleo-common role, I would like to ask for some feedback from running the > same test locally, on your dev box. > The report generated but openstack-tox-mol job looks like > http://logs.openstack.org/36/663336/14/check/openstack-tox-mol/aa7345d/tox/reports.html > > https://review.opendev.org/#/c/663336/ > Just download it and run: > tox -e mol > > You will either need docker or at least to define DOCKER_HOST= > ssh://somehost as an alternative. > is there a driver for podman? if yes, prefer it over docker on fedora. Otherwise, cool! Thanks for this work. It'll be useful with the forthcoming work in tripleo-ansible. -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From colleen at gazlene.net Fri Jun 7 22:58:00 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Fri, 07 Jun 2019 15:58:00 -0700 Subject: [keystone] Keystone Team Update - Week of 3 June 2019 Message-ID: <40595d60-a8da-4656-aa60-c55a85e4c509@www.fastmail.com> # Keystone Team Update - Week of 3 June 2019 ## News ### Milestone 1 Check-in We scheduled our Milestone 1 meeting[1] and I proposed a draft agenda[2]. Looking forward to a productive meeting! [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006783.html [2] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006976.html ### Sphinx update and the gate This week the release of Sphinx 2.1.0 broke our documentation builds. The issue[3] is that module constants are now automatically included as members in the document, and we were using constants as shorthand for external, imported modules that had broken docstrings. We've come up with a workaround[4], but as Elod pointed out, we're now documenting constants like CONF, LOG, and others which is an unexpected change in behavior. [3] https://github.com/sphinx-doc/sphinx/issues/6447 [4] https://review.opendev.org/663373 ### Expiring users We discussed[5] the work needed[6] to allow federated users to create and use application credentials, which formerly was making application credentials refreshable and is now being reworked to move the refresh layer upwards, but we had differring recollections on whether that layer was with the user or the grant. Kristi will update the spec to mention both options and the implementation and user experience implications for each option. [5] http://eavesdrop.openstack.org/meetings/keystone/2019/keystone.2019-06-04-16.00.log.html#l-37 [6] https://review.opendev.org/604201 ## Open Specs Train specs: https://bit.ly/2uZ2tRl Ongoing specs: https://bit.ly/2OyDLTh All specs for Train should now be proposed, new specs will not be accepted for Train after this point. Please provide and respond to feedback on open specs so that we can merge them in a timely manner. ## Recently Merged Changes Search query: https://bit.ly/2pquOwT We merged 3 changes this week. ## Changes that need Attention Search query: https://bit.ly/2tymTje There are 42 changes that are passing CI, not in merge conflict, have no negative reviews and aren't proposed by bots. ## Bugs This week we opened 5 new bugs and closed 2. Bugs opened (5)  Bug #1831918 (keystone:Medium) opened by Nathan Oyler https://bugs.launchpad.net/keystone/+bug/1831918  Bug #1831400 (keystone:Undecided) opened by Brin Zhang https://bugs.launchpad.net/keystone/+bug/1831400  Bug #1832005 (keystone:Undecided) opened by Maciej Kucia https://bugs.launchpad.net/keystone/+bug/1832005  Bug #1831791 (keystonemiddleware:Undecided) opened by Nathan Oyler https://bugs.launchpad.net/keystonemiddleware/+bug/1831791  Bug #1831406 (oslo.limit:Undecided) opened by jacky06 https://bugs.launchpad.net/oslo.limit/+bug/1831406  Bugs closed (2)  Bug #1831400 (keystone:Undecided) https://bugs.launchpad.net/keystone/+bug/1831400  Bug #1831791 (keystonemiddleware:Undecided) https://bugs.launchpad.net/keystonemiddleware/+bug/1831791 ## Milestone Outlook https://releases.openstack.org/train/schedule.html Today is the last day to submit spec proposals for Train. The spec freeze is on the Train-2 milestone next month. Focus now should be on reviewing and updating specs. It's also not too early to get started on feature implementations. ## Help with this newsletter Help contribute to this newsletter by editing the etherpad: https://etherpad.openstack.org/p/keystone-team-newsletter From michjo at viviotech.net Sat Jun 8 00:11:07 2019 From: michjo at viviotech.net (Jordan Michaels) Date: Fri, 7 Jun 2019 17:11:07 -0700 (PDT) Subject: [Glance] Can Glance be installed on a server other than the controller? Message-ID: <1436864438.79118.1559952667149.JavaMail.zimbra@viviotech.net> Hi Folks, First time posting here so apologies if this question is inappropriate for this list. Just a quick question to see if Glance can be installed on a server other than the controller? By following the installation docs for Rocky I can get Glance installed just fine on the controller (works great!), but following that same documentation on a separate server I cannot get it to authenticate. It's probably just something I'm doing, but I've run out of ideas on what to check next (both the controller and the separate server use the same auth and config), and I just want to make sure it's possible. It's also possible I'm losing my mind, so, there's that. =P Posted about it in detail here: https://ask.openstack.org/en/question/122501/glance-unauthorized-http-401-on-block1-but-not-controller/ Appreciate any advice! Kind regards, Jordan From mnaser at vexxhost.com Sat Jun 8 05:37:12 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Sat, 8 Jun 2019 01:37:12 -0400 Subject: [Glance] Can Glance be installed on a server other than the controller? In-Reply-To: <1436864438.79118.1559952667149.JavaMail.zimbra@viviotech.net> References: <1436864438.79118.1559952667149.JavaMail.zimbra@viviotech.net> Message-ID: On Fri., Jun. 7, 2019, 8:15 p.m. Jordan Michaels, wrote: > Hi Folks, > > First time posting here so apologies if this question is inappropriate for > this list. > > Just a quick question to see if Glance can be installed on a server other > than the controller? By following the installation docs for Rocky I can get > Glance installed just fine on the controller (works great!), but following > that same documentation on a separate server I cannot get it to > authenticate. It's probably just something I'm doing, but I've run out of > ideas on what to check next (both the controller and the separate server > use the same auth and config), and I just want to make sure it's possible. > It's also possible I'm losing my mind, so, there's that. =P > > Posted about it in detail here: > > https://ask.openstack.org/en/question/122501/glance-unauthorized-http-401-on-block1-but-not-controller/ > > Appreciate any advice! > It's possible :) Kind regards, > Jordan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Cory at Hawkless.id.au Sat Jun 8 07:03:51 2019 From: Cory at Hawkless.id.au (Cory Hawkless) Date: Sat, 8 Jun 2019 07:03:51 +0000 Subject: Neutron with LBA and BGP-EVPN over IP fabric In-Reply-To: References: Message-ID: <18C7C076CE65A443BC1DEC057949DEFE01273C86B3@CorysCloudVPS.Oblivion.local> I have come across this exact same issue while building out our Rocky deployment My solution was to make modifications to the neutron/agent/linux/ip_lib.py and neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py files then commit them to my own fork. Checkout this commit for the information https://github.com/CoryHawkless/neutron/commit/8f337b47068ad8e69aea138c43eaeb218df90dfc I'd love to see this implemented as an option as opposed to a brute force hack like ive done here. Has anyone else found another way around this problem? -----Original Message----- From: Jan Marquardt [mailto:jm at artfiles.de] Sent: Friday, 7 June 2019 10:45 PM To: openstack-discuss at lists.openstack.org Subject: Neutron with LBA and BGP-EVPN over IP fabric Hi, we are currently trying to build an Openstack Cloud with an IP fabric and FRR directly running on each host. Therefore each host is supposed to advertise its VNIs to the fabric. For this purpose I’d need VXLAN interfaces with the following config: 18: vx-50: mtu 1500 qdisc noqueue master br-test state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 7e:d2:e6:3c:5a:65 brd ff:ff:ff:ff:ff:ff promiscuity 1 vxlan id 50 local 10.0.0.101 srcport 0 0 dstport 8472 nolearning ttl inherit ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning off flood on port_id 0x8001 port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge 8000.7e:d2:e6:3c:5a:65 designated_root 8000.7e:d2:e6:3c:5a:65 hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off group_fwd_mask 0x0 group_fwd_mask_str 0x0 vlan_tunnel off addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 It seems that Neutron/lba is not capable of creating VXLAN interfaces with such a config. By default lba creates them with mode multicast, but I’d need unicast. The only way to activate unicast mode seems to be setting l2pop, but then lba does not set local address. Furthermore, I don't think we really need l2pop, because this part is supposed to be done by BGP-EVPN. Is there any way to achieve such config with Neutron/lba? Best Regards Jan -- Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95 E-Mail: support at artfiles.de | Web: http://www.artfiles.de Geschäftsführer: Harald Oltmanns | Tim Evers Eingetragen im Handelsregister Hamburg - HRB 81478 From Cory at Hawkless.id.au Sat Jun 8 07:05:36 2019 From: Cory at Hawkless.id.au (Cory Hawkless) Date: Sat, 8 Jun 2019 07:05:36 +0000 Subject: Neutron with LBA and BGP-EVPN over IP fabric References: Message-ID: <18C7C076CE65A443BC1DEC057949DEFE01273C86D4@CorysCloudVPS.Oblivion.local> Sorry, also meant to say that I then use Docker to build containers based on this modified source. We run everything in our own custom built containers including the L3Agent, DHCP agents, nova, cinder, neutron,.. the lot. -----Original Message----- From: Cory Hawkless Sent: Saturday, 8 June 2019 4:34 PM To: 'Jan Marquardt' ; openstack-discuss at lists.openstack.org Subject: RE: Neutron with LBA and BGP-EVPN over IP fabric I have come across this exact same issue while building out our Rocky deployment My solution was to make modifications to the neutron/agent/linux/ip_lib.py and neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py files then commit them to my own fork. Checkout this commit for the information https://github.com/CoryHawkless/neutron/commit/8f337b47068ad8e69aea138c43eaeb218df90dfc I'd love to see this implemented as an option as opposed to a brute force hack like ive done here. Has anyone else found another way around this problem? -----Original Message----- From: Jan Marquardt [mailto:jm at artfiles.de] Sent: Friday, 7 June 2019 10:45 PM To: openstack-discuss at lists.openstack.org Subject: Neutron with LBA and BGP-EVPN over IP fabric Hi, we are currently trying to build an Openstack Cloud with an IP fabric and FRR directly running on each host. Therefore each host is supposed to advertise its VNIs to the fabric. For this purpose I’d need VXLAN interfaces with the following config: 18: vx-50: mtu 1500 qdisc noqueue master br-test state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 7e:d2:e6:3c:5a:65 brd ff:ff:ff:ff:ff:ff promiscuity 1 vxlan id 50 local 10.0.0.101 srcport 0 0 dstport 8472 nolearning ttl inherit ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning off flood on port_id 0x8001 port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge 8000.7e:d2:e6:3c:5a:65 designated_root 8000.7e:d2:e6:3c:5a:65 hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off group_fwd_mask 0x0 group_fwd_mask_str 0x0 vlan_tunnel off addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 It seems that Neutron/lba is not capable of creating VXLAN interfaces with such a config. By default lba creates them with mode multicast, but I’d need unicast. The only way to activate unicast mode seems to be setting l2pop, but then lba does not set local address. Furthermore, I don't think we really need l2pop, because this part is supposed to be done by BGP-EVPN. Is there any way to achieve such config with Neutron/lba? Best Regards Jan -- Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95 E-Mail: support at artfiles.de | Web: http://www.artfiles.de Geschäftsführer: Harald Oltmanns | Tim Evers Eingetragen im Handelsregister Hamburg - HRB 81478 From Cory at Hawkless.id.au Sat Jun 8 07:10:40 2019 From: Cory at Hawkless.id.au (Cory Hawkless) Date: Sat, 8 Jun 2019 07:10:40 +0000 Subject: Cinder Ceph backup concurrency Message-ID: <18C7C076CE65A443BC1DEC057949DEFE01273C8720@CorysCloudVPS.Oblivion.local> I'm using Rocky and Cinders built in Ceph backup driver which is working ok but I'd like to limit each instance of the backup agent to X number of concurrent backups. For example, if I(Or a tenant) trigger a backup to run on 20 volumes, the cinder-0backuip agent promptly starts the process of backup up all 20 volumes simultaneously and while this works ok it has the downside of over saturating links, causing high IO on the disks etc. Ideally I'd like to have each cinder-backup agent limited to running X(Perhaps 5) backups jobs at any one time and the remaining jobs will be 'queued' until an agent has less than X jobs remaining. Is this possible at all? Based on my understanding the Cinder scheduler services handles the allocation and distribution of the backup tasks, is that correct? Thanks in advance Cory -------------- next part -------------- An HTML attachment was scrubbed... URL: From ssbarnea at redhat.com Sat Jun 8 07:20:51 2019 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Sat, 8 Jun 2019 08:20:51 +0100 Subject: [tripleo][molecule] feedback on testing ansible roles with molecule In-Reply-To: References: Message-ID: There is no podman driver (provider) yet, but it will be. Mainly we are waiting for Ansible modules and one done it will be easy to add one. My goal is to find a way to use both, probably based on detection and fallback. This could provide a better user experience as it would allow use of whatever you have available on your environment. -- sorin On 7 Jun 2019, 21:17 +0100, Emilien Macchi , wrote: > > > > On Fri, Jun 7, 2019 at 3:50 PM Sorin Sbarnea wrote: > > > Hi! > > > > > > While we do now have a POC job running molecule with tox for testing one tripleo-common role, I would like to ask for some feedback from running the same test locally, on your dev box. > > > The report generated but openstack-tox-mol job looks like http://logs.openstack.org/36/663336/14/check/openstack-tox-mol/aa7345d/tox/reports.html > > > > > > https://review.opendev.org/#/c/663336/ > > > Just download it and run: > > > tox -e mol > > > > > > You will either need docker or at least to define DOCKER_HOST=ssh://somehost as an alternative. > > > > is there a driver for podman? if yes, prefer it over docker on fedora. > > > Otherwise, cool! Thanks for this work. It'll be useful with the forthcoming work in tripleo-ansible. > -- > Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From pfb29 at cam.ac.uk Sat Jun 8 14:23:39 2019 From: pfb29 at cam.ac.uk (Paul Browne) Date: Sat, 8 Jun 2019 15:23:39 +0100 Subject: Cinder Ceph backup concurrency In-Reply-To: <18C7C076CE65A443BC1DEC057949DEFE01273C8720@CorysCloudVPS.Oblivion.local> References: <18C7C076CE65A443BC1DEC057949DEFE01273C8720@CorysCloudVPS.Oblivion.local> Message-ID: Hello all, I'd also be very interested in anyone's knowledge or experience of this topic, extending also to cinder-volume operation concurrency. We see similar behaviour in that many cinder-volume conversion operations started simultaneously can impact the cloud overall. Thanks, Paul On Sat, 8 Jun 2019, 08:15 Cory Hawkless, wrote: > I’m using Rocky and Cinders built in Ceph backup driver which is working > ok but I’d like to limit each instance of the backup agent to X number of > concurrent backups. > > For example, if I(Or a tenant) trigger a backup to run on 20 volumes, the > cinder-0backuip agent promptly starts the process of backup up all 20 > volumes simultaneously and while this works ok it has the downside of over > saturating links, causing high IO on the disks etc. > > > > Ideally I’d like to have each cinder-backup agent limited to running > X(Perhaps 5) backups jobs at any one time and the remaining jobs will be > ‘queued’ until an agent has less than X jobs remaining. > > > > Is this possible at all? > > Based on my understanding the Cinder scheduler services handles the > allocation and distribution of the backup tasks, is that correct? > > > > Thanks in advance > > Cory > -------------- next part -------------- An HTML attachment was scrubbed... URL: From moshele at mellanox.com Sat Jun 8 17:14:18 2019 From: moshele at mellanox.com (Moshe Levi) Date: Sat, 8 Jun 2019 17:14:18 +0000 Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan In-Reply-To: References: <19f2868b-419f-cee3-f137-dcf277719e42@namecheap.com> Message-ID: If you are using OFED please use OFED 4.6 If you are using inbox driver I know for sure that it works with Kernel 5.0. Thanks, Moshe -----Original Message----- From: Zoltan Langi Sent: Friday, June 7, 2019 10:45 PM To: Moshe Levi ; openstack-discuss at lists.openstack.org Subject: Re: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan Hello Moshe, OS is Ubuntu 18.04.2 LTS, Kernel is: 4.18.0-21-generic According to Mellanox this os is definitely supported. Zoltan On 07.06.19 21:05, Moshe Levi wrote: > Hi Zoltan, > > What OS and kernel are you using? > > -----Original Message----- > From: Zoltan Langi > Sent: Friday, June 7, 2019 3:54 PM > To: openstack-discuss at lists.openstack.org > Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan > > Hello everyone, I hope someone more experienced can help me with this problem I've been struggling for a while now. > > I'm trying to set up ASAP2 ovs vxlan offload on a dual port Mellanox > ConnectX5 card between 2 hosts using LACP link aggregation and Rocky release. > > When the LAG is not there, only one pf is being used, the offload works just fine, getting the line speed out of the vf-s. > > (I initially followed this ASAP2 guide, works well: > https://community.mellanox.com/s/article/getting-started-with-mellanox-asap-2) > > Decided, to provide HA for the vf-s, may as well use LACP as it's supported for ASAP2 according to Mellanox: > > https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf > (page15) > > So what I've done I've created a systemd script that puts the eswitch into switchdev mode on both ports before the networking starts at boot time, the bond0 comes up after the mode was changed just like in the docs. > > The bond0 interface wasn't added to the ovs as the doc recommends as it keeps the vxlan tunnel, only the vf is there after openstack creates the vm. > > The problem is only one direction of the traffic is offloaded when LAG is being used. > > I opened a mellanox case and they recommended to install the latest ovs version which I did: > > https://community.mellanox.com/s/question/0D51T00006kXkRzSAK/connectx5-asap2-vxlan-offload-bond-openstack-problem > > After using the latest OVS from master, the problem still exist, the offload simply doesn't work properly. The speed I am getting is way far away from the values that I get when only a single port is used. > > Does anyone has any experience or any idea what should I look our for or check? > > > Thank you very much, anything is appreciated! > > Zoltan > > > > > From mnaser at vexxhost.com Sat Jun 8 17:31:41 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Sat, 8 Jun 2019 13:31:41 -0400 Subject: [openstack-ansible] suse support for stable/queens Message-ID: Hi everyone, The most recent set of automatic proposal patches are all failing for opensuse-42 due to the fact that it seems the operating system is now shipping with LXC 3 instead of LXC 2. The patch that added support to LXC 3 for our newer branches was mainly done there to add support for Bionic, which means that we can't really backport it all the way to Queens. We have two options right now: 1. Someone can volunteer to implement LXC 3 support in stable/queens in order to get opensuse-42 working again 2. We move the opensuse-42 jobs to non-voting for 1/2 weeks and if no one fixes them, we drop them (because they're a waste of CI resources). I'd like to hear what the community has to say about this to be able to move forward. Thanks, Mohammed -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From zoltan.langi at namecheap.com Sat Jun 8 17:55:57 2019 From: zoltan.langi at namecheap.com (zoltan.langi) Date: Sat, 08 Jun 2019 19:55:57 +0200 Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan In-Reply-To: Message-ID: Hi Moshe,Yes I am using the latest OFED driver from mellanox.Any ideas where am I missing something?Thank you!ZoltanVon meinem Samsung Galaxy Smartphone gesendet. -------- Ursprüngliche Nachricht --------Von: Moshe Levi Datum: 08.06.19 19:14 (GMT+01:00) An: Zoltan Langi , openstack-discuss at lists.openstack.org Betreff: RE: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan If you are using OFED please use OFED 4.6If you are using inbox driver I know for sure that it works with  Kernel 5.0. Thanks,Moshe-----Original Message-----From: Zoltan Langi Sent: Friday, June 7, 2019 10:45 PMTo: Moshe Levi ; openstack-discuss at lists.openstack.orgSubject: Re: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlanHello Moshe,OS is Ubuntu 18.04.2 LTS, Kernel is: 4.18.0-21-genericAccording to Mellanox this os is definitely supported.ZoltanOn 07.06.19 21:05, Moshe Levi wrote:> Hi Zoltan,>> What OS and kernel are you using?>> -----Original Message-----> From: Zoltan Langi > Sent: Friday, June 7, 2019 3:54 PM> To: openstack-discuss at lists.openstack.org> Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan>> Hello everyone, I hope someone more experienced can help me with this problem I've been struggling for a while now.>> I'm trying to set up ASAP2 ovs vxlan offload on a dual port Mellanox> ConnectX5 card between 2 hosts using LACP link aggregation and Rocky release.>> When the LAG is not there, only one pf is being used, the offload works just fine, getting the line speed out of the vf-s.>> (I initially followed this ASAP2 guide, works well:> https://community.mellanox.com/s/article/getting-started-with-mellanox-asap-2)>> Decided, to provide HA for the vf-s, may as well use LACP as it's supported for ASAP2 according to Mellanox:>> https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf> (page15)>> So what I've done I've created a systemd script that puts the eswitch into switchdev mode on both ports before the networking starts at boot time, the bond0 comes up after the mode was changed just like in the docs.>> The bond0 interface wasn't added to the ovs as the doc recommends as it keeps the vxlan tunnel, only the vf is there after openstack creates the vm.>> The problem is only one direction of the traffic is offloaded when LAG is being used.>> I opened a mellanox case and they recommended to install the latest ovs version which I did:>> https://community.mellanox.com/s/question/0D51T00006kXkRzSAK/connectx5-asap2-vxlan-offload-bond-openstack-problem>> After using the latest OVS from master, the problem still exist, the offload simply doesn't work properly. The speed I am getting is way far away from the values that I get when only a single port is used.>> Does anyone has any experience or any idea what should I look our for or check?>>> Thank you very much, anything is appreciated!>> Zoltan>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Sun Jun 9 09:53:15 2019 From: mark at stackhpc.com (Mark Goddard) Date: Sun, 9 Jun 2019 10:53:15 +0100 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <0681bc52-ef1c-8dc5-be22-68fcabd9dbd2@gmail.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> <0512CBBECA36994BAA14C7FEDE986CA60FC17D2B@BGSMSX102.gar.corp.intel.com> <06c5a684-4825-5f86-ae46-1c3e89389b2e@gmail.com> <85189fc6-0394-8330-9b6e-ad5abdd8438b@fried.cc> <0681bc52-ef1c-8dc5-be22-68fcabd9dbd2@gmail.com> Message-ID: On Fri, 7 Jun 2019, 18:02 Jay Pipes, wrote: > On 6/7/19 11:23 AM, Eric Fried wrote: > >> Better still, add a standardized trait to os-traits for hyperthreading > >> support, which is what I'd recommended in the original > >> cpu-resource-tracking spec. > > > > HW_CPU_HYPERTHREADING was added via [1] and has been in os-traits since > > 0.8.0. > I think we need a tri-state here. There are three options: 1. Give me a node with hyperthreading enabled 2. Give me a node with hyperthreading disabled 3. I don't care For me, the lack of a trait is 3 - I wouldn't want existing flavours without this trait to cause hyperthreading to be disabled. The ironic deploy templates feature wasn't designed to support forbidden traits - I don't think they were implemented at the time. The example use cases so far have involved encoding values into a trait name, e.g. CUSTOM_HYPERTHREADING_ON. Forbidden traits could be made to work in this case, but it doesn't really extend to non Boolean things such as RAID levels. I'm not trying to shoot down new ideas, just explaining how we got here. > Excellent, I had a faint recollection of that... > > -jay > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cheng1.li at intel.com Sun Jun 9 05:58:36 2019 From: cheng1.li at intel.com (Li, Cheng1) Date: Sun, 9 Jun 2019 05:58:36 +0000 Subject: [Airship-Seaworthy] Deployment of Airship-Seaworthy on Virtual Environment In-Reply-To: References: Message-ID: Finally, I have been able to deploy airsloop on virtual env. I created two VMs(libvirt/kvm driven), one for genesis and the other for compute node. These two VMs were on the same host. As the compute node VM is supposed to be provisioned by maas via ipmi/pxe. So I used virtualbmc to simulate the ipmi. I authored the site by following these two guides[1][2]. It’s the mix of guide[1] and guide[2]. The commands I used are all these ones[3]. After fixing several issue, I have deployed the virtual airsloop env. I list here some issues I met: 1. Node identify failed. At the beginning of step ‘prepare_and_deploy_nodes’, the drydock power on the compute node VM via ipmi. Once the compute VM starts up via pxe boot, it runs script to detect local network interfaces and sends the info back to drycok. So the drydock can identify the node based on the received info. But the compute VM doesn’t have real ILO interface, so the drydock can’t identify it. What I did to workaround this was to manually fill the ipmi info on maas web page. 2. My host doesn’t have enough CPU cores, neither the VMs. So I had to increase --pods-per-core in kubelet.yaml. 3. The disk name in compute VM is vda, instead of sda. Drydock can’t map the alias device name to vda, so I had to used the fixed alias name ‘vda’ which is the same as it’s real device name.(it was ‘bootdisk’) 4. My host doesn’t have enough resource(CPU, memory), so I removed some resource consuming components(logging, monitoring). Besides, I disabled the neutron rally test. As it failed with timeout error because of the resource limits. I also paste my site changes[4] for reference. [1] https://airship-treasuremap.readthedocs.io/en/latest/authoring_and_deployment.html [2] https://airship-treasuremap.readthedocs.io/en/latest/airsloop.html [3] https://airship-treasuremap.readthedocs.io/en/latest/airsloop.html#getting-started [4] https://github.com/cheng1li/treasuremap/commit/7a8287720dacc6dc1921948aaddec96b8cf2645e Thanks, Cheng From: Anirudh Gupta [mailto:Anirudh.Gupta at hsc.com] Sent: Thursday, May 30, 2019 7:29 PM To: Li, Cheng1 ; airship-discuss at lists.airshipit.org; airship-announce at lists.airshipit.org; openstack-dev at lists.openstack.org; openstack at lists.openstack.org Subject: RE: [Airship-Seaworthy] Deployment of Airship-Seaworthy on Virtual Environment Hi Team, I am trying to create Airship-Seaworthy from the link https://airship-treasuremap.readthedocs.io/en/latest/seaworthy.html It requires 6 DELL R720xd bare-metal servers: 3 control, and 3 compute nodes to be configured, but there is no documentation of how to install and getting started with Airship-Seaworthy. Do we need to follow the “Getting Started” section mentioned in Airsloop or will there be any difference in case of Seaworthy. https://airship-treasuremap.readthedocs.io/en/latest/airsloop.html#getting-started Also what all configurations need to be run from the 3 controller nodes and what needs to be run from 3 computes? Regards अनिरुद्ध गुप्ता (वरिष्ठ अभियंता) From: Li, Cheng1 > Sent: 30 May 2019 08:29 To: Anirudh Gupta >; airship-discuss at lists.airshipit.org; airship-announce at lists.airshipit.org; openstack-dev at lists.openstack.org; openstack at lists.openstack.org Subject: RE: [Airship-Seaworthy] Deployment of Airship-Seaworthy on Virtual Environment I have the same question. I haven’t seen any docs which guides how to deploy airsloop/air-seaworthy in virtual env. I am trying to deploy airsloop on libvirt/kvm driven virtual env. Two VMs, one for genesis, the other for compute. Virtualbmc for ipmi simulation. The genesis.sh scripts has been run on genesis node without error. But deploy_site fails at prepare_and_deploy_nodes task(action ‘set_node_boot’ timeout). I am still investigating this issue. It will be great if we have official document for this scenario. Thanks, Cheng From: Anirudh Gupta [mailto:Anirudh.Gupta at hsc.com] Sent: Wednesday, May 29, 2019 3:31 PM To: airship-discuss at lists.airshipit.org; airship-announce at lists.airshipit.org; openstack-dev at lists.openstack.org; openstack at lists.openstack.org Subject: [Airship-Seaworthy] Deployment of Airship-Seaworthy on Virtual Environment Hi Team, We want to test Production Ready Airship-Seaworthy in our virtual environment The link followed is https://airship-treasuremap.readthedocs.io/en/latest/seaworthy.html As per the document we need 6 DELL R720xd bare-metal servers: 3 control, and 3 compute nodes. But we need to deploy our setup on Virtual Environment. Does Airship-Seaworthy support Installation on Virtual Environment? We have 2 Rack Servers with Dual-CPU Intel® Xeon® E5 26xx with 16 cores each and 128 GB RAM. Is it possible that we can create Virtual Machines on them and set up the complete environment. In that case, what possible infrastructure do we require for setting up the complete setup. Looking forward for your response. Regards अनिरुद्ध गुप्ता (वरिष्ठ अभियंता) Hughes Systique Corporation D-23,24 Infocity II, Sector 33, Gurugram, Haryana 122001 DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Sun Jun 9 20:06:06 2019 From: zigo at debian.org (Thomas Goirand) Date: Sun, 9 Jun 2019 22:06:06 +0200 Subject: [Glance] Can Glance be installed on a server other than the controller? In-Reply-To: <1436864438.79118.1559952667149.JavaMail.zimbra@viviotech.net> References: <1436864438.79118.1559952667149.JavaMail.zimbra@viviotech.net> Message-ID: On 6/8/19 2:11 AM, Jordan Michaels wrote: > Hi Folks, > > First time posting here so apologies if this question is inappropriate for this list. > > Just a quick question to see if Glance can be installed on a server other than the controller? By following the installation docs for Rocky I can get Glance installed just fine on the controller (works great!), but following that same documentation on a separate server I cannot get it to authenticate. It's probably just something I'm doing, but I've run out of ideas on what to check next (both the controller and the separate server use the same auth and config), and I just want to make sure it's possible. It's also possible I'm losing my mind, so, there's that. =P > > Posted about it in detail here: > https://ask.openstack.org/en/question/122501/glance-unauthorized-http-401-on-block1-but-not-controller/ > > Appreciate any advice! > > Kind regards, > Jordan Jordan, There's no such thing in the OpenStack code as a "controller". This thing only lives in the docs and in how people decide to deploy things. Users are free to install any component anywhere. Indeed, you must be doing something wrong. Cheers, Thomas Goirand (zigo) From anlin.kong at gmail.com Mon Jun 10 04:19:47 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Mon, 10 Jun 2019 16:19:47 +1200 Subject: [requirements] SQLAlchemy 1.3.4 backward compatible? Message-ID: Trove Jenkins jobs failed because of the SQLAlchemy upgrade from 1.2.19 to 1.3.4 in https://github.com/openstack/requirements/commit/4f3252cbd7c63fd1c60e9bd09748e39dc2d9f8fa#diff-0bdd949ed8a7fdd4f95240bd951779c8 yesterday. A lot of error messages like the following: sqlalchemy.exc.ArgumentError: Textual SQL expression 'visible=0 or auto_apply=1...' should be explicitly declared as text('visible=0 or auto_apply=1...') sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) duplicate column name: priority_apply I'm wondering who else is also affected? Any hints for the workaround? Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Mon Jun 10 04:21:46 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Mon, 10 Jun 2019 16:21:46 +1200 Subject: [requirements] SQLAlchemy 1.3.4 backward compatible? In-Reply-To: References: Message-ID: BTW, the error message came from Trove db migration script, and Trove is using sqlalchemy-migrate lib rather than alembic. Best regards, Lingxian Kong Catalyst Cloud On Mon, Jun 10, 2019 at 4:19 PM Lingxian Kong wrote: > Trove Jenkins jobs failed because of the SQLAlchemy upgrade from 1.2.19 to > 1.3.4 in > https://github.com/openstack/requirements/commit/4f3252cbd7c63fd1c60e9bd09748e39dc2d9f8fa#diff-0bdd949ed8a7fdd4f95240bd951779c8 > yesterday. > > A lot of error messages like the following: > > sqlalchemy.exc.ArgumentError: Textual SQL expression 'visible=0 or > auto_apply=1...' should be explicitly declared as text('visible=0 or > auto_apply=1...') > > sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) duplicate > column name: priority_apply > > I'm wondering who else is also affected? Any hints for the workaround? > > Best regards, > Lingxian Kong > Catalyst Cloud > -------------- next part -------------- An HTML attachment was scrubbed... URL: From soulxu at gmail.com Mon Jun 10 05:17:44 2019 From: soulxu at gmail.com (Alex Xu) Date: Mon, 10 Jun 2019 13:17:44 +0800 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> Message-ID: Eric Fried 于2019年6月7日周五 上午1:59写道: > > Looking at the specs, it seems it's mostly talking about changing VMs > resources without rebooting. However that's not the actual intent of the > Ironic use case I explained in the email. > > Yes, it requires a reboot to reflect the BIOS changes. This reboot can > be either be done by Nova IronicDriver or Ironic deploy step can also do it. > > So I am not sure if the spec actually satisfies the use case. > > I hope to get more response from the team to get more clarity. > > Waitwait. The VM needs to be rebooted for the BIOS change to take > effect? So (non-live) resize would actually satisfy your use case just > fine. But the problem is that the ironic driver doesn't support resize > at all? > > Without digging too hard, that seems like it would be a fairly > straightforward thing to add. It would be limited to only "same host" > and initially you could only change this one attribute (anything else > would have to fail). > > Nova people, thoughts? > > Contribute another idea. So just as Jay said in this thread. Those CUSTOM_HYPERTHREADING_ON and CUSTOM_HYPERTHREADING_OFF are configuration. Those configuration isn't used for scheduling. Actually, Traits is designed for scheduling. So yes, there should be only one trait. CUSTOM_HYPERTHREADING, this trait is used for indicating the host support HT. About whether enable it in the instance is configuration info. That is also pain for change the configuration in the flavor. The flavor is the spec of instance's virtual resource, not the configuration. So another way is we should store the configuration into another place. Like the server's metadata. So for the HT case. We only fill the CUSTOM_HYPERTHREADING trait in the flavor, and fill a server metadata 'hyperthreading_config=on' in server metadata. The nova will find out a BM node support HT. And ironic based on the server metadata 'hyperthreading_config=on' to enable the HT. When change the configuration of HT to off, the user can update the server's metadata. Currently, the nova will send a rpc call to the compute node and calling a virt driver interface when the server metadata is updated. In the ironic virt driver, it can trigger a hyper-threading configuration deploy step to turn the HT off, and do a reboot of the instance. (The reboot is a step inside deploy-step, not part of ironic virt driver flow) But yes, this changes some design to the original deploy-steps and deploy-templates. And we fill something into the server's metadata which I'm not sure nova people like it. Anyway, just put my idea at here. efried > . > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From iurygregory at gmail.com Mon Jun 10 07:44:20 2019 From: iurygregory at gmail.com (Iury Gregory) Date: Mon, 10 Jun 2019 09:44:20 +0200 Subject: [ironic] Should we add ironic-prometheus-exporter under Ironic umbrella? In-Reply-To: References: Message-ID: Hi Mohammed, Thanks for your feedback =). Em sex, 7 de jun de 2019 às 19:43, Mohammed Naser escreveu: > Hi Iury, > > This seems pretty awesome. I threw in some comments > > On Fri, Jun 7, 2019 at 11:08 AM Iury Gregory > wrote: > > > > Greetings Ironicers! > > > > I would like to have your input on the matter of moving the > ironic-prometheus-exporter to Ironic umbrella. > > > > What is the ironic-prometheus-exporter? > > The ironic-prometheus-exporter[1] provides a way to export hardware > sensor data from > > Ironic project in OpenStack to Prometheus [2]. It's implemented as an > oslo-messaging notification driver to get the sensor data and a Flask > Application to export the metrics to Prometheus. It can not only be used in > metal3-io but also in any OpenStack deployment which includes Ironic > service. > > This seems really neat. From my perspective, it seems like it waits > for notifications, and then writes it out to a file. The flask server > seems to do nothing but pretty much serve the contents at /metrics. I > think we should be doing more of this inside OpenStack to be honest > and this can be really useful in the perspective of operators. > The notifications are the sensor data of each baremetal node, each node will have a file with the sensor data as metrics in the Prometheus format. Since Prometheus is pull-based the Flask application wlil merge the content of all files to provide to the Prometheus when necessary. > I don't want to complicate this more however, but I would love for > this to be a pattern/framework that other projects can adopt. > Agree, maybe we should talk in the IRC how the pattern/framework would look like and this can be done before moving the project or something to be done trough reviews after the project is moved. > > How to ensure the sensor data will follow the Prometheus format? > > We are using the prometheus client_python [3] to generate the file with > the metrics that come trough the oslo notifier plugin. > > > > How it will be tested on the gate? > > Virtualbmc can't provide sensor data that the actual plugin supports. We > would collect sample metrics from the hardware and use it in the unit tests. > > > > Maybe we should discuss this in the next ironic weekly meeting (10th > June)? > > > > [1] https://github.com/metal3-io/ironic-prometheus-exporter > > [2] https://prometheus.io/ > > [3] https://github.com/prometheus/client_python > > > > -- > > Att[]'s > > Iury Gregory Melo Ferreira > > MSc in Computer Science at UFCG > > Part of the puppet-manager-core team in OpenStack > > Software Engineer at Red Hat Czech > > Social: https://www.linkedin.com/in/iurygregory > > E-mail: iurygregory at gmail.com > > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com > -- *Att[]'sIury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Part of the puppet-manager-core team in OpenStack* *Software Engineer at Red Hat Czech* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Mon Jun 10 07:56:30 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 10 Jun 2019 08:56:30 +0100 Subject: [kolla] Feedback request: removing OracleLinux support In-Reply-To: <81438050-7699-c7a7-b883-a707cc3f53db@linaro.org> References: <81438050-7699-c7a7-b883-a707cc3f53db@linaro.org> Message-ID: Received a reluctant go-ahead from Oracle. Proposed patches to disable CI jobs: https://review.opendev.org/664217 https://review.opendev.org/664216 Mark On Wed, 5 Jun 2019 at 10:43, Marcin Juszkiewicz wrote: > > W dniu 05.06.2019 o 11:10, Mark Goddard pisze: > > > We propose dropping support for OracleLinux in the Train cycle. If > > this will affect you and you would like to help maintain it, please > > get in touch. > > First we drop it from CI. > > Then (IMHO) it will be removed once we move to CentOS 8. > From mark at stackhpc.com Mon Jun 10 07:57:54 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 10 Jun 2019 08:57:54 +0100 Subject: [kolla] Feedback request: removing kolla-cli In-Reply-To: References: Message-ID: On Wed, 5 Jun 2019 at 10:10, Mark Goddard wrote: > > Hi, > > We discussed during the kolla virtual PTG [1] the option of removing > support for the kolla-cli deliverable [2], as a way to improve the > long term sustainability of the project. kolla-cli was a project > started by Oracle, and accepted as a kolla deliverable. While it looks > interesting and potentially useful, it never gained much traction (as > far as I'm aware) and the maintainers left the community. We have > never released it and CI has been failing for some time. > > We propose dropping support for kolla-cli in the Train cycle. If this > will affect you and you would like to help maintain it, please get in > touch. Received a reluctant go-ahead from the kolla-cli contributors. I'll work through the process of retiring the project. > > Thanks, > Mark > > [1] https://etherpad.openstack.org/p/kolla-train-ptg > [2] https://github.com/openstack/kolla-cli From geguileo at redhat.com Mon Jun 10 10:39:09 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Mon, 10 Jun 2019 12:39:09 +0200 Subject: Cinder Ceph backup concurrency In-Reply-To: <18C7C076CE65A443BC1DEC057949DEFE01273C8720@CorysCloudVPS.Oblivion.local> References: <18C7C076CE65A443BC1DEC057949DEFE01273C8720@CorysCloudVPS.Oblivion.local> Message-ID: <20190610103909.iz72uejuuksb4yhx@localhost> On 08/06, Cory Hawkless wrote: > I'm using Rocky and Cinders built in Ceph backup driver which is working ok but I'd like to limit each instance of the backup agent to X number of concurrent backups. > For example, if I(Or a tenant) trigger a backup to run on 20 volumes, the cinder-0backuip agent promptly starts the process of backup up all 20 volumes simultaneously and while this works ok it has the downside of over saturating links, causing high IO on the disks etc. > > Ideally I'd like to have each cinder-backup agent limited to running X(Perhaps 5) backups jobs at any one time and the remaining jobs will be 'queued' until an agent has less than X jobs remaining. > > Is this possible at all? > Based on my understanding the Cinder scheduler services handles the allocation and distribution of the backup tasks, is that correct? > > Thanks in advance > Cory Hi Cory, Cinder doesn't have any kind of throttling mechanism specific for "heavy" operations. This also includes the cinder-backup service that doesn't make use of the cinder-scheduler service. I think there may be ways to do throttling for the case you describe, though I haven't tried them: Defining "executor_thread_pool_size" (defaults to 64) to reduce the number of concurrent operations that will be executed on the cinder-backup service (backup listings and such will not be affected, as they are executed by cinder-api). Some of the remaining requests will remain on the oslo messaging queue, and the rest in RabbitMQ message queue. For the RBD backend you could also limit the size of the native threads with "backup_native_threads_pool_size", which will limit the number of concurrent RBD calls (since they use native threads instead of green threads). Also, don't forget to ensure that "backup_workers" is set to 1, otherwise you will be running multiple processes, each with the previously defined limitations, resulting in N times what you wanted to have. I hope this helps. Cheers, Gorka. From gael.therond at gmail.com Mon Jun 10 13:14:06 2019 From: gael.therond at gmail.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Mon, 10 Jun 2019 15:14:06 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Hi guys, Just a quick question regarding this bug, someone told me that it have been patched within stable/rocky, BUT, were you talking about the openstack/octavia repositoy or the openstack/kolla repository? Many Thanks! Le mar. 4 juin 2019 à 15:19, Gaël THEROND a écrit : > Oh, that's perfect so, I'll just update my image and my platform as we're > using kolla-ansible and that's super easy. > > You guys rocks!! (Pun intended ;-)). > > Many many thanks to all of you, that will real back me a lot regarding the > Octavia solidity and Kolla flexibility actually ^^. > > Le mar. 4 juin 2019 à 15:17, Carlos Goncalves a > écrit : > >> On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND >> wrote: >> > >> > Hi Lingxian Kong, >> > >> > That’s actually very interesting as I’ve come to the same conclusion >> this morning during my investigation and was starting to think about a fix, >> which it seems you already made! >> > >> > Is there a reason why it didn’t was backported to rocky? >> >> The patch was merged in master branch during Rocky development cycle, >> hence included in stable/rocky as well. >> >> > >> > Very helpful, many many thanks to you you clearly spare me hours of >> works! I’ll get a review of your patch and test it on our lab. >> > >> > Le mar. 4 juin 2019 à 11:06, Gaël THEROND a >> écrit : >> >> >> >> Hi Felix, >> >> >> >> « Glad » you had the same issue before, and yes of course I looked at >> the HM logs which is were I actually found out that this event was >> triggered by octavia (Beside the DB data that validated that) here is my >> log trace related to this event, It doesn't really shows major issue IMHO. >> >> >> >> Here is the stacktrace that our octavia service archived for our both >> controllers servers, with the initial loadbalancer creation trace >> (Worker.log) and both controllers triggered task (Health-Manager.log). >> >> >> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ >> >> >> >> I well may have miss something in it, but I don't see something >> strange on from my point of view. >> >> Feel free to tell me if you spot something weird. >> >> >> >> >> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner >> a écrit : >> >>> >> >>> Hi Gael, >> >>> >> >>> >> >>> >> >>> we had a similar issue in the past. >> >>> >> >>> You could check the octiava healthmanager log (should be on the same >> node where the worker is running). >> >>> >> >>> This component monitors the status of the Amphorae and restarts them >> if they don’t trigger a callback after a specific time. This might also >> happen if there is some connection issue between the two components. >> >>> >> >>> >> >>> >> >>> But normally it should at least restart the LB with new Amphorae… >> >>> >> >>> >> >>> >> >>> Hope that helps >> >>> >> >>> >> >>> >> >>> Felix >> >>> >> >>> >> >>> >> >>> From: Gaël THEROND >> >>> Sent: Tuesday, June 4, 2019 9:44 AM >> >>> To: Openstack >> >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly >> deleted by octavia >> >>> >> >>> >> >>> >> >>> Hi guys, >> >>> >> >>> >> >>> >> >>> I’ve a weird situation here. >> >>> >> >>> >> >>> >> >>> I smoothly operate a large scale multi-region Octavia service using >> the default amphora driver which imply the use of nova instances as >> loadbalancers. >> >>> >> >>> >> >>> >> >>> Everything is running really well and our customers (K8s and >> traditional users) are really happy with the solution so far. >> >>> >> >>> >> >>> >> >>> However, yesterday one of those customers using the loadbalancer in >> front of their ElasticSearch cluster poked me because this loadbalancer >> suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were >> no longer available but yet the anchor/member/pool and listeners settings >> were still existing. >> >>> >> >>> >> >>> >> >>> So I investigated and found out that the loadbalancer amphoras have >> been destroyed by the octavia user. >> >>> >> >>> >> >>> >> >>> The weird part is, both the master and the backup instance have been >> destroyed at the same moment by the octavia service user. >> >>> >> >>> >> >>> >> >>> Is there specific circumstances where the octavia service could >> decide to delete the instances but not the anchor/members/pool ? >> >>> >> >>> >> >>> >> >>> It’s worrying me a bit as there is no clear way to trace why does >> Octavia did take this action. >> >>> >> >>> >> >>> >> >>> I digged within the nova and Octavia DB in order to correlate the >> action but except than validating my investigation it doesn’t really help >> as there are no clue of why the octavia service did trigger the deletion. >> >>> >> >>> >> >>> >> >>> If someone have any clue or tips to give me I’ll be more than happy >> to discuss this situation. >> >>> >> >>> >> >>> >> >>> Cheers guys! >> >>> >> >>> Hinweise zum Datenschutz finden Sie hier. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ed at leafe.com Mon Jun 10 13:46:09 2019 From: ed at leafe.com (Ed Leafe) Date: Mon, 10 Jun 2019 08:46:09 -0500 Subject: [placement] update 19-22 In-Reply-To: References: Message-ID: On Jun 7, 2019, at 7:17 AM, Chris Dent wrote: > > Ed Leafe's ongoing work with using a graph database probably needs > some kind of report or update. Sure, be happy to. A few weeks ago I completed the code changes to remove sqlalchemy from the objects, and replace it with code to talk to the graph DB (Neo4j). One big issue is that there isn’t a 1:1 relationship between what is needed in the two database approaches. For example, there is no need to use the two-ID design that (IMO) overly complicates placement data. There is also no need to store the IDs of parents and root providers, but so much of the code depends on these values, I left them in there for now. One other twist is that an Allocation cannot exist without a Consumer, so all the code to handle the early microversions that support that was removed. I then moved on to getting the functional tests passing. Some early runs revealed holes in my understanding of what the code was supposed to be doing, so I fixed those. Most of the failures were in the tests/functional/db directory. I mentioned that to Chris in a side conversation, and he agreed that those tests would not be relevant, as the system had a completely different database, so I removed those. I tried to integrate Neo4j’s transaction model into the transaction framework of oslo_db and sqla, and while it works for the most part, it fails when running tox. I get “Invalid transaction” messages, which you would expect when one process closes another process’s transaction. Since the Python adapter I’m using (py2neo) creates a pool for connections, I suspect that the way tox runs is causing py2neo to reuse live connections. I haven’t had time to dig into this yet, but it is my current focus. When I run the tests individually, they pass without a problem. I am also planning on doing some performance measurement, so I guess I’ll look into the perfload stuff to see if it can work for this. One thing that is clear to me from all this work is that the way Placement is coded is very much a result of its relational DB roots. There were so many places I encountered when converting the code where I needed to do what felt like unnecessary steps to make the objects continue to work the way that are currently designed. Had this been a greenfield effort, the code for implementing Placement with a graph DB would have been much more direct and understandable. But the converted objects are working, and working well. -- Ed Leafe From juliaashleykreger at gmail.com Mon Jun 10 14:01:28 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 10 Jun 2019 07:01:28 -0700 Subject: [ironic] To not have meetings? Message-ID: Last week the discussion came up of splitting the ironic meeting to alternate time zones as we have increasing numbers of contributors in the Asia/Pacific areas of the world[0]. With that discussion, an additional interesting question came up posing the question of shifting to the mailing list instead of our present IRC meeting[1]? It is definitely an interesting idea, one that I'm personally keen on because of time zones and daylight savings time. I think before we do this, we should collect thoughts and also try to determine how we would pull this off so we don't forget the weekly checkpoint that the meeting serves. I think we need to do something, so I guess now is a good time to provide input into what everyone thinks would be best for the project and facilitating the weekly check-in. What I think might work: By EOD UTC Monday: * Listed primary effort participants will be expected to update the whiteboard[2] weekly before EOD Monday UTC * Contributors propose patches to the whiteboard that they believe would be important for reviewers to examine this coming week. * PTL or designee sends weekly email to the mailing list to start an update thread shortly after EOD Monday UTC or early Tuesday UTC. ** Additional updates, questions, and topical discussion (new features, RFEs) would ideally be wrapped up by EOD UTC Tuesday. With that, I think we would also need to go ahead and begin having "office hours" as during the week we generally know some ironic contributors will be in IRC and able to respond to questions. I think this would initially consist of our meeting time and perhaps the other time that seems to be most friendly to the contributors int he Asia/Pacific area[3]. Thoughts/ideas/suggestions welcome! -Julia [0]: http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-03.log.html#t2019-06-03T15:31:33 [1]: http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-03.log.html#t2019-06-03T15:43:16 [2]: https://etherpad.openstack.org/p/IronicWhiteBoard [3]: https://doodle.com/poll/bv9a4qyqy44wiq92 From lbragstad at gmail.com Mon Jun 10 14:36:22 2019 From: lbragstad at gmail.com (Lance Bragstad) Date: Mon, 10 Jun 2019 09:36:22 -0500 Subject: [tc][all] Train Community Goals Message-ID: Hi all, The goals for the Train development cycle have merged. Both are available on governance.openstack.org [0][1]. Please have a look if you haven't already. Goal champions are asettle and gmann. As always, if you have any comments, questions, or concerns, please don't hesitate to reach out on the mailing list or in #openstack-tc. Thanks, Lance [0] https://governance.openstack.org/tc/goals/train/pdf-doc-generation.html [1] https://governance.openstack.org/tc/goals/train/ipv6-support-and-testing.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From haleyb.dev at gmail.com Mon Jun 10 14:36:30 2019 From: haleyb.dev at gmail.com (Brian Haley) Date: Mon, 10 Jun 2019 10:36:30 -0400 Subject: [neutron] Bug deputy report for week of June 3rd Message-ID: Hi, I was Neutron bug deputy last week. Below is a short summary about reported bugs. -Brian Critical bugs ------------- * https://bugs.launchpad.net/neutron/+bug/1832225 - Neutron-vpnaas unit tests broken - Broken by https://review.opendev.org/#/c/653903/ - https://review.opendev.org/#/c/664257/ High bugs --------- * https://bugs.launchpad.net/neutron/+bug/1831534 - [l3][dvr] with openflow security group east-west traffic between different vlan networks is broken - https://review.opendev.org/#/c/662925/ - https://review.opendev.org/#/c/663008/ * https://bugs.launchpad.net/neutron/+bug/1831575 - br-tun gets a wrong arp drop rule when dvr is connected to a network but not used as gateway - https://review.opendev.org/#/c/662999/ - https://review.opendev.org/#/c/663000/ * https://bugs.launchpad.net/neutron/+bug/1831647 - Creation of existing resource takes too much time or fails - https://review.opendev.org/#/c/663749/ * https://bugs.launchpad.net/neutron/+bug/1831919 - Impossible to change a list of static routes defined for subnet because of InvalidRequestError with Cisco ACI integration - https://review.opendev.org/#/c/663714/ - https://review.opendev.org/#/c/663713/ - https://review.opendev.org/#/c/663712/ Medium bugs ----------- * https://bugs.launchpad.net/neutron/+bug/1831404 - rarp packet will be dropped in flows cause vm connectivity broken after live-migration - Yang Li took ownership * https://bugs.launchpad.net/neutron/+bug/1831613 - SRIOV: agent may not register VFs - https://review.opendev.org/#/c/663031/ * https://bugs.launchpad.net/neutron/+bug/1831706 - [DVR] Modify `in_port` field of packets which from remote qr-* port - Possibly related to https://review.opendev.org/#/c/639009 and https://bugs.launchpad.net/neutron/+bug/1732067 * https://bugs.launchpad.net/neutron/+bug/1831811 - Unable to filter using same cidr value as used for subnet create - https://review.opendev.org/#/c/663464/ * https://bugs.launchpad.net/ubuntu/+source/neutron-fwaas/+bug/1832210 - incorrect decode of log prefix under python 3 - https://review.opendev.org/#/c/664234/ Low bugs -------- * https://bugs.launchpad.net/neutron/+bug/1831916 - BGP dynamic routing in neutron - https://review.opendev.org/#/c/663711/ Wishlist bugs ------------- None Invalid bugs ------------ * https://bugs.launchpad.net/bugs/1831613 - moved to lbaas storyboard Further triage required ----------------------- * https://bugs.launchpad.net/neutron/+bug/1831726 - neutron-cli port-update ipv6 fixed_ips Covering previous - Fixed IP getting replaced when using neutronclient, as API ref says will happen. - Looks like the openstackclient is doing things properly by appending the new fixed IP to the existing. - Might just be a documentation issue. * https://bugs.launchpad.net/neutron/+bug/1832021 - Checksum drop of metadata traffic on isolated provider networks - Related to recent revert of TCP checksum-fill iptables rule, https://review.opendev.org/#/c/654645/ - but since that was an invalid rule there is probably another issue here. From lbragstad at gmail.com Mon Jun 10 16:26:34 2019 From: lbragstad at gmail.com (Lance Bragstad) Date: Mon, 10 Jun 2019 11:26:34 -0500 Subject: [tc][all] Train Community Goals In-Reply-To: References: Message-ID: <15b7fbcb-7731-9598-c23e-fb3f6fb48487@gmail.com> I apologize, I missed a goal. coreycb is championing an effort to implement python runtimes for Train, which is being tracked in a community goal [0]. Updates are available on the mailing list if you haven't seen them already [1]. [0] https://governance.openstack.org/tc/goals/train/python3-updates.html [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006977.html On 6/10/19 9:36 AM, Lance Bragstad wrote: > Hi all, > > The goals for the Train development cycle have merged. Both are > available on governance.openstack.org [0][1]. Please have a look if > you haven't already. Goal champions are asettle and gmann. > > As always, if you have any comments, questions, or concerns, please > don't hesitate to reach out on the mailing list or in #openstack-tc. > > Thanks, > > Lance > > [0] > https://governance.openstack.org/tc/goals/train/pdf-doc-generation.html > [1] > https://governance.openstack.org/tc/goals/train/ipv6-support-and-testing.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From kennelson11 at gmail.com Mon Jun 10 16:35:12 2019 From: kennelson11 at gmail.com (Kendall Nelson) Date: Mon, 10 Jun 2019 09:35:12 -0700 Subject: [all] [TC] Call for Election Officials Message-ID: Hello, The upcoming round for OpenStack's technical elections (both PTL and TC) is set to happen in September this year[1]. We would like to encourage anyone interested to volunteer to help administer the elections, especially as some of our long-service volunteers periodically become ineligible by way of being nominated for election themselves. While elections have historically been handled by a small number of volunteers, the intention was for this to NOT be a closed process. The election process is detailed in this document [2], with tooling managed in gerrit [3], and uses StoryBoard [4] to keep track of various election activities. We are happy to mentor individuals and share knowledge about the elections process in an effort to get more of the community involved. Involvement can be on an ongoing basis or simply volunteering to help with some small part of one election just to learn how the process works. The election officials team is explicitly delegated by the Technical Committee, who have historically consistently expressed interest in more volunteers to assist. Please let us know if you would like to volunteer! -Kendall Nelson & The Election Officials [1] https://review.opendev.org/#/c/661673/2 [2] https://governance.openstack.org/election/process.html [3] https://opendev.org/openstack/election [4] https://storyboard.openstack.org/#!/project/openstack/election -------------- next part -------------- An HTML attachment was scrubbed... URL: From jp.methot at planethoster.info Mon Jun 10 16:52:56 2019 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Mon, 10 Jun 2019 12:52:56 -0400 Subject: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq In-Reply-To: References: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> Message-ID: Hi, Can you give me an idea of how you split the API from the server part? I’m guessing it has to do with pointing the API endpoint to a specific server, but keeping the neutron info in config files pointing to the controller? Contrary to what I said on this thread last week, we’ve been plagued with this issue every 24 hours or so, needing to restart the controller nodes to restore stability. We did implement several of the tweaks that were suggested in this thread’s previous emails, but we are only now considering splitting the API from the main servers, as you did. Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. > Le 5 juin 2019 à 15:31, Mathieu Gagné a écrit : > > Hi Jean-Philippe, > > On Wed, Jun 5, 2019 at 1:01 PM Jean-Philippe Méthot > wrote: >> >> We had a Pike openstack setup that we updated to Queens earlier this week. It’s a 30 compute nodes infrastructure with 2 controller nodes and 2 network nodes, using openvswitch for networking. Since we upgraded to queens, neutron-server on the controller nodes has been unable to contact the openvswitch-agents through rabbitmq. The rabbitmq is clustered on both controller nodes and has been giving us the following error when neutron-server connections fail : >> >> =ERROR REPORT==== 5-Jun-2019::18:50:08 === >> closing AMQP connection <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d): >> missed heartbeats from client, timeout: 60s >> >> The neutron-server logs show this error: >> >> 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer >> 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on controller1:5672 is unreachable: . Trying again in 1 seconds.: RecoverableConnectionError: >> >> The relevant service version numbers are as follow: >> rabbitmq-server-3.6.5-1.el7.noarch >> openstack-neutron-12.0.6-1.el7.noarch >> python2-oslo-messaging-5.35.4-1.el7.noarch >> >> Rabbitmq does not show any alert. It also has plenty of memory and a high enough file limit. The login user and credentials are fine as they are used in other openstack services which can contact rabbitmq without issues. >> >> I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing timeouts in neutron services, etc, to no avail. I find myself at a loss and would appreciate if anyone has any idea as to where to go from there. >> > > We had a very similar issue after upgrading to Neutron Queens. In > fact, all Neutron agents were "down" according to status API and > messages weren't getting through. IIRC, this only happened in regions > which had more load than the others. > > We applied a bunch of fixes which I suspect are only a bunch of bandaids. > > Here are the changes we made: > * Split neutron-api from neutron-server. Create a whole new controller > running neutron-api with mod_wsgi. > * Increase [database]/max_overflow = 200 > * Disable RabbitMQ heartbeat in oslo.messaging: > [oslo_messaging_rabbit]/heartbeat_timeout_threshold = 0 > * Increase [agent]/report_interval = 120 > * Increase [DEFAULT]/agent_down_time = 600 > > We also have those sysctl configs due to firewall dropping sessions. > But those have been on the server forever: > net.ipv4.tcp_keepalive_time = 30 > net.ipv4.tcp_keepalive_intvl = 1 > net.ipv4.tcp_keepalive_probes = 5 > > We never figured out why a service that was working before the upgrade > but no longer is. > This is kind of frustrating as it caused us all short of intermittent > issues and stress during our upgrade. > > Hope this helps. > > -- > Mathieu -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgagne at calavera.ca Mon Jun 10 17:25:43 2019 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Mon, 10 Jun 2019 13:25:43 -0400 Subject: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq In-Reply-To: References: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> Message-ID: Hi, On Mon, Jun 10, 2019 at 12:53 PM Jean-Philippe Méthot wrote: > > Hi, > > Can you give me an idea of how you split the API from the server part? I’m guessing it has to do with pointing the API endpoint to a specific server, but keeping the neutron info in config files pointing to the controller? > > Contrary to what I said on this thread last week, we’ve been plagued with this issue every 24 hours or so, needing to restart the controller nodes to restore stability. We did implement several of the tweaks that were suggested in this thread’s previous emails, but we are only now considering splitting the API from the main servers, as you did. > I followed this procedure to use mod_wsgi and updated DNS to point to the new machine/IP: https://docs.openstack.org/neutron/rocky/admin/config-wsgi.html#neutron-api-behind-mod-wsgi You can run neutron-rpc-server if you want to remove the API part from neutron-server. Mathieu > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > > Le 5 juin 2019 à 15:31, Mathieu Gagné a écrit : > > Hi Jean-Philippe, > > On Wed, Jun 5, 2019 at 1:01 PM Jean-Philippe Méthot > wrote: > > > We had a Pike openstack setup that we updated to Queens earlier this week. It’s a 30 compute nodes infrastructure with 2 controller nodes and 2 network nodes, using openvswitch for networking. Since we upgraded to queens, neutron-server on the controller nodes has been unable to contact the openvswitch-agents through rabbitmq. The rabbitmq is clustered on both controller nodes and has been giving us the following error when neutron-server connections fail : > > =ERROR REPORT==== 5-Jun-2019::18:50:08 === > closing AMQP connection <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d): > missed heartbeats from client, timeout: 60s > > The neutron-server logs show this error: > > 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer > 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on controller1:5672 is unreachable: . Trying again in 1 seconds.: RecoverableConnectionError: > > The relevant service version numbers are as follow: > rabbitmq-server-3.6.5-1.el7.noarch > openstack-neutron-12.0.6-1.el7.noarch > python2-oslo-messaging-5.35.4-1.el7.noarch > > Rabbitmq does not show any alert. It also has plenty of memory and a high enough file limit. The login user and credentials are fine as they are used in other openstack services which can contact rabbitmq without issues. > > I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing timeouts in neutron services, etc, to no avail. I find myself at a loss and would appreciate if anyone has any idea as to where to go from there. > > > We had a very similar issue after upgrading to Neutron Queens. In > fact, all Neutron agents were "down" according to status API and > messages weren't getting through. IIRC, this only happened in regions > which had more load than the others. > > We applied a bunch of fixes which I suspect are only a bunch of bandaids. > > Here are the changes we made: > * Split neutron-api from neutron-server. Create a whole new controller > running neutron-api with mod_wsgi. > * Increase [database]/max_overflow = 200 > * Disable RabbitMQ heartbeat in oslo.messaging: > [oslo_messaging_rabbit]/heartbeat_timeout_threshold = 0 > * Increase [agent]/report_interval = 120 > * Increase [DEFAULT]/agent_down_time = 600 > > We also have those sysctl configs due to firewall dropping sessions. > But those have been on the server forever: > net.ipv4.tcp_keepalive_time = 30 > net.ipv4.tcp_keepalive_intvl = 1 > net.ipv4.tcp_keepalive_probes = 5 > > We never figured out why a service that was working before the upgrade > but no longer is. > This is kind of frustrating as it caused us all short of intermittent > issues and stress during our upgrade. > > Hope this helps. > > -- > Mathieu > > From rfolco at redhat.com Mon Jun 10 20:33:17 2019 From: rfolco at redhat.com (Rafael Folco) Date: Mon, 10 Jun 2019 17:33:17 -0300 Subject: [tripleo] TripleO CI Summary: Sprint 31 Message-ID: Greetings, The TripleO CI team has just completed Sprint 31 / Unified Sprint 10 (May 16 thru Jun 05). The following is a summary of completed work during this sprint cycle: - Created image and container build jobs for RDO on RHEL 7 in the internal instance of Software Factory. - Completed the bootstrapping of OSP 15 standalone job on RHEL8 running in the internal Software Factory. - Promotion status: green on all branches at most of the sprint. The planned work for the next sprint [1] are: - Complete RDO on RHEL7 work by having an independent pipeline running container and image build, standalone and ovb featureset001 jobs. This includes fixing ovb job and start consuming rhel containers from the standalone jobs. - Replicate RHEL7 jobs created in the last sprint for RHEL8 running in the internal Software Factory. Expected outcome is to have a preliminary job producing logs with successes or failures at the end of the sprint. - Create a design document for a staging environment to test changes in the promoter server. This will benefit CI team with less breakages in the promoter server and also prepare the grounds for the multi-arch builds. The Ruck and Rover for this sprint are Sorin Sbarnea (zbr) and Ronelle Landy (rlandy). Please direct questions or queries to them regarding CI status or issues in #tripleo, ideally to whomever has the ‘|ruck’ suffix on their nick. Ruck/rover notes are being tracked in etherpad [2]. Thanks, rfolco [1] https://tree.taiga.io/project/tripleo-ci-board/taskboard/unified-sprint-11 [2] https://etherpad.openstack.org/p/ruckroversprint11 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kecarter at redhat.com Mon Jun 10 21:01:00 2019 From: kecarter at redhat.com (Kevin Carter) Date: Mon, 10 Jun 2019 16:01:00 -0500 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: With the now merged structural changes it is time to organize an official meeting to get things moving. So without further ado: * When should we schedule our meetings (day, hour, frequency)? * Should the meeting take place in the main #tripleo channel or in one of the dedicated meeting rooms (openstack-meeting-{1,2,3,4}, etc)? * How long should our meetings last? * Any volunteers to chair meetings? To capture some of our thoughts, questions, hopes, dreams, and aspirations I've created an etherpad which I'd like interested folks to throw ideas at: [ https://etherpad.openstack.org/p/tripleo-ansible-agenda ]. I'd like to see if we can get a confirmed list of folks who want to meet and, potentially, a generally good timezone. I'd also like to see if we can nail down some ideas for a plan of attack. While I have ideas and would be happy to talk at length about them (I wrote a few things down in the etherpad), I don't want to be the only voice given I'm new to the TripleO community (I could be, and likely I am, missing a lot of context). Assuming we can get something flowing, I'd like to shoot for an official meeting sometime next week (the week of 17 June, 2019). In the meantime, I'll look forward to chatting with folks in the #tripleo channel. -- Kevin Carter IRC: cloudnull On Wed, Jun 5, 2019 at 11:27 AM Luke Short wrote: > Hey everyone, > > For the upcoming work on focusing on more Ansible automation and testing, > I have created a dedicated #tripleo-transformation channel for our new > squad. Feel free to join if you are interested in joining and helping out! > > +1 to removing repositories we don't use, especially if they have no > working code. I'd like to see the consolidation of TripleO specific things > into the tripleo-ansible repository and then using upstream Ansible roles > for all of the different services (nova, glance, cinder, etc.). > > Sincerely, > > Luke Short, RHCE > Software Engineer, OpenStack Deployment Framework > Red Hat, Inc. > > > On Wed, Jun 5, 2019 at 8:53 AM David Peacock wrote: > >> On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter wrote: >> >>> So the questions at hand are: what, if anything, should we do with >>> these repositories? Should we retire them or just ignore them? Is there >>> anyone using any of the roles? >>> >> >> My initial reaction was to suggest we just ignore them, but on second >> thought I'm wondering if there is anything negative if we leave them lying >> around. Unless we're going to benefit from them in the future if we start >> actively working in these repos, they represent obfuscation and debt, so it >> might be best to retire / dispose of them. >> >> David >> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Mon Jun 10 22:38:36 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 10 Jun 2019 17:38:36 -0500 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <9876798ff1a3700fcc0f7202bbfe12da1314a23e.camel@redhat.com> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> <9876798ff1a3700fcc0f7202bbfe12da1314a23e.camel@redhat.com> Message-ID: >>> (e) Fully manual. Aggregate operations never touch (add or remove) >>> traits on host RPs. You always have to do that manually. I'm going to come down in favor of this option. It's the shortest path to getting something viable working, in a way that is simple to understand, despite lacking magical DWIM-ness. >>> As noted above, >>> it's easy to do - and we could make it easier with a tiny wrapper that >>> takes an aggregate, a list of traits, and an --add/--remove command. So >>> initially, setting up aggregate isolation is a two-step process, and in >>> the future we can consider making new API/CLI affordance that combines >>> the steps. > ya e could work too. > melanie added a similar functionality to osc placment for managing the alloction ratios > of specific resource classes per aggregate a few months ago > https://review.opendev.org/#/c/640898/ > > we could proably provide somthing similar for managing traits but determining what RP to > add the trait too would be a littel tricker. we would have to be able to filter to RP with either a > specific inventory or with a specific trait or in a speicic subtree. We (Placement team) are still trying to figure out how to manage concepts like "resourceless request groups" and "traits/aggregates flow down". But for now, Nova is still always modeling VCPU/MEMORY_MB and traits on the root provider, so let's simply hit the providers in the aggregate (i.e. the root compute host RPs). I'm putting this on the agenda for Thursday's nova meeting [1] to hopefully get some more Nova opinions on it. efried [1] https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting From smooney at redhat.com Tue Jun 11 00:13:51 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 11 Jun 2019 01:13:51 +0100 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> <9876798ff1a3700fcc0f7202bbfe12da1314a23e.camel@redhat.com> Message-ID: <91efe32e80b7c24b0bfe5875ecd053513b7fd443.camel@redhat.com> On Mon, 2019-06-10 at 17:38 -0500, Eric Fried wrote: > > > > (e) Fully manual. Aggregate operations never touch (add or remove) > > > > traits on host RPs. You always have to do that manually. > > I'm going to come down in favor of this option. It's the shortest path > to getting something viable working, in a way that is simple to > understand, despite lacking magical DWIM-ness. > > > > > As noted above, > > > > it's easy to do - and we could make it easier with a tiny wrapper that > > > > takes an aggregate, a list of traits, and an --add/--remove command. So > > > > initially, setting up aggregate isolation is a two-step process, and in > > > > the future we can consider making new API/CLI affordance that combines > > > > the steps. > > > > ya e could work too. > > melanie added a similar functionality to osc placment for managing the alloction ratios > > of specific resource classes per aggregate a few months ago > > https://review.opendev.org/#/c/640898/ > > > > we could proably provide somthing similar for managing traits but determining what RP to > > add the trait too would be a littel tricker. we would have to be able to filter to RP with either a > > specific inventory or with a specific trait or in a speicic subtree. > > We (Placement team) are still trying to figure out how to manage > concepts like "resourceless request groups" and "traits/aggregates flow > down". But for now, Nova is still always modeling VCPU/MEMORY_MB and > traits on the root provider, so let's simply hit the providers in the > aggregate (i.e. the root compute host RPs). > > I'm putting this on the agenda for Thursday's nova meeting [1] to > hopefully get some more Nova opinions on it. for what its worth for the host-aggate case teh aplity to add or remvoe a trait form all root providers is likely enough, so that would make a cli much simpeler to create. for the generic case of being able to add/remove a trait on an rp that could be anyhere in a nested tree for all trees in an aggaget, that is a much harder problem but we also do not need it to solve the usecase we have today so we can defer that until we actully need it and if we never need it we can defer it forever. so +1 for keeping it simple and just updating the root RPs. > > efried > > [1] https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting > From missile0407 at gmail.com Tue Jun 11 03:25:49 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Tue, 11 Jun 2019 11:25:49 +0800 Subject: [Kolla] Stuck at bootstrap_gnocchi during deployment using Ubuntu binary on Rocky release. Message-ID: Hi Our env needs Gnocchi because Ceilometer, and we using Kolla to deploy Ceph as the env storage backend. But I found that it always stuck at bootstrap_gnocchi. The error log shows below when check with docker logs: 2019-06-11 10:59:17,707 [19] ERROR gnocchi.utils: Unable to initialize storage driver Traceback (most recent call last): File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 333, in call result = fn(*args, **kwargs) File "/usr/lib/python3/dist-packages/gnocchi/storage/__init__.py", line 102, in get_driver conf.storage) File "/usr/lib/python3/dist-packages/gnocchi/storage/ceph.py", line 52, in __init__ self.rados, self.ioctx = ceph.create_rados_connection(conf) File "/usr/lib/python3/dist-packages/gnocchi/common/ceph.py", line 51, in create_rados_connection raise ImportError("No module named 'rados' nor 'cradox'") ImportError: No module named 'rados' nor 'cradox' This error occurred not only the image from Docker Hub, but also build by Kolla-build in Ubuntu binary based. Strange is, no error occur if turn into use Ubuntu source based images. And I guess it only happen when enabled Ceph. Does anyone have idea about this? Many thanks, Eddie. -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Tue Jun 11 06:12:11 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 11 Jun 2019 08:12:11 +0200 Subject: [Kolla] Stuck at bootstrap_gnocchi during deployment using Ubuntu binary on Rocky release. In-Reply-To: References: Message-ID: Hello Eddie, this classifies as a bug. Please file one at: https://bugs.launchpad.net/kolla with details on the used settings. Thank you. Kind regards, Radosław Piliszek wt., 11 cze 2019 o 05:29 Eddie Yen napisał(a): > Hi > > Our env needs Gnocchi because Ceilometer, and we using Kolla to deploy > Ceph as the env storage backend. > > But I found that it always stuck at bootstrap_gnocchi. The error log shows > below when check with docker logs: > > 2019-06-11 10:59:17,707 [19] ERROR gnocchi.utils: Unable to initialize > storage driver > Traceback (most recent call last): > File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 333, in > call > result = fn(*args, **kwargs) > File "/usr/lib/python3/dist-packages/gnocchi/storage/__init__.py", line > 102, in get_driver > conf.storage) > File "/usr/lib/python3/dist-packages/gnocchi/storage/ceph.py", line 52, > in __init__ > self.rados, self.ioctx = ceph.create_rados_connection(conf) > File "/usr/lib/python3/dist-packages/gnocchi/common/ceph.py", line 51, > in create_rados_connection > raise ImportError("No module named 'rados' nor 'cradox'") > ImportError: No module named 'rados' nor 'cradox' > > This error occurred not only the image from Docker Hub, but also build by > Kolla-build in Ubuntu binary based. > Strange is, no error occur if turn into use Ubuntu source based images. > And I guess it only happen when enabled Ceph. > > Does anyone have idea about this? > > > Many thanks, > Eddie. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From missile0407 at gmail.com Tue Jun 11 06:53:51 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Tue, 11 Jun 2019 14:53:51 +0800 Subject: [Kolla] Stuck at bootstrap_gnocchi during deployment using Ubuntu binary on Rocky release. In-Reply-To: References: Message-ID: Hi This issue just reported on the Launchpad. Many thanks, Eddie. Radosław Piliszek 於 2019年6月11日 週二 下午2:12寫道: > Hello Eddie, > > this classifies as a bug. > Please file one at: https://bugs.launchpad.net/kolla > with details on the used settings. > > Thank you. > > Kind regards, > Radosław Piliszek > > wt., 11 cze 2019 o 05:29 Eddie Yen napisał(a): > >> Hi >> >> Our env needs Gnocchi because Ceilometer, and we using Kolla to deploy >> Ceph as the env storage backend. >> >> But I found that it always stuck at bootstrap_gnocchi. The error log >> shows below when check with docker logs: >> >> 2019-06-11 10:59:17,707 [19] ERROR gnocchi.utils: Unable to initialize >> storage driver >> Traceback (most recent call last): >> File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 333, >> in call >> result = fn(*args, **kwargs) >> File "/usr/lib/python3/dist-packages/gnocchi/storage/__init__.py", line >> 102, in get_driver >> conf.storage) >> File "/usr/lib/python3/dist-packages/gnocchi/storage/ceph.py", line 52, >> in __init__ >> self.rados, self.ioctx = ceph.create_rados_connection(conf) >> File "/usr/lib/python3/dist-packages/gnocchi/common/ceph.py", line 51, >> in create_rados_connection >> raise ImportError("No module named 'rados' nor 'cradox'") >> ImportError: No module named 'rados' nor 'cradox' >> >> This error occurred not only the image from Docker Hub, but also build by >> Kolla-build in Ubuntu binary based. >> Strange is, no error occur if turn into use Ubuntu source based images. >> And I guess it only happen when enabled Ceph. >> >> Does anyone have idea about this? >> >> >> Many thanks, >> Eddie. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dirk at dmllr.de Tue Jun 11 08:29:17 2019 From: dirk at dmllr.de (=?UTF-8?B?RGlyayBNw7xsbGVy?=) Date: Tue, 11 Jun 2019 10:29:17 +0200 Subject: [openstack-ansible] suse support for stable/queens In-Reply-To: References: Message-ID: Hi Mohammed, Am Sa., 8. Juni 2019 um 19:32 Uhr schrieb Mohammed Naser : > 1. Someone can volunteer to implement LXC 3 support in stable/queens > in order to get opensuse-42 working again > 2. We move the opensuse-42 jobs to non-voting for 1/2 weeks and if no > one fixes them, we drop them (because they're a waste of CI > resources). I suggest to stop caring about opensuse 42.x on stable/queens and older as we'd like to deprecate 42.x (it is going to be end of life and falling out of security support in the next few days) and focus on leap 15.x only. Greetings, Dirk From jean-philippe at evrard.me Tue Jun 11 08:56:06 2019 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Tue, 11 Jun 2019 10:56:06 +0200 Subject: [openstack-ansible] suse support for stable/queens In-Reply-To: References: Message-ID: <0f7755d6-1b85-406c-a8db-bed40f07c195@www.fastmail.com> > I suggest to stop caring about opensuse 42.x on stable/queens and > older as we'd like to deprecate > 42.x (it is going to be end of life and falling out of security > support in the next few days) and focus on leap 15.x only. Agreed. Maybe I should clarify the whole story too: 1. Focus on bare metal deploys for ALL roles. See also [1]. 2. Focus on deploys using distro packages for ALL roles. See also [1], column F. 3. Making sure efforts 1 and 2 apply to lower branches. [1]: https://docs.google.com/spreadsheets/d/1coiPHGqaIKNgCGYsNhEgzswqwp4wedm2XoBCN9WMosY/edit#gid=752070695 Regards, JP From cgoncalves at redhat.com Tue Jun 11 10:59:33 2019 From: cgoncalves at redhat.com (Carlos Goncalves) Date: Tue, 11 Jun 2019 12:59:33 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: On Mon, Jun 10, 2019 at 3:14 PM Gaël THEROND wrote: > > Hi guys, > > Just a quick question regarding this bug, someone told me that it have been patched within stable/rocky, BUT, were you talking about the openstack/octavia repositoy or the openstack/kolla repository? Octavia. https://review.opendev.org/#/q/Ief97ddda8261b5bbc54c6824f90ae9c7a2d81701 > > Many Thanks! > > Le mar. 4 juin 2019 à 15:19, Gaël THEROND a écrit : >> >> Oh, that's perfect so, I'll just update my image and my platform as we're using kolla-ansible and that's super easy. >> >> You guys rocks!! (Pun intended ;-)). >> >> Many many thanks to all of you, that will real back me a lot regarding the Octavia solidity and Kolla flexibility actually ^^. >> >> Le mar. 4 juin 2019 à 15:17, Carlos Goncalves a écrit : >>> >>> On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND wrote: >>> > >>> > Hi Lingxian Kong, >>> > >>> > That’s actually very interesting as I’ve come to the same conclusion this morning during my investigation and was starting to think about a fix, which it seems you already made! >>> > >>> > Is there a reason why it didn’t was backported to rocky? >>> >>> The patch was merged in master branch during Rocky development cycle, >>> hence included in stable/rocky as well. >>> >>> > >>> > Very helpful, many many thanks to you you clearly spare me hours of works! I’ll get a review of your patch and test it on our lab. >>> > >>> > Le mar. 4 juin 2019 à 11:06, Gaël THEROND a écrit : >>> >> >>> >> Hi Felix, >>> >> >>> >> « Glad » you had the same issue before, and yes of course I looked at the HM logs which is were I actually found out that this event was triggered by octavia (Beside the DB data that validated that) here is my log trace related to this event, It doesn't really shows major issue IMHO. >>> >> >>> >> Here is the stacktrace that our octavia service archived for our both controllers servers, with the initial loadbalancer creation trace (Worker.log) and both controllers triggered task (Health-Manager.log). >>> >> >>> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ >>> >> >>> >> I well may have miss something in it, but I don't see something strange on from my point of view. >>> >> Feel free to tell me if you spot something weird. >>> >> >>> >> >>> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner a écrit : >>> >>> >>> >>> Hi Gael, >>> >>> >>> >>> >>> >>> >>> >>> we had a similar issue in the past. >>> >>> >>> >>> You could check the octiava healthmanager log (should be on the same node where the worker is running). >>> >>> >>> >>> This component monitors the status of the Amphorae and restarts them if they don’t trigger a callback after a specific time. This might also happen if there is some connection issue between the two components. >>> >>> >>> >>> >>> >>> >>> >>> But normally it should at least restart the LB with new Amphorae… >>> >>> >>> >>> >>> >>> >>> >>> Hope that helps >>> >>> >>> >>> >>> >>> >>> >>> Felix >>> >>> >>> >>> >>> >>> >>> >>> From: Gaël THEROND >>> >>> Sent: Tuesday, June 4, 2019 9:44 AM >>> >>> To: Openstack >>> >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia >>> >>> >>> >>> >>> >>> >>> >>> Hi guys, >>> >>> >>> >>> >>> >>> >>> >>> I’ve a weird situation here. >>> >>> >>> >>> >>> >>> >>> >>> I smoothly operate a large scale multi-region Octavia service using the default amphora driver which imply the use of nova instances as loadbalancers. >>> >>> >>> >>> >>> >>> >>> >>> Everything is running really well and our customers (K8s and traditional users) are really happy with the solution so far. >>> >>> >>> >>> >>> >>> >>> >>> However, yesterday one of those customers using the loadbalancer in front of their ElasticSearch cluster poked me because this loadbalancer suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer available but yet the anchor/member/pool and listeners settings were still existing. >>> >>> >>> >>> >>> >>> >>> >>> So I investigated and found out that the loadbalancer amphoras have been destroyed by the octavia user. >>> >>> >>> >>> >>> >>> >>> >>> The weird part is, both the master and the backup instance have been destroyed at the same moment by the octavia service user. >>> >>> >>> >>> >>> >>> >>> >>> Is there specific circumstances where the octavia service could decide to delete the instances but not the anchor/members/pool ? >>> >>> >>> >>> >>> >>> >>> >>> It’s worrying me a bit as there is no clear way to trace why does Octavia did take this action. >>> >>> >>> >>> >>> >>> >>> >>> I digged within the nova and Octavia DB in order to correlate the action but except than validating my investigation it doesn’t really help as there are no clue of why the octavia service did trigger the deletion. >>> >>> >>> >>> >>> >>> >>> >>> If someone have any clue or tips to give me I’ll be more than happy to discuss this situation. >>> >>> >>> >>> >>> >>> >>> >>> Cheers guys! >>> >>> >>> >>> Hinweise zum Datenschutz finden Sie hier. From madhuri.kumari at intel.com Tue Jun 11 11:22:13 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Tue, 11 Jun 2019 11:22:13 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> <0512CBBECA36994BAA14C7FEDE986CA60FC17D2B@BGSMSX102.gar.corp.intel.com> <06c5a684-4825-5f86-ae46-1c3e89389b2e@gmail.com> <85189fc6-0394-8330-9b6e-ad5abdd8438b@fried.cc> <0681bc52-ef1c-8dc5-be22-68fcabd9dbd2@gmail.com> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC1BB6C@BGSMSX102.gar.corp.intel.com> Hi All, Thank you for your responses. I agree with Mark here, the example stated here fits the Boolean case(feature enable/disable). However many other BIOS feature doesn’t fits the case. For example enabling Intel Speed Select also needs 3 configuration or traits: CUSTOM_ISS_CONFIG_BASE – 00 CUSTOM_ISS_CONFIG_1 – 01 CUSTOM_ISS_CONFIG_2 - 02 Each configuration/trait here represents different profiles to be set on the baremetal server. Does resize help with such use case? Regards, Madhuri From: Mark Goddard [mailto:mark at stackhpc.com] Sent: Sunday, June 9, 2019 3:23 PM To: Jay Pipes Cc: openstack-discuss Subject: Re: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning On Fri, 7 Jun 2019, 18:02 Jay Pipes, > wrote: On 6/7/19 11:23 AM, Eric Fried wrote: >> Better still, add a standardized trait to os-traits for hyperthreading >> support, which is what I'd recommended in the original >> cpu-resource-tracking spec. > > HW_CPU_HYPERTHREADING was added via [1] and has been in os-traits since > 0.8.0. I think we need a tri-state here. There are three options: 1. Give me a node with hyperthreading enabled 2. Give me a node with hyperthreading disabled 3. I don't care For me, the lack of a trait is 3 - I wouldn't want existing flavours without this trait to cause hyperthreading to be disabled. The ironic deploy templates feature wasn't designed to support forbidden traits - I don't think they were implemented at the time. The example use cases so far have involved encoding values into a trait name, e.g. CUSTOM_HYPERTHREADING_ON. Forbidden traits could be made to work in this case, but it doesn't really extend to non Boolean things such as RAID levels. I'm not trying to shoot down new ideas, just explaining how we got here. Excellent, I had a faint recollection of that... -jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-philippe at evrard.me Tue Jun 11 11:55:37 2019 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Tue, 11 Jun 2019 13:55:37 +0200 Subject: [uc][tc][ops] reviving osops- repos In-Reply-To: References: <20190530205552.falsvxcegehtyuge@yuggoth.org> <20190531123501.tawgvqgsw6yle2nu@csail.mit.edu> <20190531164102.5lwt2jyxk24u3vdz@yuggoth.org> Message-ID: > Alternatively, I feel like a SIG (be it the Ops Docs SIG or a new > "Operational tooling" SIG) would totally be a good idea to revive this. > In that case we'd define the repository in [4]. > > My personal preference would be for a new SIG, but whoever is signing up > to work on this should definitely have the final say. Agreed on having it inside OpenStack namespace, and code handled by a team/SIG/WG (with my preference being a SIG -- existing or not). When this team/SIG/WG retires, the repo would with it. It provides clean ownership, and clear cleanup when disbanding. Regards, Jean-Philippe Evrard (evrardjp) From gael.therond at gmail.com Tue Jun 11 12:09:35 2019 From: gael.therond at gmail.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Tue, 11 Jun 2019 14:09:35 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Ok nice, do you have the commit hash? I would look at it and validate that it have been committed to Stein too so I could bump my service to stein using Kolla. Thanks! Le mar. 11 juin 2019 à 12:59, Carlos Goncalves a écrit : > On Mon, Jun 10, 2019 at 3:14 PM Gaël THEROND > wrote: > > > > Hi guys, > > > > Just a quick question regarding this bug, someone told me that it have > been patched within stable/rocky, BUT, were you talking about the > openstack/octavia repositoy or the openstack/kolla repository? > > Octavia. > > https://review.opendev.org/#/q/Ief97ddda8261b5bbc54c6824f90ae9c7a2d81701 > > > > > Many Thanks! > > > > Le mar. 4 juin 2019 à 15:19, Gaël THEROND a > écrit : > >> > >> Oh, that's perfect so, I'll just update my image and my platform as > we're using kolla-ansible and that's super easy. > >> > >> You guys rocks!! (Pun intended ;-)). > >> > >> Many many thanks to all of you, that will real back me a lot regarding > the Octavia solidity and Kolla flexibility actually ^^. > >> > >> Le mar. 4 juin 2019 à 15:17, Carlos Goncalves > a écrit : > >>> > >>> On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND > wrote: > >>> > > >>> > Hi Lingxian Kong, > >>> > > >>> > That’s actually very interesting as I’ve come to the same conclusion > this morning during my investigation and was starting to think about a fix, > which it seems you already made! > >>> > > >>> > Is there a reason why it didn’t was backported to rocky? > >>> > >>> The patch was merged in master branch during Rocky development cycle, > >>> hence included in stable/rocky as well. > >>> > >>> > > >>> > Very helpful, many many thanks to you you clearly spare me hours of > works! I’ll get a review of your patch and test it on our lab. > >>> > > >>> > Le mar. 4 juin 2019 à 11:06, Gaël THEROND > a écrit : > >>> >> > >>> >> Hi Felix, > >>> >> > >>> >> « Glad » you had the same issue before, and yes of course I looked > at the HM logs which is were I actually found out that this event was > triggered by octavia (Beside the DB data that validated that) here is my > log trace related to this event, It doesn't really shows major issue IMHO. > >>> >> > >>> >> Here is the stacktrace that our octavia service archived for our > both controllers servers, with the initial loadbalancer creation trace > (Worker.log) and both controllers triggered task (Health-Manager.log). > >>> >> > >>> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ > >>> >> > >>> >> I well may have miss something in it, but I don't see something > strange on from my point of view. > >>> >> Feel free to tell me if you spot something weird. > >>> >> > >>> >> > >>> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner > a écrit : > >>> >>> > >>> >>> Hi Gael, > >>> >>> > >>> >>> > >>> >>> > >>> >>> we had a similar issue in the past. > >>> >>> > >>> >>> You could check the octiava healthmanager log (should be on the > same node where the worker is running). > >>> >>> > >>> >>> This component monitors the status of the Amphorae and restarts > them if they don’t trigger a callback after a specific time. This might > also happen if there is some connection issue between the two components. > >>> >>> > >>> >>> > >>> >>> > >>> >>> But normally it should at least restart the LB with new Amphorae… > >>> >>> > >>> >>> > >>> >>> > >>> >>> Hope that helps > >>> >>> > >>> >>> > >>> >>> > >>> >>> Felix > >>> >>> > >>> >>> > >>> >>> > >>> >>> From: Gaël THEROND > >>> >>> Sent: Tuesday, June 4, 2019 9:44 AM > >>> >>> To: Openstack > >>> >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly > deleted by octavia > >>> >>> > >>> >>> > >>> >>> > >>> >>> Hi guys, > >>> >>> > >>> >>> > >>> >>> > >>> >>> I’ve a weird situation here. > >>> >>> > >>> >>> > >>> >>> > >>> >>> I smoothly operate a large scale multi-region Octavia service > using the default amphora driver which imply the use of nova instances as > loadbalancers. > >>> >>> > >>> >>> > >>> >>> > >>> >>> Everything is running really well and our customers (K8s and > traditional users) are really happy with the solution so far. > >>> >>> > >>> >>> > >>> >>> > >>> >>> However, yesterday one of those customers using the loadbalancer > in front of their ElasticSearch cluster poked me because this loadbalancer > suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were > no longer available but yet the anchor/member/pool and listeners settings > were still existing. > >>> >>> > >>> >>> > >>> >>> > >>> >>> So I investigated and found out that the loadbalancer amphoras > have been destroyed by the octavia user. > >>> >>> > >>> >>> > >>> >>> > >>> >>> The weird part is, both the master and the backup instance have > been destroyed at the same moment by the octavia service user. > >>> >>> > >>> >>> > >>> >>> > >>> >>> Is there specific circumstances where the octavia service could > decide to delete the instances but not the anchor/members/pool ? > >>> >>> > >>> >>> > >>> >>> > >>> >>> It’s worrying me a bit as there is no clear way to trace why does > Octavia did take this action. > >>> >>> > >>> >>> > >>> >>> > >>> >>> I digged within the nova and Octavia DB in order to correlate the > action but except than validating my investigation it doesn’t really help > as there are no clue of why the octavia service did trigger the deletion. > >>> >>> > >>> >>> > >>> >>> > >>> >>> If someone have any clue or tips to give me I’ll be more than > happy to discuss this situation. > >>> >>> > >>> >>> > >>> >>> > >>> >>> Cheers guys! > >>> >>> > >>> >>> Hinweise zum Datenschutz finden Sie hier. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgoncalves at redhat.com Tue Jun 11 12:13:28 2019 From: cgoncalves at redhat.com (Carlos Goncalves) Date: Tue, 11 Jun 2019 14:13:28 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: You can find the commit hash from the link I provided. The patch is available from Queens so it is also available in Stein. On Tue, Jun 11, 2019 at 2:10 PM Gaël THEROND wrote: > > Ok nice, do you have the commit hash? I would look at it and validate that it have been committed to Stein too so I could bump my service to stein using Kolla. > > Thanks! > > Le mar. 11 juin 2019 à 12:59, Carlos Goncalves a écrit : >> >> On Mon, Jun 10, 2019 at 3:14 PM Gaël THEROND wrote: >> > >> > Hi guys, >> > >> > Just a quick question regarding this bug, someone told me that it have been patched within stable/rocky, BUT, were you talking about the openstack/octavia repositoy or the openstack/kolla repository? >> >> Octavia. >> >> https://review.opendev.org/#/q/Ief97ddda8261b5bbc54c6824f90ae9c7a2d81701 >> >> > >> > Many Thanks! >> > >> > Le mar. 4 juin 2019 à 15:19, Gaël THEROND a écrit : >> >> >> >> Oh, that's perfect so, I'll just update my image and my platform as we're using kolla-ansible and that's super easy. >> >> >> >> You guys rocks!! (Pun intended ;-)). >> >> >> >> Many many thanks to all of you, that will real back me a lot regarding the Octavia solidity and Kolla flexibility actually ^^. >> >> >> >> Le mar. 4 juin 2019 à 15:17, Carlos Goncalves a écrit : >> >>> >> >>> On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND wrote: >> >>> > >> >>> > Hi Lingxian Kong, >> >>> > >> >>> > That’s actually very interesting as I’ve come to the same conclusion this morning during my investigation and was starting to think about a fix, which it seems you already made! >> >>> > >> >>> > Is there a reason why it didn’t was backported to rocky? >> >>> >> >>> The patch was merged in master branch during Rocky development cycle, >> >>> hence included in stable/rocky as well. >> >>> >> >>> > >> >>> > Very helpful, many many thanks to you you clearly spare me hours of works! I’ll get a review of your patch and test it on our lab. >> >>> > >> >>> > Le mar. 4 juin 2019 à 11:06, Gaël THEROND a écrit : >> >>> >> >> >>> >> Hi Felix, >> >>> >> >> >>> >> « Glad » you had the same issue before, and yes of course I looked at the HM logs which is were I actually found out that this event was triggered by octavia (Beside the DB data that validated that) here is my log trace related to this event, It doesn't really shows major issue IMHO. >> >>> >> >> >>> >> Here is the stacktrace that our octavia service archived for our both controllers servers, with the initial loadbalancer creation trace (Worker.log) and both controllers triggered task (Health-Manager.log). >> >>> >> >> >>> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ >> >>> >> >> >>> >> I well may have miss something in it, but I don't see something strange on from my point of view. >> >>> >> Feel free to tell me if you spot something weird. >> >>> >> >> >>> >> >> >>> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner a écrit : >> >>> >>> >> >>> >>> Hi Gael, >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> we had a similar issue in the past. >> >>> >>> >> >>> >>> You could check the octiava healthmanager log (should be on the same node where the worker is running). >> >>> >>> >> >>> >>> This component monitors the status of the Amphorae and restarts them if they don’t trigger a callback after a specific time. This might also happen if there is some connection issue between the two components. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> But normally it should at least restart the LB with new Amphorae… >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> Hope that helps >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> Felix >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> From: Gaël THEROND >> >>> >>> Sent: Tuesday, June 4, 2019 9:44 AM >> >>> >>> To: Openstack >> >>> >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> Hi guys, >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> I’ve a weird situation here. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> I smoothly operate a large scale multi-region Octavia service using the default amphora driver which imply the use of nova instances as loadbalancers. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> Everything is running really well and our customers (K8s and traditional users) are really happy with the solution so far. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> However, yesterday one of those customers using the loadbalancer in front of their ElasticSearch cluster poked me because this loadbalancer suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer available but yet the anchor/member/pool and listeners settings were still existing. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> So I investigated and found out that the loadbalancer amphoras have been destroyed by the octavia user. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> The weird part is, both the master and the backup instance have been destroyed at the same moment by the octavia service user. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> Is there specific circumstances where the octavia service could decide to delete the instances but not the anchor/members/pool ? >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> It’s worrying me a bit as there is no clear way to trace why does Octavia did take this action. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> I digged within the nova and Octavia DB in order to correlate the action but except than validating my investigation it doesn’t really help as there are no clue of why the octavia service did trigger the deletion. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> If someone have any clue or tips to give me I’ll be more than happy to discuss this situation. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> Cheers guys! >> >>> >>> >> >>> >>> Hinweise zum Datenschutz finden Sie hier. From gael.therond at gmail.com Tue Jun 11 12:15:46 2019 From: gael.therond at gmail.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Tue, 11 Jun 2019 14:15:46 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Oh, really sorry, I was looking at your answer from my mobile mailing app and it didn't shows, sorry ^^ Many thanks for your help! Le mar. 11 juin 2019 à 14:13, Carlos Goncalves a écrit : > You can find the commit hash from the link I provided. The patch is > available from Queens so it is also available in Stein. > > On Tue, Jun 11, 2019 at 2:10 PM Gaël THEROND > wrote: > > > > Ok nice, do you have the commit hash? I would look at it and validate > that it have been committed to Stein too so I could bump my service to > stein using Kolla. > > > > Thanks! > > > > Le mar. 11 juin 2019 à 12:59, Carlos Goncalves > a écrit : > >> > >> On Mon, Jun 10, 2019 at 3:14 PM Gaël THEROND > wrote: > >> > > >> > Hi guys, > >> > > >> > Just a quick question regarding this bug, someone told me that it > have been patched within stable/rocky, BUT, were you talking about the > openstack/octavia repositoy or the openstack/kolla repository? > >> > >> Octavia. > >> > >> > https://review.opendev.org/#/q/Ief97ddda8261b5bbc54c6824f90ae9c7a2d81701 > >> > >> > > >> > Many Thanks! > >> > > >> > Le mar. 4 juin 2019 à 15:19, Gaël THEROND a > écrit : > >> >> > >> >> Oh, that's perfect so, I'll just update my image and my platform as > we're using kolla-ansible and that's super easy. > >> >> > >> >> You guys rocks!! (Pun intended ;-)). > >> >> > >> >> Many many thanks to all of you, that will real back me a lot > regarding the Octavia solidity and Kolla flexibility actually ^^. > >> >> > >> >> Le mar. 4 juin 2019 à 15:17, Carlos Goncalves > a écrit : > >> >>> > >> >>> On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND > wrote: > >> >>> > > >> >>> > Hi Lingxian Kong, > >> >>> > > >> >>> > That’s actually very interesting as I’ve come to the same > conclusion this morning during my investigation and was starting to think > about a fix, which it seems you already made! > >> >>> > > >> >>> > Is there a reason why it didn’t was backported to rocky? > >> >>> > >> >>> The patch was merged in master branch during Rocky development > cycle, > >> >>> hence included in stable/rocky as well. > >> >>> > >> >>> > > >> >>> > Very helpful, many many thanks to you you clearly spare me hours > of works! I’ll get a review of your patch and test it on our lab. > >> >>> > > >> >>> > Le mar. 4 juin 2019 à 11:06, Gaël THEROND > a écrit : > >> >>> >> > >> >>> >> Hi Felix, > >> >>> >> > >> >>> >> « Glad » you had the same issue before, and yes of course I > looked at the HM logs which is were I actually found out that this event > was triggered by octavia (Beside the DB data that validated that) here is > my log trace related to this event, It doesn't really shows major issue > IMHO. > >> >>> >> > >> >>> >> Here is the stacktrace that our octavia service archived for our > both controllers servers, with the initial loadbalancer creation trace > (Worker.log) and both controllers triggered task (Health-Manager.log). > >> >>> >> > >> >>> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ > >> >>> >> > >> >>> >> I well may have miss something in it, but I don't see something > strange on from my point of view. > >> >>> >> Feel free to tell me if you spot something weird. > >> >>> >> > >> >>> >> > >> >>> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner > a écrit : > >> >>> >>> > >> >>> >>> Hi Gael, > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> we had a similar issue in the past. > >> >>> >>> > >> >>> >>> You could check the octiava healthmanager log (should be on the > same node where the worker is running). > >> >>> >>> > >> >>> >>> This component monitors the status of the Amphorae and restarts > them if they don’t trigger a callback after a specific time. This might > also happen if there is some connection issue between the two components. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> But normally it should at least restart the LB with new > Amphorae… > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> Hope that helps > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> Felix > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> From: Gaël THEROND > >> >>> >>> Sent: Tuesday, June 4, 2019 9:44 AM > >> >>> >>> To: Openstack > >> >>> >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances > unexpectedly deleted by octavia > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> Hi guys, > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> I’ve a weird situation here. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> I smoothly operate a large scale multi-region Octavia service > using the default amphora driver which imply the use of nova instances as > loadbalancers. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> Everything is running really well and our customers (K8s and > traditional users) are really happy with the solution so far. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> However, yesterday one of those customers using the > loadbalancer in front of their ElasticSearch cluster poked me because this > loadbalancer suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the > amphoras were no longer available but yet the anchor/member/pool and > listeners settings were still existing. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> So I investigated and found out that the loadbalancer amphoras > have been destroyed by the octavia user. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> The weird part is, both the master and the backup instance have > been destroyed at the same moment by the octavia service user. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> Is there specific circumstances where the octavia service could > decide to delete the instances but not the anchor/members/pool ? > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> It’s worrying me a bit as there is no clear way to trace why > does Octavia did take this action. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> I digged within the nova and Octavia DB in order to correlate > the action but except than validating my investigation it doesn’t really > help as there are no clue of why the octavia service did trigger the > deletion. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> If someone have any clue or tips to give me I’ll be more than > happy to discuss this situation. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> Cheers guys! > >> >>> >>> > >> >>> >>> Hinweise zum Datenschutz finden Sie hier. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Tue Jun 11 13:54:15 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 11 Jun 2019 09:54:15 -0400 Subject: [openstack-ansible] suse support for stable/queens In-Reply-To: References: Message-ID: On Tue, Jun 11, 2019 at 4:29 AM Dirk Müller wrote: > > Hi Mohammed, > > Am Sa., 8. Juni 2019 um 19:32 Uhr schrieb Mohammed Naser : > > > 1. Someone can volunteer to implement LXC 3 support in stable/queens > > in order to get opensuse-42 working again > > 2. We move the opensuse-42 jobs to non-voting for 1/2 weeks and if no > > one fixes them, we drop them (because they're a waste of CI > > resources). > > I suggest to stop caring about opensuse 42.x on stable/queens and > older as we'd like to deprecate > 42.x (it is going to be end of life and falling out of security > support in the next few days) and focus on leap 15.x only. https://review.opendev.org/#/c/664599/ done > Greetings, > Dirk -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From pierre at stackhpc.com Tue Jun 11 14:12:20 2019 From: pierre at stackhpc.com (Pierre Riteau) Date: Tue, 11 Jun 2019 15:12:20 +0100 Subject: [requirements] paramiko 2.5.0 causing ImportError: cannot import name py31compat Message-ID: Hello, paramiko 2.5.0 was released yesterday [1]. It appears to trigger failures in the Kayobe molecule job with the following error [2]: ImportError: cannot import name py31compat It's not clear yet why this is happening, since py31compat lives in setuptools. paramiko 2.5.0 includes changes to paramiko/py3compat.py which could be related. For now, we're capping paramiko [3] as it is blocking our gate. I thought I would share with the list, in case other projects experience similar errors. Cheers, Pierre [1] https://pypi.org/project/paramiko/#history [2] http://logs.openstack.org/17/664417/1/check/kayobe-tox-molecule/0370fdd/job-output.txt.gz [3] https://review.opendev.org/#/c/664533/ From aschultz at redhat.com Tue Jun 11 14:37:15 2019 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 11 Jun 2019 08:37:15 -0600 Subject: [tripleo] Outstanding specs & blueprints for the Train cycle Message-ID: Hey folks, I wanted to send a note about a last call for specs for the train cycle. In a previous mail back in May[0], I had mentioned that the plan was to try and have all the blueprints and specs finalized by Train milestone 1. Since milestone 1 was last week, this is your final call for specs & blueprints. Please let me know if there are any outstanding items by next week's IRC meeting on June 18, 2019. I will be applying a -2 to any outstanding specs that have not merged or been spoken for. Thanks, -Alex [0] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006223.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From jasonanderson at uchicago.edu Tue Jun 11 15:33:49 2019 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Tue, 11 Jun 2019 15:33:49 +0000 Subject: [ironic][neutron] Security groups on bare metal instances Message-ID: Hi all, We've been scratching our heads for a while, trying to figure out how security groups for bare metal instances are supposed to work. The configuration guide for Networking[1] implies that using the 'iptables_hybrid' firewall driver should work. We are using Neutron tenant networks[2] with Ironic. My understanding is that the iptables_hybrid driver creates a new OVS port (with prefix qvo), logically connects that to the integration bridge, and then creates a veth pair inside a new network namespace, and that veth device then gets some iptables rules to handle the security group rules. It is not clear to me how or when that qvo "hybrid" port is even created; I've combed through the Neutron code base for a while looking for clues. We had tried using the "pure" OVS firewall solution, where security group rules are expessed using OpenFlow flows. However, this doesn't work, as there is not OVS port for a bare metal instance (at least, not in our setup.) We are using networking-generic-switch[3], which provisions ports on a physical switch with a VLAN tag on the provider network. From OVS' perspective, the traffic exits OVS with that VLAN tag and that's that; OVS in this situation is only responsible for handling routing between provider networks and performing NAT for egress and ingress via Floating IP assignments. So, I'm wondering if others have had success getting security groups to work in a bare metal environment, and have any clues we could follow to get this working nicely. I'm beginning to suspect our problems have to do with the fact that we're doing VLAN isolation predominately via configuring physical switches, and as such there isn't a clear point where security groups can be inserted. The problem we are trying to solve is limiting ingress traffic on a Floating IP, so we only allow SSH from a given host, or only allow ports X and Y to be open externally, etc. Thanks in advance, as usual, for any insights! /Jason [1]: https://docs.openstack.org/ironic/latest/install/configure-networking.html [2]: https://docs.openstack.org/ironic/latest/install/configure-tenant-networks.html [3]: https://docs.openstack.org/networking-generic-switch/latest/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jasonanderson at uchicago.edu Tue Jun 11 15:43:02 2019 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Tue, 11 Jun 2019 15:43:02 +0000 Subject: [nova][ironic] Lock-related performance issue with update_resources periodic job References: Message-ID: Hi Surya, On 5/13/19 3:15 PM, Surya Seetharaman wrote: We faced the same problem at CERN when we upgraded to rocky (we have ~2300 nodes on a single compute) like Eric said, and we set the [compute]resource_provider_association_refresh to a large value (this definitely helps by stopping the syncing of traits/aggregates and provider tree cache info stuff in terms of chattiness with placement) and inspite of that it doesn't scale that well for us. We still find the periodic task taking too much of time which causes the locking to hold up the claim for instances in BUILD state (the exact same problem you described). While one way to tackle this like you said is to set the "update_resources_interval" to a higher value - we were not sure how much out of sync things would get with placement, so it will be interesting to see how this spans out for you - another way out would be to use multiple computes and spread the nodes around (though this is also a pain to maintain IMHO) which is what we are looking into presently. I wanted to let you know that we've been running this way in production for a few weeks now and it's had a noticeable improvement: instances are no longer sticking in the "Build" stage, pre-networking, for ages. We were able to track the improvement by comparing the Nova conductor logs ("Took {seconds} to build the instance" vs "Took {seconds} to spawn the instance on the hypervisor"; the delta should be as small as possible and in our case went from ~30 minutes to ~1 minute.) There have been a few cases where a resource provider claim got "stuck", but in practice it has been so infrequent that it potentially has other causes. As such, I can recommend increasing the interval time significantly. Currently we have it set to 6 hours. I have not yet looked in to bringing in the other Nova patches used at CERN (and available in Stein). I did take a look at updating the locking mechanism, but do not have work to show for this yet. Cheers, /Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Tue Jun 11 16:00:38 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 11 Jun 2019 17:00:38 +0100 Subject: [ironic][neutron] Security groups on bare metal instances In-Reply-To: References: Message-ID: On Tue, 2019-06-11 at 15:33 +0000, Jason Anderson wrote: > Hi all, > > We've been scratching our heads for a while, trying to figure out how security groups for bare metal instances are > supposed to work. The configuration guide for Networking< > https://docs.openstack.org/ironic/latest/install/configure-networking.html>[1] implies that using the > 'iptables_hybrid' firewall driver should work. We are using Neutron tenant networks< > https://docs.openstack.org/ironic/latest/install/configure-tenant-networks.html>[2] with Ironic. My understanding is > that the iptables_hybrid driver creates a new OVS port (with prefix qvo), logically connects that to the integration > bridge, and then creates a veth pair inside a new network namespace, and that veth device then gets some iptables > rules to handle the security group rules. It is not clear to me how or when that qvo "hybrid" port is even created; > I've combed through the Neutron code base for a while looking for clues. > > We had tried using the "pure" OVS firewall solution, where security group rules are expessed using OpenFlow flows. > However, this doesn't work, as there is not OVS port for a bare metal instance (at least, not in our setup.) We are > using networking-generic-switch[3], which provisions > ports on a physical switch with a VLAN tag on the provider network. From OVS' perspective, the traffic exits OVS with > that VLAN tag and that's that; OVS in this situation is only responsible for handling routing between provider > networks and performing NAT for egress and ingress via Floating IP assignments. > > So, I'm wondering if others have had success getting security groups to work in a bare metal environment, and have any > clues we could follow to get this working nicely. in a baremetal enviornment the only way to implement security groups for the baremetal instance is to rely on an ml2 driver that supports implementing security groups at the top of rack switch. the iptables and and openvswtich firewall dirvers can only be used in a vm deployment. > I'm beginning to suspect our problems have to do with the fact that we're doing VLAN isolation predominately via > configuring physical switches, and as such there isn't a clear point where security groups can be inserted. some switch vendors can implement security gorups directly in the TOR i belive either arrista or cisco support this in there top of rack swtich driver. e.g. https://github.com/openstack/networking-arista/blob/master/networking_arista/ml2/security_groups/arista_security_groups.py > The problem we are trying to solve is limiting ingress traffic on a Floating IP, so we only allow SSH from a given > host, or only allow ports X and Y to be open externally, etc. as an alternitive you migth be able to use the firewall as a service api to implemtn traffic filtering in the neutorn routers rather than at the port level. > > Thanks in advance, as usual, for any insights! > > /Jason > > [1]: https://docs.openstack.org/ironic/latest/install/configure-networking.html > [2]: https://docs.openstack.org/ironic/latest/install/configure-tenant-networks.html > [3]: https://docs.openstack.org/networking-generic-switch/latest/ From ekultails at gmail.com Tue Jun 11 16:05:48 2019 From: ekultails at gmail.com (Luke Short) Date: Tue, 11 Jun 2019 12:05:48 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: Hey Kevin/all, I propose we have the meeting at 14:00 UTC weekly on Thursdays for about 30 minutes. This is similar to the normal #tripleo meeting except instead of Tuesday it is on Thursday. I believe this time will accommodate the most amount of people. Let's keep it on #tripleo instead of the OpenStack meeting rooms to avoid the concerns others have had about missing information from the TripleO community. Everyone will be kept in the loop and the IRC logs will be easy to find since it's consolidated on TripleO. I would be happy to help lead the meetings and I have also added some thoughts to the Etherpad. How does everyone feel about having our first meeting on June 20th? Sincerely, Luke Short On Mon, Jun 10, 2019 at 5:02 PM Kevin Carter wrote: > With the now merged structural changes it is time to organize an official > meeting to get things moving. > > So without further ado: > * When should we schedule our meetings (day, hour, frequency)? > * Should the meeting take place in the main #tripleo channel or in one of > the dedicated meeting rooms (openstack-meeting-{1,2,3,4}, etc)? > * How long should our meetings last? > * Any volunteers to chair meetings? > > To capture some of our thoughts, questions, hopes, dreams, and aspirations > I've created an etherpad which I'd like interested folks to throw ideas at: > [ https://etherpad.openstack.org/p/tripleo-ansible-agenda ]. I'd like to > see if we can get a confirmed list of folks who want to meet and, > potentially, a generally good timezone. I'd also like to see if we can nail > down some ideas for a plan of attack. While I have ideas and would be happy > to talk at length about them (I wrote a few things down in the etherpad), I > don't want to be the only voice given I'm new to the TripleO community (I > could be, and likely I am, missing a lot of context). > > Assuming we can get something flowing, I'd like to shoot for an official > meeting sometime next week (the week of 17 June, 2019). In the meantime, > I'll look forward to chatting with folks in the #tripleo channel. > > -- > > Kevin Carter > IRC: cloudnull > > > On Wed, Jun 5, 2019 at 11:27 AM Luke Short wrote: > >> Hey everyone, >> >> For the upcoming work on focusing on more Ansible automation and testing, >> I have created a dedicated #tripleo-transformation channel for our new >> squad. Feel free to join if you are interested in joining and helping out! >> >> +1 to removing repositories we don't use, especially if they have no >> working code. I'd like to see the consolidation of TripleO specific things >> into the tripleo-ansible repository and then using upstream Ansible roles >> for all of the different services (nova, glance, cinder, etc.). >> >> Sincerely, >> >> Luke Short, RHCE >> Software Engineer, OpenStack Deployment Framework >> Red Hat, Inc. >> >> >> On Wed, Jun 5, 2019 at 8:53 AM David Peacock wrote: >> >>> On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter wrote: >>> >>>> So the questions at hand are: what, if anything, should we do with >>>> these repositories? Should we retire them or just ignore them? Is there >>>> anyone using any of the roles? >>>> >>> >>> My initial reaction was to suggest we just ignore them, but on second >>> thought I'm wondering if there is anything negative if we leave them lying >>> around. Unless we're going to benefit from them in the future if we start >>> actively working in these repos, they represent obfuscation and debt, so it >>> might be best to retire / dispose of them. >>> >>> David >>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Tue Jun 11 17:39:31 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 11 Jun 2019 18:39:31 +0100 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> Message-ID: On Mon, 10 Jun 2019 at 06:18, Alex Xu wrote: > > > > Eric Fried 于2019年6月7日周五 上午1:59写道: >> >> > Looking at the specs, it seems it's mostly talking about changing VMs resources without rebooting. However that's not the actual intent of the Ironic use case I explained in the email. >> > Yes, it requires a reboot to reflect the BIOS changes. This reboot can be either be done by Nova IronicDriver or Ironic deploy step can also do it. >> > So I am not sure if the spec actually satisfies the use case. >> > I hope to get more response from the team to get more clarity. >> >> Waitwait. The VM needs to be rebooted for the BIOS change to take >> effect? So (non-live) resize would actually satisfy your use case just >> fine. But the problem is that the ironic driver doesn't support resize >> at all? >> >> Without digging too hard, that seems like it would be a fairly >> straightforward thing to add. It would be limited to only "same host" >> and initially you could only change this one attribute (anything else >> would have to fail). >> >> Nova people, thoughts? >> > > Contribute another idea. > > So just as Jay said in this thread. Those CUSTOM_HYPERTHREADING_ON and CUSTOM_HYPERTHREADING_OFF are configuration. Those > configuration isn't used for scheduling. Actually, Traits is designed for scheduling. > > So yes, there should be only one trait. CUSTOM_HYPERTHREADING, this trait is used for indicating the host support HT. About whether enable it in the instance is configuration info. > > That is also pain for change the configuration in the flavor. The flavor is the spec of instance's virtual resource, not the configuration. > > So another way is we should store the configuration into another place. Like the server's metadata. > > So for the HT case. We only fill the CUSTOM_HYPERTHREADING trait in the flavor, and fill a server metadata 'hyperthreading_config=on' in server metadata. The nova will find out a BM node support HT. And ironic based on the server metadata 'hyperthreading_config=on' to enable the HT. > > When change the configuration of HT to off, the user can update the server's metadata. Currently, the nova will send a rpc call to the compute node and calling a virt driver interface when the server metadata is updated. In the ironic virt driver, it can trigger a hyper-threading configuration deploy step to turn the HT off, and do a reboot of the instance. (The reboot is a step inside deploy-step, not part of ironic virt driver flow) > > But yes, this changes some design to the original deploy-steps and deploy-templates. And we fill something into the server's metadata which I'm not sure nova people like it. > > Anyway, just put my idea at here. We did consider using metadata. The problem is that it is user-defined, so there is no way for an operator to restrict what can be done by a user. Flavors are operator-defined and so allow for selection from a 'menu' of types and configurations. What might be nice is if we could use a flavor extra spec like this: deploy-config:hyperthreading=enabled The nova ironic virt driver could pass this to ironic, like it does with traits. Then in the ironic deploy template, have fields like this: name: Hyperthreading enabled config-type: hyperthreading config-value: enabled steps: Ironic would then match on the config-type and config-value to find a suitable deploy template. As an extension, the deploy template could define a trait (or list of traits) that must be supported by a node in order for the template to be applied. Perhaps this would even be a standard relationship between config-type and traits? Haven't thought this through completely, I'm sure it has holes. > >> efried >> . >> From saikrishna.ura at cloudseals.com Tue Jun 11 15:17:55 2019 From: saikrishna.ura at cloudseals.com (Saikrishna Ura) Date: Tue, 11 Jun 2019 15:17:55 +0000 Subject: getting issues while configuring the Trove Message-ID: Hi, I installed Openstack in Ubuntu 18.04 by cloning the devstack repository with this url "git clone https://git.openstack.org/openstack-dev/devstack", but i can't able create or access with the trove, I'm getting issues with the installation. Can anyone help on this issue please. If any reference document or any guidance much appreciated. Thanks, Saikrishna U. -------------- next part -------------- An HTML attachment was scrubbed... URL: From grayudu at opentext.com Tue Jun 11 17:23:31 2019 From: grayudu at opentext.com (Garaga Rayudu) Date: Tue, 11 Jun 2019 17:23:31 +0000 Subject: Barbican support for Window Message-ID: Hi Team, Is it supported for window OS. If Yes, please let me know more details about installation. Also let me know should I integrate with our product freely to support key management. Since it look like open source product. Thanks, Rayudu -------------- next part -------------- An HTML attachment was scrubbed... URL: From johfulto at redhat.com Tue Jun 11 17:55:48 2019 From: johfulto at redhat.com (John Fulton) Date: Tue, 11 Jun 2019 13:55:48 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: On Tue, Jun 11, 2019 at 12:12 PM Luke Short wrote: > > Hey Kevin/all, > > I propose we have the meeting at 14:00 UTC weekly on Thursdays for about 30 minutes. This is similar to the normal #tripleo meeting except instead of Tuesday it is on Thursday. I believe this time will accommodate the most amount of people. Let's keep it on #tripleo instead of the OpenStack meeting rooms to avoid the concerns others have had about missing information from the TripleO community. Everyone will be kept in the loop and the IRC logs will be easy to find since it's consolidated on TripleO. I would be happy to help lead the meetings and I have also added some thoughts to the Etherpad. > > How does everyone feel about having our first meeting on June 20th? I've updated line 12 Etherpad with possible days/times including the one you suggested (FWIW I have a recurring conflict at that time). Maybe people who are interested can update the etherpad and we announce the winning date at the end of the week? https://etherpad.openstack.org/p/tripleo-ansible-agenda John > > Sincerely, > Luke Short > > On Mon, Jun 10, 2019 at 5:02 PM Kevin Carter wrote: >> >> With the now merged structural changes it is time to organize an official meeting to get things moving. >> >> So without further ado: >> * When should we schedule our meetings (day, hour, frequency)? >> * Should the meeting take place in the main #tripleo channel or in one of the dedicated meeting rooms (openstack-meeting-{1,2,3,4}, etc)? >> * How long should our meetings last? >> * Any volunteers to chair meetings? >> >> To capture some of our thoughts, questions, hopes, dreams, and aspirations I've created an etherpad which I'd like interested folks to throw ideas at: [ https://etherpad.openstack.org/p/tripleo-ansible-agenda ]. I'd like to see if we can get a confirmed list of folks who want to meet and, potentially, a generally good timezone. I'd also like to see if we can nail down some ideas for a plan of attack. While I have ideas and would be happy to talk at length about them (I wrote a few things down in the etherpad), I don't want to be the only voice given I'm new to the TripleO community (I could be, and likely I am, missing a lot of context). >> >> Assuming we can get something flowing, I'd like to shoot for an official meeting sometime next week (the week of 17 June, 2019). In the meantime, I'll look forward to chatting with folks in the #tripleo channel. >> >> -- >> >> Kevin Carter >> IRC: cloudnull >> >> >> On Wed, Jun 5, 2019 at 11:27 AM Luke Short wrote: >>> >>> Hey everyone, >>> >>> For the upcoming work on focusing on more Ansible automation and testing, I have created a dedicated #tripleo-transformation channel for our new squad. Feel free to join if you are interested in joining and helping out! >>> >>> +1 to removing repositories we don't use, especially if they have no working code. I'd like to see the consolidation of TripleO specific things into the tripleo-ansible repository and then using upstream Ansible roles for all of the different services (nova, glance, cinder, etc.). >>> >>> Sincerely, >>> >>> Luke Short, RHCE >>> Software Engineer, OpenStack Deployment Framework >>> Red Hat, Inc. >>> >>> >>> On Wed, Jun 5, 2019 at 8:53 AM David Peacock wrote: >>>> >>>> On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter wrote: >>>>> >>>>> So the questions at hand are: what, if anything, should we do with these repositories? Should we retire them or just ignore them? Is there anyone using any of the roles? >>>> >>>> >>>> My initial reaction was to suggest we just ignore them, but on second thought I'm wondering if there is anything negative if we leave them lying around. Unless we're going to benefit from them in the future if we start actively working in these repos, they represent obfuscation and debt, so it might be best to retire / dispose of them. >>>> >>>> David From cboylan at sapwetik.org Tue Jun 11 18:02:14 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 11 Jun 2019 11:02:14 -0700 Subject: getting issues while configuring the Trove In-Reply-To: References: Message-ID: On Tue, Jun 11, 2019, at 10:50 AM, Saikrishna Ura wrote: > Hi, > > I installed Openstack in Ubuntu 18.04 by cloning the devstack > repository with this url "git clone > https://git.openstack.org/openstack-dev/devstack", but i can't able > create or access with the trove, I'm getting issues with the > installation. > > Can anyone help on this issue please. If any reference document or any > guidance much appreciated. Here are Trove's docs on using the Trove devstack plugin: https://opendev.org/openstack/trove/src/branch/master/devstack/README.rst If you haven't seen those yet I would start there. Clark From ildiko.vancsa at gmail.com Tue Jun 11 18:43:52 2019 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Tue, 11 Jun 2019 20:43:52 +0200 Subject: [edge][ironic][neutron][starlingx] Open Infrastructure Summit and PTG Edge overview and next steps Message-ID: Hi, There were a lot of interesting discussions about edge computing at the Open Infrastructure Summit[1] and PTG in Denver. Hereby I would like to use the opportunity to share overviews and some progress and next steps the community has taken since. You can find a summary of the Forum discussions here: https://superuser.openstack.org/articles/edge-and-5g-not-just-the-future-but-the-present/ Check the following blog post for a recap on the PTG sessions: https://superuser.openstack.org/articles/edge-computing-takeaways-from-the-project-teams-gathering/ The Edge Computing Group is working towards testing the minimal reference architectures for which we are putting together hardware requirements. You can catch up and chime in on the discussion on this mail thread: http://lists.openstack.org/pipermail/edge-computing/2019-June/000597.html For Ironic related conversations since the event check these threads: * http://lists.openstack.org/pipermail/edge-computing/2019-May/000582.html * http://lists.openstack.org/pipermail/edge-computing/2019-May/000588.html We are also in progress to write up an RFE for Neutron to improve segment range management for edge use cases: http://lists.openstack.org/pipermail/edge-computing/2019-May/000589.html If you have any questions or comments to any of the above topics you can respond to this thread, chime in on the mail above threads, reach out on the edge-computing mailing[2] list or join the weekly edge group calls[3]. If you would like to get involved with StarlingX you can find pointers on the website[4]. Thanks, Ildikó (IRC: ildikov on Freenode) [1] https://www.openstack.org/videos/summits/denver-2019 [2] http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing [3] https://wiki.openstack.org/wiki/Edge_Computing_Group#Meetings [4] https://www.starlingx.io/community/ From emilien at redhat.com Tue Jun 11 19:23:04 2019 From: emilien at redhat.com (Emilien Macchi) Date: Tue, 11 Jun 2019 15:23:04 -0400 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: Kamil, you're now core. Thanks again for your work! On Wed, Jun 5, 2019 at 10:31 AM Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing really > insightful reviews, specially on Python best practices but not only; he is > one of the major contributors of the OVN integration, which was a ton of > work. I believe he has the right knowledge to review any TripleO patch and > provide excellent reviews in our project. We're lucky to have him with us > in the team! > > I would like to propose him core on TripleO, please raise any objection if > needed. > -- > Emilien Macchi > -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Tue Jun 11 19:40:44 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 11 Jun 2019 20:40:44 +0100 Subject: [kolla] meeting tomorrow Message-ID: Hi, I'm unable to chair the IRC meeting tomorrow. If someone else can stand in that would be great, otherwise we'll cancel. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Tue Jun 11 19:46:10 2019 From: openstack at fried.cc (Eric Fried) Date: Tue, 11 Jun 2019 14:46:10 -0500 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> Message-ID: <7c19bc1d-f543-a9ed-d1cd-4363c8499389@fried.cc> > What might be nice is if we could use a flavor extra spec like this: > > deploy-config:hyperthreading=enabled > > The nova ironic virt driver could pass this to ironic, like it does with traits. > > Then in the ironic deploy template, have fields like this: > > name: Hyperthreading enabled > config-type: hyperthreading > config-value: enabled > steps: > > Ironic would then match on the config-type and config-value to find a > suitable deploy template. > > As an extension, the deploy template could define a trait (or list of > traits) that must be supported by a node in order for the template to > be applied. Perhaps this would even be a standard relationship between > config-type and traits? This. As rubber has hit road for traits-related-to-config, the pattern that has emerged as (IMO) most sensible has looked a lot like the above. To get a bit more specific: - HW_CPU_HYPERTHREADING is a trait indicating that a node is *capable* of switching hyperthreading on. There is no trait, ever, anywhere, that indicates that is is on or off on a particular node. - The ironic virt driver tags the node RP with the trait when it detects that the node is capable. - The flavor (or image) indicates a desire to enable hyperthreading as Mark says: via a (non-Placement-ese) property that conveys information in a way that ironic can understand. - A request filter [1] interprets the non-Placement-ese property and adds HW_CPU_HYPERTHREADING as a required trait to the request if it's `enabled`, so the scheduler will ensure we land on a node that can handle it. - During spawn, the ironic virt driver communicates whatever/however to ironic based on the (non-Placement-ese) property in the flavor/image. Getting back to the original issue of this thread, this still means we need to implement some limited subset of `resize` for ironic to allow us to turn this thing on or off on an established instance. That resize should still go through the scheduler so that, for example, the above process will punt if you try to switch on hyperthreading on a node that isn't capable (doesn't have the HW_CPU_HYPERTHREADING trait). efried [1] https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/request_filter.py From mriedemos at gmail.com Tue Jun 11 20:07:20 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 11 Jun 2019 15:07:20 -0500 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <7c19bc1d-f543-a9ed-d1cd-4363c8499389@fried.cc> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <7c19bc1d-f543-a9ed-d1cd-4363c8499389@fried.cc> Message-ID: <8ed93d57-9e61-8927-449f-6bab082df88b@gmail.com> On 6/11/2019 2:46 PM, Eric Fried wrote: >> What might be nice is if we could use a flavor extra spec like this: >> >> deploy-config:hyperthreading=enabled >> >> The nova ironic virt driver could pass this to ironic, like it does with traits. >> >> Then in the ironic deploy template, have fields like this: >> >> name: Hyperthreading enabled >> config-type: hyperthreading >> config-value: enabled >> steps: >> >> Ironic would then match on the config-type and config-value to find a >> suitable deploy template. >> >> As an extension, the deploy template could define a trait (or list of >> traits) that must be supported by a node in order for the template to >> be applied. Perhaps this would even be a standard relationship between >> config-type and traits? > This. > > As rubber has hit road for traits-related-to-config, the pattern that > has emerged as (IMO) most sensible has looked a lot like the above. > > To get a bit more specific: > - HW_CPU_HYPERTHREADING is a trait indicating that a node is*capable* > of switching hyperthreading on. There is no trait, ever, anywhere, that > indicates that is is on or off on a particular node. > - The ironic virt driver tags the node RP with the trait when it detects > that the node is capable. > - The flavor (or image) indicates a desire to enable hyperthreading as > Mark says: via a (non-Placement-ese) property that conveys information > in a way that ironic can understand. > - A request filter [1] interprets the non-Placement-ese property and > adds HW_CPU_HYPERTHREADING as a required trait to the request if it's > `enabled`, so the scheduler will ensure we land on a node that can > handle it. > - During spawn, the ironic virt driver communicates whatever/however to > ironic based on the (non-Placement-ese) property in the flavor/image. > > Getting back to the original issue of this thread, this still means we > need to implement some limited subset of `resize` for ironic to allow us > to turn this thing on or off on an established instance. That resize > should still go through the scheduler so that, for example, the above > process will punt if you try to switch on hyperthreading on a node that > isn't capable (doesn't have the HW_CPU_HYPERTHREADING trait). This sounds similar to the ARQ device profile stuff from the nova/cyborg spec [1] - is it? Also, I'm reminded of the glare/artifactory discussion for baremetal node config we talked about at the PTG in Dublin [2] - how does this compare/contrast? [1] https://review.opendev.org/#/c/603955/ [2] https://etherpad.openstack.org/p/nova-ptg-rocky (~L250) -- Thanks, Matt From smooney at redhat.com Tue Jun 11 20:09:22 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 11 Jun 2019 21:09:22 +0100 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <7c19bc1d-f543-a9ed-d1cd-4363c8499389@fried.cc> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <7c19bc1d-f543-a9ed-d1cd-4363c8499389@fried.cc> Message-ID: <558bddb0926cc8c211d9b699c9850bf523c2f22a.camel@redhat.com> On Tue, 2019-06-11 at 14:46 -0500, Eric Fried wrote: > > What might be nice is if we could use a flavor extra spec like this: > > > > deploy-config:hyperthreading=enabled > > > > The nova ironic virt driver could pass this to ironic, like it does with traits. > > > > Then in the ironic deploy template, have fields like this: > > > > name: Hyperthreading enabled > > config-type: hyperthreading > > config-value: enabled > > steps: > > > > Ironic would then match on the config-type and config-value to find a > > suitable deploy template. > > > > As an extension, the deploy template could define a trait (or list of > > traits) that must be supported by a node in order for the template to > > be applied. Perhaps this would even be a standard relationship between > > config-type and traits? > > This. > > As rubber has hit road for traits-related-to-config, the pattern that > has emerged as (IMO) most sensible has looked a lot like the above. > > To get a bit more specific: > - HW_CPU_HYPERTHREADING is a trait indicating that a node is *capable* > of switching hyperthreading on. There is no trait, ever, anywhere, that > indicates that is is on or off on a particular node. > - The ironic virt driver tags the node RP with the trait when it detects > that the node is capable. > - The flavor (or image) indicates a desire to enable hyperthreading as > Mark says: via a (non-Placement-ese) property that conveys information > in a way that ironic can understand. > - A request filter [1] interprets the non-Placement-ese property and > adds HW_CPU_HYPERTHREADING as a required trait to the request if it's > `enabled`, so the scheduler will ensure we land on a node that can > handle it. just an fyi we are adding a request filter to do ^ as part of the pcpu in placment spec. if you set hw:cpu_thread_polciy=require or hw:cpu_thread_policy=isolate that will be converteded to a required or forbiden trait. in the libvirt driver already uses this to influcne how we pin vms to host cores requing that they land on hyperthreads or requiing the vm uses dedicated cores. ironic could add support for this existing extaspec and the corresponding image property to enable or disabel hyperthreading or SMT to use the generic term. > - During spawn, the ironic virt driver communicates whatever/however to > ironic based on the (non-Placement-ese) property in the flavor/image. > > Getting back to the original issue of this thread, this still means we > need to implement some limited subset of `resize` or rebuild in the image metadata case > for ironic to allow us > to turn this thing on or off on an established instance. That resize > should still go through the scheduler so that, for example, the above > process will punt if you try to switch on hyperthreading on a node that > isn't capable (doesn't have the HW_CPU_HYPERTHREADING trait). > > efried > > [1] > https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/request_filter.py > From colleen at gazlene.net Tue Jun 11 20:10:11 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Tue, 11 Jun 2019 13:10:11 -0700 Subject: [dev][keystone] M-1 check-in and retrospective meeting In-Reply-To: References: <627ae3a7-b998-4323-8981-2d1cd7bc3085@www.fastmail.com> Message-ID: <19164694-e3ed-4ac0-82d4-813abb0ecc59@www.fastmail.com> Thanks to everyone who attended this check-in today. I hope it felt worthwhile and helps us accomplish our goal of keeping up momentum through the cycle. The recording is available here: https://www.dropbox.com/s/7yx596ei2uazpib/keystone-train-m-1%20on%202019-06-11%2017%3A04.mp4 We also recorded some notes in the agenda etherpad: https://etherpad.openstack.org/p/keystone-train-M-1-review-planning-meeting This meeting didn't cover any in-depth technical discussion, rather we mainly focused on revisiting our past decisions: https://trello.com/b/VCCcnCGd/keystone-stein-retrospective and realigning and refining our roadmap: https://trello.com/b/ClKW9C8x/keystone-train-roadmap If you have any feedback about the format of this meeting, please let me know. Colleen From kennelson11 at gmail.com Tue Jun 11 22:56:35 2019 From: kennelson11 at gmail.com (Kendall Nelson) Date: Tue, 11 Jun 2019 15:56:35 -0700 Subject: [ptl] Shanghai PTG Changes Message-ID: Hello All, After Denver we were able to take time to reflect on the improvements we can make now that the PTG will occur immediately following the summit for the near future. While Shanghai will have its own set of variables, it's still good to reevaluate how we allocate time for groups and how we structure the week overall. tldr; - Onboarding is moving into the PTG for this round (updates stay a part of the Summit) - You can still do regular PTG stuff (or both onboarding and regular PTG stuff) - PTG slots can be as short as 1/4 of a day - More shared space at the Shanghai venue, less dedicated space - New breakdown: 1.5 days of Forum and 3.5 days of PTG - Survey will be out in a few weeks for requesting PTG space We'll have our traditional project team meetings at the PTG in Shanghai as the default format, that won't change. However, we know many of you don't expect to have all your regulars attend the PTG in Shanghai. To combat this and still help project teams make use of the PTG in the most effective way possible we are encouraging teams that want to meet but might not have all the people they need to have technical discussions to meet anyway and instead focus on a more thorough onboarding of our Chinese contributors. Project teams could also do a combination of the two, spend an hour and a half on onboarding (or however much time you see fit) and then have your regular technical discussions after. Project Updates will still be a part of the Summit like normal, its just the onboardings that will be compacted into the PTG for Shanghai. We are making PTG days more granular as well and will have the option to request 1/4 day slots in an effort to leave less empty space in the schedule. So if you are only doing onboarding, you probably only need 1/4 to 1/2 of a day. If you are doing just your regular technical discussions and still need three days, thats fine too. The venue itself (similar to Denver) will have a few large rooms for bigger teams to meet, however, most teams will meet in shared space. For those teams meeting to have only technical discussions and for teams that have larger groups, we will try to prioritize giving them their own dedicated space. For the shared spaces, we will add to the PTGbot more clearly defined locations within the shared space so its easier to find teams meeting there. I regret to inform you that, again, projection will be a very limited commodity. Yeah.. please don't shoot the messenger. Due to using mainly shared space, projection is just something we are not able to offer. The other change I haven't already mentioned is that we are going to have the PTG start a half day early. Instead of only being 3 days like in Denver, we are going to add more time to the PTG and subtract a half day from the Forum. Basically the breakdown will be 1.5 Forum and 3.5 PTG with the Summit overlapping the first two days. I will be sending the PTG survey out to PTLs/Project Leads in a couple weeks with a few changes. -Kendall Nelson (diablo_rojo) -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Tue Jun 11 22:56:37 2019 From: kennelson11 at gmail.com (Kendall Nelson) Date: Tue, 11 Jun 2019 15:56:37 -0700 Subject: [SIG][WG] Shanghai PTG Changes Message-ID: Hello All, After Denver we were able to take time to reflect on the improvements we can make now that the PTG will occur immediately following the summit for the near future. While Shanghai will have its own set of variables, it's still good to reevaluate how we allocate time for groups and how we structure the week overall. tldr; - No BoFs, try to focus more specific discussions into topics for the Forum and follow up with PTG slot for more general conversations - PTG slots can be as short as 1/4 of a day - More shared space at the Shanghai venue, less dedicated space - New breakdown: 1.5 days of Forum and 3.5 days of PTG - Survey will be out in a few weeks for requesting PTG space For many of you (and myself as FC SIG Chair) there were a lot of different ways to get time to talk about topics. There was the forum, there were BoF sessions you could request, and there was also the option of having PTG sessions. Using the FC SIG as an example, we had two forum sessions (I think?), a BoF, and a half day at the PTG. This was WAY too much time for us. We didn't realize it when we were asking for space all the different ways, but we ended up with a lot of redundant discussions and time in which we didn't do much but just chat (which was great, but not the best use of the time/space since we could have done that in a hallway and not a dedicated room). To account for this duplication, we are going to get rid of the BoF mechanism for asking for space since largely the topics discussed there could be more cleanly divided into feedback focused Forum sessions and PTG team discussion time. The tentative plan is to try to condense as many of the SIG/WGs PTG slots towards the start of the PTG as we can so that they will more or less immediately follow the forum so that you can focus on making action items out of the conversations had and the feedback received at the Forum. We will also offer a smaller granularity of time that you can request at the PTG. Previously, a half day slot was as small as you could request; this time we will be offering 1/4 day slots (we found with more than one SIG/WG that even at a half day they were done in an hour and a half with all that they needed to talk about). The venue itself (similar to Denver) will have a few large rooms for bigger teams to meet, however, most teams will meet in shared space. That being said, we will add to the PTGbot more clearly defined locations in the shared space so its easier to find groups in shared spaces. I regret to inform you that, again, projection will be a very limited commodity. Yeah.. please don't shoot the messenger. Due to using mainly shared space, projection is just something we are not able to offer. The other change I haven't already mentioned is that we are going to have the PTG start a half day early. Instead of only being 3 days like in Denver, we are going to add more time to the PTG and subtract a half day from the Forum. Basically the breakdown will be 1.5 Forum and 3.5 PTG with the Summit overlapping the first two days. I will be sending the PTG survey out to SIG Chairs/ WG Leads in a couple weeks with a few changes. -Kendall Nelson (diablo_rojo) -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Tue Jun 11 22:57:30 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Wed, 12 Jun 2019 10:57:30 +1200 Subject: getting issues while configuring the Trove In-Reply-To: References: Message-ID: Hi Saikrishna, Here is a local.conf file for Trove installation in DevStack i usually use http://dpaste.com/14DW815.txt Best regards, Lingxian Kong Catalyst Cloud On Wed, Jun 12, 2019 at 5:58 AM Saikrishna Ura < saikrishna.ura at cloudseals.com> wrote: > Hi, > > I installed Openstack in Ubuntu 18.04 by cloning the devstack repository > with this url "git clone https://git.openstack.org/openstack-dev/devstack", > but i can't able create or access with the trove, I'm getting issues with > the installation. > > Can anyone help on this issue please. If any reference document or any > guidance much appreciated. > > Thanks, > > Saikrishna U. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtantsur at redhat.com Wed Jun 12 07:47:59 2019 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Wed, 12 Jun 2019 09:47:59 +0200 Subject: [SIG][WG] Shanghai PTG Changes In-Reply-To: References: Message-ID: Hi, Thank you for the update! Could you please clarify how many days the whole event will take in the end, 5 or 6? Dmitry On 6/12/19 12:56 AM, Kendall Nelson wrote: > Hello All, > > After Denver we were able to take time to reflect on the improvements we can > make now that the PTG will occur immediately following the summit for the near > future.While Shanghai will have its own set of variables, it's still good to > reevaluate how we allocate time for groups and how we structure the week overall. > > tldr; > > - No BoFs, try to focus more specific discussions into topics for the Forum and > follow up with PTG slot for more general conversations > - PTG slots can be as short as 1/4 of a day > - More shared space at the Shanghai venue, less dedicated space > - New breakdown: 1.5 days of Forum and 3.5 days of PTG > - Survey will be out in a few weeks for requesting PTG space > > For many of you (and myself as FC SIG Chair) there were a lot of different ways > to get time to talk about topics. There was the forum, there were BoF sessions > you could request, and there was also the option of having PTG sessions. Using > the FC SIG as an example, we had two forum sessions (I think?), a BoF, and a > half day at the PTG. This was WAY too much time for us. We didn't realize it > when we were asking for space all the different ways, but we ended up with a lot > of redundant discussions and time in which we didn't do much but just chat > (which was great, but not the best use of the time/space since we could have > done thatin a hallway and not a dedicated room). > > To account for thisduplication, we are going to get rid of the BoF mechanism for > asking for space since largely the topics discussed there could be more cleanly > divided into feedback focused Forum sessions and PTG team discussion time. The > tentative plan is to try to condense as many of the SIG/WGs PTG slots towards > the start of the PTG as we can so that theywill more or less immediately follow > the forum so that you can focus on making action items out of the conversations > had and the feedback received at the Forum. > > We will also offer a smaller granularity of time that you can request at the > PTG. Previously, a half day slot was as small as you could request; this time we > will be offering 1/4 day slots (we found with more than one SIG/WG that even at > a half day they were done in an hour and a half with all that they needed to > talk about). > > The venue itself (similar to Denver) will have a few large rooms for bigger > teams to meet, however, most teams will meet in shared space. That being said, > we willadd to the PTGbot more clearly defined locations in the shared space so > its easier to find groups in shared spaces. > > I regret to inform you that, again, projection will be a very limited commodity. > Yeah.. please don't shoot the messenger. Due to using mainly shared space, > projectionis just something we are not able to offer. > > The other change I haven't already mentioned is that we are going to have the > PTG start a half day early. Instead of only being 3 days like in Denver, we are > going to add more time to the PTG and subtract a half day from the Forum. > Basically the breakdown will be 1.5 Forum and 3.5 PTG with the Summit > overlapping the first two days. > > I will be sending the PTG survey out to SIG Chairs/ WG Leads in a couple weeks > with a few changes. > > -Kendall Nelson (diablo_rojo) > > From ssbarnea at redhat.com Wed Jun 12 07:59:03 2019 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Wed, 12 Jun 2019 08:59:03 +0100 Subject: [requirements] paramiko 2.5.0 causing ImportError: cannot import name py31compat In-Reply-To: References: Message-ID: <1b33bf6b-5472-44bb-9fc0-53bc564daaa6@Spark> I used the new paramiko succesfully with ansible-molecule, so if you spot a bug in it, please include a link to that bug, so we can follow it. Paramiko lacked a new release for a very long time and someone even ended up with a fork paramiko-ng due to that. Hopefully this is about to change and new releases will be more often... the. cryptography deprecation warnings were very annoying. -- sorin On 11 Jun 2019, 15:17 +0100, Pierre Riteau , wrote: > Hello, > > paramiko 2.5.0 was released yesterday [1]. It appears to trigger > failures in the Kayobe molecule job with the following error [2]: > > ImportError: cannot import name py31compat > > It's not clear yet why this is happening, since py31compat lives in > setuptools. paramiko 2.5.0 includes changes to paramiko/py3compat.py > which could be related. > For now, we're capping paramiko [3] as it is blocking our gate. > > I thought I would share with the list, in case other projects > experience similar errors. > > Cheers, > Pierre > > [1] https://pypi.org/project/paramiko/#history > [2] http://logs.openstack.org/17/664417/1/check/kayobe-tox-molecule/0370fdd/job-output.txt.gz > [3] https://review.opendev.org/#/c/664533/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bhagyashri.Shewale at nttdata.com Wed Jun 12 09:10:04 2019 From: Bhagyashri.Shewale at nttdata.com (Shewale, Bhagyashri) Date: Wed, 12 Jun 2019 09:10:04 +0000 Subject: [nova] Spec: Standardize CPU resource tracking Message-ID: Hi All, Currently I am working on implementation of cpu pinning upgrade part as mentioned in the spec [1]. While implementing the scheduler pre-filter as mentioned in [1], I have encountered one big issue: Proposed change in spec: In scheduler pre-filter we are going to alias request_spec.flavor.extra_spec and request_spec.image.properties form ``hw:cpu_policy`` to ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. So when user will create a new instance or execute instance actions like shelve, unshelve, resize, evacuate and migration post upgrade it will go through scheduler pre-filter which will set alias for `hw:cpu_policy` in request_spec flavor ``extra specs`` and image metadata properties. In below particular case, it won’t work:- For example: I have two compute nodes say A and B: On Stein: Compute node A configurations: vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) Compute node B Configuration: vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) On Train, two possible scenarios: Compute node A configurations: (Consider the new cpu pinning implementation is merged into Train) vcpu_pin_set=0-3 (Keep same settings as in Stein) Compute node B Configuration: (Consider the new cpu pinning implementation is merged into Train) cpu_dedicated_set=0-3 (change to the new config option) 1. Consider that one instance say `test ` is created using flavor having old extra specs (hw:cpu_policy=dedicated, "aggregate_instance_extra_specs:pinned": "true") in Stein release and now upgraded Nova to Train with the above configuration. 2. Now when user will perform instance action say shelve/unshelve scheduler pre-filter will change the request_spec flavor extra spec from ``hw:cpu_policy`` to ``resources=PCPU:$`` which ultimately will return only compute node B from placement service. Here, we expect it should have retuned both Compute A and Compute B. 3. If user creates a new instance using old extra specs (hw:cpu_policy=dedicated, "aggregate_instance_extra_specs:pinned": "true") on Train release with the above configuration then it will return only compute node B from placement service where as it should have returned both compute Node A and B. Problem: As Compute node A is still configured to be used to boot instances with dedicated CPUs same behavior as Stein, it will not be returned by placement service due to the changes in the scheduler pre-filter logic. Propose changes: Earlier in the spec [2]: The online data migration was proposed to change flavor extra specs and image metadata properties of request_spec and instance object. Based on the instance host, we can get the NumaTopology of the host which will contain the new configuration options set on the compute host. Based on the NumaTopology of host, we can change instance and request_spec flavor extra specs. 1. Remove cpu_policy from extra specs 2. Add “resources:PCPU=” in extra specs We can also change the flavor extra specs and image metadata properties of instance and request_spec object using the reshape functionality. Please give us your feedback on the proposed solution so that we can update specs accordingly. [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 [2]: https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst Thanks and Regards, -Bhagyashri Shewale- Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.settle at outlook.com Wed Jun 12 13:14:17 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Wed, 12 Jun 2019 13:14:17 +0000 Subject: [tc][ptls][all] Change to the health check process Message-ID: Hi all, The TC have made the decision to stop health tracking team projects. During discussions at the Train cycle PTG, the OpenStack TC concluded that its experiments in formally tracking problems within individual project teams was not providing enough value for the investment of effort it required. The wiki has subsequently been closed down [0]. The component of it which is still deemed valuable is having specific TC members officially assigned as liaisons to each project team, so in future that will continue but will be documented in the openstack/governance Git repository's project metadata and on the project pages linked from the OpenStack Project Teams page [1]. This was discussed at the most recent TC meeting [2] and it was agreed upon that SIGs are to be included in the liaison roster for team health checks. Please keep your eyes out for changes coming up in the governance repo and PTLS - also please keep your inboxes open for TC members to reach out and introduce themselves as your liaison. In the mean time - if you have any concerns, please do not hesitate to reach out to any one of the TC members. Cheers, Alex [0] https://wiki.openstack.org/wiki/OpenStack_health_tracker [1] https://governance.openstack.org/tc/reference/projects/ [2] http://eavesdrop.openstack.org/meetings/tc/2019/tc.2019-06-06-14.00.log.html#l-23 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kecarter at redhat.com Wed Jun 12 13:41:10 2019 From: kecarter at redhat.com (Kevin Carter) Date: Wed, 12 Jun 2019 08:41:10 -0500 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: I've submitted reviews under the topic "retire-role" to truncate all of the ansible-role-tripleo-* repos, that set can be seen here [0]. When folks get a chance, I'd greatly appreciate folks have a look at these reviews. [0] - https://review.opendev.org/#/q/topic:retire-role+status:open -- Kevin Carter IRC: kecarter On Wed, Jun 5, 2019 at 11:27 AM Luke Short wrote: > Hey everyone, > > For the upcoming work on focusing on more Ansible automation and testing, > I have created a dedicated #tripleo-transformation channel for our new > squad. Feel free to join if you are interested in joining and helping out! > > +1 to removing repositories we don't use, especially if they have no > working code. I'd like to see the consolidation of TripleO specific things > into the tripleo-ansible repository and then using upstream Ansible roles > for all of the different services (nova, glance, cinder, etc.). > > Sincerely, > > Luke Short, RHCE > Software Engineer, OpenStack Deployment Framework > Red Hat, Inc. > > > On Wed, Jun 5, 2019 at 8:53 AM David Peacock wrote: > >> On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter wrote: >> >>> So the questions at hand are: what, if anything, should we do with >>> these repositories? Should we retire them or just ignore them? Is there >>> anyone using any of the roles? >>> >> >> My initial reaction was to suggest we just ignore them, but on second >> thought I'm wondering if there is anything negative if we leave them lying >> around. Unless we're going to benefit from them in the future if we start >> actively working in these repos, they represent obfuscation and debt, so it >> might be best to retire / dispose of them. >> >> David >> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From kashwinkumar10 at gmail.com Wed Jun 12 10:37:35 2019 From: kashwinkumar10 at gmail.com (Ashwinkumar Kandasami) Date: Wed, 12 Jun 2019 16:07:35 +0530 Subject: Openstack Octavia Configuration Message-ID: <5d00d5ee.1c69fb81.34121.41bd@mx.google.com> Hi, I am a graduate student trying to deploy openstack in my own environment. I waana configure openstack octavia with my existing openstack cloud. I done the openstack deployment using RDO project. I tried configure openstack with ovn neutron l2 agent with octavia but i getting alert like we can’t able to use octavia for ovn type neutron agent, then how can i use it?? Sent from Mail for Windows 10 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kashwinkumar10 at gmail.com Wed Jun 12 10:41:04 2019 From: kashwinkumar10 at gmail.com (Ashwinkumar Kandasamy) Date: Wed, 12 Jun 2019 16:11:04 +0530 Subject: Openstack - Octavia Message-ID: Hi, I am a private cloud engineer in india. I deployed openstack in my own environment. I done that through openstack RDO project with openstack network type 1 (provider network). I want to configure openstack LBaaS (Octavia) in an existing openstack. How can i do that? please help me for that. -- Thank You, *Ashwinkumar K* *Software Associate,* *ZippyOPS Consulting Services LLP,* *Chennai.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Wed Jun 12 15:01:23 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 12 Jun 2019 16:01:23 +0100 Subject: [kolla] meeting tomorrow In-Reply-To: References: Message-ID: On Tue, 11 Jun 2019 at 20:40, Mark Goddard wrote: > > Hi, > > I'm unable to chair the IRC meeting tomorrow. If someone else can stand in that would be great, otherwise we'll cancel. Since no one was available to chair, this week's meeting is cancelled. We'll meet again next week. > > Thanks, > Mark From sean.mcginnis at gmail.com Wed Jun 12 15:50:43 2019 From: sean.mcginnis at gmail.com (Sean McGinnis) Date: Wed, 12 Jun 2019 10:50:43 -0500 Subject: [Release-job-failures] Pre-release of openstack/horizon failed In-Reply-To: References: Message-ID: This appears to have been a network issue that prevented the installation of one of the requirements. Fungi was able to reenqueue the job and it passed the second time through. Everything looks good now, but if anything unusual is noticed later, please let us know in the #openstack-release channel. Sean On Wed, Jun 12, 2019 at 9:48 AM wrote: > Build failed. > > - release-openstack-python > http://logs.openstack.org/e3/e30d8258f5993736dc8982e280ae43fe1ed22395/pre-release/release-openstack-python/4f35eb5/ > : FAILURE in 3m 06s > - announce-release announce-release : SKIPPED > - propose-update-constraints propose-update-constraints : SKIPPED > > _______________________________________________ > Release-job-failures mailing list > Release-job-failures at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/release-job-failures > -------------- next part -------------- An HTML attachment was scrubbed... URL: From e0ne at e0ne.info Wed Jun 12 16:05:05 2019 From: e0ne at e0ne.info (Ivan Kolodyazhny) Date: Wed, 12 Jun 2019 19:05:05 +0300 Subject: [Release-job-failures] Pre-release of openstack/horizon failed In-Reply-To: References: Message-ID: Thanks for the notice, Sean! Regards, Ivan Kolodyazhny, http://blog.e0ne.info/ On Wed, Jun 12, 2019 at 6:52 PM Sean McGinnis wrote: > This appears to have been a network issue that prevented the installation > of one of the requirements. > > Fungi was able to reenqueue the job and it passed the second time through. > Everything looks good now, > but if anything unusual is noticed later, please let us know in the > #openstack-release channel. > > Sean > > On Wed, Jun 12, 2019 at 9:48 AM wrote: > >> Build failed. >> >> - release-openstack-python >> http://logs.openstack.org/e3/e30d8258f5993736dc8982e280ae43fe1ed22395/pre-release/release-openstack-python/4f35eb5/ >> : FAILURE in 3m 06s >> - announce-release announce-release : SKIPPED >> - propose-update-constraints propose-update-constraints : SKIPPED >> >> _______________________________________________ >> Release-job-failures mailing list >> Release-job-failures at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/release-job-failures >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michjo at viviotech.net Wed Jun 12 17:23:57 2019 From: michjo at viviotech.net (Jordan Michaels) Date: Wed, 12 Jun 2019 10:23:57 -0700 (PDT) Subject: [Glance] Can Glance be installed on a server other than the controller? Message-ID: <1277790298.106982.1560360237286.JavaMail.zimbra@viviotech.net> For anyone who's interested, this issue turned out to be caused by the system times being different on the separate server. I had set up Chrony according to the docs but never verified it was actually working. While reviewing the logs I noticed the time stamps were different on each server and that is what pointed me in the right direction. Just wanted to post the solution for posterity. Hopefully this helps someone in the future. -Jordan From johnsomor at gmail.com Wed Jun 12 17:53:11 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Wed, 12 Jun 2019 10:53:11 -0700 Subject: Openstack Octavia Configuration In-Reply-To: <5d00d5ee.1c69fb81.34121.41bd@mx.google.com> References: <5d00d5ee.1c69fb81.34121.41bd@mx.google.com> Message-ID: Hi Ashwinkumar, Welcome to using OpenStack and Octavia. I can talk to Octavia, but I do not yet have much experience with RDO deployments. RDO has a page for LBaaS (though it is using the old neutron-lbaas with Octavia) here: https://www.rdoproject.org/networking/lbaas/ They also have a users mailing list that might provide more help for deploying with RDO: http://rdoproject.org/contribute/mailing-lists/ RDO also has an IRC channel on Freenode called #rdo. As for Octavia, Octavia integrates with neutron for networking. Any of the supported ML2 drivers for neutron should work fine with Octavia. If you would like to chat about Octavia, the team has a channel on Freenode IRC called #openstack-lbaas. We would be happy to help you get started. Michael On Wed, Jun 12, 2019 at 7:54 AM Ashwinkumar Kandasami wrote: > > Hi, > > I am a graduate student trying to deploy openstack in my own environment. I waana configure openstack octavia with my existing openstack cloud. I done the openstack deployment using RDO project. I tried configure openstack with ovn neutron l2 agent with octavia but i getting alert like we can’t able to use octavia for ovn type neutron agent, then how can i use it?? > > > > Sent from Mail for Windows 10 > > From kennelson11 at gmail.com Wed Jun 12 18:03:55 2019 From: kennelson11 at gmail.com (Kendall Nelson) Date: Wed, 12 Jun 2019 11:03:55 -0700 Subject: [SIG][WG] Shanghai PTG Changes In-Reply-To: References: Message-ID: The whole event will be 5 days. Monday to Friday. -Kendall (diablo_rojo) On Wed, Jun 12, 2019 at 12:50 AM Dmitry Tantsur wrote: > Hi, > > Thank you for the update! > > Could you please clarify how many days the whole event will take in the > end, 5 or 6? > > Dmitry > > On 6/12/19 12:56 AM, Kendall Nelson wrote: > > Hello All, > > > > After Denver we were able to take time to reflect on the improvements we > can > > make now that the PTG will occur immediately following the summit for > the near > > future.While Shanghai will have its own set of variables, it's still > good to > > reevaluate how we allocate time for groups and how we structure the week > overall. > > > > tldr; > > > > - No BoFs, try to focus more specific discussions into topics for the > Forum and > > follow up with PTG slot for more general conversations > > - PTG slots can be as short as 1/4 of a day > > - More shared space at the Shanghai venue, less dedicated space > > - New breakdown: 1.5 days of Forum and 3.5 days of PTG > > - Survey will be out in a few weeks for requesting PTG space > > > > For many of you (and myself as FC SIG Chair) there were a lot of > different ways > > to get time to talk about topics. There was the forum, there were BoF > sessions > > you could request, and there was also the option of having PTG sessions. > Using > > the FC SIG as an example, we had two forum sessions (I think?), a BoF, > and a > > half day at the PTG. This was WAY too much time for us. We didn't > realize it > > when we were asking for space all the different ways, but we ended up > with a lot > > of redundant discussions and time in which we didn't do much but just > chat > > (which was great, but not the best use of the time/space since we could > have > > done thatin a hallway and not a dedicated room). > > > > To account for thisduplication, we are going to get rid of the BoF > mechanism for > > asking for space since largely the topics discussed there could be more > cleanly > > divided into feedback focused Forum sessions and PTG team discussion > time. The > > tentative plan is to try to condense as many of the SIG/WGs PTG slots > towards > > the start of the PTG as we can so that theywill more or less immediately > follow > > the forum so that you can focus on making action items out of the > conversations > > had and the feedback received at the Forum. > > > > We will also offer a smaller granularity of time that you can request at > the > > PTG. Previously, a half day slot was as small as you could request; this > time we > > will be offering 1/4 day slots (we found with more than one SIG/WG that > even at > > a half day they were done in an hour and a half with all that they > needed to > > talk about). > > > > The venue itself (similar to Denver) will have a few large rooms for > bigger > > teams to meet, however, most teams will meet in shared space. That being > said, > > we willadd to the PTGbot more clearly defined locations in the shared > space so > > its easier to find groups in shared spaces. > > > > I regret to inform you that, again, projection will be a very limited > commodity. > > Yeah.. please don't shoot the messenger. Due to using mainly shared > space, > > projectionis just something we are not able to offer. > > > > The other change I haven't already mentioned is that we are going to > have the > > PTG start a half day early. Instead of only being 3 days like in Denver, > we are > > going to add more time to the PTG and subtract a half day from the > Forum. > > Basically the breakdown will be 1.5 Forum and 3.5 PTG with the Summit > > overlapping the first two days. > > > > I will be sending the PTG survey out to SIG Chairs/ WG Leads in a couple > weeks > > with a few changes. > > > > -Kendall Nelson (diablo_rojo) > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Wed Jun 12 18:49:24 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Wed, 12 Jun 2019 20:49:24 +0200 Subject: [Kolla] ansible compatibility Message-ID: Hello All, I'd like to know if there is a Matrix for kolla-ansible and ansible version ...in other words, which ansible version must be used for a kolla-ansible version. For example ocata used kolla-ansible 4.0.5 but I do not know which version of ansible must be used. Installing kolla-ansible with pip it does not install ansible. Reverse Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Wed Jun 12 18:52:24 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Wed, 12 Jun 2019 20:52:24 +0200 Subject: [Kolla] ansible compatibility In-Reply-To: References: Message-ID: ---------- Forwarded message --------- Da: Ignazio Cassano Date: Mer 12 Giu 2019 20:49 Subject: [Kolla] ansible compatibility To: Hello All, I'd like to know if there is a Matrix for kolla-ansible and ansible version ...in other words, which ansible version must be used for a kolla-ansible version. For example ocata used kolla-ansible 4.0.5 but I do not know which version of ansible must be used. Installing kolla-ansible with pip it does not install ansible. Reverse Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed Jun 12 20:38:30 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 12 Jun 2019 15:38:30 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? Message-ID: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Before [1] when deleting a compute service in the API we did not check to see if the compute service was hosting any instances and just blindly deleted the service and related compute_node(s) records which orphaned the resource provider(s) for those nodes. With [2] we built on that and would cleanup the (first [3]) compute node resource provider by first deleting any allocations for instances still on that host - which because of the check in [1] should be none - and then deleted the resource provider itself. [2] forgot about ironic where a single compute service can be managing multiple (hundreds or even thousands) of baremetal compute nodes so I wrote [3] to delete *all* resource providers for compute nodes tied to the service - again barring there being any instances running on the service because of the check added in [1]. What we've failed to realize until recently is that there are cases where deleting the resource provider can still fail because there are allocations we haven't cleaned up, namely: 1. Residual allocations for evacuated instances from a source host. 2. Allocations held by a migration record for an unconfirmed (or not yet complete) migration. Because the delete_resource_provider method isn't checking for those, we can get ResourceProviderInUse errors which are then ignored [4]. Since that error is ignored, we continue on to delete the compute service record [5], effectively orphaning the providers (which is what [2] was meant to fix). I have recreated the evacuate scenario in a functional test here [6]. The question is what should we do about the fix? I'm getting lost thinking about this in a vacuum so trying to get some others to help think about it. Clearly with [1] we said you shouldn't be able to delete a compute service that has instances on it because that corrupts our resource tracking system. If we extend that to any allocations held against providers for that compute service, then the fix might be as simple as not ignoring the ResourceProviderInUse error and fail if we can't delete the provider(s). The question I'm struggling with is what does an operator do for the two cases mentioned above, not-yet-complete migrations and evacuated instances? For migrations, that seems pretty simple - wait for the migration to complete and confirm it (reverting a cold migration or resize would put the instance back on the compute service host you're trying to delete). The nastier thing is the allocations tied to an evacuated instance since those don't get cleaned up until the compute service is restarted [7]. If the operator never intends on restarting that compute service and just wants to clear the data, then they have to manually delete the allocations for the resource providers associated with that host before they can delete the compute service, which kind of sucks. What are our options? 1. Don't delete the compute service if we can't cleanup all resource providers - make sure to not orphan any providers. Manual cleanup may be necessary by the operator. 2. Change delete_resource_provider cascade=True logic to remove all allocations for the provider before deleting it, i.e. for not-yet-complete migrations and evacuated instances. For the evacuated instance allocations this is likely OK since restarting the source compute service is going to do that cleanup anyway. Also, if you delete the source compute service during a migration, confirming or reverting the resize later will likely fail since we'd be casting to something that is gone (and we'd orphan those allocations). Maybe we need a functional recreate test for the unconfirmed migration scenario before deciding on this? 3. Other things I'm not thinking of? Should we add a force parameter to the API to allow the operator to forcefully delete (#2 above) if #1 fails? Force parameters are hacky and usually seem to cause more problems than they solve, but it does put the control in the operators hands. If we did remove allocations for an instance when deleting it's compute service host, the operator should be able to get them back by running the "nova-manage placement heal_allocations" CLI - assuming they restart the compute service on that host. This would have to be tested of course. Help me Obi-Wan Kenobi. You're my only hope. [1] https://review.opendev.org/#/q/I0bd63b655ad3d3d39af8d15c781ce0a45efc8e3a [2] https://review.opendev.org/#/q/I7b8622b178d5043ed1556d7bdceaf60f47e5ac80 [3] https://review.opendev.org/#/c/657016/ [4] https://github.com/openstack/nova/blob/cb0cfc90e1e03e82c42187ec60f46fb8fd590a06/nova/scheduler/client/report.py#L2180 [5] https://github.com/openstack/nova/blob/cb0cfc90e1e03e82c42187ec60f46fb8fd590a06/nova/api/openstack/compute/services.py#L279 [6] https://review.opendev.org/#/c/663737/ [7] https://github.com/openstack/nova/blob/cb0cfc90e1e03e82c42187ec60f46fb8fd590a06/nova/compute/manager.py#L706 -- Thanks, Matt From robson.rbarreto at gmail.com Wed Jun 12 20:40:04 2019 From: robson.rbarreto at gmail.com (Robson Ramos Barreto) Date: Wed, 12 Jun 2019 17:40:04 -0300 Subject: [openstack-helm] custom container images for helm Message-ID: Hi all I saw in the docker hub that there is just until rocky ubuntu xenial version. I'd like to know how can I create my own images centos-based from new versions like Stein to be used with the helm charts, if is there any specific customization to works with helm or, for example, if can I use the kolla images. Thank you Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilkers.steve at gmail.com Wed Jun 12 21:13:49 2019 From: wilkers.steve at gmail.com (Steve Wilkerson) Date: Wed, 12 Jun 2019 16:13:49 -0500 Subject: [openstack-helm] custom container images for helm In-Reply-To: References: Message-ID: Hey Robson, We’ve recently started building images out of the openstack-helm-images repository. Currently, we use LOCI to build ubuntu based images for releases Ocata through Rocky and leap15 images for the Rocky release. We’ve recently started work on the multi-distro support spec which also added overrides and jobs required for the leap15 based images for Rocky. We’d love to see support added for centos images added to both openstack-helm-images and the openstack-helm charts themselves (and for releases beyond Rocky), but we just haven’t gotten there yet. If you’re interested in contributing and getting your hands dirty, we’d love to help provide guidance and help here. In regards to the Kolla images, it’s been awhile since I’ve used them myself so I can’t speak much there. Cheers, Steve On Wed, Jun 12, 2019 at 3:45 PM Robson Ramos Barreto < robson.rbarreto at gmail.com> wrote: > Hi all > > I saw in the docker hub that there is just until rocky ubuntu xenial > version. > > I'd like to know how can I create my own images centos-based from new > versions like Stein to be used with the helm charts, if is there any > specific customization to works with helm or, for example, if can I use > the kolla images. > > Thank you > > Regards > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jasonanderson at uchicago.edu Wed Jun 12 21:19:58 2019 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Wed, 12 Jun 2019 21:19:58 +0000 Subject: [ironic][neutron] Security groups on bare metal instances References: Message-ID: Hi Sean, thanks for the reply. On 6/11/19 11:00 AM, Sean Mooney wrote: as an alternitive you migth be able to use the firewall as a service api to implemtn traffic filtering in the neutorn routers rather than at the port level. This was a good idea! I found that it actually worked to solve our use-case. I set up FWaaS and configured a firewall group with the rules I wanted. Then I added my subnets's router_interface port to the firewall. Thank you! Re: the general issue of doing security groups in Ironic, I was wondering if this is something that others envision eventually being the job of networking-baremetal[1]. I looked and the storyboard[2] for the project doesn't show any planned work for this, but I saw it mentioned in this presentation[3] from 2017. Cheers, /Jason [1]: https://docs.openstack.org/networking-baremetal/latest/ [2]: https://storyboard.openstack.org/#!/project/955 [3]: https://www.slideshare.net/nyechiel/openstack-networking-the-road-ahead -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsneddon at redhat.com Wed Jun 12 22:03:16 2019 From: dsneddon at redhat.com (Dan Sneddon) Date: Wed, 12 Jun 2019 15:03:16 -0700 Subject: [ironic][neutron] Security groups on bare metal instances In-Reply-To: References: Message-ID: I helped to design the python-networking-ansible driver for ML2 + bare metal networking automation [1]. The idea behind it is a more production-grade alternative to networking-generic-switch that works with multiple makes/models of switches in the same environment. Behind the scenes, Ansible Networking is used to provide a vendor-neutral interface. I have tried to architect security groups for bare metal, but it’s a difficult challenge. I’d appreciate if anyone has suggestions. The main question is where to apply the security groups? Ideally, security groups would be applied at the port-level where the baremetal node is attached (we already configure VLAN assignment at the port level). Unfortunately, port security implementations vary wildly between vendors, and implementations may support only L2 filters, or very basic L3 filters only. The next logical place to apply the security group is at the VLAN router interface. That wouldn’t prevent hosts on the same network from talking to one another (access would be wide open between hosts on the same VLAN), but it would allow firewalling of hosts between networks. The challenge with this is that the plugin would have to know not only the switch and port where the baremetal node is attached, but also the switch/router where the VLAN router interface is located (or switches/routers in an HA environment). The baremetal port info is collected via Ironic Inspector, or it may be specified by the operator. How would we obtain the switch info and interface name for the VLAN L3 interface? What if there are multiple switch routers running with HA? Would the switch/interface have to be passed to Neutron when the network is created? I would love to discuss some ideas about how this could be implemented. [1] - https://pypi.org/project/networking-ansible/ On Wed, Jun 12, 2019 at 2:21 PM Jason Anderson wrote: > Hi Sean, thanks for the reply. > > On 6/11/19 11:00 AM, Sean Mooney wrote: > > as an alternitive you migth be able to use the firewall as a service api to implemtn traffic filtering in the neutorn > routers rather than at the port level. > > This was a good idea! I found that it actually worked to solve our > use-case. I set up FWaaS and configured a firewall group with the rules I > wanted. Then I added my subnets's router_interface port to the firewall. > Thank you! > > Re: the general issue of doing security groups in Ironic, I was wondering > if this is something that others envision eventually being the job of > networking-baremetal[1]. I looked and the storyboard[2] for the project > doesn't show any planned work for this, but I saw it mentioned in this > presentation[3] from 2017. > > Cheers, > /Jason > > [1]: https://docs.openstack.org/networking-baremetal/latest/ > [2]: https://storyboard.openstack.org/#!/project/955 > [3]: > https://www.slideshare.net/nyechiel/openstack-networking-the-road-ahead > -- Dan Sneddon | Senior Principal Software Engineer dsneddon at redhat.com | redhat.com/cloud dsneddon:irc | @dxs:twitter -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at doughellmann.com Wed Jun 12 22:16:13 2019 From: doug at doughellmann.com (Doug Hellmann) Date: Wed, 12 Jun 2019 18:16:13 -0400 Subject: [doc][release][requirements] new warning and impending incompatible change in sphinxcontrib-datatemplates Message-ID: The new 0.4.0 release of sphinxcontrib-datatemplates will emit a deprecation warning message when the "datatemplate" directive is used. This may break jobs that run sphinx-build with the -W option enabled. That package includes support for the new form of the directive, which includes a different variation depending on the type of the data source, allowing different options to be used for each directive. See https://doughellmann.com/blog/2019/06/12/sphinxcontrib-datatemplates-0-4-0/ for details about the release and https://sphinxcontribdatatemplates.readthedocs.io/en/latest/index.html for details about the new syntax. The 1.0.0 release (not yet scheduled) will drop the legacy form of the directive entirely. -- Doug From openstack at fried.cc Wed Jun 12 22:36:20 2019 From: openstack at fried.cc (Eric Fried) Date: Wed, 12 Jun 2019 17:36:20 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: <98a6fb1f-7fd2-20da-4a5d-53821422b015@fried.cc> > 2. Change delete_resource_provider cascade=True logic to remove all > allocations for the provider before deleting it, i.e. for > not-yet-complete migrations and evacuated instances. For the evacuated > instance allocations this is likely OK since restarting the source > compute service is going to do that cleanup anyway. Also, if you delete > the source compute service during a migration, confirming or reverting > the resize later will likely fail since we'd be casting to something > that is gone (and we'd orphan those allocations). Maybe we need a > functional recreate test for the unconfirmed migration scenario before > deciding on this? This seems like a win to me. If we can distinguish between the migratey ones and the evacuatey ones, maybe we fail on the former (forcing them to wait for completion) and automatically delete the latter (which is almost always okay for the reasons you state; and recoverable via heal if it's not okay for some reason). efried . From zigo at debian.org Wed Jun 12 22:50:16 2019 From: zigo at debian.org (Thomas Goirand) Date: Thu, 13 Jun 2019 00:50:16 +0200 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: <1cdb1bf8-2fea-79e2-4eb9-041eae4c6be7@debian.org> Hi Matt, Hoping I can bring an operator's perspective. On 6/12/19 10:38 PM, Matt Riedemann wrote: > 1. Don't delete the compute service if we can't cleanup all resource > providers - make sure to not orphan any providers. Manual cleanup may be > necessary by the operator. I'd say that this option is ok-ish *IF* the operators are given good enough directives saying what to do. It would really suck if we just get an error, and don't know what resource cleanup is needed. But if the error is: Cannot delete nova-compute on host mycloud-compute-5. Instances still running: 623051e7-4e0d-4b06-b977-1d9a73e6e6e1 f8483448-39b5-4981-a731-5f4eeb28592c Currently live-migrating: 49a12659-9dc6-4b07-b38b-e0bf2a69820a Not confirmed migration/resize: cc3d4311-e252-4922-bf04-dedc31b3a425 then that's fine, we know what to do. And better: the operator will know better than nova what to do. Maybe live-migrate the instances? Or maybe just destroy them? Nova shouldn't attempt to double-guess what the operator has in mind. > 2. Change delete_resource_provider cascade=True logic to remove all > allocations for the provider before deleting it, i.e. for > not-yet-complete migrations and evacuated instances. For the evacuated > instance allocations this is likely OK since restarting the source > compute service is going to do that cleanup anyway. Also, if you delete > the source compute service during a migration, confirming or reverting > the resize later will likely fail since we'd be casting to something > that is gone (and we'd orphan those allocations). Maybe we need a > functional recreate test for the unconfirmed migration scenario before > deciding on this? I don't see how this is going to help more than an evacuate command. Or is the intend to do the evacuate, then right after it, the deletion of the resource provider? > 3. Other things I'm not thinking of? Should we add a force parameter to > the API to allow the operator to forcefully delete (#2 above) if #1 > fails? Force parameters are hacky and usually seem to cause more > problems than they solve, but it does put the control in the operators > hands. Let's say the --force is just doing the resize --confirm for the operator, or do an evacuate, then that's fine (and in fact, a good idea, automations are great...). If it's going to create a mess in the DB, then it's IMO a terrible idea. However, I see a case that may happen: image a compute node is completely broken (think: broken motherboard...), then probably we do want to remove everything that's in there, and want to handle the case where nova-compute doesn't even respond. This very much is a real life scenario. If your --force is to address this case, then why not! Though again and of course, we don't want a mess in the db... :P I hope this helps, Cheers, Thomas Goirand (zigo) From mnaser at vexxhost.com Wed Jun 12 23:26:06 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Wed, 12 Jun 2019 19:26:06 -0400 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: On Wed, Jun 12, 2019 at 4:44 PM Matt Riedemann wrote: > > Before [1] when deleting a compute service in the API we did not check > to see if the compute service was hosting any instances and just blindly > deleted the service and related compute_node(s) records which orphaned > the resource provider(s) for those nodes. > > With [2] we built on that and would cleanup the (first [3]) compute node > resource provider by first deleting any allocations for instances still > on that host - which because of the check in [1] should be none - and > then deleted the resource provider itself. > > [2] forgot about ironic where a single compute service can be managing > multiple (hundreds or even thousands) of baremetal compute nodes so I > wrote [3] to delete *all* resource providers for compute nodes tied to > the service - again barring there being any instances running on the > service because of the check added in [1]. > > What we've failed to realize until recently is that there are cases > where deleting the resource provider can still fail because there are > allocations we haven't cleaned up, namely: > > 1. Residual allocations for evacuated instances from a source host. > > 2. Allocations held by a migration record for an unconfirmed (or not yet > complete) migration. > > Because the delete_resource_provider method isn't checking for those, we > can get ResourceProviderInUse errors which are then ignored [4]. Since > that error is ignored, we continue on to delete the compute service > record [5], effectively orphaning the providers (which is what [2] was > meant to fix). I have recreated the evacuate scenario in a functional > test here [6]. > > The question is what should we do about the fix? I'm getting lost > thinking about this in a vacuum so trying to get some others to help > think about it. > > Clearly with [1] we said you shouldn't be able to delete a compute > service that has instances on it because that corrupts our resource > tracking system. If we extend that to any allocations held against > providers for that compute service, then the fix might be as simple as > not ignoring the ResourceProviderInUse error and fail if we can't delete > the provider(s). > > The question I'm struggling with is what does an operator do for the two > cases mentioned above, not-yet-complete migrations and evacuated > instances? For migrations, that seems pretty simple - wait for the > migration to complete and confirm it (reverting a cold migration or > resize would put the instance back on the compute service host you're > trying to delete). > > The nastier thing is the allocations tied to an evacuated instance since > those don't get cleaned up until the compute service is restarted [7]. > If the operator never intends on restarting that compute service and > just wants to clear the data, then they have to manually delete the > allocations for the resource providers associated with that host before > they can delete the compute service, which kind of sucks. > > What are our options? > > 1. Don't delete the compute service if we can't cleanup all resource > providers - make sure to not orphan any providers. Manual cleanup may be > necessary by the operator. I'm personally in favor of this. I think that currently a lot of operators don't really think of the placement service much (or perhaps don't really know what it's doing). There's a lack of transparency in the data that exists in that service, a lot of users will actually rely on the information fed by *nova* and not *placement*. Because of this, I've seen a lot of deployments with stale placement records or issues with clouds where the hypervisors are not efficiently used because of a bunch of stale resource allocations that haven't been cleaned up (and counting on deployers watching logs for warnings.. eh) I would be more in favor of failing a delete if it will cause the cloud to reach an inconsistent state than brute-force a delete leaving you in a messy state where you need to login to the database to unkludge things. > 2. Change delete_resource_provider cascade=True logic to remove all > allocations for the provider before deleting it, i.e. for > not-yet-complete migrations and evacuated instances. For the evacuated > instance allocations this is likely OK since restarting the source > compute service is going to do that cleanup anyway. Also, if you delete > the source compute service during a migration, confirming or reverting > the resize later will likely fail since we'd be casting to something > that is gone (and we'd orphan those allocations). Maybe we need a > functional recreate test for the unconfirmed migration scenario before > deciding on this? > > 3. Other things I'm not thinking of? Should we add a force parameter to > the API to allow the operator to forcefully delete (#2 above) if #1 > fails? Force parameters are hacky and usually seem to cause more > problems than they solve, but it does put the control in the operators > hands. > > If we did remove allocations for an instance when deleting it's compute > service host, the operator should be able to get them back by running > the "nova-manage placement heal_allocations" CLI - assuming they restart > the compute service on that host. This would have to be tested of course. > > Help me Obi-Wan Kenobi. You're my only hope. > > [1] https://review.opendev.org/#/q/I0bd63b655ad3d3d39af8d15c781ce0a45efc8e3a > [2] https://review.opendev.org/#/q/I7b8622b178d5043ed1556d7bdceaf60f47e5ac80 > [3] https://review.opendev.org/#/c/657016/ > [4] > https://github.com/openstack/nova/blob/cb0cfc90e1e03e82c42187ec60f46fb8fd590a06/nova/scheduler/client/report.py#L2180 > [5] > https://github.com/openstack/nova/blob/cb0cfc90e1e03e82c42187ec60f46fb8fd590a06/nova/api/openstack/compute/services.py#L279 > [6] https://review.opendev.org/#/c/663737/ > [7] > https://github.com/openstack/nova/blob/cb0cfc90e1e03e82c42187ec60f46fb8fd590a06/nova/compute/manager.py#L706 > > -- > > Thanks, > > Matt > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From smooney at redhat.com Thu Jun 13 00:05:31 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 13 Jun 2019 01:05:31 +0100 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <98a6fb1f-7fd2-20da-4a5d-53821422b015@fried.cc> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> <98a6fb1f-7fd2-20da-4a5d-53821422b015@fried.cc> Message-ID: On Wed, 2019-06-12 at 17:36 -0500, Eric Fried wrote: > > 2. Change delete_resource_provider cascade=True logic to remove all > > allocations for the provider before deleting it, i.e. for > > not-yet-complete migrations and evacuated instances. For the evacuated > > instance allocations this is likely OK since restarting the source > > compute service is going to do that cleanup anyway. Also, if you delete > > the source compute service during a migration, confirming or reverting > > the resize later will likely fail since we'd be casting to something > > that is gone (and we'd orphan those allocations). Maybe we need a > > functional recreate test for the unconfirmed migration scenario before > > deciding on this? > > This seems like a win to me. > > If we can distinguish between the migratey ones and the evacuatey ones, > maybe we fail on the former (forcing them to wait for completion) and > automatically delete the latter (which is almost always okay for the > reasons you state; and recoverable via heal if it's not okay for some > reason). for a cold migration the allcoation will be associated with a migration object for evacuate which is basically a rebuild to a different host we do not have a migration object so the consumer uuid for the allcotion are still associated with the instace uuid not a migration uuid. so technically we can tell yes but only if we pull back the allcoation form placmenet and then iterate over them and check if we have a migration object or an instance that has the same uuid. in the evac case we shoudl also be able to tell that its an evac as the uuid will match an instance but the instnace host will not match the RP name the allcoation is associated with. so we can figure this out on the nova side by looking at either the instances table or migrations table or in the futrue when we have consumer types in placement that will also make this simplete to do as the info will be in the allocation itself. personally i like option 2 but yes we could selectivly force for evac only if we wanted. > > efried > . > > From corey.bryant at canonical.com Thu Jun 13 03:04:37 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Wed, 12 Jun 2019 23:04:37 -0400 Subject: [goal][python3] Train unit tests weekly update (goal-13) Message-ID: This is the goal-13 weekly update for the "Update Python 3 test runtimes for Train" goal [1]. There are 13 weeks remaining for completion of Train community goals [2]. == What's the Goal? == To ensure (in the Train cycle) that all official OpenStack repositories with Python 3 unit tests are exclusively using the 'openstack-python3-train-jobs' Zuul template or one of its variants (e.g. 'openstack-python3-train-jobs-neutron') to run unit tests, and that tests are passing. This will ensure that all official projects are running py36 and py37 unit tests in Train. For complete details please see [1]. == Ongoing Work == I have initial scripts working to automate patch generation for all supported projects. I plan to get them cleaned up and submitted for review next week, and I plan to start submitting patches next week. For reference my goal-tools scripts are located at: https://github.com/coreycb/goal-tools/commit/6eaf2535af02d5c48ebd9762e280c73859427268. I'll be off Thurs/Fri this week. Open patches needing reviews: https://review.openstack.org/#/q/topic:python3-train+is:open Failing patches: https://review.openstack.org/#/q/topic:python3-train+status:open+(+label:Verified-1+OR+label:Verified-2+) == Completed Work == Merged patches: https://review.openstack.org/#/q/topic:python3-train+is:merged == How can you help? == Please take a look at the failing patches and help fix any failing unit tests for your project(s). Python 3.7 unit tests will be self-testing in zuul. If you're interested in helping submit patches, please let me know. == Reference Material == [1] Goal description: https://governance.openstack.org/tc/goals/train/python3-updates.html [2] Train release schedule: https://releases.openstack.org/train/schedule.html (see R-5 for "Train Community Goals Completed") Storyboard: https://storyboard.openstack.org/#!/board/ Porting to Python 3.7: https://docs.python.org/3/whatsnew/3.7.html#porting-to-python-3-7 Python Update Process: https://opendev.org/openstack/governance/src/branch/master/resolutions/20181024-python-update-process.rst Train runtimes: https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/train.rst Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Thu Jun 13 03:32:21 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Thu, 13 Jun 2019 15:32:21 +1200 Subject: [nova] Admin user cannot create vm with user's port? Message-ID: Hi Nova team, In Nova, even the admin user cannot specify user's port to create a vm, is that designed intentionally or sounds like a bug? Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bhagyashri.Shewale at nttdata.com Thu Jun 13 04:42:28 2019 From: Bhagyashri.Shewale at nttdata.com (Shewale, Bhagyashri) Date: Thu, 13 Jun 2019 04:42:28 +0000 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: Message-ID: Hi All, After revisiting the spec [1] again and again, I got to know few points please check and let me know about my understanding: Understanding: If the ``vcpu_pin_set`` is set on compute node A in the Stein release then we can say that this node is used to host the dedicated instance on it and if user upgrades from Stein to Train and if operator doesn’t define ``[compute] cpu_dedicated_set`` set then simply fallback to ``vcpu_pin_set`` and report it as PCPU inventory. Considering multiple combinations of various configuration options, I think we will need to implement below business rules so that the issue highlighted in the previous email about the scheduler pre-filter can be solved. Rule 1: If operator sets ``[compute] cpu_shared_set`` in Train. 1.If pinned instances are found then we can simply say that this compute node is used as dedicated in the previous release so raise an error that says to set ``[compute] cpu_dedicated_set`` config option otherwise report it as VCPU inventory. Rule 2: If operator sets ``[compute] cpu_dedicated_set`` in Train. 1. Report inventory as PCPU 2. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is not empty, that means this compute node is used as dedicated in the previous release and if empty, then raise an error that this compute node is used as shared compute node in previous release. Rule 3: If operator sets None of the options (``[compute] cpu_dedicated_set``, ``[compute] cpu_shared_set``, ``vcpu_pin_set``) in Train. 1. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is not empty, then raise an error that this compute node is used as dedicated compute node in previous release so set ``[compute] cpu_dedicated_set``, otherwise report inventory as VCPU. 2. If no instances, report inventory as VCPU. Rule 4: If operator sets ``vcpu_pin_set`` config option in Train. 1. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is empty, that means this compute node is used for non-pinned instances in the previous release, so raise an error otherwise report it as PCPU inventory. 2. If no instances, report inventory as PCPU. Rule 5: If operator sets ``vcpu_pin_set`` and ``[compute] cpu_dedicated_set`` or ``[compute] cpu_shared_set`` config options in Train 1. Simply raise an error Above business rules 3 and 4 are very important in order to solve the scheduler pre-filter issue highlighted in my previous email. As of today, in either case, `vcpu_pin_set`` is set or not set on the compute node, it can used for both pinned or non-pinned instances depending on whether this host belongs to an aggregate with “pinned” metadata. But as per business rule #3 , if ``vcpu_pin_set`` is not set, we are considering it to be used for non-pinned instances only. Do you think this could cause an issue in providing backward compatibility? Please provide your suggestions on the above business rules. [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 409 Thanks and Regards, -Bhagyashri Shewale- ________________________________ From: Shewale, Bhagyashri Sent: Wednesday, June 12, 2019 6:10:04 PM To: openstack-discuss at lists.openstack.org; openstack at fried.cc; smooney at redhat.com; sfinucan at redhat.com; jaypipes at gmail.com Subject: [nova] Spec: Standardize CPU resource tracking Hi All, Currently I am working on implementation of cpu pinning upgrade part as mentioned in the spec [1]. While implementing the scheduler pre-filter as mentioned in [1], I have encountered one big issue: Proposed change in spec: In scheduler pre-filter we are going to alias request_spec.flavor.extra_spec and request_spec.image.properties form ``hw:cpu_policy`` to ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. So when user will create a new instance or execute instance actions like shelve, unshelve, resize, evacuate and migration post upgrade it will go through scheduler pre-filter which will set alias for `hw:cpu_policy` in request_spec flavor ``extra specs`` and image metadata properties. In below particular case, it won’t work:- For example: I have two compute nodes say A and B: On Stein: Compute node A configurations: vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) Compute node B Configuration: vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) On Train, two possible scenarios: Compute node A configurations: (Consider the new cpu pinning implementation is merged into Train) vcpu_pin_set=0-3 (Keep same settings as in Stein) Compute node B Configuration: (Consider the new cpu pinning implementation is merged into Train) cpu_dedicated_set=0-3 (change to the new config option) 1. Consider that one instance say `test ` is created using flavor having old extra specs (hw:cpu_policy=dedicated, "aggregate_instance_extra_specs:pinned": "true") in Stein release and now upgraded Nova to Train with the above configuration. 2. Now when user will perform instance action say shelve/unshelve scheduler pre-filter will change the request_spec flavor extra spec from ``hw:cpu_policy`` to ``resources=PCPU:$`` which ultimately will return only compute node B from placement service. Here, we expect it should have retuned both Compute A and Compute B. 3. If user creates a new instance using old extra specs (hw:cpu_policy=dedicated, "aggregate_instance_extra_specs:pinned": "true") on Train release with the above configuration then it will return only compute node B from placement service where as it should have returned both compute Node A and B. Problem: As Compute node A is still configured to be used to boot instances with dedicated CPUs same behavior as Stein, it will not be returned by placement service due to the changes in the scheduler pre-filter logic. Propose changes: Earlier in the spec [2]: The online data migration was proposed to change flavor extra specs and image metadata properties of request_spec and instance object. Based on the instance host, we can get the NumaTopology of the host which will contain the new configuration options set on the compute host. Based on the NumaTopology of host, we can change instance and request_spec flavor extra specs. 1. Remove cpu_policy from extra specs 2. Add “resources:PCPU=” in extra specs We can also change the flavor extra specs and image metadata properties of instance and request_spec object using the reshape functionality. Please give us your feedback on the proposed solution so that we can update specs accordingly. [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 [2]: https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst Thanks and Regards, -Bhagyashri Shewale- Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jun 13 04:55:42 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 13 Jun 2019 13:55:42 +0900 Subject: [nova] Admin user cannot create vm with user's port? In-Reply-To: References: Message-ID: <16b4f310b40.b15ee317318066.1276719462324180582@ghanshyammann.com> ---- On Thu, 13 Jun 2019 12:32:21 +0900 Lingxian Kong wrote ---- > Hi Nova team, > In Nova, even the admin user cannot specify user's port to create a vm, is that designed intentionally or sounds like a bug? You can specify that in networks object( networks.port field) [1]. This takes port_id of the existing port. [1] https://developer.openstack.org/api-ref/compute/?expanded=create-server-detail - https://opendev.org/openstack/nova/src/commit/52d8d3d7f65bed99c25f39e7e38f566346586009/nova/api/openstack/compute/schemas/servers.py -gmann > > Best regards, > Lingxian KongCatalyst Cloud From soulxu at gmail.com Thu Jun 13 05:54:52 2019 From: soulxu at gmail.com (Alex Xu) Date: Thu, 13 Jun 2019 13:54:52 +0800 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> Message-ID: Mark Goddard 于2019年6月12日周三 下午2:45写道: > > > On Wed, 12 Jun 2019, 06:23 Alex Xu, wrote: > >> >> >> Mark Goddard 于2019年6月12日周三 上午1:39写道: >> >>> On Mon, 10 Jun 2019 at 06:18, Alex Xu wrote: >>> > >>> > >>> > >>> > Eric Fried 于2019年6月7日周五 上午1:59写道: >>> >> >>> >> > Looking at the specs, it seems it's mostly talking about changing >>> VMs resources without rebooting. However that's not the actual intent of >>> the Ironic use case I explained in the email. >>> >> > Yes, it requires a reboot to reflect the BIOS changes. This reboot >>> can be either be done by Nova IronicDriver or Ironic deploy step can also >>> do it. >>> >> > So I am not sure if the spec actually satisfies the use case. >>> >> > I hope to get more response from the team to get more clarity. >>> >> >>> >> Waitwait. The VM needs to be rebooted for the BIOS change to take >>> >> effect? So (non-live) resize would actually satisfy your use case just >>> >> fine. But the problem is that the ironic driver doesn't support resize >>> >> at all? >>> >> >>> >> Without digging too hard, that seems like it would be a fairly >>> >> straightforward thing to add. It would be limited to only "same host" >>> >> and initially you could only change this one attribute (anything else >>> >> would have to fail). >>> >> >>> >> Nova people, thoughts? >>> >> >>> > >>> > Contribute another idea. >>> > >>> > So just as Jay said in this thread. Those CUSTOM_HYPERTHREADING_ON and >>> CUSTOM_HYPERTHREADING_OFF are configuration. Those >>> > configuration isn't used for scheduling. Actually, Traits is designed >>> for scheduling. >>> > >>> > So yes, there should be only one trait. CUSTOM_HYPERTHREADING, this >>> trait is used for indicating the host support HT. About whether enable it >>> in the instance is configuration info. >>> > >>> > That is also pain for change the configuration in the flavor. The >>> flavor is the spec of instance's virtual resource, not the configuration. >>> > >>> > So another way is we should store the configuration into another >>> place. Like the server's metadata. >>> > >>> > So for the HT case. We only fill the CUSTOM_HYPERTHREADING trait in >>> the flavor, and fill a server metadata 'hyperthreading_config=on' in server >>> metadata. The nova will find out a BM node support HT. And ironic based on >>> the server metadata 'hyperthreading_config=on' to enable the HT. >>> > >>> > When change the configuration of HT to off, the user can update the >>> server's metadata. Currently, the nova will send a rpc call to the compute >>> node and calling a virt driver interface when the server metadata is >>> updated. In the ironic virt driver, it can trigger a hyper-threading >>> configuration deploy step to turn the HT off, and do a reboot of the >>> instance. (The reboot is a step inside deploy-step, not part of ironic virt >>> driver flow) >>> > >>> > But yes, this changes some design to the original deploy-steps and >>> deploy-templates. And we fill something into the server's metadata which >>> I'm not sure nova people like it. >>> > >>> > Anyway, just put my idea at here. >>> >>> We did consider using metadata. The problem is that it is >>> user-defined, so there is no way for an operator to restrict what can >>> be done by a user. Flavors are operator-defined and so allow for >>> selection from a 'menu' of types and configurations. >>> >> >> The end user can change the BIOS config by the ipmi inside the guest OS, >> and do a reboot. It is already out of control for the operator. >> (Correct me if ironic doesn't allow the end user change the config inside >> the guest OS) >> > > It depends. Normally you can't configure BIOS via IPMI, but need to use a > vendor interface such as racadm or on hardware that supports it, Redfish. > Access to the management controller can and should be locked down though. > It's also usually possible to reconfigure via serial console, if this is > exposed to users. > It sounds that breaking the operator control partially. (Sorry for drop the mallist thread again...I will paste a note to the wall "click the "Reply All"...") > >> So Flavor should be thing to strict the resource( or resource's capable) >> which can be requested by the end user. For example, flavor will say I need >> a BM node has hyper-thread capable. But enable or disable can be controlled >> by the end user. >> >> >>> >>> What might be nice is if we could use a flavor extra spec like this: >>> >>> deploy-config:hyperthreading=enabled >>> >>> The nova ironic virt driver could pass this to ironic, like it does with >>> traits. >>> >>> Then in the ironic deploy template, have fields like this: >>> >>> name: Hyperthreading enabled >>> config-type: hyperthreading >>> config-value: enabled >>> steps: >>> >>> Ironic would then match on the config-type and config-value to find a >>> suitable deploy template. >>> >>> As an extension, the deploy template could define a trait (or list of >>> traits) that must be supported by a node in order for the template to >>> be applied. Perhaps this would even be a standard relationship between >>> config-type and traits? >>> >>> Haven't thought this through completely, I'm sure it has holes. >>> >>> > >>> >> efried >>> >> . >>> >> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.rydberg at citynetwork.eu Thu Jun 13 06:51:25 2019 From: tobias.rydberg at citynetwork.eu (Tobias Rydberg) Date: Thu, 13 Jun 2019 08:51:25 +0200 Subject: [sigs][publiccloud][publiccloud-wg][publiccloud-sig][billing] Meeting today at 1400 UTC regarding billing initiative Message-ID: <506b17fb-12c4-8f90-1ac5-a2a332d0b0c3@citynetwork.eu> Hi all, This is a reminder for todays meeting for the Public Cloud SIG - 1400 UTC in #openstack-publiccloud. The topic of the day will be continued discussions regarding the billing initiative. More information about that at https://etherpad.openstack.org/p/publiccloud-sig-billing-implementation-proposal See you all later today! Cheers, Tobias -- Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED From madhuri.kumari at intel.com Thu Jun 13 07:13:24 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Thu, 13 Jun 2019 07:13:24 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> Hi All, Thank you everyone for your responses. We have created an etherpad[1] with suggested solution and concerns. I request Nova and Ironic developers to provide their input on the etherpad. [1] https://etherpad.openstack.org/p/ironic-nova-reset-configuration Regards, Madhuri From: Alex Xu [mailto:soulxu at gmail.com] Sent: Thursday, June 13, 2019 11:25 AM To: Mark Goddard ; openstack-discuss Subject: Re: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning Mark Goddard > 于2019年6月12日周三 下午2:45写道: On Wed, 12 Jun 2019, 06:23 Alex Xu, > wrote: Mark Goddard > 于2019年6月12日周三 上午1:39写道: On Mon, 10 Jun 2019 at 06:18, Alex Xu > wrote: > > > > Eric Fried > 于2019年6月7日周五 上午1:59写道: >> >> > Looking at the specs, it seems it's mostly talking about changing VMs resources without rebooting. However that's not the actual intent of the Ironic use case I explained in the email. >> > Yes, it requires a reboot to reflect the BIOS changes. This reboot can be either be done by Nova IronicDriver or Ironic deploy step can also do it. >> > So I am not sure if the spec actually satisfies the use case. >> > I hope to get more response from the team to get more clarity. >> >> Waitwait. The VM needs to be rebooted for the BIOS change to take >> effect? So (non-live) resize would actually satisfy your use case just >> fine. But the problem is that the ironic driver doesn't support resize >> at all? >> >> Without digging too hard, that seems like it would be a fairly >> straightforward thing to add. It would be limited to only "same host" >> and initially you could only change this one attribute (anything else >> would have to fail). >> >> Nova people, thoughts? >> > > Contribute another idea. > > So just as Jay said in this thread. Those CUSTOM_HYPERTHREADING_ON and CUSTOM_HYPERTHREADING_OFF are configuration. Those > configuration isn't used for scheduling. Actually, Traits is designed for scheduling. > > So yes, there should be only one trait. CUSTOM_HYPERTHREADING, this trait is used for indicating the host support HT. About whether enable it in the instance is configuration info. > > That is also pain for change the configuration in the flavor. The flavor is the spec of instance's virtual resource, not the configuration. > > So another way is we should store the configuration into another place. Like the server's metadata. > > So for the HT case. We only fill the CUSTOM_HYPERTHREADING trait in the flavor, and fill a server metadata 'hyperthreading_config=on' in server metadata. The nova will find out a BM node support HT. And ironic based on the server metadata 'hyperthreading_config=on' to enable the HT. > > When change the configuration of HT to off, the user can update the server's metadata. Currently, the nova will send a rpc call to the compute node and calling a virt driver interface when the server metadata is updated. In the ironic virt driver, it can trigger a hyper-threading configuration deploy step to turn the HT off, and do a reboot of the instance. (The reboot is a step inside deploy-step, not part of ironic virt driver flow) > > But yes, this changes some design to the original deploy-steps and deploy-templates. And we fill something into the server's metadata which I'm not sure nova people like it. > > Anyway, just put my idea at here. We did consider using metadata. The problem is that it is user-defined, so there is no way for an operator to restrict what can be done by a user. Flavors are operator-defined and so allow for selection from a 'menu' of types and configurations. The end user can change the BIOS config by the ipmi inside the guest OS, and do a reboot. It is already out of control for the operator. (Correct me if ironic doesn't allow the end user change the config inside the guest OS) It depends. Normally you can't configure BIOS via IPMI, but need to use a vendor interface such as racadm or on hardware that supports it, Redfish. Access to the management controller can and should be locked down though. It's also usually possible to reconfigure via serial console, if this is exposed to users. It sounds that breaking the operator control partially. (Sorry for drop the mallist thread again...I will paste a note to the wall "click the "Reply All"...") So Flavor should be thing to strict the resource( or resource's capable) which can be requested by the end user. For example, flavor will say I need a BM node has hyper-thread capable. But enable or disable can be controlled by the end user. What might be nice is if we could use a flavor extra spec like this: deploy-config:hyperthreading=enabled The nova ironic virt driver could pass this to ironic, like it does with traits. Then in the ironic deploy template, have fields like this: name: Hyperthreading enabled config-type: hyperthreading config-value: enabled steps: Ironic would then match on the config-type and config-value to find a suitable deploy template. As an extension, the deploy template could define a trait (or list of traits) that must be supported by a node in order for the template to be applied. Perhaps this would even be a standard relationship between config-type and traits? Haven't thought this through completely, I'm sure it has holes. > >> efried >> . >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdent+os at anticdent.org Thu Jun 13 09:04:22 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Thu, 13 Jun 2019 10:04:22 +0100 (BST) Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: On Wed, 12 Jun 2019, Matt Riedemann wrote: > 2. Change delete_resource_provider cascade=True logic to remove all > allocations for the provider before deleting it, i.e. for not-yet-complete > migrations and evacuated instances. For the evacuated instance allocations > this is likely OK since restarting the source compute service is going to do > that cleanup anyway. Also, if you delete the source compute service during a > migration, confirming or reverting the resize later will likely fail since > we'd be casting to something that is gone (and we'd orphan those > allocations). Maybe we need a functional recreate test for the unconfirmed > migration scenario before deciding on this? I think this is likely the right choice. If the service is being deleted (not disabled) it shouldn't have a resource provider and to not have a resource provider it needs to not have allocations, and of those left over allocations that it does have are either bogus now, or will be soon enough, may as well get them gone in a consistent and predictable way. That said, we shouldn't make a habit of a removing allocations just so we can remove a resource provider whenever we want, only in special cases like this. If/when we're modelling shared disk as a shared resource provider does this get any more complicated? Does the part of an allocation that is DISK_GB need special handling. > 3. Other things I'm not thinking of? Should we add a force parameter to the > API to allow the operator to forcefully delete (#2 above) if #1 fails? Force > parameters are hacky and usually seem to cause more problems than they solve, > but it does put the control in the operators hands. I'm sort of maybe on this. A #1, with an option to inspect and then #2 seems friendly and potentially useful but how often is someone going to want to inspect versus just "whatevs, #2"? I don't know. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From cdent+os at anticdent.org Thu Jun 13 09:12:32 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Thu, 13 Jun 2019 10:12:32 +0100 (BST) Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: On Wed, 12 Jun 2019, Mohammed Naser wrote: > On Wed, Jun 12, 2019 at 4:44 PM Matt Riedemann wrote: >> 1. Don't delete the compute service if we can't cleanup all resource >> providers - make sure to not orphan any providers. Manual cleanup may be >> necessary by the operator. > > I'm personally in favor of this. I think that currently a lot of > operators don't > really think of the placement service much (or perhaps don't really know what > it's doing). > > There's a lack of transparency in the data that exists in that service, a lot of > users will actually rely on the information fed by *nova* and not *placement*. I agree, and this is part of why I prefer #2 over #1. For someone dealing with a deleted nova compute service, placement shouldn't be something they need to be all that concerned with. Nova should be mediating the interactions with placement to correct the model of reality that it is storing there. That's what option 2 is doing: fixing the model, from nova. (Obviously this is an idealisation that we've not achieved, which is I why I used that horrible word "should", but I do think it is something we should be striving towards.) Please: https://en.wikipedia.org/wiki/Posting_style#Trimming_and_reformatting /me scurries back to Usenet -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From anlin.kong at gmail.com Thu Jun 13 09:22:16 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Thu, 13 Jun 2019 21:22:16 +1200 Subject: [nova] Admin user cannot create vm with user's port? In-Reply-To: <16b4f310b40.b15ee317318066.1276719462324180582@ghanshyammann.com> References: <16b4f310b40.b15ee317318066.1276719462324180582@ghanshyammann.com> Message-ID: Yeah, the api allows to specify port. What i mean is, the vm creation will fail for admin user if port belongs to a non-admin user. An exception is raised from nova-compute. 在 2019年6月13日星期四,Ghanshyam Mann 写道: > ---- On Thu, 13 Jun 2019 12:32:21 +0900 Lingxian Kong < > anlin.kong at gmail.com> wrote ---- > > Hi Nova team, > > In Nova, even the admin user cannot specify user's port to create a vm, > is that designed intentionally or sounds like a bug? > > You can specify that in networks object( networks.port field) [1]. This > takes port_id of the existing port. > > [1] https://developer.openstack.org/api-ref/compute/?expanded= > create-server-detail > - https://opendev.org/openstack/nova/src/commit/ > 52d8d3d7f65bed99c25f39e7e38f566346586009/nova/api/openstack/ > compute/schemas/servers.py > > -gmann > > > > > Best regards, > > Lingxian KongCatalyst Cloud > > -- Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Jun 13 10:48:32 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 13 Jun 2019 11:48:32 +0100 Subject: [nova] Admin user cannot create vm with user's port? In-Reply-To: References: <16b4f310b40.b15ee317318066.1276719462324180582@ghanshyammann.com> Message-ID: <0956984a0d34141997110fa28091cdd37ac24c50.camel@redhat.com> On Thu, 2019-06-13 at 21:22 +1200, Lingxian Kong wrote: > Yeah, the api allows to specify port. What i mean is, the vm creation will > fail for admin user if port belongs to a non-admin user. An exception is > raised from nova-compute. i believe this is intentional. we do not currently allow you to trasfer ownerwhip of a vm form one user or proejct to another. but i also believe we currently do not allow a vm to be create from resouces with different owners it would cause issue with quota if we did. in this case the port would belong to the non admin and is currently being consumed from there quota. it woudld then be used by a vm created by the admin user which could result in the admin user being over there quota without use knowing. e.g. it would allow them to "steal" qutoa form the other project/user by using there resoucse. where it get tricky is if that first user hits there quota for ports and wants to delete it. shoulw we allow them too? the own the port after all but if delete the port it would break the admins vm. mixing ownership in a singel vm is pretty messy so we dont allow that. its possible it is a bug but i would be highly surprised if we ever intentionally supported this. the only multi teanant share resoucse im aware of are neutron shared netwrok which have ports owned by the indivitual users not the owner of the shared netwrok and manial shares which be shared between multiple project. in both cases we are not adding the shared resouse directly to the vm and i dont know of a case that does work today that would suggest a port should work. > > 在 2019年6月13日星期四,Ghanshyam Mann 写道: > > > ---- On Thu, 13 Jun 2019 12:32:21 +0900 Lingxian Kong < > > anlin.kong at gmail.com> wrote ---- > > > Hi Nova team, > > > In Nova, even the admin user cannot specify user's port to create a vm, > > is that designed intentionally or sounds like a bug? > > > > You can specify that in networks object( networks.port field) [1]. This > > takes port_id of the existing port. > > > > [1] https://developer.openstack.org/api-ref/compute/?expanded= > > create-server-detail > > - https://opendev.org/openstack/nova/src/commit/ > > 52d8d3d7f65bed99c25f39e7e38f566346586009/nova/api/openstack/ > > compute/schemas/servers.py > > > > -gmann > > > > > > > > Best regards, > > > Lingxian KongCatalyst Cloud > > > > > > From smooney at redhat.com Thu Jun 13 11:21:09 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 13 Jun 2019 12:21:09 +0100 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: Message-ID: <3b7955951183eff75b643b0cfc91c88bbf737e62.camel@redhat.com> On Thu, 2019-06-13 at 04:42 +0000, Shewale, Bhagyashri wrote: > Hi All, > > > After revisiting the spec [1] again and again, I got to know few points please check and let me know about my > understanding: > > > Understanding: If the ``vcpu_pin_set`` is set on compute node A in the Stein release then we can say that this node > is used to host the dedicated instance on it and if user upgrades from Stein to Train and if operator doesn’t define > ``[compute] cpu_dedicated_set`` set then simply fallback to ``vcpu_pin_set`` and report it as PCPU inventory. that is incorrect if the vcpu_pin_set is defiend it may be used for instance with hw:cpu_policy=dedicated or not. in train if vcpu_pin_set is defiend and cpu_dedicated_set is not defiend then we use vcpu_pin_set to define the inventory of both PCPUs and VCPUs > > > Considering multiple combinations of various configuration options, I think we will need to implement below business > rules so that the issue highlighted in the previous email about the scheduler pre-filter can be solved. > > > Rule 1: > > If operator sets ``[compute] cpu_shared_set`` in Train. > > 1.If pinned instances are found then we can simply say that this compute node is used as dedicated in the previous > release so raise an error that says to set ``[compute] cpu_dedicated_set`` config option otherwise report it as VCPU > inventory. cpu_share_set in stien was used for vm emulator thread and required the instnace to be pinned for it to take effect. i.e. the hw:emulator_thread_policy extra spcec currently only works if you had hw_cpu_policy=dedicated. so we should not error if vcpu_pin_set and cpu_shared_set are defined, it was valid. what we can do is ignore teh cpu_shared_set for schduling and not report 0 VCPUs for this host and use vcpu_pinned_set as PCPUs > > > Rule 2: > > If operator sets ``[compute] cpu_dedicated_set`` in Train. > > 1. Report inventory as PCPU yes if cpu_dedicated_set is set we will report its value as PCPUs > > 2. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is not empty, that means this > compute node is used as dedicated in the previous release and if empty, then raise an error that this compute node is > used as shared compute node in previous release. this was not part of the spec. we could do this but i think its not needed and operators should check this themselves. if we decide to do this check on startup it should only happen if vcpu_pin_set is defined. addtionally we can log an error but we should not prevent the compute node form working and contuing to spawn vms. > > > Rule 3: > > If operator sets None of the options (``[compute] cpu_dedicated_set``, ``[compute] cpu_shared_set``, > ``vcpu_pin_set``) in Train. > > 1. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is not empty, then raise an error > that this compute node is used as dedicated compute node in previous release so set ``[compute] cpu_dedicated_set``, > otherwise report inventory as VCPU. again this is not in the spec and i dont think we should do this. if none of the values are set we should report all cpus as both VCPUs and PCPUs the vcpu_pin_set option was never intended to signal a host was used for cpu pinning it was intoduced for cpu pinning and numa affinity but it was orignally ment to apply to floaing instance and currently contople the number of VCPU reported to the resouce tracker which is used to set the capastiy of the VCPU inventory. you should read https://that.guru/blog/cpu-resources/ for a walk through of this. > > 2. If no instances, report inventory as VCPU. we could do this but i think it will be confusing as to what will happen after we spawn an instnace on the host in train. i dont think this logic should be condtional on the presence of vms. > > > Rule 4: > > If operator sets ``vcpu_pin_set`` config option in Train. > > 1. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is empty, that means this compute > node is used for non-pinned instances in the previous release, so raise an error otherwise report it as PCPU > inventory. agin this is not in the spec. what the spec says for if vcpu_pin_set is defiend is we will report inventorys of both VCPU and PCPUs for all cpus in the vcpu_pin_set > > 2. If no instances, report inventory as PCPU. again this should not be condtional on the presence of vms. > > > Rule 5: > > If operator sets ``vcpu_pin_set`` and ``[compute] cpu_dedicated_set`` or ``[compute] cpu_shared_set`` config options > in Train > > 1. Simply raise an error this is the only case were we "rasise" and error and refuse to start the compute node. > > > Above business rules 3 and 4 are very important in order to solve the scheduler pre-filter issue highlighted in my > previous email. we explctly do not want to have the behavior in 3 and 4 specificly the logic of checking the instances. > > > As of today, in either case, `vcpu_pin_set`` is set or not set on the compute node, it can used for both pinned or > non-pinned instances depending on whether this host belongs to an aggregate with “pinned” metadata. But as per > business rule #3 , if ``vcpu_pin_set`` is not set, we are considering it to be used for non-pinned instances > only. Do you think this could cause an issue in providing backward compatibility? yes the rule you have listed above will cause issue for upgrades and we rejected similar rules in the spec. i have not read your previous email which ill look at next but we spent a long time debating how this should work in the spec design and i would prefer to stick to what the spec currently states. > > > Please provide your suggestions on the above business rules. > > > [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 409 > > > > Thanks and Regards, > > -Bhagyashri Shewale- > > ________________________________ > From: Shewale, Bhagyashri > Sent: Wednesday, June 12, 2019 6:10:04 PM > To: openstack-discuss at lists.openstack.org; openstack at fried.cc; smooney at redhat.com; sfinucan at redhat.com; > jaypipes at gmail.com > Subject: [nova] Spec: Standardize CPU resource tracking > > > Hi All, > > > Currently I am working on implementation of cpu pinning upgrade part as mentioned in the spec [1]. > > > While implementing the scheduler pre-filter as mentioned in [1], I have encountered one big issue: > > > Proposed change in spec: In scheduler pre-filter we are going to alias request_spec.flavor.extra_spec and > request_spec.image.properties form ``hw:cpu_policy`` to ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. > > > So when user will create a new instance or execute instance actions like shelve, unshelve, resize, evacuate and > migration post upgrade it will go through scheduler pre-filter which will set alias for `hw:cpu_policy` in > request_spec flavor ``extra specs`` and image metadata properties. In below particular case, it won’t work:- > > > For example: > > > I have two compute nodes say A and B: > > > On Stein: > > > Compute node A configurations: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) > > > Compute node B Configuration: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) > > > On Train, two possible scenarios: > > Compute node A configurations: (Consider the new cpu pinning implementation is merged into Train) > > vcpu_pin_set=0-3 (Keep same settings as in Stein) > > > Compute node B Configuration: (Consider the new cpu pinning implementation is merged into Train) > > cpu_dedicated_set=0-3 (change to the new config option) > > 1. Consider that one instance say `test ` is created using flavor having old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") in Stein release and now upgraded Nova to Train with the above > configuration. > 2. Now when user will perform instance action say shelve/unshelve scheduler pre-filter will change the > request_spec flavor extra spec from ``hw:cpu_policy`` to ``resources=PCPU:$`` which ultimately will > return only compute node B from placement service. Here, we expect it should have retuned both Compute A and Compute > B. > 3. If user creates a new instance using old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") on Train release with the above configuration then it will return > only compute node B from placement service where as it should have returned both compute Node A and B. > > Problem: As Compute node A is still configured to be used to boot instances with dedicated CPUs same behavior as > Stein, it will not be returned by placement service due to the changes in the scheduler pre-filter logic. > > > Propose changes: > > > Earlier in the spec [2]: The online data migration was proposed to change flavor extra specs and image metadata > properties of request_spec and instance object. Based on the instance host, we can get the NumaTopology of the host > which will contain the new configuration options set on the compute host. Based on the NumaTopology of host, we can > change instance and request_spec flavor extra specs. > > 1. Remove cpu_policy from extra specs > 2. Add “resources:PCPU=” in extra specs > > > We can also change the flavor extra specs and image metadata properties of instance and request_spec object using the > reshape functionality. > > > Please give us your feedback on the proposed solution so that we can update specs accordingly. > > > [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 > > [2]: https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst > > > Thanks and Regards, > > -Bhagyashri Shewale- > > Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may > contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise > the sender by replying promptly to this email and then delete and destroy this email and any attachments without any > further use, copying or forwarding. From smooney at redhat.com Thu Jun 13 11:32:02 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 13 Jun 2019 12:32:02 +0100 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: Message-ID: <57be9f4f426d3a8ac07c79e3e0d1f4737b09b6af.camel@redhat.com> On Wed, 2019-06-12 at 09:10 +0000, Shewale, Bhagyashri wrote: > Hi All, > > > Currently I am working on implementation of cpu pinning upgrade part as mentioned in the spec [1]. > > > While implementing the scheduler pre-filter as mentioned in [1], I have encountered one big issue: > > > Proposed change in spec: In scheduler pre-filter we are going to alias request_spec.flavor.extra_spec and > request_spec.image.properties form ``hw:cpu_policy`` to ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. > > > So when user will create a new instance or execute instance actions like shelve, unshelve, resize, evacuate and > migration post upgrade it will go through scheduler pre-filter which will set alias for `hw:cpu_policy` in > request_spec flavor ``extra specs`` and image metadata properties. In below particular case, it won’t work:- > > > For example: > > > I have two compute nodes say A and B: > > > On Stein: > > > Compute node A configurations: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) vcpu_pin_set does not mean that the host was used for pinned instances https://that.guru/blog/cpu-resources/ > > > Compute node B Configuration: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) > > > On Train, two possible scenarios: > > Compute node A configurations: (Consider the new cpu pinning implementation is merged into Train) > > vcpu_pin_set=0-3 (Keep same settings as in Stein) > > > Compute node B Configuration: (Consider the new cpu pinning implementation is merged into Train) > > cpu_dedicated_set=0-3 (change to the new config option) > > 1. Consider that one instance say `test ` is created using flavor having old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") in Stein release and now upgraded Nova to Train with the above > configuration. > 2. Now when user will perform instance action say shelve/unshelve scheduler pre-filter will change the > request_spec flavor extra spec from ``hw:cpu_policy`` to ``resources=PCPU:$`` it wont remove hw:cpu_policy it will just change the resouces=VCPU:$ -> resources=PCPU:$ > which ultimately will return only compute node B from placement service. that is incorrect both a and by will be returned. the spec states that for host A we report an inventory of 4 VCPUs and an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so both host will be returned assuming $ <=4 > Here, we expect it should have retuned both Compute A and Compute B. it will > 3. If user creates a new instance using old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") on Train release with the above configuration then it will return > only compute node B from placement service where as it should have returned both compute Node A and B. that is what would have happend in the stien version of the spec and we changed the spec specifically to ensure that that wont happen. in the train version of the spec you will get both host as candates to prevent this upgrade impact. > > Problem: As Compute node A is still configured to be used to boot instances with dedicated CPUs same behavior as > Stein, it will not be returned by placement service due to the changes in the scheduler pre-filter logic. > > > Propose changes: > > > Earlier in the spec [2]: The online data migration was proposed to change flavor extra specs and image metadata > properties of request_spec and instance object. Based on the instance host, we can get the NumaTopology of the host > which will contain the new configuration options set on the compute host. Based on the NumaTopology of host, we can > change instance and request_spec flavor extra specs. > > 1. Remove cpu_policy from extra specs > 2. Add “resources:PCPU=” in extra specs > > > We can also change the flavor extra specs and image metadata properties of instance and request_spec object using the > reshape functionality. > > > Please give us your feedback on the proposed solution so that we can update specs accordingly. i am fairly stongly opposed to useing an online data migration to modify the request spec to reflect the host they landed on. this speficic problem is why the spec was changed in the train cycle to report dual inventoryis of VCPU and PCPU if vcpu_pin_set is the only option set or of no options are set. > > > [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 > > [2]: https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst > > > Thanks and Regards, > > -Bhagyashri Shewale- > > Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may > contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise > the sender by replying promptly to this email and then delete and destroy this email and any attachments without any > further use, copying or forwarding. From ignaziocassano at gmail.com Thu Jun 13 12:22:53 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 13 Jun 2019 14:22:53 +0200 Subject: [cinder] nfs in kolla Message-ID: Hello, I' just deployed ocata with kolla and my cinder backend is nfs. Volumes are created successfully but live migration does not work. While cinder_volume container mounts the cinder nfs backend, the cinder api not and during live migration the cinder api logs reports errors accessing volumes : Stderr: u"qemu-img: Could not open '/var/lib/cinder/mnt/451bacc11bd88b51ce7bdf31aa97cf39/volume-4889a547-0a0d-440e-8b50-413285b5979c' Any help, please ? Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Jun 13 13:45:11 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 08:45:11 -0500 Subject: Are most grenade plugin projects doing upgrades wrong? Message-ID: While fixing how the watcher grenade plugin is cloning the watcher repo for the base (old) side [1] to hard-code from stable/rocky to stable/stien, I noticed that several other projects with grenade plugins aren't actually setting a stable branch when cloning the plugin for the old side [2]. So I tried removing the stable/stein branch from the base plugin clone in watcher [3] and grenade cloned from master for the base (old) side [4]. Taking designate as an example, I see the same thing happening for it's grenade run [5]. Is there something I'm missing here or are most of the openstack projects running grenade via plugin actually just upgrading from master to master rather than n-1 to master? [1] https://review.opendev.org/#/c/664610/1/devstack/upgrade/settings [2] http://codesearch.openstack.org/?q=base%20enable_plugin&i=nope&files=&repos= [3] https://review.opendev.org/#/c/664610/2/devstack/upgrade/settings [4] http://logs.openstack.org/10/664610/2/check/watcher-grenade/ad2e068/logs/grenade.sh.txt.gz#_2019-06-12_19_19_36_874 [5] http://logs.openstack.org/47/662647/6/check/designate-grenade-pdns4/0b8968f/logs/grenade.sh.txt.gz#_2019-06-09_23_10_03_034 -- Thanks, Matt From ildiko.vancsa at gmail.com Thu Jun 13 14:13:49 2019 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Thu, 13 Jun 2019 16:13:49 +0200 Subject: [edge] China Mobile Edge platform evaluation presentation next Tuesday on the Edge WG call Message-ID: Hi, I attended a presentation today from Qihui Zhao about China Mobile’s experience on evaluation different edge deployment models with various software components. As many of the evaluated components are part of OpenStack and/or StarlingX I invited her for next week’s Edge Computing Group call (Tuesday, June 18) to share their findings with the working group and everyone who is interested. For agenda and call details please visit this wiki: https://wiki.openstack.org/wiki/Edge_Computing_Group#Meetings Please let me know if you have any questions. Thanks and Best Regards, Ildikó From robson.rbarreto at gmail.com Thu Jun 13 14:28:49 2019 From: robson.rbarreto at gmail.com (Robson Ramos Barreto) Date: Thu, 13 Jun 2019 11:28:49 -0300 Subject: [openstack-helm] custom container images for helm In-Reply-To: References: Message-ID: Hi Steve Ok Thank you. I'll have a look into the openstack-images repository. Yes sure. For now I'm evaluating if helm attend our needs so if it is I can contributing. Thank you Regards On Wed, Jun 12, 2019 at 6:14 PM Steve Wilkerson wrote: > Hey Robson, > > We’ve recently started building images out of the openstack-helm-images > repository. Currently, we use LOCI to build ubuntu based images for > releases Ocata through Rocky and leap15 images for the Rocky release. > > We’ve recently started work on the multi-distro support spec which also > added overrides and jobs required for the leap15 based images for Rocky. > We’d love to see support added for centos images added to both > openstack-helm-images and the openstack-helm charts themselves (and for > releases beyond Rocky), but we just haven’t gotten there yet. If you’re > interested in contributing and getting your hands dirty, we’d love to help > provide guidance and help here. > > In regards to the Kolla images, it’s been awhile since I’ve used them > myself so I can’t speak much there. > > Cheers, > Steve > > On Wed, Jun 12, 2019 at 3:45 PM Robson Ramos Barreto < > robson.rbarreto at gmail.com> wrote: > >> Hi all >> >> I saw in the docker hub that there is just until rocky ubuntu xenial >> version. >> >> I'd like to know how can I create my own images centos-based from new >> versions like Stein to be used with the helm charts, if is there any >> specific customization to works with helm or, for example, if can I use >> the kolla images. >> >> Thank you >> >> Regards >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilkers.steve at gmail.com Thu Jun 13 14:31:43 2019 From: wilkers.steve at gmail.com (Steve Wilkerson) Date: Thu, 13 Jun 2019 09:31:43 -0500 Subject: [openstack-helm] custom container images for helm In-Reply-To: References: Message-ID: Hey Robson, That sounds great. Please don’t hesitate to reach out to us on #openstack-helm if you’ve got any questions or concerns we can address as you look to see if openstack-helm can fit your use case(s) Steve On Thu, Jun 13, 2019 at 9:29 AM Robson Ramos Barreto < robson.rbarreto at gmail.com> wrote: > Hi Steve > > Ok Thank you. I'll have a look into the openstack-images repository. > > Yes sure. For now I'm evaluating if helm attend our needs so if it is I > can contributing. > > Thank you > > Regards > > > > On Wed, Jun 12, 2019 at 6:14 PM Steve Wilkerson > wrote: > >> Hey Robson, >> >> We’ve recently started building images out of the openstack-helm-images >> repository. Currently, we use LOCI to build ubuntu based images for >> releases Ocata through Rocky and leap15 images for the Rocky release. >> >> We’ve recently started work on the multi-distro support spec which also >> added overrides and jobs required for the leap15 based images for Rocky. >> We’d love to see support added for centos images added to both >> openstack-helm-images and the openstack-helm charts themselves (and for >> releases beyond Rocky), but we just haven’t gotten there yet. If you’re >> interested in contributing and getting your hands dirty, we’d love to help >> provide guidance and help here. >> >> In regards to the Kolla images, it’s been awhile since I’ve used them >> myself so I can’t speak much there. >> >> Cheers, >> Steve >> >> On Wed, Jun 12, 2019 at 3:45 PM Robson Ramos Barreto < >> robson.rbarreto at gmail.com> wrote: >> >>> Hi all >>> >>> I saw in the docker hub that there is just until rocky ubuntu xenial >>> version. >>> >>> I'd like to know how can I create my own images centos-based from new >>> versions like Stein to be used with the helm charts, if is there any >>> specific customization to works with helm or, for example, if can I use >>> the kolla images. >>> >>> Thank you >>> >>> Regards >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.settle at outlook.com Thu Jun 13 16:01:32 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Thu, 13 Jun 2019 16:01:32 +0000 Subject: [tc] Recap for Technical Committee Meeting 6 June 2019 @ 1400 UTC In-Reply-To: <02f401d52174$9c8b1060$d5a13120$@openstack.org> References: <02f401d52174$9c8b1060$d5a13120$@openstack.org> Message-ID: Hi Alan, Thanks for your response :) I hope you don't mind, I'm replying back to you and the openstack-discuss list so the other TC members also working on the Help Most Wanted text can review your thoughts. In the etherpad, you can just start writing and it will give you an individual colour assigned to your user. If you feel like we won't see/know who you are, just put your initials next to your comments :) Thanks so much for offering to help. I've transferred over your thoughts below into the etherpad for now. Thanks, Alex On 13/06/2019 00:14, Alan Clark wrote: > Hey Alexandra, > > As I mentioned during the TC meeting I would love to help with the "Help Most Wanted" text. > > I took a look at the etherpad for the documentation role [1] > > I would like to offer a couple suggested changes. I wasn't sure where to post them, so am pinging you directly. > > My first comment is around the audience. Who is most likely to fit and fill this role. I think the audience we are after for this posting are those that take the OpenStack documentation to develop for re-use and distribution. Those are the most likely persons to convince to take a higher contribution and leadership role. Which is what this posting targets. > > The opening description section conveys the documentation teams pain and struggles. I think a more effective opening would be to convey the business and personal benefits the audience gains from contributing. They have to sell this to their boss. Boss wants to solve their pain not the documentation teams. If we agree that the most likely audience to contribute in this posted role, then their benefits are more complete documentation with less effort. Being able to leverage and re-use the community contributed text means much less text that the audience person has to write. Helping the community effort helps steer the contributed text to fill the needs and gaps that you find of most need. The first paragraph starts in the right direction but I suggest replacing the second paragraph with these ideas. I’m sure you can elaborate these ideas better than me. > > My second thought is that this posting should convey that it’s easy to do and get started. In fact you could turn it into an FAQ style. The First timers page is full of great material and a good page to point to: https://docs.openstack.org/doc-contrib-guide/quickstart/first-timers.html > Address the question of how easy it is to get started, that they use the tools they commonly use and here’s where to get their questions and concerns answered. > > Thanks, > AlanClark > > [1] https://etherpad.openstack.org/p/2019-upstream-investment-opportunities-refactor > > > > > >> -----Original Message----- >> From: Alexandra Settle [mailto:a.settle at outlook.com] >> Sent: Thursday, June 06, 2019 9:51 AM >> To: openstack-discuss at lists.openstack.org >> Subject: [tc] Recap for Technical Committee Meeting 6 June 2019 @ 1400 UTC >> >> Hello all, >> >> Thanks to those who joined the TC meeting today and running through it with me >> at the speed of light. Gif game was impeccably strong and that's primarily what I >> like about this community. >> >> For a recap of the meeting, please see the eavesdrop [0] for full detailed logs and >> action items. All items in the agenda [1] were covered and no major concerns >> raised. >> >> Next meeting will be on the 8th of July 2019. >> >> Cheers, >> >> Alex >> >> [0] http://eavesdrop.openstack.org/meetings/tc/2019/tc.2019-06-06-14.00.txt >> >> [1] >> http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006877.html >> >> > From elod.illes at ericsson.com Thu Jun 13 16:13:33 2019 From: elod.illes at ericsson.com (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Thu, 13 Jun 2019 16:13:33 +0000 Subject: Are most grenade plugin projects doing upgrades wrong? In-Reply-To: References: Message-ID: Actually, if you check [4] and [5], they seemingly check out master, but just below the linked line like 20 lines, there's a 'git show --oneline' and 'head -1' which shows that the branch is clearly stable/stein. So it seems there's no need for explicit branch settings in [1]... but I haven't found the code lines which cause this behavior, yet... BR, Előd On 2019. 06. 13. 15:45, Matt Riedemann wrote: > While fixing how the watcher grenade plugin is cloning the watcher > repo for the base (old) side [1] to hard-code from stable/rocky to > stable/stien, I noticed that several other projects with grenade > plugins aren't actually setting a stable branch when cloning the > plugin for the old side [2]. So I tried removing the stable/stein > branch from the base plugin clone in watcher [3] and grenade cloned > from master for the base (old) side [4]. Taking designate as an > example, I see the same thing happening for it's grenade run [5]. > > Is there something I'm missing here or are most of the openstack > projects running grenade via plugin actually just upgrading from > master to master rather than n-1 to master? > > [1] https://review.opendev.org/#/c/664610/1/devstack/upgrade/settings > [2] > http://codesearch.openstack.org/?q=base%20enable_plugin&i=nope&files=&repos= > [3] https://review.opendev.org/#/c/664610/2/devstack/upgrade/settings > [4] > http://logs.openstack.org/10/664610/2/check/watcher-grenade/ad2e068/logs/grenade.sh.txt.gz#_2019-06-12_19_19_36_874 > [5] > http://logs.openstack.org/47/662647/6/check/designate-grenade-pdns4/0b8968f/logs/grenade.sh.txt.gz#_2019-06-09_23_10_03_034 > -- > > Thanks, > > Matt From mriedemos at gmail.com Thu Jun 13 17:37:40 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 12:37:40 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: <8cd10d55-9656-7b50-cf92-1116eed755bc@gmail.com> On 6/13/2019 4:04 AM, Chris Dent wrote: > If/when we're modelling shared disk as a shared resource provider > does this get any more complicated? Does the part of an allocation > that is DISK_GB need special handling. Nova doesn't create nor manage shared resource providers today, so deleting the compute service and its related compute node(s) and their related resource provider(s) shouldn't have anything to do with a shared resource provider. -- Thanks, Matt From mriedemos at gmail.com Thu Jun 13 17:40:18 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 12:40:18 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <1cdb1bf8-2fea-79e2-4eb9-041eae4c6be7@debian.org> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> <1cdb1bf8-2fea-79e2-4eb9-041eae4c6be7@debian.org> Message-ID: <2288a623-358e-9c3e-92d6-461d4b43b4af@gmail.com> On 6/12/2019 5:50 PM, Thomas Goirand wrote: >> 1. Don't delete the compute service if we can't cleanup all resource >> providers - make sure to not orphan any providers. Manual cleanup may be >> necessary by the operator. > I'd say that this option is ok-ish*IF* the operators are given good > enough directives saying what to do. It would really suck if we just get > an error, and don't know what resource cleanup is needed. But if the > error is: > > Cannot delete nova-compute on host mycloud-compute-5. > Instances still running: > 623051e7-4e0d-4b06-b977-1d9a73e6e6e1 > f8483448-39b5-4981-a731-5f4eeb28592c > Currently live-migrating: > 49a12659-9dc6-4b07-b38b-e0bf2a69820a > Not confirmed migration/resize: > cc3d4311-e252-4922-bf04-dedc31b3a425 I don't think we'll realistically generate a report like this for an error response in the API. While we could figure this out, for the baremetal case we could have hundreds of instances still managed by that compute service host which is a lot of data to generate for an error response. I guess it could be a warning dumped into the API logs but it could still be a lot of data to crunch and log. -- Thanks, Matt From mriedemos at gmail.com Thu Jun 13 17:44:52 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 12:44:52 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <1cdb1bf8-2fea-79e2-4eb9-041eae4c6be7@debian.org> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> <1cdb1bf8-2fea-79e2-4eb9-041eae4c6be7@debian.org> Message-ID: On 6/12/2019 5:50 PM, Thomas Goirand wrote: >> 3. Other things I'm not thinking of? Should we add a force parameter to >> the API to allow the operator to forcefully delete (#2 above) if #1 >> fails? Force parameters are hacky and usually seem to cause more >> problems than they solve, but it does put the control in the operators >> hands. > Let's say the --force is just doing the resize --confirm for the > operator, or do an evacuate, then that's fine (and in fact, a good idea, > automations are great...). If it's going to create a mess in the DB, > then it's IMO a terrible idea. I really don't think we're going to change the delete compute service API into an orchestrator that auto-confirms/evacuates the node(s) for you. This is something an external agent / script / service could determine, perform whatever actions, and retry, based on existing APIs (like the migrations API). The one catch is the evacuated instance allocations - there is not much you can do about those from the compute API, you would have to cleanup the allocations for those via the placement API directly. > > However, I see a case that may happen: image a compute node is > completely broken (think: broken motherboard...), then probably we do > want to remove everything that's in there, and want to handle the case > where nova-compute doesn't even respond. This very much is a real life > scenario. If your --force is to address this case, then why not! Though > again and of course, we don't want a mess in the db... :P Well, that's where a force parameter would be available to the admin to decide what they want to happen depending on the situation rather than just have nova guess and hope it's what you wanted. We could check if the service is "up" using the service group API and make some determinations that way, i.e. if there are still allocations on the thing and it's down, assume you're deleting it because it's dead and you want it gone so we just cleanup the allocations for you. -- Thanks, Matt From openstack at fried.cc Thu Jun 13 17:45:24 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 13 Jun 2019 12:45:24 -0500 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> Message-ID: <8e240b9c-c5f0-3d1f-5b16-c543bbbc525b@fried.cc> We discussed this today in the nova meeting [1] with a little bit of followup in the main channel after the meeting closed [2]. There seems to be general support (or at least not objection) for implementing "resize" for ironic, limited to: - same host [3] - just this feature (i.e. "hyperthreading") or possibly "anything deploy template" And the consensus was that it's time to put this into a spec. There was a rocky spec [4] that has some overlap and could be repurposed; or a new one could be introduced. efried [1] http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-06-13-14.00.log.html#l-309 [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-06-13.log.html#t2019-06-13T15:02:10 (interleaved) [3] an acknowledged wrinkle here was that we need to be able to detect at the API level that we're dealing with an Ironic instance, and ignore the allow_resize_to_same_host option (because always forcing same host) [4] https://review.opendev.org/#/c/449155/ From mriedemos at gmail.com Thu Jun 13 17:47:58 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 12:47:58 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: On 6/12/2019 6:26 PM, Mohammed Naser wrote: > I would be more in favor of failing a delete if it will cause the cloud to reach > an inconsistent state than brute-force a delete leaving you in a messy state > where you need to login to the database to unkludge things. I'm not sure that the cascading delete (option #2) case would leave things in a messy state since we'd delete the stuff that we're actually orphaning today. If we don't cascade delete for you and just let the request fail if there are still allocations (option #1), then like I said in a reply to zigo, there are APIs available to figure out what's still being used on the host and then clean those up - but that's the manual part I'm talking about since nova wouldn't be doing it for you. -- Thanks, Matt From cdent+os at anticdent.org Thu Jun 13 17:49:14 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Thu, 13 Jun 2019 18:49:14 +0100 (BST) Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <8cd10d55-9656-7b50-cf92-1116eed755bc@gmail.com> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> <8cd10d55-9656-7b50-cf92-1116eed755bc@gmail.com> Message-ID: On Thu, 13 Jun 2019, Matt Riedemann wrote: > On 6/13/2019 4:04 AM, Chris Dent wrote: >> If/when we're modelling shared disk as a shared resource provider >> does this get any more complicated? Does the part of an allocation >> that is DISK_GB need special handling. > > Nova doesn't create nor manage shared resource providers today, so deleting > the compute service and its related compute node(s) and their related > resource provider(s) shouldn't have anything to do with a shared resource > provider. Yeah, "today". That's why I said "If/when". If we do start doing that, does that make things more complicated in a way we may wish to think about _now_ while we're designing today's solution? I'd like to think that we can just ignore it for now and adapt as things change in the future, but we're all familiar with the way that everything is way more connected and twisted up in a scary hairy ball in nova than we'd all like. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From mriedemos at gmail.com Thu Jun 13 18:00:39 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 13:00:39 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> <98a6fb1f-7fd2-20da-4a5d-53821422b015@fried.cc> Message-ID: <00d1892c-809e-2faf-e58b-5f95f510504f@gmail.com> On 6/12/2019 7:05 PM, Sean Mooney wrote: >> If we can distinguish between the migratey ones and the evacuatey ones, >> maybe we fail on the former (forcing them to wait for completion) and >> automatically delete the latter (which is almost always okay for the >> reasons you state; and recoverable via heal if it's not okay for some >> reason). > for a cold migration the allcoation will be associated with a migration object > for evacuate which is basically a rebuild to a different host we do not have a > migration object so the consumer uuid for the allcotion are still associated with > the instace uuid not a migration uuid. so technically we can tell yes > but only if we pull back the allcoation form placmenet and then iterate over > them and check if we have a migration object or an instance that has the same > uuid. Evacuate operations do have a migration record but you're right that we don't move the source node allocations from the instance to the migration prior to scheduling (like we do for cold and live migration). So after the evacuation, the instance consumer has allocations on both the source and dest node. If we did what Eric is suggesting, which is kind of a mix between option 1 and option 2, then I'd do the same query as we have on restart of the compute service [1] to find migration records for evacuations concerning the host we're being asked to delete within a certain status and clean those up, then (re?)try the resource provider delete - and if that fails, then we punt and fail the request to delete the compute service because we couldn't safely delete the resource provider (and we don't want to orphan it for the reasons mnaser pointed out). [1] https://github.com/openstack/nova/blob/61558f274842b149044a14bbe7537b9f278035fd/nova/compute/manager.py#L651 -- Thanks, Matt From openstack at fried.cc Thu Jun 13 18:36:04 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 13 Jun 2019 13:36:04 -0500 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <91efe32e80b7c24b0bfe5875ecd053513b7fd443.camel@redhat.com> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> <9876798ff1a3700fcc0f7202bbfe12da1314a23e.camel@redhat.com> <91efe32e80b7c24b0bfe5875ecd053513b7fd443.camel@redhat.com> Message-ID: <6c41c425-71b5-5095-7acc-2198f7ad1d92@fried.cc> We discussed this in the nova meeting today [1] with a little spillover in the -nova channel afterward [2]. The consensus was: Don't muck with resource provider traits at all during aggregate operations. The operator must do that bit manually. As a stretch goal, we can write a simple utility to help with this. This was discussed as option (e) earlier in this thread. The spec needs to get updated accordingly. Thanks, efried [1] http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-06-13-14.00.log.html#l-267 [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-06-13.log.html#t2019-06-13T15:02:06-2 (interleaved) From mriedemos at gmail.com Thu Jun 13 18:45:31 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 13:45:31 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: On 6/12/2019 3:38 PM, Matt Riedemann wrote: > What are our options? > > 1. Don't delete the compute service if we can't cleanup all resource > providers - make sure to not orphan any providers. Manual cleanup may be > necessary by the operator. > > 2. Change delete_resource_provider cascade=True logic to remove all > allocations for the provider before deleting it, i.e. for > not-yet-complete migrations and evacuated instances. For the evacuated > instance allocations this is likely OK since restarting the source > compute service is going to do that cleanup anyway. Also, if you delete > the source compute service during a migration, confirming or reverting > the resize later will likely fail since we'd be casting to something > that is gone (and we'd orphan those allocations). Maybe we need a > functional recreate test for the unconfirmed migration scenario before > deciding on this? > > 3. Other things I'm not thinking of? Should we add a force parameter to > the API to allow the operator to forcefully delete (#2 above) if #1 > fails? Force parameters are hacky and usually seem to cause more > problems than they solve, but it does put the control in the operators > hands. > > If we did remove allocations for an instance when deleting it's compute > service host, the operator should be able to get them back by running > the "nova-manage placement heal_allocations" CLI - assuming they restart > the compute service on that host. This would have to be tested of course. After talking a bit about this in IRC today, I'm thinking about a phased approach to this problem with these changes in order: 1. Land [1] so we're at least trying to cleanup all providers for a given compute service (the ironic case). 2. Implement option #1 above where we fail to delete the compute service if any of the resource providers cannot be deleted. We'd have stuff in the logs about completing migrations and trying again, and failing that cleanup allocations for old evacuations. Rather than dump all of that info into the logs, it would probably be better to just write up a troubleshooting doc [2] for it and link to that from the logs, then the doc can reference APIs and CLIs to use for the cleanup scenarios. 3. Implement option #2 above where we cleanup allocations but only for evacuations - like the compute service would do when it's restarted anyway. This would leave the case that we don't delete the compute service for allocations related to other types of migrations - in-progress or unconfirmed (or failed and leaked) migrations that would require operator investigation. We could build on that in the future if we wanted to toy with the idea of checking the service group API for whether or not the service is up or if we wanted to add a force option to just tell nova to fully cascade delete everything, but I don't really want to get hung up on those edge cases right now. How do people feel about this plan? [1] https://review.opendev.org/#/c/657016/ [2] https://docs.openstack.org/nova/latest/admin/support-compute.html -- Thanks, Matt From openstack at fried.cc Thu Jun 13 19:21:58 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 13 Jun 2019 14:21:58 -0500 Subject: [nova] Help wanted with bug triage! Message-ID: <5c38071e-a19e-123a-1c35-01ad9baeeed9@fried.cc> Folks- Nova's queue of untriaged bugs [1] has been creeping slowly upward lately. We could really use some focused effort to get this back under control. We're not even (necessarily) talking about *fixing* bugs - though that's great too. We're talking about triaging [2]. If every nova contributor (you don't need to be a core) triaged just one bug a day, it wouldn't take long for things to be back in manageable territory. Thanks in advance. efried [1] https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New [2] https://wiki.openstack.org/wiki/Nova/BugTriage From zigo at debian.org Thu Jun 13 21:03:58 2019 From: zigo at debian.org (Thomas Goirand) Date: Thu, 13 Jun 2019 23:03:58 +0200 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <2288a623-358e-9c3e-92d6-461d4b43b4af@gmail.com> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> <1cdb1bf8-2fea-79e2-4eb9-041eae4c6be7@debian.org> <2288a623-358e-9c3e-92d6-461d4b43b4af@gmail.com> Message-ID: On 6/13/19 7:40 PM, Matt Riedemann wrote: > On 6/12/2019 5:50 PM, Thomas Goirand wrote: >>> 1. Don't delete the compute service if we can't cleanup all resource >>> providers - make sure to not orphan any providers. Manual cleanup may be >>> necessary by the operator. >> I'd say that this option is ok-ish*IF*  the operators are given good >> enough directives saying what to do. It would really suck if we just get >> an error, and don't know what resource cleanup is needed. But if the >> error is: >> >> Cannot delete nova-compute on host mycloud-compute-5. >> Instances still running: >> 623051e7-4e0d-4b06-b977-1d9a73e6e6e1 >> f8483448-39b5-4981-a731-5f4eeb28592c >> Currently live-migrating: >> 49a12659-9dc6-4b07-b38b-e0bf2a69820a >> Not confirmed migration/resize: >> cc3d4311-e252-4922-bf04-dedc31b3a425 > > I don't think we'll realistically generate a report like this for an > error response in the API. While we could figure this out, for the > baremetal case we could have hundreds of instances still managed by that > compute service host which is a lot of data to generate for an error > response. > > I guess it could be a warning dumped into the API logs but it could > still be a lot of data to crunch and log. In such case, in the error message, just suggest what to do to fix the issue. I once worked in a company that made me change every error message so that each of them contained hints on what to do to fix the problem. Since, I often suggest it. Cheers, Thomas Goirand (zigo) From mriedemos at gmail.com Thu Jun 13 21:18:01 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 16:18:01 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> <1cdb1bf8-2fea-79e2-4eb9-041eae4c6be7@debian.org> <2288a623-358e-9c3e-92d6-461d4b43b4af@gmail.com> Message-ID: <2e846728-8e46-4382-2d1a-55f7a6324a33@gmail.com> On 6/13/2019 4:03 PM, Thomas Goirand wrote: > I once worked in a company that made me change every error message so > that each of them contained hints on what to do to fix the problem. > Since, I often suggest it. Heh, same and while it was grueling for the developers it left an impression on me and I tend to try and nack people's changes for crappy error messages as a result. -- Thanks, Matt From anlin.kong at gmail.com Thu Jun 13 22:55:45 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Fri, 14 Jun 2019 10:55:45 +1200 Subject: [nova] Admin user cannot create vm with user's port? In-Reply-To: <0956984a0d34141997110fa28091cdd37ac24c50.camel@redhat.com> References: <16b4f310b40.b15ee317318066.1276719462324180582@ghanshyammann.com> <0956984a0d34141997110fa28091cdd37ac24c50.camel@redhat.com> Message-ID: On Thu, Jun 13, 2019 at 10:48 PM Sean Mooney wrote: > On Thu, 2019-06-13 at 21:22 +1200, Lingxian Kong wrote: > > Yeah, the api allows to specify port. What i mean is, the vm creation > will > > fail for admin user if port belongs to a non-admin user. An exception is > > raised from nova-compute. > > i believe this is intentional. > > we do not currently allow you to trasfer ownerwhip of a vm form one user > or proejct to another. > but i also believe we currently do not allow a vm to be create from > resouces with different owners > That's not true. As the admin user, you are allowed to create a vm using non-admin's network, security group, image, volume, etc but just not port. There is use case for admin user to create vms but using non-admin's resources for debugging or other purposes. What's more, the exception is raised in nova-compute not nova-api, which i assume it should be supported if it's allowed in the api layer. Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Thu Jun 13 22:57:10 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Fri, 14 Jun 2019 10:57:10 +1200 Subject: [nova] Admin user cannot create vm with user's port? In-Reply-To: References: <16b4f310b40.b15ee317318066.1276719462324180582@ghanshyammann.com> <0956984a0d34141997110fa28091cdd37ac24c50.camel@redhat.com> Message-ID: Another use case is coming from the services (e.g. Trove) which will create vms in the service tenant but using the resources (e.g. network or port) given by the non-admin user. Best regards, Lingxian Kong Catalyst Cloud On Fri, Jun 14, 2019 at 10:55 AM Lingxian Kong wrote: > On Thu, Jun 13, 2019 at 10:48 PM Sean Mooney wrote: > >> On Thu, 2019-06-13 at 21:22 +1200, Lingxian Kong wrote: >> > Yeah, the api allows to specify port. What i mean is, the vm creation >> will >> > fail for admin user if port belongs to a non-admin user. An exception is >> > raised from nova-compute. >> >> i believe this is intentional. >> >> we do not currently allow you to trasfer ownerwhip of a vm form one user >> or proejct to another. >> but i also believe we currently do not allow a vm to be create from >> resouces with different owners >> > > That's not true. As the admin user, you are allowed to create a vm using > non-admin's network, security group, image, volume, etc but just not port. > > There is use case for admin user to create vms but using non-admin's > resources for debugging or other purposes. > > What's more, the exception is raised in nova-compute not nova-api, which i > assume it should be supported if it's allowed in the api layer. > > Best regards, > Lingxian Kong > Catalyst Cloud > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrist at redhat.com Thu Jun 13 23:11:08 2019 From: jrist at redhat.com (Jason Rist) Date: Thu, 13 Jun 2019 17:11:08 -0600 Subject: Retiring TripleO-UI - no longer supported In-Reply-To: <0583152c-5a85-a34d-577e-e7789cac344b@suse.com> References: <3924F5DE-314C-4D41-8CEA-DCF7A2A2CDEA@redhat.com> <0583152c-5a85-a34d-577e-e7789cac344b@suse.com> Message-ID: <9C3778CC-9936-4735-9E61-88F5720CC61A@redhat.com> Thanks for pointing this out. These are now up. Jason Rist Red Hat  jrist / knowncitizen > On Jun 6, 2019, at 11:24 PM, Andreas Jaeger wrote: > > On 07/06/2019 06.34, Jason Rist wrote: >> Follow-up - this work is now done. >> >> https://review.opendev.org/#/q/topic:retire_tripleo_ui+(status:open+OR+status:merged) >> > > Not yet for ansible-role-tripleo-ui - please remove the repo from > project-config and governance repo, step 4 and 5 of > https://docs.openstack.org/infra/manual/drivers.html#retiring-a-project > are missing. > > Andreas > -- > Andreas Jaeger aj at suse.com Twitter: jaegerandi > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah > HRB 21284 (AG Nürnberg) > GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 -------------- next part -------------- An HTML attachment was scrubbed... URL: From frode.nordahl at canonical.com Fri Jun 14 08:05:48 2019 From: frode.nordahl at canonical.com (Frode Nordahl) Date: Fri, 14 Jun 2019 10:05:48 +0200 Subject: [charms] Proposing Sahid Orentino Ferdjaoui to the Charms core team In-Reply-To: References: <17abd9ed-e76d-52b3-29b1-6d6ae75161bf@canonical.com> Message-ID: +1 On Tue, May 28, 2019 at 10:37 PM Corey Bryant wrote: > On Fri, May 24, 2019 at 6:35 AM Chris MacNaughton < > chris.macnaughton at canonical.com> wrote: > >> Hello all, >> >> I would like to propose Sahid Orentino Ferdjaoui as a member of the >> Charms core team. >> > > +1 Sahid is a solid contributor and I'm confident he'll use caution, ask > questions, and pull the right people in if needed. > > Corey > >> Chris MacNaughton >> > -- Frode Nordahl Senior Engineer Canonical -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bhagyashri.Shewale at nttdata.com Fri Jun 14 08:35:21 2019 From: Bhagyashri.Shewale at nttdata.com (Shewale, Bhagyashri) Date: Fri, 14 Jun 2019 08:35:21 +0000 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: <3b7955951183eff75b643b0cfc91c88bbf737e62.camel@redhat.com> References: , <3b7955951183eff75b643b0cfc91c88bbf737e62.camel@redhat.com> Message-ID: >> cpu_share_set in stien was used for vm emulator thread and required the instnace to be pinned for it to take effect. >> i.e. the hw:emulator_thread_policy extra spcec currently only works if you had hw_cpu_policy=dedicated. >> so we should not error if vcpu_pin_set and cpu_shared_set are defined, it was valid. what we can do is >> ignore teh cpu_shared_set for schduling and not report 0 VCPUs for this host and use vcpu_pinned_set as PCPUs Thinking of backward compatibility, I agree both of these configuration options ``cpu_shared_set``, ``vcpu_pinned_set`` should be allowed in Train release as well. Few possible combinations in train: A) What if only ``cpu_shared_set`` is set on a new compute node? Report VCPU inventory. B) what if ``cpu_shared_set`` and ``cpu_dedicated_set`` are set on a new compute node? Report VCPU and PCPU inventory. In fact, we want to support both these options so that instance can request both VCPU and PCPU at the same time. If flavor requests VCPU or hw:emulator_thread_policy=share, in both the cases, it will float on CPUs set in ``cpu_shared_set`` config option. C) What if ``cpu_shared_set`` and ``vcpu_pin_set`` are set on a new compute node? Ignore cpu_shared_set and report vcpu_pinned_set as VCPU or PCPU? D) What if ``cpu_shared_set`` and ``vcpu_pin_set`` are set on a upgraded compute node? As you have mentioned, ignore cpu_shared_set and report vcpu_pinned_set as PCPUs provided ``NumaTopology`` ,``pinned_cpus`` attribute is not empty otherwise VCPU. >> we explctly do not want to have the behavior in 3 and 4 specificly the logic of checking the instances. Here we are checking Host ``NumaTopology`` ,``pinned_cpus`` attribute and not directly instances ( if that attribute is not empty that means some instance are running) and this logic will be needed to address above #D case. Regards, -Bhagyashri Shewale- ________________________________ From: Sean Mooney Sent: Thursday, June 13, 2019 8:21:09 PM To: Shewale, Bhagyashri; openstack-discuss at lists.openstack.org; openstack at fried.cc; sfinucan at redhat.com; jaypipes at gmail.com Subject: Re: [nova] Spec: Standardize CPU resource tracking On Thu, 2019-06-13 at 04:42 +0000, Shewale, Bhagyashri wrote: > Hi All, > > > After revisiting the spec [1] again and again, I got to know few points please check and let me know about my > understanding: > > > Understanding: If the ``vcpu_pin_set`` is set on compute node A in the Stein release then we can say that this node > is used to host the dedicated instance on it and if user upgrades from Stein to Train and if operator doesn’t define > ``[compute] cpu_dedicated_set`` set then simply fallback to ``vcpu_pin_set`` and report it as PCPU inventory. that is incorrect if the vcpu_pin_set is defiend it may be used for instance with hw:cpu_policy=dedicated or not. in train if vcpu_pin_set is defiend and cpu_dedicated_set is not defiend then we use vcpu_pin_set to define the inventory of both PCPUs and VCPUs > > > Considering multiple combinations of various configuration options, I think we will need to implement below business > rules so that the issue highlighted in the previous email about the scheduler pre-filter can be solved. > > > Rule 1: > > If operator sets ``[compute] cpu_shared_set`` in Train. > > 1.If pinned instances are found then we can simply say that this compute node is used as dedicated in the previous > release so raise an error that says to set ``[compute] cpu_dedicated_set`` config option otherwise report it as VCPU > inventory. cpu_share_set in stien was used for vm emulator thread and required the instnace to be pinned for it to take effect. i.e. the hw:emulator_thread_policy extra spcec currently only works if you had hw_cpu_policy=dedicated. so we should not error if vcpu_pin_set and cpu_shared_set are defined, it was valid. what we can do is ignore teh cpu_shared_set for schduling and not report 0 VCPUs for this host and use vcpu_pinned_set as PCPUs > > > Rule 2: > > If operator sets ``[compute] cpu_dedicated_set`` in Train. > > 1. Report inventory as PCPU yes if cpu_dedicated_set is set we will report its value as PCPUs > > 2. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is not empty, that means this > compute node is used as dedicated in the previous release and if empty, then raise an error that this compute node is > used as shared compute node in previous release. this was not part of the spec. we could do this but i think its not needed and operators should check this themselves. if we decide to do this check on startup it should only happen if vcpu_pin_set is defined. addtionally we can log an error but we should not prevent the compute node form working and contuing to spawn vms. > > > Rule 3: > > If operator sets None of the options (``[compute] cpu_dedicated_set``, ``[compute] cpu_shared_set``, > ``vcpu_pin_set``) in Train. > > 1. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is not empty, then raise an error > that this compute node is used as dedicated compute node in previous release so set ``[compute] cpu_dedicated_set``, > otherwise report inventory as VCPU. again this is not in the spec and i dont think we should do this. if none of the values are set we should report all cpus as both VCPUs and PCPUs the vcpu_pin_set option was never intended to signal a host was used for cpu pinning it was intoduced for cpu pinning and numa affinity but it was orignally ment to apply to floaing instance and currently contople the number of VCPU reported to the resouce tracker which is used to set the capastiy of the VCPU inventory. you should read https://that.guru/blog/cpu-resources/ for a walk through of this. > > 2. If no instances, report inventory as VCPU. we could do this but i think it will be confusing as to what will happen after we spawn an instnace on the host in train. i dont think this logic should be condtional on the presence of vms. > > > Rule 4: > > If operator sets ``vcpu_pin_set`` config option in Train. > > 1. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is empty, that means this compute > node is used for non-pinned instances in the previous release, so raise an error otherwise report it as PCPU > inventory. agin this is not in the spec. what the spec says for if vcpu_pin_set is defiend is we will report inventorys of both VCPU and PCPUs for all cpus in the vcpu_pin_set > > 2. If no instances, report inventory as PCPU. again this should not be condtional on the presence of vms. > > > Rule 5: > > If operator sets ``vcpu_pin_set`` and ``[compute] cpu_dedicated_set`` or ``[compute] cpu_shared_set`` config options > in Train > > 1. Simply raise an error this is the only case were we "rasise" and error and refuse to start the compute node. > > > Above business rules 3 and 4 are very important in order to solve the scheduler pre-filter issue highlighted in my > previous email. we explctly do not want to have the behavior in 3 and 4 specificly the logic of checking the instances. > > > As of today, in either case, `vcpu_pin_set`` is set or not set on the compute node, it can used for both pinned or > non-pinned instances depending on whether this host belongs to an aggregate with “pinned” metadata. But as per > business rule #3 , if ``vcpu_pin_set`` is not set, we are considering it to be used for non-pinned instances > only. Do you think this could cause an issue in providing backward compatibility? yes the rule you have listed above will cause issue for upgrades and we rejected similar rules in the spec. i have not read your previous email which ill look at next but we spent a long time debating how this should work in the spec design and i would prefer to stick to what the spec currently states. > > > Please provide your suggestions on the above business rules. > > > [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 409 > > > > Thanks and Regards, > > -Bhagyashri Shewale- > > ________________________________ > From: Shewale, Bhagyashri > Sent: Wednesday, June 12, 2019 6:10:04 PM > To: openstack-discuss at lists.openstack.org; openstack at fried.cc; smooney at redhat.com; sfinucan at redhat.com; > jaypipes at gmail.com > Subject: [nova] Spec: Standardize CPU resource tracking > > > Hi All, > > > Currently I am working on implementation of cpu pinning upgrade part as mentioned in the spec [1]. > > > While implementing the scheduler pre-filter as mentioned in [1], I have encountered one big issue: > > > Proposed change in spec: In scheduler pre-filter we are going to alias request_spec.flavor.extra_spec and > request_spec.image.properties form ``hw:cpu_policy`` to ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. > > > So when user will create a new instance or execute instance actions like shelve, unshelve, resize, evacuate and > migration post upgrade it will go through scheduler pre-filter which will set alias for `hw:cpu_policy` in > request_spec flavor ``extra specs`` and image metadata properties. In below particular case, it won’t work:- > > > For example: > > > I have two compute nodes say A and B: > > > On Stein: > > > Compute node A configurations: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) > > > Compute node B Configuration: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) > > > On Train, two possible scenarios: > > Compute node A configurations: (Consider the new cpu pinning implementation is merged into Train) > > vcpu_pin_set=0-3 (Keep same settings as in Stein) > > > Compute node B Configuration: (Consider the new cpu pinning implementation is merged into Train) > > cpu_dedicated_set=0-3 (change to the new config option) > > 1. Consider that one instance say `test ` is created using flavor having old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") in Stein release and now upgraded Nova to Train with the above > configuration. > 2. Now when user will perform instance action say shelve/unshelve scheduler pre-filter will change the > request_spec flavor extra spec from ``hw:cpu_policy`` to ``resources=PCPU:$`` which ultimately will > return only compute node B from placement service. Here, we expect it should have retuned both Compute A and Compute > B. > 3. If user creates a new instance using old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") on Train release with the above configuration then it will return > only compute node B from placement service where as it should have returned both compute Node A and B. > > Problem: As Compute node A is still configured to be used to boot instances with dedicated CPUs same behavior as > Stein, it will not be returned by placement service due to the changes in the scheduler pre-filter logic. > > > Propose changes: > > > Earlier in the spec [2]: The online data migration was proposed to change flavor extra specs and image metadata > properties of request_spec and instance object. Based on the instance host, we can get the NumaTopology of the host > which will contain the new configuration options set on the compute host. Based on the NumaTopology of host, we can > change instance and request_spec flavor extra specs. > > 1. Remove cpu_policy from extra specs > 2. Add “resources:PCPU=” in extra specs > > > We can also change the flavor extra specs and image metadata properties of instance and request_spec object using the > reshape functionality. > > > Please give us your feedback on the proposed solution so that we can update specs accordingly. > > > [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 > > [2]: https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst > > > Thanks and Regards, > > -Bhagyashri Shewale- > > Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may > contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise > the sender by replying promptly to this email and then delete and destroy this email and any attachments without any > further use, copying or forwarding. Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bhagyashri.Shewale at nttdata.com Fri Jun 14 08:37:58 2019 From: Bhagyashri.Shewale at nttdata.com (Shewale, Bhagyashri) Date: Fri, 14 Jun 2019 08:37:58 +0000 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: <57be9f4f426d3a8ac07c79e3e0d1f4737b09b6af.camel@redhat.com> References: , <57be9f4f426d3a8ac07c79e3e0d1f4737b09b6af.camel@redhat.com> Message-ID: >> that is incorrect both a and by will be returned. the spec states that for host A we report an inventory of 4 VCPUs and >> an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so both host will be returned assuming >> $ <=4 Means if ``vcpu_pin_set`` is set in previous release then report both VCPU and PCPU as inventory (in Train) but this seems contradictory for example: On Stein, Configuration on compute node A: vcpu_pin_set=0-3 (This will report 4 VCPUs inventory in placement database) On Train: vcpu_pin_set=0-3 The inventory will be reported as 4 VCPUs and 4 PCPUs in the placement db Now say user wants to create instances as below: 1. Flavor having extra specs (resources:PCPU=1), instance A 2. Flavor having extra specs (resources:VCPU=1), instance B For both instance requests, placement will return compute Node A. Instance A: will be pinned to say 0 CPU Instance B: will float on 0-3 To resolve above issue, I think it’s possible to detect whether the compute node was configured to be used for pinned instances if ``NumaTopology`` ``pinned_cpus`` attribute is not empty. In that case, vcpu_pin_set will be reported as PCPU otherwise VCPU. Regards, -Bhagyashri Shewale- ________________________________ From: Sean Mooney Sent: Thursday, June 13, 2019 8:32:02 PM To: Shewale, Bhagyashri; openstack-discuss at lists.openstack.org; openstack at fried.cc; sfinucan at redhat.com; jaypipes at gmail.com Subject: Re: [nova] Spec: Standardize CPU resource tracking On Wed, 2019-06-12 at 09:10 +0000, Shewale, Bhagyashri wrote: > Hi All, > > > Currently I am working on implementation of cpu pinning upgrade part as mentioned in the spec [1]. > > > While implementing the scheduler pre-filter as mentioned in [1], I have encountered one big issue: > > > Proposed change in spec: In scheduler pre-filter we are going to alias request_spec.flavor.extra_spec and > request_spec.image.properties form ``hw:cpu_policy`` to ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. > > > So when user will create a new instance or execute instance actions like shelve, unshelve, resize, evacuate and > migration post upgrade it will go through scheduler pre-filter which will set alias for `hw:cpu_policy` in > request_spec flavor ``extra specs`` and image metadata properties. In below particular case, it won’t work:- > > > For example: > > > I have two compute nodes say A and B: > > > On Stein: > > > Compute node A configurations: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) vcpu_pin_set does not mean that the host was used for pinned instances https://that.guru/blog/cpu-resources/ > > > Compute node B Configuration: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) > > > On Train, two possible scenarios: > > Compute node A configurations: (Consider the new cpu pinning implementation is merged into Train) > > vcpu_pin_set=0-3 (Keep same settings as in Stein) > > > Compute node B Configuration: (Consider the new cpu pinning implementation is merged into Train) > > cpu_dedicated_set=0-3 (change to the new config option) > > 1. Consider that one instance say `test ` is created using flavor having old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") in Stein release and now upgraded Nova to Train with the above > configuration. > 2. Now when user will perform instance action say shelve/unshelve scheduler pre-filter will change the > request_spec flavor extra spec from ``hw:cpu_policy`` to ``resources=PCPU:$`` it wont remove hw:cpu_policy it will just change the resouces=VCPU:$ -> resources=PCPU:$ > which ultimately will return only compute node B from placement service. that is incorrect both a and by will be returned. the spec states that for host A we report an inventory of 4 VCPUs and an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so both host will be returned assuming $ <=4 > Here, we expect it should have retuned both Compute A and Compute B. it will > 3. If user creates a new instance using old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") on Train release with the above configuration then it will return > only compute node B from placement service where as it should have returned both compute Node A and B. that is what would have happend in the stien version of the spec and we changed the spec specifically to ensure that that wont happen. in the train version of the spec you will get both host as candates to prevent this upgrade impact. > > Problem: As Compute node A is still configured to be used to boot instances with dedicated CPUs same behavior as > Stein, it will not be returned by placement service due to the changes in the scheduler pre-filter logic. > > > Propose changes: > > > Earlier in the spec [2]: The online data migration was proposed to change flavor extra specs and image metadata > properties of request_spec and instance object. Based on the instance host, we can get the NumaTopology of the host > which will contain the new configuration options set on the compute host. Based on the NumaTopology of host, we can > change instance and request_spec flavor extra specs. > > 1. Remove cpu_policy from extra specs > 2. Add “resources:PCPU=” in extra specs > > > We can also change the flavor extra specs and image metadata properties of instance and request_spec object using the > reshape functionality. > > > Please give us your feedback on the proposed solution so that we can update specs accordingly. i am fairly stongly opposed to useing an online data migration to modify the request spec to reflect the host they landed on. this speficic problem is why the spec was changed in the train cycle to report dual inventoryis of VCPU and PCPU if vcpu_pin_set is the only option set or of no options are set. > > > [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 > > [2]: https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst > > > Thanks and Regards, > > -Bhagyashri Shewale- > > Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may > contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise > the sender by replying promptly to this email and then delete and destroy this email and any attachments without any > further use, copying or forwarding. Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From madhuri.kumari at intel.com Fri Jun 14 10:16:59 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Fri, 14 Jun 2019 10:16:59 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <8e240b9c-c5f0-3d1f-5b16-c543bbbc525b@fried.cc> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> <8e240b9c-c5f0-3d1f-5b16-c543bbbc525b@fried.cc> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC3299E@BGSMSX102.gar.corp.intel.com> Hi Eric, Thank you for following up and the notes. The spec[4] is related but a complex one too with all the migration implementation. So I will try to put a new spec with a limited implementation of resize. Regards, Madhuri >>-----Original Message----- >>From: Eric Fried [mailto:openstack at fried.cc] >>Sent: Thursday, June 13, 2019 11:15 PM >>To: openstack-discuss at lists.openstack.org >>Subject: Re: [Nova][Ironic] Reset Configurations in Baremetals Post >>Provisioning >> >>We discussed this today in the nova meeting [1] with a little bit of followup >>in the main channel after the meeting closed [2]. >> >>There seems to be general support (or at least not objection) for >>implementing "resize" for ironic, limited to: >> >>- same host [3] >>- just this feature (i.e. "hyperthreading") or possibly "anything deploy >>template" >> >>And the consensus was that it's time to put this into a spec. >> >>There was a rocky spec [4] that has some overlap and could be repurposed; >>or a new one could be introduced. >> >>efried >> >>[1] >>http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-06-13- >>14.00.log.html#l-309 >>[2] >>http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack- >>nova.2019-06-13.log.html#t2019-06-13T15:02:10 >>(interleaved) >>[3] an acknowledged wrinkle here was that we need to be able to detect at >>the API level that we're dealing with an Ironic instance, and ignore the >>allow_resize_to_same_host option (because always forcing same host) [4] >>https://review.opendev.org/#/c/449155/ From mdulko at redhat.com Fri Jun 14 10:44:20 2019 From: mdulko at redhat.com (=?UTF-8?Q?Micha=C5=82?= Dulko) Date: Fri, 14 Jun 2019 12:44:20 +0200 Subject: [requirements][kuryr][flame] openshift dificulties In-Reply-To: <20190606141747.gxoyrcels266rcgv@mthode.org> References: <20190529205352.f2dxzckgvfavbvtv@mthode.org> <20190530151739.nfzrqfstlb2sbrq5@mthode.org> <20190605165807.jmhogmfyrxltx5b3@mthode.org> <06337c09a594e16e40086b6c64495a59c3e6cd84.camel@redhat.com> <20190606141747.gxoyrcels266rcgv@mthode.org> Message-ID: On Thu, 2019-06-06 at 09:17 -0500, Matthew Thode wrote: > On 19-06-06 09:13:46, Michał Dulko wrote: > > On Wed, 2019-06-05 at 11:58 -0500, Matthew Thode wrote: > > > On 19-05-30 10:17:39, Matthew Thode wrote: > > > > On 19-05-30 17:07:54, Michał Dulko wrote: > > > > > On Wed, 2019-05-29 at 15:53 -0500, Matthew Thode wrote: > > > > > > Openshift upstream is giving us difficulty as they are capping the > > > > > > version of urllib3 and kubernetes we are using. > > > > > > > > > > > > -urllib3===1.25.3 > > > > > > +urllib3===1.24.3 > > > > > > -kubernetes===9.0.0 > > > > > > +kubernetes===8.0.1 > > > > > > > > > > > > I've opened an issue with them but not had much luck there (and their > > > > > > prefered solution just pushes the can down the road). > > > > > > > > > > > > https://github.com/openshift/openshift-restclient-python/issues/289 > > > > > > > > > > > > What I'd us to do is move off of openshift as our usage doesn't seem too > > > > > > much. > > > > > > > > > > > > openstack/kuryr-tempest-plugin uses it for one import (and just one > > > > > > function with that import). I'm not sure exactly what you are doing > > > > > > with it but would it be too much to ask to move to something else? > > > > > > > > > > From Kuryr side it's not really much effort, we can switch to bare REST > > > > > calls, but obviously we prefer the client. If there's much support for > > > > > getting rid of it, we can do the switch. > > > > > > > > > > > > > Right now Kyryr is only using it in that one place and it's blocking the > > > > update of urllib3 and kubernetes for the rest of openstack. So if it's > > > > not too much trouble it'd be nice to have happen. > > > > > > > > > > x/flame has it in it's constraints but I don't see any actual usage, so > > > > > > perhaps it's a false flag. > > > > > > > > > > > > Please let me know what you think > > > > > > > > > > > > Any updates on this? I'd like to move forward on removing the > > > dependency if possible. > > > > > > > Sure, I'm waiting for some spare time to do this. Fastest it may happen > > will probably be next week. > > > > Sounds good, thanks for working on it. > The patch [1] is up, let's see if my alternative approach haven't broke our gates. I'll ping you once it's merged so you can propose a patch removing openshift library from global-requirements. [1] https://review.opendev.org/#/c/665352 From cdent+os at anticdent.org Fri Jun 14 13:05:55 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Fri, 14 Jun 2019 14:05:55 +0100 (BST) Subject: [placement] update 19-23 Message-ID: HTML: https://anticdent.org/placement-update-19-23.html 19-23. I'll be travelling the end of next week so there will be no 19-24. # Most Important We keep having interesting and helpful, but not yet fully conclusive, discussions related to the [spec for nested magic](https://review.opendev.org/662191). The discussion on the spec links back to a few different IRC discussions. Part of the reason this drags on so much is that we're trying to find a model that is internally consistent in placement and generally applicable while satisfying the NUMA requirements from Nova, while not requiring either Placement or Nova to bend over backwards to get things right, and while not imposing additional complexity on simple requests. (And probably a few other whiles...) If you have thoughts on these sorts of things, join that review. In the meantime there are plenty of other things to review (below). # What's Changed * A blocker migration for incomplete consumers has been added, and the inline migrations that would guard against the incompleteness have been removed. * CORS configuration and use in Placement has been modernized and corrected. You can now, if you want, talk to Placement from JavaScript in your browser. (This was a bug I found while working on a [visualisation toy](https://github.com/cdent/placeview) with a friend.) * Result sets for certain nested provider requests could return different results in Python versions 2 and 3. This has [been fixed](https://review.opendev.org/663137). * That ^ work was the result of working on implementing [mappings in allocations](https://docs.openstack.org/placement/latest/specs/train/approved/placement-resource-provider-request-group-mapping-in-allocation-candidates.html) which has [merged today as microversion 1.34](https://docs.openstack.org/placement/latest/placement-api-microversion-history.html#request-group-mappings-in-allocation-candidates) # Specs/Features * Support Consumer Types. This has some open questions that need to be addressed, but we're still go on the general idea. * Spec for nested magic 1. The easier parts of nested magic: same_subtree, resourceless request groups, verbose suffixes (already merged as 1.33). See "Most Important" above. Some non-placement specs are listed in the Other section below. # Stories/Bugs (Numbers in () are the change since the last pupdate.) There are 20 (0) stories in [the placement group](https://storyboard.openstack.org/#!/project_group/placement). 0 (0) are [untagged](https://storyboard.openstack.org/#!/worklist/580). 3 (0) are [bugs](https://storyboard.openstack.org/#!/worklist/574). 6 (2) are [cleanups](https://storyboard.openstack.org/#!/worklist/575). 11 (0) are [rfes](https://storyboard.openstack.org/#!/worklist/594). 2 (0) are [docs](https://storyboard.openstack.org/#!/worklist/637). If you're interested in helping out with placement, those stories are good places to look. * Placement related nova [bugs not yet in progress](https://goo.gl/TgiPXb) on launchpad: 16 (1). * Placement related nova [in progress bugs](https://goo.gl/vzGGDQ) on launchpad: 6 (-1). [1832814: Placement API appears to have issues when compute host replaced](https://bugs.launchpad.net/nova/+bug/1832814) is an interesting bug. In a switch from RDO to OSA, resource providers are being duplicated because of a change in node name. # osc-placement osc-placement is currently behind by 11 microversions. There are no changes that have had attention in less than 4 weeks. There are 4 other changes. # Main Themes ## Nested Magic The overview of the features encapsulated by the term "nested magic" are in a [story](https://storyboard.openstack.org/#!/story/2005575) and [spec](https://review.opendev.org/662191). There is some in progress code, mostly WIPs to expose issues and think about how things ought to work: * PoC: resourceless request, including some code from WIP: Allow RequestGroups without resources ## Consumer Types Adding a type to consumers will allow them to be grouped for various purposes, including quota accounting. A [spec](https://review.opendev.org/654799) has started. There are some questions about request and response details that need to be resolved, but the overall concept is sound. ## Cleanup We continue to do cleanup work to lay in reasonable foundations for the nested work above. As a nice bonus, we keep eking out additional performance gains too. There are two new stories about some minor performance degradations: * GET /allocations/{non existent consumer} slower than expected * Allocation mappings have introduced a slight performance reduction Gibi discovered that osprofiler wasn't working with placement and then fixed it: * Add support for osprofiler in wsgi Thanks to Ed Leafe for his report on the [state of graph database work](http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007037.html). # Other Placement Miscellaneous changes can be found in [the usual place](https://review.opendev.org/#/q/project:openstack/placement+status:open). There are three [os-traits changes](https://review.opendev.org/#/q/project:openstack/os-traits+status:open) being discussed. And one [os-resource-classes change](https://review.opendev.org/#/q/project:openstack/os-resource-classes+status:open). # Other Service Users New discoveries are added to the end. Merged stuff is removed. Anything that has had no activity in 4 weeks has been removed. * Nova: spec: support virtual persistent memory * Nova: nova-manage: heal port allocations * nova-spec: Allow compute nodes to use DISK_GB from shared storage RP * Cyborg: Placement report * rpm-packaging: placement service * Delete resource providers for all nodes when deleting compute service * nova test and fix for: Drop source node allocations if finish_resize fails * nova: WIP: Hey let's support routed networks y'all! * starlingx: Add placement chart patch to openstack-helm * helm: WIP: add placement chart * kolla-ansible: Add a explanatory note for "placement_api_port" * Nova: Use OpenStack SDK for placement * Nova: Spec: Provider config YAML file * Nova: single pass instance info fetch in host manager * docs: Add Placement service to Minimal deployment for Stein * devstack: Add setting of placement microversion on tempest conf * libvirt: report pmem namespaces resources by provider tree * Nova: Remove PlacementAPIConnectFailure handling from AggregateAPI * Nova: Validate requested host/node during servers create * Nova: support move ops with qos ports * Kolla-ansible: Fix placement log perms in config.json * TripleO: Enable Request Filter for Image Types * Neutron: Force segments to use placement 1.1 * Nova: get_ksa_adapter: nix by-service-type confgrp hack * OSA: Add nova placement to placement migration # End If you, like me, use this collection of information to drive what to do early in the next week, you might like [placement-reviewme](https://github.com/cdent/placement-reviewme) which brute force loads all the links from the HTML version of this in tabs in your browser. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From ekultails at gmail.com Fri Jun 14 16:14:59 2019 From: ekultails at gmail.com (Luke Short) Date: Fri, 14 Jun 2019 12:14:59 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: Hey folks, Since we have not gotten a lot of feedback on the times to meet-up, I have created a Google Forms survey to help us figure it out. When you get a chance, please answer this simple 1-question survey. https://docs.google.com/forms/d/e/1FAIpQLSfHkN_T7T-W4Dhc17Pf6VHm1oUzKLJYz0u9ORAJYafrIGooZQ/viewform?usp=sf_link Try to complete this by end-of-day Tuesday so we have an idea of when we should meet on this upcoming Thursday. Thanks for your help! Sincerely, Luke Short On Wed, Jun 12, 2019 at 9:43 AM Kevin Carter wrote: > I've submitted reviews under the topic "retire-role" to truncate all of > the ansible-role-tripleo-* repos, that set can be seen here [0]. When folks > get a chance, I'd greatly appreciate folks have a look at these reviews. > > [0] - https://review.opendev.org/#/q/topic:retire-role+status:open > > -- > > Kevin Carter > IRC: kecarter > > > On Wed, Jun 5, 2019 at 11:27 AM Luke Short wrote: > >> Hey everyone, >> >> For the upcoming work on focusing on more Ansible automation and testing, >> I have created a dedicated #tripleo-transformation channel for our new >> squad. Feel free to join if you are interested in joining and helping out! >> >> +1 to removing repositories we don't use, especially if they have no >> working code. I'd like to see the consolidation of TripleO specific things >> into the tripleo-ansible repository and then using upstream Ansible roles >> for all of the different services (nova, glance, cinder, etc.). >> >> Sincerely, >> >> Luke Short, RHCE >> Software Engineer, OpenStack Deployment Framework >> Red Hat, Inc. >> >> >> On Wed, Jun 5, 2019 at 8:53 AM David Peacock wrote: >> >>> On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter wrote: >>> >>>> So the questions at hand are: what, if anything, should we do with >>>> these repositories? Should we retire them or just ignore them? Is there >>>> anyone using any of the roles? >>>> >>> >>> My initial reaction was to suggest we just ignore them, but on second >>> thought I'm wondering if there is anything negative if we leave them lying >>> around. Unless we're going to benefit from them in the future if we start >>> actively working in these repos, they represent obfuscation and debt, so it >>> might be best to retire / dispose of them. >>> >>> David >>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Jun 14 17:17:32 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 15 Jun 2019 02:17:32 +0900 Subject: Are most grenade plugin projects doing upgrades wrong? In-Reply-To: References: Message-ID: <16b56fe93b6.126440baf26354.4377986212126709292@ghanshyammann.com> ---- On Fri, 14 Jun 2019 01:13:33 +0900 Elõd Illés wrote ---- > Actually, if you check [4] and [5], they seemingly check out master, but > just below the linked line like 20 lines, there's a 'git show --oneline' > and 'head -1' which shows that the branch is clearly stable/stein. So it > seems there's no need for explicit branch settings in [1]... but I > haven't found the code lines which cause this behavior, yet... yeah, it seems it did checkout stein version. But from log it is not clear where exactly /opt.stack/old/watcher dir is being cloned from stein? this is the first clone I am seeing in log[6]. I checked for designate grenade job on stein and rocky branch and they also seem to clone the correct base designate [7]. Are we missing the log somewhere where the correct base is cloned? [6] http://logs.openstack.org/10/664610/2/check/watcher-grenade/ad2e068/logs/old/devstacklog.txt.gz#_2019-06-12_19_19_36_915 [7] stein- http://logs.openstack.org/71/660171/1/check/designate-grenade-pdns4/a460998/logs/grenade.sh.txt.gz#_2019-05-20_17_16_11_958 rocky- http://logs.openstack.org/60/662760/1/check/designate-grenade-pdns4/940dc47/logs/grenade.sh.txt.gz#_2019-06-03_14_26_43_243 -gmann > > BR, > > Előd > > > On 2019. 06. 13. 15:45, Matt Riedemann wrote: > > While fixing how the watcher grenade plugin is cloning the watcher > > repo for the base (old) side [1] to hard-code from stable/rocky to > > stable/stien, I noticed that several other projects with grenade > > plugins aren't actually setting a stable branch when cloning the > > plugin for the old side [2]. So I tried removing the stable/stein > > branch from the base plugin clone in watcher [3] and grenade cloned > > from master for the base (old) side [4]. Taking designate as an > > example, I see the same thing happening for it's grenade run [5]. > > > > Is there something I'm missing here or are most of the openstack > > projects running grenade via plugin actually just upgrading from > > master to master rather than n-1 to master? > > > > [1] https://review.opendev.org/#/c/664610/1/devstack/upgrade/settings > > [2] > > http://codesearch.openstack.org/?q=base%20enable_plugin&i=nope&files=&repos= > > [3] https://review.opendev.org/#/c/664610/2/devstack/upgrade/settings > > [4] > > http://logs.openstack.org/10/664610/2/check/watcher-grenade/ad2e068/logs/grenade.sh.txt.gz#_2019-06-12_19_19_36_874 > > [5] > > http://logs.openstack.org/47/662647/6/check/designate-grenade-pdns4/0b8968f/logs/grenade.sh.txt.gz#_2019-06-09_23_10_03_034 > > -- > > > > Thanks, > > > > Matt > > > > From ken at jots.org Fri Jun 14 20:04:22 2019 From: ken at jots.org (Ken D'Ambrosio) Date: Fri, 14 Jun 2019 16:04:22 -0400 Subject: Nova scheduler -- which compute node? Message-ID: Hey, all. First things first: we're running Juno. Yeah, it's ancient, and we've got Queens on-deck, but it's Juno. Anyway, to my question: how can you tell which compute node is *actually getting* the VM request, shy of logging into them and looking? For example, on the Nova node, you can see the nova-scheduler.log file talk about which hosts are weighted how, but it doesn't seem as if it actually comes out and says which host it's trying to give the VM to. Am I missing something? (I'd happily supply log snippets, but I figure this is a fairly general question, and didn't want to paste 1.7 TB of random, possibly-relevant logs.) Thanks! -Ken From sfinucan at redhat.com Sat Jun 15 16:24:13 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Sat, 15 Jun 2019 17:24:13 +0100 Subject: [nova] 'pre-commit' integration Message-ID: <51e792855f5b2c98bff1a94f196cef1759d4f47f.camel@redhat.com> This is a heads up that I've proposed a patch to integrate [1] to add support for the 'pre-commit' tool [2] in nova. As noted in that change, this provides a faster, most automated alternative to the 'fast8' tox target we currently use for validating recently changed code. While this is opt-in, I'm also proposing deprecating the 'fast8' behavior since I don't think it's necessary with pre-commit support. As a result, it would be good if anyone that has an opinion on the matter took a look at the change. Have a good weekend, Stephen [1] https://review.opendev.org/665518 [2] https://pre-commit.com/ From ghadge.dhairya23 at gmail.com Sat Jun 15 16:25:24 2019 From: ghadge.dhairya23 at gmail.com (Dhairyasheel Ghadge) Date: Sat, 15 Jun 2019 21:55:24 +0530 Subject: Comparative analysis of openstack and Cloudstack Message-ID: Hello all!!! I have been exploring the cloudstack cloud platform while comparing it with the openstack cloud platform. I am a little skeptical about which solution to use as our cloud platform between cloudstack and OpenStack. Following are the difficulties I am facing which comparing these two solutions. 1.) OpenStack has HEAT as the component for orchestration of application VM's deployed and the cloud setup services as well. Is there any orchestration feature in cloudstack which provides an autscaling feature for the application deployed in it? 2.) OpenStack has TROVE for DBaaS which takes care of the entire lifecycle of the database service which includes deployment, configuration, replication, failover, backup, patching, restores. Is there any similar service in cloudstack for providing DBaaS? 3.) Cloudstack documentation says that it has orchestration feature for its applications, but couldn't find any details regarding the same. Also, for big data Hadoop clusters provisioning, SAHARA component is used in OpenStack. What does this sahara feature provides and is it useful for big data applications? This is feature is not there in cloudstack. What is the importance of serverless functionality feature "Qinling" in openstack which is not there in cloudstack Are there any points of Openstack which completely outshines itself from cloudstack? Thankyou Dhairyasheel Ghadge From saikrishna.ura at cloudseals.com Thu Jun 13 06:32:09 2019 From: saikrishna.ura at cloudseals.com (Saikrishna Ura) Date: Thu, 13 Jun 2019 06:32:09 +0000 Subject: getting issues while configuring the Trove In-Reply-To: References: Message-ID: Hi, I installed Openstack in Ubuntu 14 by cloning the devstack repository with this url "git clone https://git.openstack.org/openstack-dev/devstack" and trying to install trove manually with the reference of this document https://docs.openstack.org/trove/pike/install/install-ubuntu.html but can't able to establish the healthy API connection. I'm getting this error "root at openstack ~(keystone)# mysql -u root -p -h 192.168.2.70 Enter password: ERROR 1045 (28000): Access denied for user 'root'@'ip-192-168-2-70.us-east-2.compute.internal' (using password: YES)" Can one anyone please help me how to configure trove to the existing openstack. Thanks, Saikrishna U. OpenStack Docs: Install and configure for Ubuntu Install and configure for Ubuntu¶. This section describes how to install and configure the Database service for Ubuntu 14.04 (LTS). docs.openstack.org ________________________________ From: Saikrishna Ura Sent: Tuesday, June 11, 2019 8:47 PM To: openstack-dev at lists.openstack.org Subject: getting issues while configuring the Trove Hi, I installed Openstack in Ubuntu 18.04 by cloning the devstack repository with this url "git clone https://git.openstack.org/openstack-dev/devstack", but i can't able create or access with the trove, I'm getting issues with the installation. Can anyone help on this issue please. If any reference document or any guidance much appreciated. Thanks, Saikrishna U. -------------- next part -------------- An HTML attachment was scrubbed... URL: From laszlo.budai at gmail.com Sun Jun 16 01:44:57 2019 From: laszlo.budai at gmail.com (Budai Laszlo) Date: Sun, 16 Jun 2019 04:44:57 +0300 Subject: Nova scheduler -- which compute node? In-Reply-To: References: Message-ID: <0801b551-a977-36b6-1091-799286e467e2@gmail.com> Hi Ken If you have admin rights in your openstack then "nova show " will tell you the host on which the VM was scheduled and the instance name on the host (what you would see locally using "virsh list"). HTH, Laszlo On 6/14/19 11:04 PM, Ken D'Ambrosio wrote: > Hey, all.  First things first: we're running Juno.  Yeah, it's ancient, and we've got Queens on-deck, but it's Juno.  Anyway, to my question: how can you tell which compute node is *actually getting* the VM request, shy of logging into them and looking?  For example, on the Nova node, you can see the nova-scheduler.log file talk about which hosts are weighted how, but it doesn't seem as if it actually comes out and says which host it's trying to give the VM to.  Am I missing something?  (I'd happily supply log snippets, but I figure this is a fairly general question, and didn't want to paste 1.7 TB of random, possibly-relevant logs.) > > Thanks! > > -Ken > > From gkotton at vmware.com Sun Jun 16 08:42:50 2019 From: gkotton at vmware.com (Gary Kotton) Date: Sun, 16 Jun 2019 08:42:50 +0000 Subject: [Keystone] Performance degradation In-Reply-To: References: Message-ID: <763237F1-4C94-4064-BEFF-24B904A34034@vmware.com> Hi, Over the last few weeks we have being doing performance tests on the Stein version and have seen a notable performance degradation. Further investigation showed that Keystone seems to be the bottleneck. An example of this is running a Rally test that creates a keystone tenant with users. We see that in Queens this takes 20 seconds to complete the whole iteration whilst Stein takes 30 seconds. We have done the following tests: 1. Vanilla devstack Queens vs Steins 2. In our Stein version e have swapped out the Keystone Stein container with the Keystone Queens container and the numbers are considerably better too (please note the same keystone configuration is used with regards to processes/threads etc.) Are any folks familiar with what may cause this? Are there any performance improvement suggestions or hints? Thanks Gary -------------- next part -------------- An HTML attachment was scrubbed... URL: From gkotton at vmware.com Sun Jun 16 08:08:39 2019 From: gkotton at vmware.com (Gary Kotton) Date: Sun, 16 Jun 2019 08:08:39 +0000 Subject: [Keystone] Performance degradation Message-ID: Hi, Over the last few weeks we have being doing performance tests on the Stein version and have seen a notable performance degradation. Further investigation showed that Keystone seems to be the bottleneck. An example of this is running a Rally test that creates a keystone tenant with users. We see that in Queens this takes 20 seconds to complete the whole iteration whilst Stein takes 30 seconds. We have done the following tests: 1. Vanilla devstack Queens vs Steins 2. In our Stein version e have swapped out the Keystone Stein container with the Keystone Queens container and the numbers are considerably better too (please note the same keystone configuration is used with regards to processes/threads etc.) Are any folks familiar with what may cause this? Are there any performance improvement suggestions or hints? Thanks Gary -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From gkotton at vmware.com Sun Jun 16 08:13:51 2019 From: gkotton at vmware.com (Gary Kotton) Date: Sun, 16 Jun 2019 08:13:51 +0000 Subject: [Keystone] Performance degradation In-Reply-To: References: Message-ID: Hi, Over the last few weeks we have being doing performance tests on the Stein version and have seen a notable performance degradation. Further investigation showed that Keystone seems to be the bottleneck. An example of this is running a Rally test that creates a keystone tenant with users. We see that in Queens this takes 20 seconds to complete the whole iteration whilst Stein takes 30 seconds. (please see the attached file) We have done the following tests: 1. Vanilla devstack Queens vs Steins 2. In our Stein version e have swapped out the Keystone Stein container with the Keystone Queens container and the numbers are considerably better too (please note the same keystone configuration is used with regards to processes/threads etc.) Are any folks familiar with what may cause this? Are there any performance improvement suggestions or hints? Thanks Gary -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: keystone.tar.gz Type: application/x-gzip Size: 30078 bytes Desc: keystone.tar.gz URL: From morgan.fainberg at gmail.com Sun Jun 16 15:43:55 2019 From: morgan.fainberg at gmail.com (Morgan Fainberg) Date: Sun, 16 Jun 2019 08:43:55 -0700 Subject: [Keystone] Performance degradation In-Reply-To: <763237F1-4C94-4064-BEFF-24B904A34034@vmware.com> References: <763237F1-4C94-4064-BEFF-24B904A34034@vmware.com> Message-ID: Hi Gary, A couple things come to mind right off the bat that I want to ensure are addressed: 1) Is caching enabled and is the memcache server in-fact accessible from the keystone process? Keystone is developed assuming you have caching enabled (we will be examining, during Train, making this a requirement instead of "strongly encouraged"). 2) Did you swap from uuid to Fernet between the deployments? Unfortunately, fernet is slower than UUID. We opted for fernet and taking the performance hit in light of the significantly better token management/maintenance for long-term running clouds. UUID token provider was removed as of rocky. 3) I am curious about the scenario you have built for rally: Is this scenario a real-world-ish scenario that you are doing on a regular basis? I want to be clear we are troubleshooting real-world(ish) scenarios and not synthetic problems that only occur in test scenarios; there are options to streamline test scenarios separately from real-world use-cases. Tell me more about the rally scenario. I also want to point out that I've been unable to get any real information from the attached files (they seem to be broken). 4) Is this running under mod_wsgi? uwsgi? is there a lot of other process space contention if under mod_wsgi in apache? (There isn't a lot of information about your configuration in the posed question), Configuration information, deployment in the container, etc helps us understand what is going on. Thanks! On Sun, Jun 16, 2019 at 1:48 AM Gary Kotton wrote: > Hi, > > Over the last few weeks we have being doing performance tests on the Stein > version and have seen a notable performance degradation. Further > investigation showed that Keystone seems to be the bottleneck. > > An example of this is running a Rally test that creates a keystone tenant > with users. We see that in Queens this takes 20 seconds to complete the > whole iteration whilst Stein takes 30 seconds. > > We have done the following tests: > > 1. Vanilla devstack Queens vs Steins > 2. In our Stein version e have swapped out the Keystone Stein > container with the Keystone Queens container and the numbers are > considerably better too (please note the same keystone configuration is > used with regards to processes/threads etc.) > > Are any folks familiar with what may cause this? > > Are there any performance improvement suggestions or hints? > > Thanks > > Gary > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bcafarel at redhat.com Sun Jun 16 18:53:25 2019 From: bcafarel at redhat.com (Bernard Cafarelli) Date: Sun, 16 Jun 2019 20:53:25 +0200 Subject: [neutron] Zuul checks fail Message-ID: Hi neutrinos, a quick heads-up that currently Zuul will 100% give -1 on your reviews, failing 404 Not Found when trying to download Ubuntu Xenial image. Thanks to Yulong for filling launchpad bug [0] and submitting fix [1]. Once it is merged, we should be back in business [0] https://bugs.launchpad.net/neutron/+bug/1832968 [1] https://review.opendev.org/#/c/665530/ -- Bernard Cafarelli -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Mon Jun 17 03:02:12 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Mon, 17 Jun 2019 15:02:12 +1200 Subject: [nova] Admin user cannot create vm with user's port? In-Reply-To: References: <16b4f310b40.b15ee317318066.1276719462324180582@ghanshyammann.com> <0956984a0d34141997110fa28091cdd37ac24c50.camel@redhat.com> Message-ID: Please could anyone else from nova team know the reason? Best regards, Lingxian Kong Catalyst Cloud On Fri, Jun 14, 2019 at 10:57 AM Lingxian Kong wrote: > Another use case is coming from the services (e.g. Trove) which will > create vms in the service tenant but using the resources (e.g. network or > port) given by the non-admin user. > > Best regards, > Lingxian Kong > Catalyst Cloud > > > On Fri, Jun 14, 2019 at 10:55 AM Lingxian Kong > wrote: > >> On Thu, Jun 13, 2019 at 10:48 PM Sean Mooney wrote: >> >>> On Thu, 2019-06-13 at 21:22 +1200, Lingxian Kong wrote: >>> > Yeah, the api allows to specify port. What i mean is, the vm creation >>> will >>> > fail for admin user if port belongs to a non-admin user. An exception >>> is >>> > raised from nova-compute. >>> >>> i believe this is intentional. >>> >>> we do not currently allow you to trasfer ownerwhip of a vm form one user >>> or proejct to another. >>> but i also believe we currently do not allow a vm to be create from >>> resouces with different owners >>> >> >> That's not true. As the admin user, you are allowed to create a vm using >> non-admin's network, security group, image, volume, etc but just not port. >> >> There is use case for admin user to create vms but using non-admin's >> resources for debugging or other purposes. >> >> What's more, the exception is raised in nova-compute not nova-api, which >> i assume it should be supported if it's allowed in the api layer. >> >> Best regards, >> Lingxian Kong >> Catalyst Cloud >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gkotton at vmware.com Mon Jun 17 06:30:19 2019 From: gkotton at vmware.com (Gary Kotton) Date: Mon, 17 Jun 2019 06:30:19 +0000 Subject: [Keystone] Performance degradation In-Reply-To: References: <763237F1-4C94-4064-BEFF-24B904A34034@vmware.com> Message-ID: Please see the comments inline below. Please note that we too the queens code (with the same configuration for stein) and that produced far better results. From: Morgan Fainberg Date: Sunday, 16 June 2019 at 18:44 To: Gary Kotton Cc: "openstack-discuss at lists.openstack.org" Subject: Re: [Keystone] Performance degradation Hi Gary, A couple things come to mind right off the bat that I want to ensure are addressed: 1) Is caching enabled and is the memcache server in-fact accessible from the keystone process? Keystone is developed assuming you have caching enabled (we will be examining, during Train, making this a requirement instead of "strongly encouraged"). [Gary] Yes, caching is enabled and memcache is accessible. 2) Did you swap from uuid to Fernet between the deployments? Unfortunately, fernet is slower than UUID. We opted for fernet and taking the performance hit in light of the significantly better token management/maintenance for long-term running clouds. UUID token provider was removed as of rocky. [Gary] We are using Fernet. Please note that in Queens we are also using Fernet. 3) I am curious about the scenario you have built for rally: Is this scenario a real-world-ish scenario that you are doing on a regular basis? I want to be clear we are troubleshooting real-world(ish) scenarios and not synthetic problems that only occur in test scenarios; there are options to streamline test scenarios separately from real-world use-cases. Tell me more about the rally scenario. I also want to point out that I've been unable to get any real information from the attached files (they seem to be broken). [Gary] I am getting information from the team on this and will get back to you 4) Is this running under mod_wsgi? uwsgi? is there a lot of other process space contention if under mod_wsgi in apache? (There isn't a lot of information about your configuration in the posed question), Configuration information, deployment in the container, etc helps us understand what is going on. [Gary] It is mod_wsgi. This is containerized. There is nothing else running in the same container. Thanks! On Sun, Jun 16, 2019 at 1:48 AM Gary Kotton > wrote: Hi, Over the last few weeks we have being doing performance tests on the Stein version and have seen a notable performance degradation. Further investigation showed that Keystone seems to be the bottleneck. An example of this is running a Rally test that creates a keystone tenant with users. We see that in Queens this takes 20 seconds to complete the whole iteration whilst Stein takes 30 seconds. We have done the following tests: 1. Vanilla devstack Queens vs Steins 2. In our Stein version e have swapped out the Keystone Stein container with the Keystone Queens container and the numbers are considerably better too (please note the same keystone configuration is used with regards to processes/threads etc.) Are any folks familiar with what may cause this? Are there any performance improvement suggestions or hints? Thanks Gary -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Mon Jun 17 06:40:12 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Mon, 17 Jun 2019 18:40:12 +1200 Subject: getting issues while configuring the Trove In-Reply-To: References: Message-ID: Hi Saikrishna, Which host is the IP 192.168.2.70? If it's your devstack host, try log into the host and execute the same command. Best regards, Lingxian Kong Catalyst Cloud On Sun, Jun 16, 2019 at 8:57 AM Saikrishna Ura < saikrishna.ura at cloudseals.com> wrote: > Hi, > > I installed Openstack in Ubuntu 14 by cloning the devstack repository with > this url "git clone https://git.openstack.org/openstack-dev/devstack" and > trying to install trove manually with the reference of this document > https://docs.openstack.org/trove/pike/install/install-ubuntu.html but > can't able to establish the healthy API connection. > > I'm getting this error "root at openstack ~(keystone)# mysql -u root -p -h > 192.168.2.70 > Enter password: > ERROR 1045 (28000): Access denied for user 'root'@'ip-192-168-2-70.us-east-2.compute.internal' > (using password: YES)" > > Can one anyone please help me how to configure trove to the existing > openstack. > > > Thanks, > Saikrishna U. > OpenStack Docs: Install and configure for Ubuntu > > Install and configure for Ubuntu¶. This section describes how to install > and configure the Database service for Ubuntu 14.04 (LTS). > docs.openstack.org > > > > ------------------------------ > *From:* Saikrishna Ura > *Sent:* Tuesday, June 11, 2019 8:47 PM > *To:* openstack-dev at lists.openstack.org > *Subject:* getting issues while configuring the Trove > > Hi, > > I installed Openstack in Ubuntu 18.04 by cloning the devstack repository > with this url "git clone https://git.openstack.org/openstack-dev/devstack", > but i can't able create or access with the trove, I'm getting issues with > the installation. > > Can anyone help on this issue please. If any reference document or any > guidance much appreciated. > > Thanks, > > Saikrishna U. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Mon Jun 17 08:13:47 2019 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 17 Jun 2019 10:13:47 +0200 Subject: Comparative analysis of openstack and Cloudstack In-Reply-To: References: Message-ID: Dhairyasheel Ghadge wrote: > [...] > Are there any points of Openstack which completely outshines itself > from cloudstack? It's not really our role to compare solutions or tell one is better than the other. I think one key difference as you noticed is that CloudStack focuses on key IaaS functionality (VMs as a service, with accompanying networking and storage), while OpenStack aims at being a more complete cloud framework, and therefore includes advanced services like object storage (Swift), Kubernetes clusters provisioning (Magnum), running containers / functions... As a result OpenStack is a larger project with more features, but also is arguably more complex. Hope this helps, -- Thierry Carrez (ttx) From Tushar.Patil at nttdata.com Mon Jun 17 08:20:57 2019 From: Tushar.Patil at nttdata.com (Patil, Tushar) Date: Mon, 17 Jun 2019 08:20:57 +0000 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <6c41c425-71b5-5095-7acc-2198f7ad1d92@fried.cc> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> <9876798ff1a3700fcc0f7202bbfe12da1314a23e.camel@redhat.com> <91efe32e80b7c24b0bfe5875ecd053513b7fd443.camel@redhat.com>, <6c41c425-71b5-5095-7acc-2198f7ad1d92@fried.cc> Message-ID: > The consensus was: Don't muck with resource provider traits at all > during aggregate operations. The operator must do that bit manually. As > a stretch goal, we can write a simple utility to help with this. > This was discussed as option (e) earlier in this thread. I have updated specs and uploaded a new patch for review. https://review.opendev.org/#/c/665605/ Regards, Tushar Patil ________________________________________ From: Eric Fried Sent: Friday, June 14, 2019 3:36:04 AM To: openstack-discuss at lists.openstack.org Subject: Re: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' We discussed this in the nova meeting today [1] with a little spillover in the -nova channel afterward [2]. The consensus was: Don't muck with resource provider traits at all during aggregate operations. The operator must do that bit manually. As a stretch goal, we can write a simple utility to help with this. This was discussed as option (e) earlier in this thread. The spec needs to get updated accordingly. Thanks, efried [1] http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-06-13-14.00.log.html#l-267 [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-06-13.log.html#t2019-06-13T15:02:06-2 (interleaved) Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. From soulxu at gmail.com Mon Jun 17 08:45:37 2019 From: soulxu at gmail.com (Alex Xu) Date: Mon, 17 Jun 2019 16:45:37 +0800 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: <57be9f4f426d3a8ac07c79e3e0d1f4737b09b6af.camel@redhat.com> Message-ID: I'm thinking we should have recommended upgrade follow. If we give a lot of flexibility for the operator to have a lot combination of the value of vcpu_pin_set, dedicated_cpu_set and shared_cpu_set, then we have trouble in this email and have to do a lot of checks this email introduced also. I'm thinking that the pre-request filter (which translates the cpu_policy=dedicated to PCPU request) should be enabled after all the node upgrades to the Train release. Before that, all the cpu_policy=dedicated instance still using the VCPU. Trying to image the upgrade as below: 1. Rolling upgrade the compute node. 2. The upgraded compute node begins to report both VCPU and PCPU, but reshape for the existed inventories. The upgraded node is still using the vcpu_pin_set config, or didn't set the vcpu_pin_config. Both in this two cases are reporting VCPU and PCPU same time. And the request with cpu_policy=dedicated still uses the VCPU. Then it is worked same as Stein release. And existed instance can be shelved/unshelved, migration and evacuate. 3. Disable the new request and operation for the instance to the hosts for dedicated instance. (it is kind of breaking our live-upgrade? I thought this will be a short interrupt for the control plane if that is available) 4. reshape the inventories for existed instance for all the hosts. 5. Enable the instance's new request and operation, also enable the pre-request filter. 6. Operator copies the value of vcpu_pin_set to dedicated_cpu_set. For the case of vcpu_pin_set isn't set, the value of dedicated_cpu_set should be all the cpu ids exclude shared_cpu_set if set. Two rules at here: 1. The operator doesn't allow to change a different value for dedicated_cpu_set with vcpu_pin_set when any instance is running on the host. 2. The operator doesn't allow to change the value of dedicated_cpu_set and shared_cpu_set when any instance is running on the host. Shewale, Bhagyashri 于2019年6月14日周五 下午4:42写道: > >> that is incorrect both a and by will be returned. the spec states that > for host A we report an inventory of 4 VCPUs and > > >> an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so > both host will be returned assuming > > >> $ <=4 > > > Means if ``vcpu_pin_set`` is set in previous release then report both VCPU > and PCPU as inventory (in Train) but this seems contradictory for example: > > > On Stein, > > > Configuration on compute node A: > > vcpu_pin_set=0-3 (This will report 4 VCPUs inventory in placement database) > > > On Train: > > vcpu_pin_set=0-3 > > > The inventory will be reported as 4 VCPUs and 4 PCPUs in the placement db > > > Now say user wants to create instances as below: > > 1. Flavor having extra specs (resources:PCPU=1), instance A > 2. Flavor having extra specs (resources:VCPU=1), instance B > > > For both instance requests, placement will return compute Node A. > > Instance A: will be pinned to say 0 CPU > > Instance B: will float on 0-3 > > > To resolve above issue, I think it’s possible to detect whether the > compute node was configured to be used for pinned instances if > ``NumaTopology`` ``pinned_cpus`` attribute is not empty. In that case, > vcpu_pin_set will be reported as PCPU otherwise VCPU. > > > Regards, > > -Bhagyashri Shewale- > > ------------------------------ > *From:* Sean Mooney > *Sent:* Thursday, June 13, 2019 8:32:02 PM > *To:* Shewale, Bhagyashri; openstack-discuss at lists.openstack.org; > openstack at fried.cc; sfinucan at redhat.com; jaypipes at gmail.com > *Subject:* Re: [nova] Spec: Standardize CPU resource tracking > > On Wed, 2019-06-12 at 09:10 +0000, Shewale, Bhagyashri wrote: > > Hi All, > > > > > > Currently I am working on implementation of cpu pinning upgrade part as > mentioned in the spec [1]. > > > > > > While implementing the scheduler pre-filter as mentioned in [1], I have > encountered one big issue: > > > > > > Proposed change in spec: In scheduler pre-filter we are going to alias > request_spec.flavor.extra_spec and > > request_spec.image.properties form ``hw:cpu_policy`` to > ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. > > > > > > So when user will create a new instance or execute instance actions > like shelve, unshelve, resize, evacuate and > > migration post upgrade it will go through scheduler pre-filter which > will set alias for `hw:cpu_policy` in > > request_spec flavor ``extra specs`` and image metadata properties. In > below particular case, it won’t work:- > > > > > > For example: > > > > > > I have two compute nodes say A and B: > > > > > > On Stein: > > > > > > Compute node A configurations: > > > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate > which has “pinned” metadata) > vcpu_pin_set does not mean that the host was used for pinned instances > https://that.guru/blog/cpu-resources/ > > > > > > Compute node B Configuration: > > > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate > which has “pinned” metadata) > > > > > > On Train, two possible scenarios: > > > > Compute node A configurations: (Consider the new cpu pinning > implementation is merged into Train) > > > > vcpu_pin_set=0-3 (Keep same settings as in Stein) > > > > > > Compute node B Configuration: (Consider the new cpu pinning > implementation is merged into Train) > > > > cpu_dedicated_set=0-3 (change to the new config option) > > > > 1. Consider that one instance say `test ` is created using flavor > having old extra specs (hw:cpu_policy=dedicated, > > "aggregate_instance_extra_specs:pinned": "true") in Stein release and > now upgraded Nova to Train with the above > > configuration. > > 2. Now when user will perform instance action say shelve/unshelve > scheduler pre-filter will change the > > request_spec flavor extra spec from ``hw:cpu_policy`` to > ``resources=PCPU:$`` > it wont remove hw:cpu_policy it will just change the resouces=VCPU:$ of cpus> -> resources=PCPU:$ > > > which ultimately will return only compute node B from placement service. > that is incorrect both a and by will be returned. the spec states that for > host A we report an inventory of 4 VCPUs and > an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so > both host will be returned assuming > $ <=4 > > > Here, we expect it should have retuned both Compute A and Compute B. > it will > > 3. If user creates a new instance using old extra specs > (hw:cpu_policy=dedicated, > > "aggregate_instance_extra_specs:pinned": "true") on Train release with > the above configuration then it will return > > only compute node B from placement service where as it should have > returned both compute Node A and B. > that is what would have happend in the stien version of the spec and we > changed the spec specifically to ensure that > that wont happen. in the train version of the spec you will get both host > as candates to prevent this upgrade impact. > > > > Problem: As Compute node A is still configured to be used to boot > instances with dedicated CPUs same behavior as > > Stein, it will not be returned by placement service due to the changes > in the scheduler pre-filter logic. > > > > > > Propose changes: > > > > > > Earlier in the spec [2]: The online data migration was proposed to > change flavor extra specs and image metadata > > properties of request_spec and instance object. Based on the instance > host, we can get the NumaTopology of the host > > which will contain the new configuration options set on the compute > host. Based on the NumaTopology of host, we can > > change instance and request_spec flavor extra specs. > > > > 1. Remove cpu_policy from extra specs > > 2. Add “resources:PCPU=” in extra specs > > > > > > We can also change the flavor extra specs and image metadata properties > of instance and request_spec object using the > > reshape functionality. > > > > > > Please give us your feedback on the proposed solution so that we can > update specs accordingly. > i am fairly stongly opposed to useing an online data migration to modify > the request spec to reflect the host they > landed on. this speficic problem is why the spec was changed in the train > cycle to report dual inventoryis of VCPU and > PCPU if vcpu_pin_set is the only option set or of no options are set. > > > > > > [1]: > https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 > > > > [2]: > https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst > > > > > > Thanks and Regards, > > > > -Bhagyashri Shewale- > > > > Disclaimer: This email and any attachments are sent in strictest > confidence for the sole use of the addressee and may > > contain legally privileged, confidential, and proprietary data. If you > are not the intended recipient, please advise > > the sender by replying promptly to this email and then delete and > destroy this email and any attachments without any > > further use, copying or forwarding. > > Disclaimer: This email and any attachments are sent in strictest > confidence for the sole use of the addressee and may contain legally > privileged, confidential, and proprietary data. If you are not the intended > recipient, please advise the sender by replying promptly to this email and > then delete and destroy this email and any attachments without any further > use, copying or forwarding. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From soulxu at gmail.com Mon Jun 17 08:50:35 2019 From: soulxu at gmail.com (Alex Xu) Date: Mon, 17 Jun 2019 16:50:35 +0800 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: <57be9f4f426d3a8ac07c79e3e0d1f4737b09b6af.camel@redhat.com> Message-ID: Alex Xu 于2019年6月17日周一 下午4:45写道: > I'm thinking we should have recommended upgrade follow. If we give a lot > of flexibility for the operator to have a lot combination of the value of > vcpu_pin_set, dedicated_cpu_set and shared_cpu_set, then we have trouble in > this email and have to do a lot of checks this email introduced also. > > I'm thinking that the pre-request filter (which translates the > cpu_policy=dedicated to PCPU request) should be enabled after all the node > upgrades to the Train release. Before that, all the cpu_policy=dedicated > instance still using the VCPU. > > Trying to image the upgrade as below: > > 1. Rolling upgrade the compute node. > 2. The upgraded compute node begins to report both VCPU and PCPU, but > reshape for the existed inventories. > The upgraded node is still using the vcpu_pin_set config, or didn't > set the vcpu_pin_config. Both in this two cases are reporting VCPU and PCPU > same time. And the request with cpu_policy=dedicated still uses the VCPU. > Then it is worked same as Stein release. And existed instance can be > shelved/unshelved, migration and evacuate. > 3. Disable the new request and operation for the instance to the hosts for > dedicated instance. (it is kind of breaking our live-upgrade? I thought > this will be a short interrupt for the control plane if that is available) > 4. reshape the inventories for existed instance for all the hosts. > 5. Enable the instance's new request and operation, also enable the > pre-request filter. > 6. Operator copies the value of vcpu_pin_set to dedicated_cpu_set. For the > case of vcpu_pin_set isn't set, the value of dedicated_cpu_set should be > all the cpu ids exclude shared_cpu_set if set. > I should adjust the order of 4, 5, 6 as below: 4. Operator copies the value of vcpu_pin_set to dedicated_cpu_set. For the case of vcpu_pin_set isn't set, the value of dedicated_cpu_set should be all the cpu ids exclude shared_cpu_set if set. 5. the changing of dedicated_cpu_set triggers the reshape of the existed inventories, and remove the duplicated VCPU resources reporting. 6. Enable the instance's new request and operation, also enable the pre-request filter. > > Two rules at here: > 1. The operator doesn't allow to change a different value for > dedicated_cpu_set with vcpu_pin_set when any instance is running on the > host. > 2. The operator doesn't allow to change the value of dedicated_cpu_set and > shared_cpu_set when any instance is running on the host. > > > > Shewale, Bhagyashri 于2019年6月14日周五 > 下午4:42写道: > >> >> that is incorrect both a and by will be returned. the spec states that >> for host A we report an inventory of 4 VCPUs and >> >> >> an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so >> both host will be returned assuming >> >> >> $ <=4 >> >> >> Means if ``vcpu_pin_set`` is set in previous release then report both >> VCPU and PCPU as inventory (in Train) but this seems contradictory for >> example: >> >> >> On Stein, >> >> >> Configuration on compute node A: >> >> vcpu_pin_set=0-3 (This will report 4 VCPUs inventory in placement >> database) >> >> >> On Train: >> >> vcpu_pin_set=0-3 >> >> >> The inventory will be reported as 4 VCPUs and 4 PCPUs in the placement db >> >> >> Now say user wants to create instances as below: >> >> 1. Flavor having extra specs (resources:PCPU=1), instance A >> 2. Flavor having extra specs (resources:VCPU=1), instance B >> >> >> For both instance requests, placement will return compute Node A. >> >> Instance A: will be pinned to say 0 CPU >> >> Instance B: will float on 0-3 >> >> >> To resolve above issue, I think it’s possible to detect whether the >> compute node was configured to be used for pinned instances if >> ``NumaTopology`` ``pinned_cpus`` attribute is not empty. In that case, >> vcpu_pin_set will be reported as PCPU otherwise VCPU. >> >> >> Regards, >> >> -Bhagyashri Shewale- >> >> ------------------------------ >> *From:* Sean Mooney >> *Sent:* Thursday, June 13, 2019 8:32:02 PM >> *To:* Shewale, Bhagyashri; openstack-discuss at lists.openstack.org; >> openstack at fried.cc; sfinucan at redhat.com; jaypipes at gmail.com >> *Subject:* Re: [nova] Spec: Standardize CPU resource tracking >> >> On Wed, 2019-06-12 at 09:10 +0000, Shewale, Bhagyashri wrote: >> > Hi All, >> > >> > >> > Currently I am working on implementation of cpu pinning upgrade part as >> mentioned in the spec [1]. >> > >> > >> > While implementing the scheduler pre-filter as mentioned in [1], I have >> encountered one big issue: >> > >> > >> > Proposed change in spec: In scheduler pre-filter we are going to alias >> request_spec.flavor.extra_spec and >> > request_spec.image.properties form ``hw:cpu_policy`` to >> ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. >> > >> > >> > So when user will create a new instance or execute instance actions >> like shelve, unshelve, resize, evacuate and >> > migration post upgrade it will go through scheduler pre-filter which >> will set alias for `hw:cpu_policy` in >> > request_spec flavor ``extra specs`` and image metadata properties. In >> below particular case, it won’t work:- >> > >> > >> > For example: >> > >> > >> > I have two compute nodes say A and B: >> > >> > >> > On Stein: >> > >> > >> > Compute node A configurations: >> > >> > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in >> aggregate which has “pinned” metadata) >> vcpu_pin_set does not mean that the host was used for pinned instances >> https://that.guru/blog/cpu-resources/ >> > >> > >> > Compute node B Configuration: >> > >> > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in >> aggregate which has “pinned” metadata) >> > >> > >> > On Train, two possible scenarios: >> > >> > Compute node A configurations: (Consider the new cpu pinning >> implementation is merged into Train) >> > >> > vcpu_pin_set=0-3 (Keep same settings as in Stein) >> > >> > >> > Compute node B Configuration: (Consider the new cpu pinning >> implementation is merged into Train) >> > >> > cpu_dedicated_set=0-3 (change to the new config option) >> > >> > 1. Consider that one instance say `test ` is created using flavor >> having old extra specs (hw:cpu_policy=dedicated, >> > "aggregate_instance_extra_specs:pinned": "true") in Stein release and >> now upgraded Nova to Train with the above >> > configuration. >> > 2. Now when user will perform instance action say shelve/unshelve >> scheduler pre-filter will change the >> > request_spec flavor extra spec from ``hw:cpu_policy`` to >> ``resources=PCPU:$`` >> it wont remove hw:cpu_policy it will just change the resouces=VCPU:$> of cpus> -> resources=PCPU:$ >> >> > which ultimately will return only compute node B from placement >> service. >> that is incorrect both a and by will be returned. the spec states that >> for host A we report an inventory of 4 VCPUs and >> an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so >> both host will be returned assuming >> $ <=4 >> >> > Here, we expect it should have retuned both Compute A and Compute B. >> it will >> > 3. If user creates a new instance using old extra specs >> (hw:cpu_policy=dedicated, >> > "aggregate_instance_extra_specs:pinned": "true") on Train release with >> the above configuration then it will return >> > only compute node B from placement service where as it should have >> returned both compute Node A and B. >> that is what would have happend in the stien version of the spec and we >> changed the spec specifically to ensure that >> that wont happen. in the train version of the spec you will get both host >> as candates to prevent this upgrade impact. >> > >> > Problem: As Compute node A is still configured to be used to boot >> instances with dedicated CPUs same behavior as >> > Stein, it will not be returned by placement service due to the changes >> in the scheduler pre-filter logic. >> > >> > >> > Propose changes: >> > >> > >> > Earlier in the spec [2]: The online data migration was proposed to >> change flavor extra specs and image metadata >> > properties of request_spec and instance object. Based on the instance >> host, we can get the NumaTopology of the host >> > which will contain the new configuration options set on the compute >> host. Based on the NumaTopology of host, we can >> > change instance and request_spec flavor extra specs. >> > >> > 1. Remove cpu_policy from extra specs >> > 2. Add “resources:PCPU=” in extra specs >> > >> > >> > We can also change the flavor extra specs and image metadata properties >> of instance and request_spec object using the >> > reshape functionality. >> > >> > >> > Please give us your feedback on the proposed solution so that we can >> update specs accordingly. >> i am fairly stongly opposed to useing an online data migration to modify >> the request spec to reflect the host they >> landed on. this speficic problem is why the spec was changed in the train >> cycle to report dual inventoryis of VCPU and >> PCPU if vcpu_pin_set is the only option set or of no options are set. >> > >> > >> > [1]: >> https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 >> > >> > [2]: >> https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst >> > >> > >> > Thanks and Regards, >> > >> > -Bhagyashri Shewale- >> > >> > Disclaimer: This email and any attachments are sent in strictest >> confidence for the sole use of the addressee and may >> > contain legally privileged, confidential, and proprietary data. If you >> are not the intended recipient, please advise >> > the sender by replying promptly to this email and then delete and >> destroy this email and any attachments without any >> > further use, copying or forwarding. >> >> Disclaimer: This email and any attachments are sent in strictest >> confidence for the sole use of the addressee and may contain legally >> privileged, confidential, and proprietary data. If you are not the intended >> recipient, please advise the sender by replying promptly to this email and >> then delete and destroy this email and any attachments without any further >> use, copying or forwarding. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sshnaidm at redhat.com Mon Jun 17 08:53:43 2019 From: sshnaidm at redhat.com (Sagi Shnaidman) Date: Mon, 17 Jun 2019 11:53:43 +0300 Subject: [tripleo][molecule] feedback on testing ansible roles with molecule In-Reply-To: References: Message-ID: On Fri, Jun 7, 2019 at 11:39 PM Emilien Macchi wrote: > is there a driver for podman? if yes, prefer it over docker on fedora. > > Since Saturday we do have driver for podman in Molecule [1]. But it requires for you to have latest ansible from "devel" branch, as podman connector in ansible is pretty new thing. > Otherwise, cool! Thanks for this work. It'll be useful with the > forthcoming work in tripleo-ansible. > -- > Emilien Macchi > [1] https://github.com/ansible/molecule/pull/2098 -- Best regards Sagi Shnaidman -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Mon Jun 17 09:19:18 2019 From: smooney at redhat.com (Sean Mooney) Date: Mon, 17 Jun 2019 10:19:18 +0100 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: <57be9f4f426d3a8ac07c79e3e0d1f4737b09b6af.camel@redhat.com> Message-ID: <4650ab3c8a6e04d4e391325ddb7c2985146ce747.camel@redhat.com> On Mon, 2019-06-17 at 16:45 +0800, Alex Xu wrote: > I'm thinking we should have recommended upgrade follow. If we give a lot of > flexibility for the operator to have a lot combination of the value of > vcpu_pin_set, dedicated_cpu_set and shared_cpu_set, then we have trouble in > this email and have to do a lot of checks this email introduced also. we modified the spec intentionally to make upgradeing simple. i don't be believe the concerns raised in the intial 2 emails are valid if we follow what was detailed in the spec. we did take some steps to restrict what values you can set. for example dedicated_cpu_set cannot be set if vcpu pin set is set. technicall i belive we relaxed that to say we would ignore vcpu pin set in that case be original i was pushing for it to be a hard error. > > I'm thinking that the pre-request filter (which translates the > cpu_policy=dedicated to PCPU request) should be enabled after all the node > upgrades to the Train release. Before that, all the cpu_policy=dedicated > instance still using the VCPU. it should be enabled after all node are upgraded but not nessisarily before all compute nodes are updated to use dedicated_cpu_set. > > Trying to image the upgrade as below: > > 1. Rolling upgrade the compute node. > 2. The upgraded compute node begins to report both VCPU and PCPU, but > reshape for the existed inventories. > The upgraded node is still using the vcpu_pin_set config, or didn't > set the vcpu_pin_config. Both in this two cases are reporting VCPU and PCPU > same time. And the request with cpu_policy=dedicated still uses the VCPU. > Then it is worked same as Stein release. And existed instance can be > shelved/unshelved, migration and evacuate. +1 > 3. Disable the new request and operation for the instance to the hosts for > dedicated instance. (it is kind of breaking our live-upgrade? I thought > this will be a short interrupt for the control plane if that is available) im not sure why we need to do this unless you are thinging this will be done by a cli? e.g. like nova-manage. > 4. reshape the inventories for existed instance for all the hosts. should this not happen when the agent starts up? > 5. Enable the instance's new request and operation, also enable the > pre-request filter. > 6. Operator copies the value of vcpu_pin_set to dedicated_cpu_set. vcpu_pin_set is not the set of cpu used for pinning. the operators should set dedicated_cpu_set and shared_cpu_set approprealy at this point but in general they proably wont just copy it as host that used vcpu_pin_set but were not used for pinned instances will be copied to shared_cpu_set. > For the > case of vcpu_pin_set isn't set, the value of dedicated_cpu_set should be > all the cpu ids exclude shared_cpu_set if set. > > Two rules at here: > 1. The operator doesn't allow to change a different value for > dedicated_cpu_set with vcpu_pin_set when any instance is running on the > host. > 2. The operator doesn't allow to change the value of dedicated_cpu_set and > shared_cpu_set when any instance is running on the host. neither of these rule can be enforced. one of the requirements that dan smith had for edge computeing is that we need to supprot upgraes with instance inplace. > > > > Shewale, Bhagyashri 于2019年6月14日周五 下午4:42写道: > > > > > that is incorrect both a and by will be returned. the spec states that > > > > for host A we report an inventory of 4 VCPUs and > > > > > > an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so > > > > both host will be returned assuming > > > > > > $ <=4 > > > > > > Means if ``vcpu_pin_set`` is set in previous release then report both VCPU > > and PCPU as inventory (in Train) but this seems contradictory for example: > > > > > > On Stein, > > > > > > Configuration on compute node A: > > > > vcpu_pin_set=0-3 (This will report 4 VCPUs inventory in placement database) > > > > > > On Train: > > > > vcpu_pin_set=0-3 > > > > > > The inventory will be reported as 4 VCPUs and 4 PCPUs in the placement db > > > > > > Now say user wants to create instances as below: > > > > 1. Flavor having extra specs (resources:PCPU=1), instance A > > 2. Flavor having extra specs (resources:VCPU=1), instance B > > > > > > For both instance requests, placement will return compute Node A. > > > > Instance A: will be pinned to say 0 CPU > > > > Instance B: will float on 0-3 > > > > > > To resolve above issue, I think it’s possible to detect whether the > > compute node was configured to be used for pinned instances if > > ``NumaTopology`` ``pinned_cpus`` attribute is not empty. In that case, > > vcpu_pin_set will be reported as PCPU otherwise VCPU. > > > > > > Regards, > > > > -Bhagyashri Shewale- > > > > ------------------------------ > > *From:* Sean Mooney > > *Sent:* Thursday, June 13, 2019 8:32:02 PM > > *To:* Shewale, Bhagyashri; openstack-discuss at lists.openstack.org; > > openstack at fried.cc; sfinucan at redhat.com; jaypipes at gmail.com > > *Subject:* Re: [nova] Spec: Standardize CPU resource tracking > > > > On Wed, 2019-06-12 at 09:10 +0000, Shewale, Bhagyashri wrote: > > > Hi All, > > > > > > > > > Currently I am working on implementation of cpu pinning upgrade part as > > > > mentioned in the spec [1]. > > > > > > > > > While implementing the scheduler pre-filter as mentioned in [1], I have > > > > encountered one big issue: > > > > > > > > > Proposed change in spec: In scheduler pre-filter we are going to alias > > > > request_spec.flavor.extra_spec and > > > request_spec.image.properties form ``hw:cpu_policy`` to > > > > ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. > > > > > > > > > So when user will create a new instance or execute instance actions > > > > like shelve, unshelve, resize, evacuate and > > > migration post upgrade it will go through scheduler pre-filter which > > > > will set alias for `hw:cpu_policy` in > > > request_spec flavor ``extra specs`` and image metadata properties. In > > > > below particular case, it won’t work:- > > > > > > > > > For example: > > > > > > > > > I have two compute nodes say A and B: > > > > > > > > > On Stein: > > > > > > > > > Compute node A configurations: > > > > > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate > > > > which has “pinned” metadata) > > vcpu_pin_set does not mean that the host was used for pinned instances > > https://that.guru/blog/cpu-resources/ > > > > > > > > > Compute node B Configuration: > > > > > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate > > > > which has “pinned” metadata) > > > > > > > > > On Train, two possible scenarios: > > > > > > Compute node A configurations: (Consider the new cpu pinning > > > > implementation is merged into Train) > > > > > > vcpu_pin_set=0-3 (Keep same settings as in Stein) > > > > > > > > > Compute node B Configuration: (Consider the new cpu pinning > > > > implementation is merged into Train) > > > > > > cpu_dedicated_set=0-3 (change to the new config option) > > > > > > 1. Consider that one instance say `test ` is created using flavor > > > > having old extra specs (hw:cpu_policy=dedicated, > > > "aggregate_instance_extra_specs:pinned": "true") in Stein release and > > > > now upgraded Nova to Train with the above > > > configuration. > > > 2. Now when user will perform instance action say shelve/unshelve > > > > scheduler pre-filter will change the > > > request_spec flavor extra spec from ``hw:cpu_policy`` to > > > > ``resources=PCPU:$`` > > it wont remove hw:cpu_policy it will just change the resouces=VCPU:$ > of cpus> -> resources=PCPU:$ > > > > > which ultimately will return only compute node B from placement service. > > > > that is incorrect both a and by will be returned. the spec states that for > > host A we report an inventory of 4 VCPUs and > > an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so > > both host will be returned assuming > > $ <=4 > > > > > Here, we expect it should have retuned both Compute A and Compute B. > > > > it will > > > 3. If user creates a new instance using old extra specs > > > > (hw:cpu_policy=dedicated, > > > "aggregate_instance_extra_specs:pinned": "true") on Train release with > > > > the above configuration then it will return > > > only compute node B from placement service where as it should have > > > > returned both compute Node A and B. > > that is what would have happend in the stien version of the spec and we > > changed the spec specifically to ensure that > > that wont happen. in the train version of the spec you will get both host > > as candates to prevent this upgrade impact. > > > > > > Problem: As Compute node A is still configured to be used to boot > > > > instances with dedicated CPUs same behavior as > > > Stein, it will not be returned by placement service due to the changes > > > > in the scheduler pre-filter logic. > > > > > > > > > Propose changes: > > > > > > > > > Earlier in the spec [2]: The online data migration was proposed to > > > > change flavor extra specs and image metadata > > > properties of request_spec and instance object. Based on the instance > > > > host, we can get the NumaTopology of the host > > > which will contain the new configuration options set on the compute > > > > host. Based on the NumaTopology of host, we can > > > change instance and request_spec flavor extra specs. > > > > > > 1. Remove cpu_policy from extra specs > > > 2. Add “resources:PCPU=” in extra specs > > > > > > > > > We can also change the flavor extra specs and image metadata properties > > > > of instance and request_spec object using the > > > reshape functionality. > > > > > > > > > Please give us your feedback on the proposed solution so that we can > > > > update specs accordingly. > > i am fairly stongly opposed to useing an online data migration to modify > > the request spec to reflect the host they > > landed on. this speficic problem is why the spec was changed in the train > > cycle to report dual inventoryis of VCPU and > > PCPU if vcpu_pin_set is the only option set or of no options are set. > > > > > > > > > [1]: > > > > https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 > > > > > > [2]: > > > > https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst > > > > > > > > > Thanks and Regards, > > > > > > -Bhagyashri Shewale- > > > > > > Disclaimer: This email and any attachments are sent in strictest > > > > confidence for the sole use of the addressee and may > > > contain legally privileged, confidential, and proprietary data. If you > > > > are not the intended recipient, please advise > > > the sender by replying promptly to this email and then delete and > > > > destroy this email and any attachments without any > > > further use, copying or forwarding. > > > > Disclaimer: This email and any attachments are sent in strictest > > confidence for the sole use of the addressee and may contain legally > > privileged, confidential, and proprietary data. If you are not the intended > > recipient, please advise the sender by replying promptly to this email and > > then delete and destroy this email and any attachments without any further > > use, copying or forwarding. > > From soulxu at gmail.com Mon Jun 17 09:47:05 2019 From: soulxu at gmail.com (Alex Xu) Date: Mon, 17 Jun 2019 17:47:05 +0800 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: <4650ab3c8a6e04d4e391325ddb7c2985146ce747.camel@redhat.com> References: <57be9f4f426d3a8ac07c79e3e0d1f4737b09b6af.camel@redhat.com> <4650ab3c8a6e04d4e391325ddb7c2985146ce747.camel@redhat.com> Message-ID: Sean Mooney 于2019年6月17日周一 下午5:19写道: > On Mon, 2019-06-17 at 16:45 +0800, Alex Xu wrote: > > I'm thinking we should have recommended upgrade follow. If we give a lot > of > > flexibility for the operator to have a lot combination of the value of > > vcpu_pin_set, dedicated_cpu_set and shared_cpu_set, then we have trouble > in > > this email and have to do a lot of checks this email introduced also. > we modified the spec intentionally to make upgradeing simple. > i don't be believe the concerns raised in the intial 2 emails are valid > if we follow what was detailed in the spec. > we did take some steps to restrict what values you can set. > for example dedicated_cpu_set cannot be set if vcpu pin set is set. > technicall i belive we relaxed that to say we would ignore vcpu pin set in > that case > be original i was pushing for it to be a hard error. > > > > > I'm thinking that the pre-request filter (which translates the > > cpu_policy=dedicated to PCPU request) should be enabled after all the > node > > upgrades to the Train release. Before that, all the cpu_policy=dedicated > > instance still using the VCPU. > it should be enabled after all node are upgraded but not nessisarily before > all compute nodes are updated to use dedicated_cpu_set. > If we enable the pre-request filter in the middle of upgrade, there will have the problem Bhagyashri said. Reporting PCPU and VCPU sametime doesn't resolve the concern from him as my understand. For example, we have 100 nodes for dedicated host in the cluster. The operator begins to upgrade the cluster. The controller plane upgrade first, and the pre-request filter enabled. For rolling upgrade, he begins to upgrade 10 nodes first. Then only those 10 nodes report PCPU and VCPU sametime. But any new request with dedicated cpu policy begins to request PCPU, all of those new instance only can be go to those 10 nodes. Also if the existed instances execute the resize and evacuate, and shelve/unshelve are going to those 10 nodes also. That is kind of nervious on the capacity at that time. > > > > > > Trying to image the upgrade as below: > > > > 1. Rolling upgrade the compute node. > > 2. The upgraded compute node begins to report both VCPU and PCPU, but > > reshape for the existed inventories. > > The upgraded node is still using the vcpu_pin_set config, or didn't > > set the vcpu_pin_config. Both in this two cases are reporting VCPU and > PCPU > > same time. And the request with cpu_policy=dedicated still uses the VCPU. > > Then it is worked same as Stein release. And existed instance can be > > shelved/unshelved, migration and evacuate. > +1 > > 3. Disable the new request and operation for the instance to the hosts > for > > dedicated instance. (it is kind of breaking our live-upgrade? I thought > > this will be a short interrupt for the control plane if that is > available) > im not sure why we need to do this unless you are thinging this will be > done by a cli? e.g. like nova-manage. > The inventories of existed instance still consumes VCPU. As we know the PCPU and VCPU reporting same time, that is kind of duplicated resources. If we begin to consume the PCPU, in the end, it will over consume the resource. yes, the disable request is done by CLI, probably disable the service. > > 4. reshape the inventories for existed instance for all the hosts. > should this not happen when the agent starts up? > > 5. Enable the instance's new request and operation, also enable the > > pre-request filter. > > 6. Operator copies the value of vcpu_pin_set to dedicated_cpu_set. > vcpu_pin_set is not the set of cpu used for pinning. > the operators should set dedicated_cpu_set and shared_cpu_set approprealy > at this point but in general they proably wont just copy it as host that > used vcpu_pin_set but were not used for pinned instances will be copied to > shared_cpu_set. > Yes, I should say this upgrade flow is for those dedicated instance host. For the host only running floating instance, they doesn't have trouble with those problem. > > For the > > case of vcpu_pin_set isn't set, the value of dedicated_cpu_set should be > > all the cpu ids exclude shared_cpu_set if set. > > > > Two rules at here: > > 1. The operator doesn't allow to change a different value for > > dedicated_cpu_set with vcpu_pin_set when any instance is running on the > > host. > > 2. The operator doesn't allow to change the value of dedicated_cpu_set > and > > shared_cpu_set when any instance is running on the host. > neither of these rule can be enforced. one of the requirements that dan > smith had > for edge computeing is that we need to supprot upgraes with instance > inplace. > > > > > > > > Shewale, Bhagyashri 于2019年6月14日周五 > 下午4:42写道: > > > > > > > that is incorrect both a and by will be returned. the spec states > that > > > > > > for host A we report an inventory of 4 VCPUs and > > > > > > > > an inventory of 4 PCPUs and host B will have 1 inventory of 4 > PCPUs so > > > > > > both host will be returned assuming > > > > > > > > $ <=4 > > > > > > > > > Means if ``vcpu_pin_set`` is set in previous release then report both > VCPU > > > and PCPU as inventory (in Train) but this seems contradictory for > example: > > > > > > > > > On Stein, > > > > > > > > > Configuration on compute node A: > > > > > > vcpu_pin_set=0-3 (This will report 4 VCPUs inventory in placement > database) > > > > > > > > > On Train: > > > > > > vcpu_pin_set=0-3 > > > > > > > > > The inventory will be reported as 4 VCPUs and 4 PCPUs in the placement > db > > > > > > > > > Now say user wants to create instances as below: > > > > > > 1. Flavor having extra specs (resources:PCPU=1), instance A > > > 2. Flavor having extra specs (resources:VCPU=1), instance B > > > > > > > > > For both instance requests, placement will return compute Node A. > > > > > > Instance A: will be pinned to say 0 CPU > > > > > > Instance B: will float on 0-3 > > > > > > > > > To resolve above issue, I think it’s possible to detect whether the > > > compute node was configured to be used for pinned instances if > > > ``NumaTopology`` ``pinned_cpus`` attribute is not empty. In that case, > > > vcpu_pin_set will be reported as PCPU otherwise VCPU. > > > > > > > > > Regards, > > > > > > -Bhagyashri Shewale- > > > > > > ------------------------------ > > > *From:* Sean Mooney > > > *Sent:* Thursday, June 13, 2019 8:32:02 PM > > > *To:* Shewale, Bhagyashri; openstack-discuss at lists.openstack.org; > > > openstack at fried.cc; sfinucan at redhat.com; jaypipes at gmail.com > > > *Subject:* Re: [nova] Spec: Standardize CPU resource tracking > > > > > > On Wed, 2019-06-12 at 09:10 +0000, Shewale, Bhagyashri wrote: > > > > Hi All, > > > > > > > > > > > > Currently I am working on implementation of cpu pinning upgrade part > as > > > > > > mentioned in the spec [1]. > > > > > > > > > > > > While implementing the scheduler pre-filter as mentioned in [1], I > have > > > > > > encountered one big issue: > > > > > > > > > > > > Proposed change in spec: In scheduler pre-filter we are going to > alias > > > > > > request_spec.flavor.extra_spec and > > > > request_spec.image.properties form ``hw:cpu_policy`` to > > > > > > ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. > > > > > > > > > > > > So when user will create a new instance or execute instance actions > > > > > > like shelve, unshelve, resize, evacuate and > > > > migration post upgrade it will go through scheduler pre-filter which > > > > > > will set alias for `hw:cpu_policy` in > > > > request_spec flavor ``extra specs`` and image metadata properties. In > > > > > > below particular case, it won’t work:- > > > > > > > > > > > > For example: > > > > > > > > > > > > I have two compute nodes say A and B: > > > > > > > > > > > > On Stein: > > > > > > > > > > > > Compute node A configurations: > > > > > > > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in > aggregate > > > > > > which has “pinned” metadata) > > > vcpu_pin_set does not mean that the host was used for pinned instances > > > https://that.guru/blog/cpu-resources/ > > > > > > > > > > > > Compute node B Configuration: > > > > > > > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in > aggregate > > > > > > which has “pinned” metadata) > > > > > > > > > > > > On Train, two possible scenarios: > > > > > > > > Compute node A configurations: (Consider the new cpu pinning > > > > > > implementation is merged into Train) > > > > > > > > vcpu_pin_set=0-3 (Keep same settings as in Stein) > > > > > > > > > > > > Compute node B Configuration: (Consider the new cpu pinning > > > > > > implementation is merged into Train) > > > > > > > > cpu_dedicated_set=0-3 (change to the new config option) > > > > > > > > 1. Consider that one instance say `test ` is created using flavor > > > > > > having old extra specs (hw:cpu_policy=dedicated, > > > > "aggregate_instance_extra_specs:pinned": "true") in Stein release and > > > > > > now upgraded Nova to Train with the above > > > > configuration. > > > > 2. Now when user will perform instance action say shelve/unshelve > > > > > > scheduler pre-filter will change the > > > > request_spec flavor extra spec from ``hw:cpu_policy`` to > > > > > > ``resources=PCPU:$`` > > > it wont remove hw:cpu_policy it will just change the > resouces=VCPU:$ > > of cpus> -> resources=PCPU:$ > > > > > > > which ultimately will return only compute node B from placement > service. > > > > > > that is incorrect both a and by will be returned. the spec states that > for > > > host A we report an inventory of 4 VCPUs and > > > an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so > > > both host will be returned assuming > > > $ <=4 > > > > > > > Here, we expect it should have retuned both Compute A and Compute B. > > > > > > it will > > > > 3. If user creates a new instance using old extra specs > > > > > > (hw:cpu_policy=dedicated, > > > > "aggregate_instance_extra_specs:pinned": "true") on Train release > with > > > > > > the above configuration then it will return > > > > only compute node B from placement service where as it should have > > > > > > returned both compute Node A and B. > > > that is what would have happend in the stien version of the spec and we > > > changed the spec specifically to ensure that > > > that wont happen. in the train version of the spec you will get both > host > > > as candates to prevent this upgrade impact. > > > > > > > > Problem: As Compute node A is still configured to be used to boot > > > > > > instances with dedicated CPUs same behavior as > > > > Stein, it will not be returned by placement service due to the > changes > > > > > > in the scheduler pre-filter logic. > > > > > > > > > > > > Propose changes: > > > > > > > > > > > > Earlier in the spec [2]: The online data migration was proposed to > > > > > > change flavor extra specs and image metadata > > > > properties of request_spec and instance object. Based on the instance > > > > > > host, we can get the NumaTopology of the host > > > > which will contain the new configuration options set on the compute > > > > > > host. Based on the NumaTopology of host, we can > > > > change instance and request_spec flavor extra specs. > > > > > > > > 1. Remove cpu_policy from extra specs > > > > 2. Add “resources:PCPU=” in extra specs > > > > > > > > > > > > We can also change the flavor extra specs and image metadata > properties > > > > > > of instance and request_spec object using the > > > > reshape functionality. > > > > > > > > > > > > Please give us your feedback on the proposed solution so that we can > > > > > > update specs accordingly. > > > i am fairly stongly opposed to useing an online data migration to > modify > > > the request spec to reflect the host they > > > landed on. this speficic problem is why the spec was changed in the > train > > > cycle to report dual inventoryis of VCPU and > > > PCPU if vcpu_pin_set is the only option set or of no options are set. > > > > > > > > > > > > [1]: > > > > > > > https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 > > > > > > > > [2]: > > > > > > > https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst > > > > > > > > > > > > Thanks and Regards, > > > > > > > > -Bhagyashri Shewale- > > > > > > > > Disclaimer: This email and any attachments are sent in strictest > > > > > > confidence for the sole use of the addressee and may > > > > contain legally privileged, confidential, and proprietary data. If > you > > > > > > are not the intended recipient, please advise > > > > the sender by replying promptly to this email and then delete and > > > > > > destroy this email and any attachments without any > > > > further use, copying or forwarding. > > > > > > Disclaimer: This email and any attachments are sent in strictest > > > confidence for the sole use of the addressee and may contain legally > > > privileged, confidential, and proprietary data. If you are not the > intended > > > recipient, please advise the sender by replying promptly to this email > and > > > then delete and destroy this email and any attachments without any > further > > > use, copying or forwarding. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdent+os at anticdent.org Mon Jun 17 10:02:30 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Mon, 17 Jun 2019 11:02:30 +0100 (BST) Subject: [placement] candidate logo Message-ID: https://burningchrome.com/magpie2.png Is the latest iteration of the candidate placement logo. Please shout out if it's not okay. It's been through a few iterations of feedback to get it to a form closer to what we discussed a couple of weeks ago. For those not aware: The image is of an Australian Magpie. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From sfinucan at redhat.com Mon Jun 17 10:10:28 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Mon, 17 Jun 2019 11:10:28 +0100 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: ,<3b7955951183eff75b643b0cfc91c88bbf737e62.camel@redhat.com> Message-ID: <394fd3b198d72ccc533b504b20cc86d2da1a5210.camel@redhat.com> [Cleaning up the 'To' field since Jay isn't working on OpenStack anymore and everyone else is on openstack-discuss already] On Fri, 2019-06-14 at 08:35 +0000, Shewale, Bhagyashri wrote: > > cpu_share_set in stien was used for vm emulator thread and required > > the instnace to be pinned for it to take effect. i.e. the > > hw:emulator_thread_policy extra spcec currently only works if you > > had hw_cpu_policy=dedicated so we should not error if vcpu_pin_set > > and cpu_shared_set are defined, it was valid. what we can do is > > ignore teh cpu_shared_set for schduling and not report 0 VCPUs > > for this host and use vcpu_pinned_set as PCPUs. > Thinking of backward compatibility, I agree both of these > configuration options ``cpu_shared_set``, ``vcpu_pinned_set`` should > be allowed in Train release as well. > > Few possible combinations in train: > A) What if only ``cpu_shared_set`` is set on a new compute node? > Report VCPU inventory. I think this is _very_ unlikely to happen in the real world since the lack of a 'vcpu_pin_set' option means an instances pinned CPUs could co-exist on the same cores as the emulator threats, which defeats the whole point of placing emulator threads on a separate core. That said, it's possible so we do have to deal with it. Ignore 'cpu_shared_set' in this case and issue a warning saying that the user has to configure 'cpu_dedicated_set'. > B) what if ``cpu_shared_set`` and ``cpu_dedicated_set`` are set on > a new compute node? Report VCPU and PCPU inventory. In fact, we > want to support both these options so that instance can request both > VCPU and PCPU at the same time. If flavor requests VCPU or > hw:emulator_thread_policy=share, in both the cases, it will float on > CPUs set in ``cpu_shared_set`` config option. We should report both VCPU and PCPU inventory, yes. However, please don't add the ability to create a single instance with combined VCPU and PCPU inventory. I dropped this from the spec intentionally to make it easier for something (_anything_) to land. We can iterate on this once we have the basics done. > C) What if ``cpu_shared_set`` and ``vcpu_pin_set`` are set on a new > compute node? Ignore cpu_shared_set and report vcpu_pinned_set as > VCPU or PCPU? As above, ignore 'cpu_shared_set' but issue a warning. Use the value of 'vcpu_pin_set' to report both VCPU and PCPU inventory. Note that 'vcpu_pin_set' is already used to calculate VCPU inventory. https://opendev.org/openstack/nova/src/branch/master/nova/virt/libvirt/driver.py#L5808-L5811 > D) What if ``cpu_shared_set`` and ``vcpu_pin_set`` are set on a > upgraded compute node? As you have mentioned, ignore cpu_shared_set > and report vcpu_pinned_set as PCPUs provided ``NumaTopology`` > ,``pinned_cpus`` attribute is not empty otherwise VCPU. Ignore 'cpu_shared_set' but issue a warning. Use the value of 'vcpu_pin_set' to report both VCPU and PCPU inventory. Note that 'vcpu_pin_set' is already used to calculate VCPU inventory. > > we explctly do not want to have the behavior in 3 and 4 specificly > > the logic of checking the instances. > > Here we are checking Host ``NumaTopology`` ,``pinned_cpus`` > attribute and not directly instances ( if that attribute is not > empty that means some instance are running) and this logic will be > needed to address above #D case. You shouldn't need to do this. Rely solely on configuration options to determine inventory, even if it means reporting more inventory than we actually have (reporting of a host core as both units of VCPU and PCPU) and hope that operators have correctly used host aggregrates to isolate NUMA-based instances from non-NUMA-based instances. I realize this is very much in flux but could you please push what you have up for review, marked as WIP or such. Debating this stuff in the code might be easier. Stephen From sfinucan at redhat.com Mon Jun 17 10:19:57 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Mon, 17 Jun 2019 11:19:57 +0100 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: ,<57be9f4f426d3a8ac07c79e3e0d1f4737b09b6af.camel@redhat.com> Message-ID: On Fri, 2019-06-14 at 08:37 +0000, Shewale, Bhagyashri wrote: > >> that is incorrect both a and by will be returned. the spec states > that for host A we report an inventory of 4 VCPUs and > >> an inventory of 4 PCPUs and host B will have 1 inventory of 4 > PCPUs so both host will be returned assuming > >> $ <=4 > > Means if ``vcpu_pin_set`` is set in previous release then report both > VCPU and PCPU as inventory (in Train) but this seems contradictory > for example: > > On Stein, > > Configuration on compute node A: > vcpu_pin_set=0-3 (This will report 4 VCPUs inventory in placement > database) > > On Train: > vcpu_pin_set=0-3 > > The inventory will be reported as 4 VCPUs and 4 PCPUs in the > placement db > > Now say user wants to create instances as below: > Flavor having extra specs (resources:PCPU=1), instance A > Flavor having extra specs (resources:VCPU=1), instance B > > For both instance requests, placement will return compute Node A. > Instance A: will be pinned to say 0 CPU > Instance B: will float on 0-3 This is not a serious issue. This is very similar to what will happen today if you don't use host aggregrates to isolate NUMA-based instances from non-NUMA-based instances. If you can assume that operators are using host aggregates to separate pinned and unpinned instance, then the VCPU inventory of a host in the 'pinned' aggregrate will never be consumed and vice versa. > To resolve above issue, I think it’s possible to detect whether the > compute node was configured to be used for pinned instances if > ``NumaTopology`` ``pinned_cpus`` attribute is not empty. In that > case, vcpu_pin_set will be reported as PCPU otherwise VCPU. This only works if the host already has instances on it. If you've a deployment with 100 hosts and 82 of them have instances on there at the time of upgrade, then 82 will start reporting PCPU inventory and 18 will continue reporting just VCPU inventory. We thought long and hard about this and there is no good heuristic we can use to separate hosts that should report PCPUs from those that should report VCPUs. That's why we said we'll report both and hope that host aggregrates are configured correctly. If host aggregrates aren't configured, then things are no more broken than before but at least the operator will now get warnings (above missing 'cpu_dedicated_set' options). As before, please push some of this code up so we can start reviewing it. Stephen From alifshit at redhat.com Mon Jun 17 12:04:44 2019 From: alifshit at redhat.com (Artom Lifshitz) Date: Mon, 17 Jun 2019 08:04:44 -0400 Subject: [placement] candidate logo In-Reply-To: References: Message-ID: I have no horse in this race, but that magpie looks... evil. Was that intentional? On Mon, Jun 17, 2019, 06:08 Chris Dent, wrote: > > https://burningchrome.com/magpie2.png > > Is the latest iteration of the candidate placement logo. > Please shout out if it's not okay. It's been through a few > iterations of feedback to get it to a form closer to what we > discussed a couple of weeks ago. > > For those not aware: The image is of an Australian Magpie. > > -- > Chris Dent ٩◔̯◔۶ https://anticdent.org/ > freenode: cdent -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Mon Jun 17 12:11:40 2019 From: smooney at redhat.com (Sean Mooney) Date: Mon, 17 Jun 2019 13:11:40 +0100 Subject: [placement] candidate logo In-Reply-To: References: Message-ID: <3857aa6b8dffb621d881c0d60c99091d190e5f1c.camel@redhat.com> On Mon, 2019-06-17 at 11:02 +0100, Chris Dent wrote: > https://burningchrome.com/magpie2.png > > Is the latest iteration of the candidate placement logo. > Please shout out if it's not okay. It's been through a few > iterations of feedback to get it to a form closer to what we > discussed a couple of weeks ago. > > For those not aware: The image is of an Australian Magpie. i like the image in general but i think the red and the angle of the iris sets the wrong tone. it reads as slightly arragant/sinister its kind of hard to edit a 640*640 png without and get the quality you would want for a logo but i mocked up a slight change that i think makes it more cheerful and aprochable. i think the rotation of the iris is the main thing. the more hoizontal placement of the in the origial makes the image look like the magpie si look over its sholder at you where as the more vertical inclanation in my modifed version read less like its stearing at you. looking at the original because of its stance and the perception its watching you it read more like a raven giving me an edger alan po vibe but i think the modifed version could be a magpie. the cyan/blue tones also read much more like the magpies i am used to seeing https://download.ams.birds.cornell.edu/api/v1/asset/70580781/1800 https://www.audubon.org/sites/default/files/styles/grid_gallery_lightbox/public/apa_2015_michaellabarbera_282789_black-billed_magpie_kk-adult.jpg?itok=ceft_M1F granted the colouration is mostly on there wings not around there eyes but it is a log after all :) > -------------- next part -------------- A non-text attachment was scrubbed... Name: magpie3.png Type: image/png Size: 23150 bytes Desc: not available URL: From sombrafam at gmail.com Mon Jun 17 12:28:59 2019 From: sombrafam at gmail.com (Erlon Cruz) Date: Mon, 17 Jun 2019 09:28:59 -0300 Subject: [placement] candidate logo In-Reply-To: <3857aa6b8dffb621d881c0d60c99091d190e5f1c.camel@redhat.com> References: <3857aa6b8dffb621d881c0d60c99091d190e5f1c.camel@redhat.com> Message-ID: Lol, had the same feeling about the evilness of the bird. Em seg, 17 de jun de 2019 às 09:16, Sean Mooney escreveu: > On Mon, 2019-06-17 at 11:02 +0100, Chris Dent wrote: > > https://burningchrome.com/magpie2.png > > > > Is the latest iteration of the candidate placement logo. > > Please shout out if it's not okay. It's been through a few > > iterations of feedback to get it to a form closer to what we > > discussed a couple of weeks ago. > > > > For those not aware: The image is of an Australian Magpie. > i like the image in general but i think the red and the angle of the > iris sets the wrong tone. it reads as slightly arragant/sinister > > its kind of hard to edit a 640*640 png without and get the quality you > would want > for a logo but i mocked up a slight change that i think makes it more > cheerful and > aprochable. i think the rotation of the iris is the main thing. > > the more hoizontal placement of the in the origial makes the image look > like the > magpie si look over its sholder at you where as the more vertical > inclanation > in my modifed version read less like its stearing at you. > > looking at the original because of its stance and the perception its > watching you it > read more like a raven giving me an edger alan po vibe but i think the > modifed version > could be a magpie. the cyan/blue tones also read much more like the > magpies i am used > to seeing > https://download.ams.birds.cornell.edu/api/v1/asset/70580781/1800 > > > https://www.audubon.org/sites/default/files/styles/grid_gallery_lightbox/public/apa_2015_michaellabarbera_282789_black-billed_magpie_kk-adult.jpg?itok=ceft_M1F > > granted the colouration is mostly on there wings not around there eyes but > it is a log after all :) > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sfinucan at redhat.com Mon Jun 17 12:47:26 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Mon, 17 Jun 2019 13:47:26 +0100 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: <57be9f4f426d3a8ac07c79e3e0d1f4737b09b6af.camel@redhat.com> <4650ab3c8a6e04d4e391325ddb7c2985146ce747.camel@redhat.com> Message-ID: On Mon, 2019-06-17 at 17:47 +0800, Alex Xu wrote: > Sean Mooney 于2019年6月17日周一 下午5:19写道: > > On Mon, 2019-06-17 at 16:45 +0800, Alex Xu wrote: > > > I'm thinking we should have recommended upgrade follow. If we > > > give a lot of flexibility for the operator to have a lot > > > combination of the value of vcpu_pin_set, dedicated_cpu_set and > > > shared_cpu_set, then we have trouble in this email and have to do > > > a lot of checks this email introduced also. > > > > we modified the spec intentionally to make upgradeing simple. > > i don't be believe the concerns raised in the intial 2 emails are > > valid if we follow what was detailed in the spec. > > we did take some steps to restrict what values you can set. > > for example dedicated_cpu_set cannot be set if vcpu pin set is set. > > technicall i belive we relaxed that to say we would ignore vcpu pin > > set in that case be original i was pushing for it to be a hard > > error. > > > > > I'm thinking that the pre-request filter (which translates the > > > cpu_policy=dedicated to PCPU request) should be enabled after all > > > the node upgrades to the Train release. Before that, all the > > > cpu_policy=dedicated instance still using the VCPU. > > > > it should be enabled after all node are upgraded but not > > nessisarily before all compute nodes are updated to use > > dedicated_cpu_set. > > If we enable the pre-request filter in the middle of upgrade, there > will have the problem Bhagyashri said. Reporting PCPU and VCPU > sametime doesn't resolve the concern from him as my understand. > > For example, we have 100 nodes for dedicated host in the cluster. > > The operator begins to upgrade the cluster. The controller plane > upgrade first, and the pre-request filter enabled. > For rolling upgrade, he begins to upgrade 10 nodes first. Then only > those 10 nodes report PCPU and VCPU sametime. > But any new request with dedicated cpu policy begins to request PCPU, > all of those new instance only can be go to those 10 nodes. Also > if the existed instances execute the resize and evacuate, and > shelve/unshelve are going to those 10 nodes also. That is kind of > nervious on the capacity at that time. The exact same issue can happen the other way around. As an operator slowly starts upgrading, by setting the necessary configuration options, the compute nodes will reduce the VCPU inventory they report and start reporting PCPU inventory. Using the above example, if we upgraded 90 of the 100 compute nodes and didn't enable the prefilter, we would only be able to schedule to one of the remaining 10 nodes. This doesn't seem any better. At some point we're going to need to make a clean break from pinned instances consuming VCPU resources to them using PCPU resources. When that happens is up to us. I figured it was easiest to do this as soon as the controllers were updated because I had assumed compute nodes would be updated pretty soon after the controllers and therefore there would only be a short window where instances would start requesting PCPU resources but there wouldn't be any available. Maybe that doesn't make sense though. If not, I guess we need to make this configurable. I propose that as soon as compute nodes are upgraded then they will all start reporting PCPU inventory, as noted in the spec. However, the prefilter will initially be disabled and we will not reshape existing inventories. This means pinned instances will continue consuming VCPU resources as before but that is not an issue since this is the behavior we currently have. Once the operator is happy that all of the compute nodes have been upgraded, or at least enough that they care about, we will then need some way for us to switch on the prefilter and reshape existing instances. Perhaps this would require manual configuration changes, validated by an upgrade check, or perhaps we could add a workaround config option? In any case, at some point we need to have a switch from "use VCPUs for pinned instances" to "use PCPUs for pinned instances". Stephen > > > Trying to image the upgrade as below: > > > > > > 1. Rolling upgrade the compute node. > > > 2. The upgraded compute node begins to report both VCPU and PCPU, > > > but reshape for the existed inventories. > > > The upgraded node is still using the vcpu_pin_set config, or > > > didn't set the vcpu_pin_config. Both in this two cases are > > > reporting VCPU and PCPU same time. And the request with > > > cpu_policy=dedicated still uses the VCPU. > > > Then it is worked same as Stein release. And existed instance can > > > be shelved/unshelved, migration and evacuate. > > > > +1 > > > > > 3. Disable the new request and operation for the instance to the > > > hosts for dedicated instance. (it is kind of breaking our live- > > > upgrade? I thought this will be a short interrupt for the control > > > plane if that is available) > > > > im not sure why we need to do this unless you are thinging this > > will be done by a cli? e.g. like nova-manage. > > The inventories of existed instance still consumes VCPU. As we know > the PCPU and VCPU reporting same time, that is kind of duplicated > resources. If we begin to consume the PCPU, in the end, it will over > consume the resource. > > yes, the disable request is done by CLI, probably disable the > service. > > > > 4. reshape the inventories for existed instance for all the > > > hosts. > > > > should this not happen when the agent starts up? > > > > > 5. Enable the instance's new request and operation, also enable > > > the pre-request filter. > > > 6. Operator copies the value of vcpu_pin_set to > > > dedicated_cpu_set. > > > > vcpu_pin_set is not the set of cpu used for pinning. the operators > > should set dedicated_cpu_set and shared_cpu_set approprealy at this > > point but in general they proably wont just copy it as host that > > used vcpu_pin_set but were not used for pinned instances will be > > copied to shared_cpu_set. > > Yes, I should say this upgrade flow is for those dedicated instance > host. For the host only running floating instance, they doesn't have > trouble with those problem. > > > > For the case of vcpu_pin_set isn't set, the value of > > > dedicated_cpu_set should be all the cpu ids exclude > > > shared_cpu_set if set. > > > > > > Two rules at here: > > > 1. The operator doesn't allow to change a different value for > > > dedicated_cpu_set with vcpu_pin_set when any instance is running > > > on the host. > > > 2. The operator doesn't allow to change the value of > > > dedicated_cpu_set and shared_cpu_set when any instance is running > > > on the host. > > > > neither of these rule can be enforced. one of the requirements that dan smith had > > for edge computeing is that we need to supprot upgraes with instance inplace. From sombrafam at gmail.com Mon Jun 17 12:48:46 2019 From: sombrafam at gmail.com (Erlon Cruz) Date: Mon, 17 Jun 2019 09:48:46 -0300 Subject: [placement] candidate logo In-Reply-To: References: <3857aa6b8dffb621d881c0d60c99091d190e5f1c.camel@redhat.com> Message-ID: This one could render a nice logo: https://static.standard.co.uk/s3fs-public/thumbnails/image/2018/09/26/13/magpie-shane-miller.jpg?w968 Em seg, 17 de jun de 2019 às 09:28, Erlon Cruz escreveu: > Lol, had the same feeling about the evilness of the bird. > > Em seg, 17 de jun de 2019 às 09:16, Sean Mooney > escreveu: > >> On Mon, 2019-06-17 at 11:02 +0100, Chris Dent wrote: >> > https://burningchrome.com/magpie2.png >> > >> > Is the latest iteration of the candidate placement logo. >> > Please shout out if it's not okay. It's been through a few >> > iterations of feedback to get it to a form closer to what we >> > discussed a couple of weeks ago. >> > >> > For those not aware: The image is of an Australian Magpie. >> i like the image in general but i think the red and the angle of the >> iris sets the wrong tone. it reads as slightly arragant/sinister >> >> its kind of hard to edit a 640*640 png without and get the quality you >> would want >> for a logo but i mocked up a slight change that i think makes it more >> cheerful and >> aprochable. i think the rotation of the iris is the main thing. >> >> the more hoizontal placement of the in the origial makes the image look >> like the >> magpie si look over its sholder at you where as the more vertical >> inclanation >> in my modifed version read less like its stearing at you. >> >> looking at the original because of its stance and the perception its >> watching you it >> read more like a raven giving me an edger alan po vibe but i think the >> modifed version >> could be a magpie. the cyan/blue tones also read much more like the >> magpies i am used >> to seeing >> https://download.ams.birds.cornell.edu/api/v1/asset/70580781/1800 >> >> >> https://www.audubon.org/sites/default/files/styles/grid_gallery_lightbox/public/apa_2015_michaellabarbera_282789_black-billed_magpie_kk-adult.jpg?itok=ceft_M1F >> >> granted the colouration is mostly on there wings not around there eyes >> but it is a log after all :) >> >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtantsur at redhat.com Mon Jun 17 12:55:11 2019 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Mon, 17 Jun 2019 14:55:11 +0200 Subject: [ironic] To not have meetings? In-Reply-To: References: Message-ID: I think replacing the IRC meeting with an "ML meeting" + IRC office hours is an interesting idea to try, +1 On 6/10/19 4:01 PM, Julia Kreger wrote: > Last week the discussion came up of splitting the ironic meeting to > alternate time zones as we have increasing numbers of contributors in > the Asia/Pacific areas of the world[0]. With that discussion, an > additional interesting question came up posing the question of > shifting to the mailing list instead of our present IRC meeting[1]? > > It is definitely an interesting idea, one that I'm personally keen on > because of time zones and daylight savings time. > > I think before we do this, we should collect thoughts and also try to > determine how we would pull this off so we don't forget the weekly > checkpoint that the meeting serves. I think we need to do something, > so I guess now is a good time to provide input into what everyone > thinks would be best for the project and facilitating the weekly > check-in. > > What I think might work: > > By EOD UTC Monday: > > * Listed primary effort participants will be expected to update the > whiteboard[2] weekly before EOD Monday UTC > * Contributors propose patches to the whiteboard that they believe > would be important for reviewers to examine this coming week. > * PTL or designee sends weekly email to the mailing list to start an > update thread shortly after EOD Monday UTC or early Tuesday UTC. > ** Additional updates, questions, and topical discussion (new > features, RFEs) would ideally be wrapped up by EOD UTC Tuesday. > > With that, I think we would also need to go ahead and begin having > "office hours" as during the week we generally know some ironic > contributors will be in IRC and able to respond to questions. I think > this would initially consist of our meeting time and perhaps the other > time that seems to be most friendly to the contributors int he > Asia/Pacific area[3]. > > Thoughts/ideas/suggestions welcome! > > -Julia > > [0]: http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-03.log.html#t2019-06-03T15:31:33 > [1]: http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-03.log.html#t2019-06-03T15:43:16 > [2]: https://etherpad.openstack.org/p/IronicWhiteBoard > [3]: https://doodle.com/poll/bv9a4qyqy44wiq92 > From cdent+os at anticdent.org Mon Jun 17 12:55:40 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Mon, 17 Jun 2019 13:55:40 +0100 (BST) Subject: [placement] candidate logo In-Reply-To: <3857aa6b8dffb621d881c0d60c99091d190e5f1c.camel@redhat.com> References: <3857aa6b8dffb621d881c0d60c99091d190e5f1c.camel@redhat.com> Message-ID: On Mon, 17 Jun 2019, Sean Mooney wrote: > On Mon, 2019-06-17 at 11:02 +0100, Chris Dent wrote: >> https://burningchrome.com/magpie2.png >> >> Is the latest iteration of the candidate placement logo. >> Please shout out if it's not okay. It's been through a few >> iterations of feedback to get it to a form closer to what we >> discussed a couple of weeks ago. >> >> For those not aware: The image is of an Australian Magpie. > i like the image in general but i think the red and the angle of the > iris sets the wrong tone. it reads as slightly arragant/sinister It's red because it is an australian magpie, which has red eyes. It is supposed to be somewhat sinister/snarky/bossing-you-about-where-to-place-your-stuff. An earlier version was even more so, and a smile was added to make it look less so, and that was too much. So this tries to strike a bit of a balance. > its kind of hard to edit a 640*640 png without and get the quality you would want > for a logo but i mocked up a slight change that i think makes it more cheerful and > aprochable. i think the rotation of the iris is the main thing. If you view the image in a smaller size (which it will often be) the red goes more like a glint. > the more hoizontal placement of the in the origial makes the image look like the > magpie si look over its sholder at you where as the more vertical inclanation > in my modifed version read less like its stearing at you. It is supposed to be looking back at you a bit. Based on https://en.wikipedia.org/wiki/Australian_magpie#/media/File:Magpie_samcem05.jpg > looking at the original because of its stance and the perception its watching you it > read more like a raven giving me an edger alan po vibe but i think the modifed version > could be a magpie. the cyan/blue tones also read much more like the magpies i am used > to seeing https://download.ams.birds.cornell.edu/api/v1/asset/70580781/1800 That picture is a European Magpie, completely different bird. An australian magpie isn't really a magpie. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From openstack at fried.cc Mon Jun 17 13:08:33 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 17 Jun 2019 08:08:33 -0500 Subject: [placement] candidate logo In-Reply-To: References: <3857aa6b8dffb621d881c0d60c99091d190e5f1c.camel@redhat.com> Message-ID: I love it, as is, sinister and arrogant and snarky. Don't change a thing. efried . From smooney at redhat.com Mon Jun 17 13:42:57 2019 From: smooney at redhat.com (Sean Mooney) Date: Mon, 17 Jun 2019 14:42:57 +0100 Subject: [placement] candidate logo In-Reply-To: References: <3857aa6b8dffb621d881c0d60c99091d190e5f1c.camel@redhat.com> Message-ID: <6c15148811d7884f6035c5e3932f6ab9c80afa4b.camel@redhat.com> On Mon, 2019-06-17 at 08:08 -0500, Eric Fried wrote: > I love it, as is, sinister and arrogant and snarky. Don't change a thing. well i actully like the proposed one too. just wanted to make sure that you wanted to capture that aspect of its personality in the depiction. sounds like it convayed those charateristic well so if people like it then its fine with me :) > > efried > . > > From smooney at redhat.com Mon Jun 17 12:55:46 2019 From: smooney at redhat.com (Sean Mooney) Date: Mon, 17 Jun 2019 13:55:46 +0100 Subject: [placement] candidate logo In-Reply-To: References: <3857aa6b8dffb621d881c0d60c99091d190e5f1c.camel@redhat.com> Message-ID: <1a287f5053bdc8e147dfcf2b684cf3460397f256.camel@redhat.com> On Mon, 2019-06-17 at 09:28 -0300, Erlon Cruz wrote: > Lol, had the same feeling about the evilness of the bird. the ausie magpies do look a lot more sinister then the european ones https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Magpie_samcem05.jpg/1280px-Magpie_samcem05.jpg https://natgeo.imgix.net/factsheets/thumbnails/magpie.jpg?auto=compress,format&w=1024&h=560&fit=crop but i think the modifcation i made help reduce that. the logo does look quite like https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Magpie_samcem05.jpg/1280px-Magpie_samcem05.jpg so if the tone is intentional then i think its pretty accurate depection. unlike the australian magpie the European ones dont attack humans. they would not last long in ireland if they tried unless they evolved to withstand a hurly to the face. swans on the other hand can be vious as hell so if they start heading your direct you get out of there way :) /me gets teh feeling ausie magpies might be more like swans. > > Em seg, 17 de jun de 2019 às 09:16, Sean Mooney > escreveu: > > > On Mon, 2019-06-17 at 11:02 +0100, Chris Dent wrote: > > > https://burningchrome.com/magpie2.png > > > > > > Is the latest iteration of the candidate placement logo. > > > Please shout out if it's not okay. It's been through a few > > > iterations of feedback to get it to a form closer to what we > > > discussed a couple of weeks ago. > > > > > > For those not aware: The image is of an Australian Magpie. > > > > i like the image in general but i think the red and the angle of the > > iris sets the wrong tone. it reads as slightly arragant/sinister > > > > its kind of hard to edit a 640*640 png without and get the quality you > > would want > > for a logo but i mocked up a slight change that i think makes it more > > cheerful and > > aprochable. i think the rotation of the iris is the main thing. > > > > the more hoizontal placement of the in the origial makes the image look > > like the > > magpie si look over its sholder at you where as the more vertical > > inclanation > > in my modifed version read less like its stearing at you. > > > > looking at the original because of its stance and the perception its > > watching you it > > read more like a raven giving me an edger alan po vibe but i think the > > modifed version > > could be a magpie. the cyan/blue tones also read much more like the > > magpies i am used > > to seeing > > https://download.ams.birds.cornell.edu/api/v1/asset/70580781/1800 > > > > > > https://www.audubon.org/sites/default/files/styles/grid_gallery_lightbox/public/apa_2015_michaellabarbera_282789_black-billed_magpie_kk-adult.jpg?itok=ceft_M1F > > > > granted the colouration is mostly on there wings not around there eyes but > > it is a log after all :) > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: magpie2.png Type: image/png Size: 28597 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: magpie3.png Type: image/png Size: 23150 bytes Desc: not available URL: From rafaelweingartner at gmail.com Mon Jun 17 14:44:25 2019 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Mon, 17 Jun 2019 11:44:25 -0300 Subject: [telemetry] volume_type_id stored instead of volume_type name In-Reply-To: <4081acb6-be89-3249-e535-67c192be3743@everyware.ch> References: <4081acb6-be89-3249-e535-67c192be3743@everyware.ch> Message-ID: Just to update this on a public list: When the event data is created, Cinder is setting the volume type as the volume type ID. On the other hand, the pollsters are always updating the volume_type value in Gnocchi, but for every event, this value is then updated back to the ID (it is like a "flight" between these two subsystems to see who updates the status last). The thing is that, deleted volumes are not polled. Therefore, the last update is executed by the event processing systems, which updates the value to the volume type ID. This explains why only deleted volumes were perceived with the volume type being set as the ID. So, to fix it. We need to decide which "standard" we want to use. At first sight, it looks more natural (when integrating system-system via API) to use an ID, but for humans when accessing Cinder API, it might be better to see the volume type name instead. Because the API is publicly available, and we never know what consumes it, I would say that changing from volume type name to volume type ID can break things that we are not aware of. On the other hand, fixing the event data should not break anything because we know (hopefully) where that message is pushed to. We decided to move on (internally), and fix the event creation code, and use the volume type name there. As soon as we finish internally, we will open a push upstream PR. On Thu, Jun 6, 2019 at 5:22 AM Florian Engelmann < florian.engelmann at everyware.ch> wrote: > Hi, > > some volumes are stored with the volume_type Id instead of the > volume_type name: > > openstack metric resource history --details > b5496a42-c766-4267-9248-6149aa9dd483 -c id -c revision_start -c > revision_end -c instance_id -c volume_type > > +--------------------------------------+----------------------------------+----------------------------------+--------------------------------------+--------------------------------------+ > | id | revision_start | revision_end > | instance_id | volume_type > | > > +--------------------------------------+----------------------------------+----------------------------------+--------------------------------------+--------------------------------------+ > | b5496a42-c766-4267-9248-6149aa9dd483 | > 2019-05-08T07:21:35.354474+00:00 | 2019-05-21T09:18:32.767426+00:00 | > 662998da-c3d1-45c5-9120-2cff6240e3b6 | v-ssd-std | > | b5496a42-c766-4267-9248-6149aa9dd483 | > 2019-05-21T09:18:32.767426+00:00 | 2019-05-21T09:18:32.845700+00:00 | > 662998da-c3d1-45c5-9120-2cff6240e3b6 | v-ssd-std | > | b5496a42-c766-4267-9248-6149aa9dd483 | > 2019-05-21T09:18:32.845700+00:00 | None | > 662998da-c3d1-45c5-9120-2cff6240e3b6 | > 8bd7e1b1-3396-49bf-802c-8c31a9444895 | > > +--------------------------------------+----------------------------------+----------------------------------+--------------------------------------+--------------------------------------+ > > > I was not able to find anything fishy in ceilometer. So I guess it could > be some event/notification with a wrong payload? > > Could anyone please verify this error is not uniq to our (rocky) > environment by running: > > openstack metric resource list --type volume -c id -c volume_type > > > All the best, > Florian > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Mon Jun 17 15:46:37 2019 From: openstack at nemebean.com (Ben Nemec) Date: Mon, 17 Jun 2019 10:46:37 -0500 Subject: [all][qinling] Please check your README files In-Reply-To: <76f8a665-20f9-eadd-0ba5-bcf0dd10c66d@linaro.org> References: <2d210f25-db54-5822-f54f-28283adbadbd@nemebean.com> <76f8a665-20f9-eadd-0ba5-bcf0dd10c66d@linaro.org> Message-ID: On 5/24/19 3:20 AM, Marcin Juszkiewicz wrote: > W dniu 23.05.2019 o 15:50, Ben Nemec pisze: > >> Hmm, is this because of https://bugs.launchpad.net/pbr/+bug/1704472 ? >> >> If so, we should just fix it in pbr. I have a patch up to do that >> (https://review.opendev.org/#/c/564874) but I haven't gotten around to >> writing tests for it. I'll try to get that done shortly. > > I provided better description example for that patch. Based on changes > done in some projects. > > u'UTF-8 description can contain misc Unicode “quotes”, ’apostrophes’, > multiple dots like “…“, misc dashes like “–“ for example. Some projects > also use IPA to show pronounciation of their name so chars like ”ʃŋ” can > happen.' > Okay, the fix for this should be available in pbr 5.3.0. If everyone who was affected by this bug could verify 5.3.0 fixes the problem that would be great! Thanks. -Ben From bcafarel at redhat.com Mon Jun 17 17:06:18 2019 From: bcafarel at redhat.com (Bernard Cafarelli) Date: Mon, 17 Jun 2019 19:06:18 +0200 Subject: [neutron] Zuul checks fail In-Reply-To: References: Message-ID: On Sun, 16 Jun 2019 at 20:53, Bernard Cafarelli wrote: > Hi neutrinos, > > a quick heads-up that currently Zuul will 100% give -1 on your reviews, > failing 404 Not Found when trying to download Ubuntu Xenial image. > > Thanks to Yulong for filling launchpad bug [0] and submitting fix [1]. > Once it is merged, we should be back in business > > [0] https://bugs.launchpad.net/neutron/+bug/1832968 > [1] https://review.opendev.org/#/c/665530/ > That fix is now merged, time to recheck your reviews! Note this also impacted stable branches, sorry I did not make it explicit -- Bernard Cafarelli -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim at jimrollenhagen.com Mon Jun 17 17:42:55 2019 From: jim at jimrollenhagen.com (Jim Rollenhagen) Date: Mon, 17 Jun 2019 13:42:55 -0400 Subject: [ironic] To not have meetings? In-Reply-To: References: Message-ID: On Mon, Jun 10, 2019 at 10:02 AM Julia Kreger wrote: > Last week the discussion came up of splitting the ironic meeting to > alternate time zones as we have increasing numbers of contributors in > the Asia/Pacific areas of the world[0]. With that discussion, an > additional interesting question came up posing the question of > shifting to the mailing list instead of our present IRC meeting[1]? > I like it overall, some comments below. > > It is definitely an interesting idea, one that I'm personally keen on > because of time zones and daylight savings time. > > I think before we do this, we should collect thoughts and also try to > determine how we would pull this off so we don't forget the weekly > checkpoint that the meeting serves. I think we need to do something, > so I guess now is a good time to provide input into what everyone > thinks would be best for the project and facilitating the weekly > check-in. > > What I think might work: > > By EOD UTC Monday: > > * Listed primary effort participants will be expected to update the > whiteboard[2] weekly before EOD Monday UTC > Good, but I hope they will just strive to keep it up to date. > * Contributors propose patches to the whiteboard that they believe > would be important for reviewers to examine this coming week. > Ditto! > * PTL or designee sends weekly email to the mailing list to start an > update thread shortly after EOD Monday UTC or early Tuesday UTC. > Awesome, this is useful with or without an IRC meeting. > ** Additional updates, questions, and topical discussion (new > features, RFEs) would ideally be wrapped up by EOD UTC Tuesday. > I'd much prefer these things don't wait for the weekly update. Just send the email! :) > With that, I think we would also need to go ahead and begin having > "office hours" as during the week we generally know some ironic > contributors will be in IRC and able to respond to questions. I think > this would initially consist of our meeting time and perhaps the other > time that seems to be most friendly to the contributors int he > Asia/Pacific area[3]. > ++ > > Thoughts/ideas/suggestions welcome! > > -Julia > > [0]: > http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-03.log.html#t2019-06-03T15:31:33 > [1]: > http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-03.log.html#t2019-06-03T15:43:16 > [2]: https://etherpad.openstack.org/p/IronicWhiteBoard > [3]: https://doodle.com/poll/bv9a4qyqy44wiq92 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergio.traldi at pd.infn.it Mon Jun 17 15:54:33 2019 From: sergio.traldi at pd.infn.it (Sergio Traldi) Date: Mon, 17 Jun 2019 17:54:33 +0200 Subject: ocatvia on rocky openstack environment does not work as expected Message-ID: <691a542e-154d-66fa-06cb-8cb454acb07b@pd.infn.it> Hi, I installed using packstack 2 node 1 controller and 1 compute with rocky release in CentOS 7 OS. I define an external network and configure openvswitch and I can assign floating IP. Everythings works fine. (VMs creation, images upload, security group, key pair, ...) I tested neutron, glance, nova, keystone, ... I followed the documentation to install ad configure octavia, so I create the user, the endpoint, the db, the network  lb-mgmt-net, the subnet, the neutron security group for amphorae and the rules, the image amphorae, I tagged the image,... I set the main values in /etc/octavia/octavia.conf for keystone for the bind, and so on ... and I started the services. I followed these documentations: https://docs.openstack.org/octavia/queens/contributor/guides/dev-quick-start.html http://sudomakeinstall.com/uncategorized/building-octavia-images-with-centos-7-and-haproxy https://blog.zufardhiyaulhaq.com/manual-instalation-octavia-openstack-queens At the end everything seems to work but I have two "problems" 1): I can not create a lb without a project from the openstack client if I do: [root at cld-ctrl-pa-02 ~(keystone_admin)]# openstack loadbalancer create --name lb --vip-subnet-id c5887a52-bddb-4e8b-8023-cd7c341194fa Validation failure: Missing project ID in request where one is required. (HTTP 400) (Request-ID: req-1b9307c5-8aee-472d-ac56-44b6f34b05ce) If I put the project the lb has been created: [root at cld-ctrl-pa-02 ~(keystone_admin)]# openstack loadbalancer create --name lb1 --vip-subnet-id c5887a52-bddb-4e8b-8023-cd7c341194fa --project c86066dd95e345c386ef5e095b83918a +---------------------+--------------------------------------+ | Field               | Value                                | +---------------------+--------------------------------------+ | admin_state_up      | True                                 | | created_at          | 2019-06-11T14:02:10                  | | description         |                                      | | flavor              |                                      | | id                  | f740be24-edf1-459c-ac77-c93917cbca31 | | listeners           |                                      | | name                | lb1                                  | | operating_status    | OFFLINE                              | | pools               |                                      | | project_id          | c86066dd95e345c386ef5e095b83918a     | | provider            | amphora                              | | provisioning_status | PENDING_CREATE                       | | updated_at          | None                                 | | vip_address         | 198.51.100.1                         | | vip_network_id      | e4a02581-1d90-4ea2-9e73-681ff66a4328 | | vip_port_id         | 1666d874-f1b5-437e-a989-ea49f65ba5a3 | | vip_qos_policy_id   | None                                 | | vip_subnet_id       | c5887a52-bddb-4e8b-8023-cd7c341194fa  | +---------------------+--------------------------------------+ But the project has been not a mandatory value to pass to the client. 2) Second problem: all the loadbalancers I created lay on the lb-mgmt-subent and not in the subnet I passed. In the example above. My subnet is like this: [root at cld-ctrl-pa-02 ~(keystone_admin)]# openstack subnet list | grep c5887a52-bddb-4e8b-8023-cd7c341194fa | c5887a52-bddb-4e8b-8023-cd7c341194fa | private_subnet | 8d37ca14-47b8-4ce8-aa27-fb4f8267d9ab | 10.0.0.0/24      | But the VIP created is 198.51.100.1 and each loadbalancer I create have that IP as VIP and I think this is not correct: [root at cld-ctrl-pa-02 ~(keystone_admin)]# openstack loadbalancer list +--------------------------------------+----------+----------------------------------+--------------+---------------------+----------+ | id                                   | name     | project_id                       | vip_address  | provisioning_status | provider | +--------------------------------------+----------+----------------------------------+--------------+---------------------+----------+ | 21d121fd-799f-4244-bf33-622e2fcd0060 | lb-demo  | 78932a05499d4916bfd1895f7017cac1 | 198.51.100.1 | ACTIVE              | amphora  | | 94653251-849e-41c4-9071-f75382a46569 | lb-test  | 78932a05499d4916bfd1895f7017cac1 | 198.51.100.1 | ACTIVE              | amphora  | | a76ccc28-0242-46e4-8ac5-b1410a29cf4a | lb1      | 78932a05499d4916bfd1895f7017cac1 | 198.51.100.1 | ACTIVE              | amphora  | | f740be24-edf1-459c-ac77-c93917cbca31 | lb789    | c86066dd95e345c386ef5e095b83918a | 198.51.100.1 | ACTIVE              | amphora  | | aeb6e964-fc24-4b04-aa94-9b5ccfe4eb09 | lb-test2 | 78932a05499d4916bfd1895f7017cac1 | 198.51.100.1 | ACTIVE              | amphora  | +--------------------------------------+----------+----------------------------------+--------------+---------------------+----------+ I tried using different users and different projects and different networks but the result is the same in all loadbalancer created. In log file I dis not find anything useful for all of the problems. If someone has got some hints it could be great. For the first problem I found this ( http://www.codeha.us/openstack-discuss/msg00906.html ) but nobody answer to this problem. The second may be is correct but I expect different VIP for each loadbalancer crerated and may be in the network I passed not int eh lb management subenet. Thanks in advance Cheers Sergio From johnsomor at gmail.com Mon Jun 17 20:52:58 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Mon, 17 Jun 2019 13:52:58 -0700 Subject: ocatvia on rocky openstack environment does not work as expected In-Reply-To: <691a542e-154d-66fa-06cb-8cb454acb07b@pd.infn.it> References: <691a542e-154d-66fa-06cb-8cb454acb07b@pd.infn.it> Message-ID: Hi Sergio, If you include [] in the subnet of the messages you send to openstack-discuss it will highlight for the project teams. For exampe, the octavia team uses [octavia]. For issue 1: OpenStack in general requires users to be associated with a project. The Octavia API requires this as well. If you can create resources using this account with other services, such as neutron or nova, I'm not 100% what is going on. I would check that you have the appropriate credentials for the openstack client by reading this section of the operations guide: https://docs.openstack.org/operations-guide/ops-lay-of-the-land.html#getting-credentials If the user has the OS_PROJECT_ID in their environment, it would be helpful to us if you re-run the command with --debug and paste us the results at http://paste.openstack.org. Note, that output will contain security related content, so either scrub it or mark the paste private and reply only to me. This might also be an issue with the keystone_authtoken section of the octavia.conf, but it seems unlikely. For issue 2: You are running the "noop drivers" which are used for testing instead of live code. Please check the [controller_worker] section of your octavia.conf and make sure you have enabled the "live" drivers as opposed to the no-op drivers. Our No-Op drivers simulate parts of OpenStack so that we don't have to allocate resources in a live cloud for some of our cases. It's live code testing, without the cloud. See the configuration reference here: https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amphora_driver or see this section of the example configuration file: https://opendev.org/openstack/octavia/src/branch/master/etc/octavia.conf#L232 The amphora_driver, compute_driver, and network_driver sections need to be filled in. As a general reference for the configuration, you can look at the configuration file we use for our gate testing: http://logs.openstack.org/32/665732/1/check/octavia-v2-dsvm-scenario/70a1089/controller/logs/etc/octavia/octavia_conf.txt.gz Also note that the above link will expire in a week or two, but you can view another one by clicking on our test job links in gerrit. Note that not all of those timeout/retry values are appropriate for every deployment or production use, so just use it as a reference to the fields we configure and refer to the configuration reference and sample configuration file for more information. If you need more assistance, you can reply here or the team has a channel on Freenode IRC called #openstack-lbaas Good luck, Michael On Mon, Jun 17, 2019 at 10:49 AM Sergio Traldi wrote: > > Hi, > > I installed using packstack 2 node 1 controller and 1 compute with rocky > release in CentOS 7 OS. I define an external network and configure > openvswitch and I can assign floating IP. Everythings works fine. (VMs > creation, images upload, security group, key pair, ...) I tested > neutron, glance, nova, keystone, ... > > I followed the documentation to install ad configure octavia, so I > create the user, the endpoint, the db, the network lb-mgmt-net, the > subnet, the neutron security group for amphorae and the rules, the image > amphorae, I tagged the image,... > > I set the main values in /etc/octavia/octavia.conf for keystone for the > bind, and so on ... and I started the services. > > I followed these documentations: > > https://docs.openstack.org/octavia/queens/contributor/guides/dev-quick-start.html > > http://sudomakeinstall.com/uncategorized/building-octavia-images-with-centos-7-and-haproxy > > https://blog.zufardhiyaulhaq.com/manual-instalation-octavia-openstack-queens > > At the end everything seems to work but I have two "problems" > > 1): > > I can not create a lb without a project from the openstack client if I do: > > [root at cld-ctrl-pa-02 ~(keystone_admin)]# openstack loadbalancer create > --name lb --vip-subnet-id c5887a52-bddb-4e8b-8023-cd7c341194fa > Validation failure: Missing project ID in request where one is required. > (HTTP 400) (Request-ID: req-1b9307c5-8aee-472d-ac56-44b6f34b05ce) > > > If I put the project the lb has been created: > > [root at cld-ctrl-pa-02 ~(keystone_admin)]# openstack loadbalancer create > --name lb1 --vip-subnet-id c5887a52-bddb-4e8b-8023-cd7c341194fa > --project c86066dd95e345c386ef5e095b83918a > +---------------------+--------------------------------------+ > | Field | Value | > +---------------------+--------------------------------------+ > | admin_state_up | True | > | created_at | 2019-06-11T14:02:10 | > | description | | > | flavor | | > | id | f740be24-edf1-459c-ac77-c93917cbca31 | > | listeners | | > | name | lb1 | > | operating_status | OFFLINE | > | pools | | > | project_id | c86066dd95e345c386ef5e095b83918a | > | provider | amphora | > | provisioning_status | PENDING_CREATE | > | updated_at | None | > | vip_address | 198.51.100.1 | > | vip_network_id | e4a02581-1d90-4ea2-9e73-681ff66a4328 | > | vip_port_id | 1666d874-f1b5-437e-a989-ea49f65ba5a3 | > | vip_qos_policy_id | None | > | vip_subnet_id | c5887a52-bddb-4e8b-8023-cd7c341194fa | > +---------------------+--------------------------------------+ > > > But the project has been not a mandatory value to pass to the client. > > 2) > > Second problem: all the loadbalancers I created lay on the > lb-mgmt-subent and not in the subnet I passed. In the example above. My > subnet is like this: > > [root at cld-ctrl-pa-02 ~(keystone_admin)]# openstack subnet list | grep > c5887a52-bddb-4e8b-8023-cd7c341194fa > | c5887a52-bddb-4e8b-8023-cd7c341194fa | private_subnet | > 8d37ca14-47b8-4ce8-aa27-fb4f8267d9ab | 10.0.0.0/24 | > > But the VIP created is 198.51.100.1 and each loadbalancer I create have > that IP as VIP and I think this is not correct: > > [root at cld-ctrl-pa-02 ~(keystone_admin)]# openstack loadbalancer list > +--------------------------------------+----------+----------------------------------+--------------+---------------------+----------+ > | id | name | > project_id | vip_address | provisioning_status | > provider | > +--------------------------------------+----------+----------------------------------+--------------+---------------------+----------+ > | 21d121fd-799f-4244-bf33-622e2fcd0060 | lb-demo | > 78932a05499d4916bfd1895f7017cac1 | 198.51.100.1 | ACTIVE | > amphora | > | 94653251-849e-41c4-9071-f75382a46569 | lb-test | > 78932a05499d4916bfd1895f7017cac1 | 198.51.100.1 | ACTIVE | > amphora | > | a76ccc28-0242-46e4-8ac5-b1410a29cf4a | lb1 | > 78932a05499d4916bfd1895f7017cac1 | 198.51.100.1 | ACTIVE | > amphora | > | f740be24-edf1-459c-ac77-c93917cbca31 | lb789 | > c86066dd95e345c386ef5e095b83918a | 198.51.100.1 | ACTIVE | > amphora | > | aeb6e964-fc24-4b04-aa94-9b5ccfe4eb09 | lb-test2 | > 78932a05499d4916bfd1895f7017cac1 | 198.51.100.1 | ACTIVE | > amphora | > +--------------------------------------+----------+----------------------------------+--------------+---------------------+----------+ > > > I tried using different users and different projects and different > networks but the result is the same in all loadbalancer created. > > In log file I dis not find anything useful for all of the problems. > > If someone has got some hints it could be great. > > For the first problem I found this ( > http://www.codeha.us/openstack-discuss/msg00906.html ) but nobody answer > to this problem. > > The second may be is correct but I expect different VIP for each > loadbalancer crerated and may be in the network I passed not int eh lb > management subenet. > > Thanks in advance > > Cheers > > Sergio > > From miguel at mlavalle.com Mon Jun 17 21:56:17 2019 From: miguel at mlavalle.com (Miguel Lavalle) Date: Mon, 17 Jun 2019 16:56:17 -0500 Subject: [neutron] Bug deputy report for the week of June 10th Message-ID: Critical: https://bugs.launchpad.net/neutron/+bug/1832307 Functional test neutron.tests.functional.agent.linux.test_ip_lib.IpMonitorTestCase. test_add_remove_ip_address_and_interface is failing. Proposed fix: https://review.opendev.org/#/c/664889/ https://bugs.launchpad.net/neutron/+bug/1832968 neutron tempest test fails 100% due to abandoned ubuntu image release. Fix proposed and merged: https://review.opendev.org/665530 https://bugs.launchpad.net/bgpvpn/+bug/1833051 Unit tests in networking-bgpvpn are broken. Fix proposed: https://review.opendev.org/#/c/665637/ High: https://bugs.launchpad.net/neutron/+bug/1832985 Update of segmentation id for network don't works. Fix proposed: https://review.opendev.org/#/c/665548/, https://review.opendev.org/#/c/665623/ Medium: https://bugs.launchpad.net/neutron/+bug/1832636 Error creating IPv6 subnet on routed network segment https://bugs.launchpad.net/neutron/+bug/1832021 Checksum drop of metadata traffic on isolated networks with DPDK (needs further triaging) https://bugs.launchpad.net/neutron/+bug/1832210 incorrect decode of log prefix under python 3 https://bugs.launchpad.net/neutron/+bug/1832278 Reference policies related to 'update_port' don't work properly with shared network. Proposed fix: https://review.opendev.org/664470 https://bugs.launchpad.net/neutron/+bug/1832745 _update_network_segmentation_id KeyError: 'provider:network_type' https://bugs.launchpad.net/neutron/+bug/1815762 you can end up in a state where qvo* interfaces aren't owned by ovs which results in a dangling connection https://bugs.launchpad.net/neutron/+bug/1832925 Class neutron.common.utils.Timer is not thread safe Low: https://bugs.launchpad.net/neutron/+bug/1832603 BGP dynamic routing config doc should use openstack client in examples. Assigned to tidwellr https://bugs.launchpad.net/neutron/+bug/1832743 delete_dvr_dst_mac_for_arp uses wrong table id. Proposed fix: https://review.opendev.org/#/c/665175/ (approved) https://bugs.launchpad.net/neutron/+bug/1833122 FWaaS admin documentation is outdated. Fix proposed: https://review.opendev.org/#/c/665749/ https://bugs.launchpad.net/neutron/+bug/1833125 Remaining neutron-lbaas relevant code and documentation RFEs: https://bugs.launchpad.net/neutron/+bug/1832526 [RFE] Network segmentation support for edge deployments https://bugs.launchpad.net/neutron/+bug/1832758 [RFE] Allow/deny custom ethertypes in security groups Incomplete: https://bugs.launchpad.net/neutron/+bug/1832769 designate client requires region name to fetch PTR records Invalid / Won't fix: https://bugs.launchpad.net/neutron/+bug/1832574 Able to update the http-method-type for ICMP type healthmonoitor -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bhagyashri.Shewale at nttdata.com Tue Jun 18 06:41:26 2019 From: Bhagyashri.Shewale at nttdata.com (Shewale, Bhagyashri) Date: Tue, 18 Jun 2019 06:41:26 +0000 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: <394fd3b198d72ccc533b504b20cc86d2da1a5210.camel@redhat.com> References: ,<3b7955951183eff75b643b0cfc91c88bbf737e62.camel@redhat.com> , <394fd3b198d72ccc533b504b20cc86d2da1a5210.camel@redhat.com> Message-ID: >> As above, ignore 'cpu_shared_set' but issue a warning. Use the value of >> ‘vcpu_pin_set' to report both VCPU and PCPU inventory. Note that >> ‘vcpu_pin_set' is already used to calculate VCPU inventory. As mentioned in the spec, If operator sets the ``vcpu_pin_set`` in the Stein and upgrade to Train then both VCPU and PCPU inventory should be reported in placement. As on current master (Stein) if operator sets ``vpcu_pin_set=0-3`` on Compute node A and adds that node A into the host aggregate say “agg1” having metadata ``pinned=true``, then it allows to create both pinned and non-pinned instances which is known big issue. 1. Create instance A having flavor extra specs ("aggregate_instance_extra_specs:pinned": "true") then instance A will float on cpus 0-3 2. Create the instance B having flavor extra specs ("aggregate_instance_extra_specs:pinned": "true", "hw:cpu_policy": "dedicated") then instance B will be pinned to one of the cpu say 0. Now, operator will do the upgrade (Stein to Train), nova compute will report both VCPU and PCPU inventory. In this case if cpu_allocation_ratio is 1, then total PCPU available will be 4 (vpcu_pin_set=0-3) and VCPU will also be 4. And this will allow user to create maximum of 4 instances with flavor extra specs ``resources:PCPU=1`` and 4 instances with flavor extra specs ``resources:VCPU=1``. With current master code, it’s possible to create only 4 instances where now, by reporting both VCPU and PCPU, it will allow user to create total of 8 instances which is adding another level of problem along with the existing known issue. Is this acceptable? because this is decorating the problems. If not acceptable, then we can report only PCPU in this case which will solve two problems:- 1. The existing known issue on current master (allowing both pinned and non-pinned instances) on the compute host meant for pinning. 2. Above issue of allowing 8 instances to be created on the host. But there is one problem in taking this decision, if no instances are running on the compute node in case only ``vcpu_pinned_set`` is set, how do you find out this compute node is configured to create pinned or non-pinned instances? If instances are running, based on the Host numa_topology.pinned_cpus, it’s possible to detect that. Regards, Bhagyashri Shewale ________________________________ From: Stephen Finucane Sent: Monday, June 17, 2019 7:10:28 PM To: Shewale, Bhagyashri; openstack-discuss at lists.openstack.org Subject: Re: [nova] Spec: Standardize CPU resource tracking [Cleaning up the 'To' field since Jay isn't working on OpenStack anymore and everyone else is on openstack-discuss already] On Fri, 2019-06-14 at 08:35 +0000, Shewale, Bhagyashri wrote: > > cpu_share_set in stien was used for vm emulator thread and required > > the instnace to be pinned for it to take effect. i.e. the > > hw:emulator_thread_policy extra spcec currently only works if you > > had hw_cpu_policy=dedicated so we should not error if vcpu_pin_set > > and cpu_shared_set are defined, it was valid. what we can do is > > ignore teh cpu_shared_set for schduling and not report 0 VCPUs > > for this host and use vcpu_pinned_set as PCPUs. > Thinking of backward compatibility, I agree both of these > configuration options ``cpu_shared_set``, ``vcpu_pinned_set`` should > be allowed in Train release as well. > > Few possible combinations in train: > A) What if only ``cpu_shared_set`` is set on a new compute node? > Report VCPU inventory. I think this is _very_ unlikely to happen in the real world since the lack of a 'vcpu_pin_set' option means an instances pinned CPUs could co-exist on the same cores as the emulator threats, which defeats the whole point of placing emulator threads on a separate core. That said, it's possible so we do have to deal with it. Ignore 'cpu_shared_set' in this case and issue a warning saying that the user has to configure 'cpu_dedicated_set'. > B) what if ``cpu_shared_set`` and ``cpu_dedicated_set`` are set on > a new compute node? Report VCPU and PCPU inventory. In fact, we > want to support both these options so that instance can request both > VCPU and PCPU at the same time. If flavor requests VCPU or > hw:emulator_thread_policy=share, in both the cases, it will float on > CPUs set in ``cpu_shared_set`` config option. We should report both VCPU and PCPU inventory, yes. However, please don't add the ability to create a single instance with combined VCPU and PCPU inventory. I dropped this from the spec intentionally to make it easier for something (_anything_) to land. We can iterate on this once we have the basics done. > C) What if ``cpu_shared_set`` and ``vcpu_pin_set`` are set on a new > compute node? Ignore cpu_shared_set and report vcpu_pinned_set as > VCPU or PCPU? As above, ignore 'cpu_shared_set' but issue a warning. Use the value of 'vcpu_pin_set' to report both VCPU and PCPU inventory. Note that 'vcpu_pin_set' is already used to calculate VCPU inventory. https://opendev.org/openstack/nova/src/branch/master/nova/virt/libvirt/driver.py#L5808-L5811 > D) What if ``cpu_shared_set`` and ``vcpu_pin_set`` are set on a > upgraded compute node? As you have mentioned, ignore cpu_shared_set > and report vcpu_pinned_set as PCPUs provided ``NumaTopology`` > ,``pinned_cpus`` attribute is not empty otherwise VCPU. Ignore 'cpu_shared_set' but issue a warning. Use the value of 'vcpu_pin_set' to report both VCPU and PCPU inventory. Note that 'vcpu_pin_set' is already used to calculate VCPU inventory. > > we explctly do not want to have the behavior in 3 and 4 specificly > > the logic of checking the instances. > > Here we are checking Host ``NumaTopology`` ,``pinned_cpus`` > attribute and not directly instances ( if that attribute is not > empty that means some instance are running) and this logic will be > needed to address above #D case. You shouldn't need to do this. Rely solely on configuration options to determine inventory, even if it means reporting more inventory than we actually have (reporting of a host core as both units of VCPU and PCPU) and hope that operators have correctly used host aggregrates to isolate NUMA-based instances from non-NUMA-based instances. I realize this is very much in flux but could you please push what you have up for review, marked as WIP or such. Debating this stuff in the code might be easier. Stephen Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From soulxu at gmail.com Tue Jun 18 07:57:19 2019 From: soulxu at gmail.com (Alex Xu) Date: Tue, 18 Jun 2019 15:57:19 +0800 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: <57be9f4f426d3a8ac07c79e3e0d1f4737b09b6af.camel@redhat.com> <4650ab3c8a6e04d4e391325ddb7c2985146ce747.camel@redhat.com> Message-ID: Stephen Finucane 于2019年6月17日周一 下午8:47写道: > On Mon, 2019-06-17 at 17:47 +0800, Alex Xu wrote: > > Sean Mooney 于2019年6月17日周一 下午5:19写道: > > > On Mon, 2019-06-17 at 16:45 +0800, Alex Xu wrote: > > > > I'm thinking we should have recommended upgrade follow. If we > > > > give a lot of flexibility for the operator to have a lot > > > > combination of the value of vcpu_pin_set, dedicated_cpu_set and > > > > shared_cpu_set, then we have trouble in this email and have to do > > > > a lot of checks this email introduced also. > > > > > > we modified the spec intentionally to make upgradeing simple. > > > i don't be believe the concerns raised in the intial 2 emails are > > > valid if we follow what was detailed in the spec. > > > we did take some steps to restrict what values you can set. > > > for example dedicated_cpu_set cannot be set if vcpu pin set is set. > > > technicall i belive we relaxed that to say we would ignore vcpu pin > > > set in that case be original i was pushing for it to be a hard > > > error. > > > > > > > I'm thinking that the pre-request filter (which translates the > > > > cpu_policy=dedicated to PCPU request) should be enabled after all > > > > the node upgrades to the Train release. Before that, all the > > > > cpu_policy=dedicated instance still using the VCPU. > > > > > > it should be enabled after all node are upgraded but not > > > nessisarily before all compute nodes are updated to use > > > dedicated_cpu_set. > > > > If we enable the pre-request filter in the middle of upgrade, there > > will have the problem Bhagyashri said. Reporting PCPU and VCPU > > sametime doesn't resolve the concern from him as my understand. > > > > For example, we have 100 nodes for dedicated host in the cluster. > > > > The operator begins to upgrade the cluster. The controller plane > > upgrade first, and the pre-request filter enabled. > > For rolling upgrade, he begins to upgrade 10 nodes first. Then only > > those 10 nodes report PCPU and VCPU sametime. > > But any new request with dedicated cpu policy begins to request PCPU, > > all of those new instance only can be go to those 10 nodes. Also > > if the existed instances execute the resize and evacuate, and > > shelve/unshelve are going to those 10 nodes also. That is kind of > > nervious on the capacity at that time. > > The exact same issue can happen the other way around. As an operator > slowly starts upgrading, by setting the necessary configuration > options, the compute nodes will reduce the VCPU inventory they report > and start reporting PCPU inventory. Using the above example, if we > upgraded 90 of the 100 compute nodes and didn't enable the prefilter, > we would only be able to schedule to one of the remaining 10 nodes. > This doesn't seem any better. > > At some point we're going to need to make a clean break from pinned > instances consuming VCPU resources to them using PCPU resources. When > that happens is up to us. I figured it was easiest to do this as soon > as the controllers were updated because I had assumed compute nodes > would be updated pretty soon after the controllers and therefore there > would only be a short window where instances would start requesting > PCPU resources but there wouldn't be any available. Maybe that doesn't > make sense though. If not, I guess we need to make this configurable. > > I propose that as soon as compute nodes are upgraded then they will all > start reporting PCPU inventory, as noted in the spec. However, the > prefilter will initially be disabled and we will not reshape existing > inventories. This means pinned instances will continue consuming VCPU > resources as before but that is not an issue since this is the behavior > we currently have. Once the operator is happy that all of the compute > nodes have been upgraded, or at least enough that they care about, we > will then need some way for us to switch on the prefilter and reshape > existing instances. Perhaps this would require manual configuration > changes, validated by an upgrade check, or perhaps we could add a > workaround config option? > > In any case, at some point we need to have a switch from "use VCPUs for > pinned instances" to "use PCPUs for pinned instances". > All agree, we are talking about the same thing. This is the upgrade step I write below. I didn't see the spec describe those steps clearly or I miss something. > Stephen > > > > > Trying to image the upgrade as below: > > > > > > > > 1. Rolling upgrade the compute node. > > > > 2. The upgraded compute node begins to report both VCPU and PCPU, > > > > but reshape for the existed inventories. > > > > The upgraded node is still using the vcpu_pin_set config, or > > > > didn't set the vcpu_pin_config. Both in this two cases are > > > > reporting VCPU and PCPU same time. And the request with > > > > cpu_policy=dedicated still uses the VCPU. > > > > Then it is worked same as Stein release. And existed instance can > > > > be shelved/unshelved, migration and evacuate. > > > > > > +1 > > > > > > > 3. Disable the new request and operation for the instance to the > > > > hosts for dedicated instance. (it is kind of breaking our live- > > > > upgrade? I thought this will be a short interrupt for the control > > > > plane if that is available) > > > > > > im not sure why we need to do this unless you are thinging this > > > will be done by a cli? e.g. like nova-manage. > > > > The inventories of existed instance still consumes VCPU. As we know > > the PCPU and VCPU reporting same time, that is kind of duplicated > > resources. If we begin to consume the PCPU, in the end, it will over > > consume the resource. > > > > yes, the disable request is done by CLI, probably disable the > > service. > > > > > > 4. reshape the inventories for existed instance for all the > > > > hosts. > > > > > > should this not happen when the agent starts up? > > > > > > > 5. Enable the instance's new request and operation, also enable > > > > the pre-request filter. > > > > 6. Operator copies the value of vcpu_pin_set to > > > > dedicated_cpu_set. > > > > > > vcpu_pin_set is not the set of cpu used for pinning. the operators > > > should set dedicated_cpu_set and shared_cpu_set approprealy at this > > > point but in general they proably wont just copy it as host that > > > used vcpu_pin_set but were not used for pinned instances will be > > > copied to shared_cpu_set. > > > > Yes, I should say this upgrade flow is for those dedicated instance > > host. For the host only running floating instance, they doesn't have > > trouble with those problem. > > > > > > For the case of vcpu_pin_set isn't set, the value of > > > > dedicated_cpu_set should be all the cpu ids exclude > > > > shared_cpu_set if set. > > > > > > > > Two rules at here: > > > > 1. The operator doesn't allow to change a different value for > > > > dedicated_cpu_set with vcpu_pin_set when any instance is running > > > > on the host. > > > > 2. The operator doesn't allow to change the value of > > > > dedicated_cpu_set and shared_cpu_set when any instance is running > > > > on the host. > > > > > > neither of these rule can be enforced. one of the requirements that > dan smith had > > > for edge computeing is that we need to supprot upgraes with instance > inplace. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lyarwood at redhat.com Tue Jun 18 08:16:13 2019 From: lyarwood at redhat.com (Lee Yarwood) Date: Tue, 18 Jun 2019 09:16:13 +0100 Subject: [nova][tempest] Should we make image certificate validation a configurable feature? Message-ID: <20190618081613.lek5vwzbln7anfwh@lyarwood.usersys.redhat.com> Hello all, $subject, this came up downstream after we noticed failures due to a race within the ServerShowV263Test test that I've documented in the following upstream bug: tempest.api.compute.servers.test_servers.ServerShowV263Test only passing when image has already been cached https://bugs.launchpad.net/tempest/+bug/1831866 As the test itself is pretty useless without valid certs, signed images etc I'd like to make the overall feature optional within Tempest and wire up some additional configurables for anyone looking to actually test this correctly. I've received some feedback in the review but this has stalled so I wanted to see if I could get things moving again by posting to the ML. Thanks in advance, -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 From lyarwood at redhat.com Tue Jun 18 08:59:11 2019 From: lyarwood at redhat.com (Lee Yarwood) Date: Tue, 18 Jun 2019 09:59:11 +0100 Subject: [nova][tempest] Should we make image certificate validation a configurable feature? In-Reply-To: <20190618081613.lek5vwzbln7anfwh@lyarwood.usersys.redhat.com> References: <20190618081613.lek5vwzbln7anfwh@lyarwood.usersys.redhat.com> Message-ID: <20190618085911.llr56vzxcezqk2za@lyarwood.usersys.redhat.com> On 18-06-19 09:16:13, Lee Yarwood wrote: > Hello all, > > $subject, this came up downstream after we noticed failures due to a > race within the ServerShowV263Test test that I've documented in the > following upstream bug: > > tempest.api.compute.servers.test_servers.ServerShowV263Test only passing > when image has already been cached > https://bugs.launchpad.net/tempest/+bug/1831866 > > As the test itself is pretty useless without valid certs, signed images > etc I'd like to make the overall feature optional within Tempest and > wire up some additional configurables for anyone looking to actually > test this correctly. I've received some feedback in the review but this > has stalled so I wanted to see if I could get things moving again by > posting to the ML. aaaaaaaaaand I didn't include a link to the actual review, doh! compute: Make image certificate validation a configurable feature https://review.opendev.org/#/c/663596/ Thanks again, -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 455 bytes Desc: not available URL: From balazs.gibizer at est.tech Tue Jun 18 09:23:35 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Tue, 18 Jun 2019 09:23:35 +0000 Subject: [nova] [placement] [ops] visualizing placement data as a graph Message-ID: <1560849806.7124.2@smtp.office365.com> Hi, Have you ever wondered how the placement resource provider tree looks like in your deployment or in your functional test? I did, so I put together a tool that can dump these trees as a dot file so it can be visualized as a graph. The osc-placement-tree [1][2] works as an openstack CLI plugin and also gives adapters to dump the trees from nova and placement functional test envs. Cheers, gibi [1] https://pypi.org/project/osc-placement-tree [2] https://github.com/gibizer/osc-placement-tree From sfinucan at redhat.com Tue Jun 18 09:51:15 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Tue, 18 Jun 2019 10:51:15 +0100 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: ,<3b7955951183eff75b643b0cfc91c88bbf737e62.camel@redhat.com> ,<394fd3b198d72ccc533b504b20cc86d2da1a5210.camel@redhat.com> Message-ID: <9e2d014ec8b2c3e749d0362b8d814128ee3321e4.camel@redhat.com> On Tue, 2019-06-18 at 06:41 +0000, Shewale, Bhagyashri wrote: > > As above, ignore 'cpu_shared_set' but issue a warning. Use the value of > > ‘vcpu_pin_set' to report both VCPU and PCPU inventory. Note that > > ‘vcpu_pin_set' is already used to calculate VCPU inventory. > > As mentioned in the spec, If operator sets the ``vcpu_pin_set`` in > the Stein and upgrade to Train then both VCPU and PCPU inventory > should be reported in placement. > > As on current master (Stein) if operator sets ``vpcu_pin_set=0-3`` on > Compute node A and adds that node A into the host aggregate say > “agg1” having metadata ``pinned=true``, then it allows to create > both pinned and non-pinned instances which is known big issue. > Create instance A having flavor extra specs > ("aggregate_instance_extra_specs:pinned": "true") then instance A > will float on cpus 0-3 > Create the instance B having flavor extra specs > ("aggregate_instance_extra_specs:pinned": "true", "hw:cpu_policy": > "dedicated") then instance B will be pinned to one of the cpu say 0. > Now, operator will do the upgrade (Stein to Train), nova compute will > report both VCPU and PCPU inventory. In this case if > cpu_allocation_ratio is 1, then total PCPU available will be 4 > (vpcu_pin_set=0-3) and VCPU will also be 4. And this will allow user > to create maximum of 4 instances with flavor extra specs > ``resources:PCPU=1`` and 4 instances with flavor extra specs > ``resources:VCPU=1``. If the cpu_allocation_ratio is 1.0 then yes, this is correct. However, if it's any greater (and remember, the default is 16.0) then the gap is much smaller, though still broken. > With current master code, it’s possible to create only 4 instances > where now, by reporting both VCPU and PCPU, it will allow user to > create total of 8 instances which is adding another level of problem > along with the existing known issue. Is this acceptable? because > this is decorating the problems. I think is acceptable, yes. As we've said, this is broken behavior and things are just slightly more broken here, though not horribly so. As it stands, if you don't isolate pinned instances from non-pinned instances, you don't get any of the guarantees pinning is supposed to provide. Using the above example, if you booted two pinned and two unpinned instances on the same host, the unpinned instances would float over the pinned instances' cores [*] and impact their performance. If performance is an issue, host aggregrates will have been used. [*] They'll actually float over the entire range of host cores since instnace without a NUMA topology don't respect the 'vcpu_pin_set' value. > If not acceptable, then we can report only PCPU in this case which > will solve two problems:- > The existing known issue on current master (allowing both pinned and > non-pinned instances) on the compute host meant for pinning. > Above issue of allowing 8 instances to be created on the host. > But there is one problem in taking this decision, if no instances are > running on the compute node in case only ``vcpu_pinned_set`` is set, > how do you find out this compute node is configured to create pinned > or non-pinned instances? If instances are running, based on the Host > numa_topology.pinned_cpus, it’s possible to detect that. As noted previously, this is too complex and too error prone. Let's just suffer the potential additional impact on performance for those who haven't correctly configured their deployment, knowing that as soon as they get to U, where we can require the 'cpu_dedicated_set' and 'cpu_shared_set' options if you want to use pinned instances, things will be fixed. Stephen From gmann at ghanshyammann.com Tue Jun 18 10:42:09 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 18 Jun 2019 19:42:09 +0900 Subject: [nova][tempest] Should we make image certificate validation a configurable feature? In-Reply-To: <20190618085911.llr56vzxcezqk2za@lyarwood.usersys.redhat.com> References: <20190618081613.lek5vwzbln7anfwh@lyarwood.usersys.redhat.com> <20190618085911.llr56vzxcezqk2za@lyarwood.usersys.redhat.com> Message-ID: <16b6a2e0661.11aca0bff94381.8415538147020804012@ghanshyammann.com> ---- On Tue, 18 Jun 2019 17:59:11 +0900 Lee Yarwood wrote ---- > On 18-06-19 09:16:13, Lee Yarwood wrote: > > Hello all, > > > > $subject, this came up downstream after we noticed failures due to a > > race within the ServerShowV263Test test that I've documented in the > > following upstream bug: > > > > tempest.api.compute.servers.test_servers.ServerShowV263Test only passing > > when image has already been cached > > https://bugs.launchpad.net/tempest/+bug/1831866 > > > > As the test itself is pretty useless without valid certs, signed images > > etc I'd like to make the overall feature optional within Tempest and > > wire up some additional configurables for anyone looking to actually > > test this correctly. I've received some feedback in the review but this > > has stalled so I wanted to see if I could get things moving again by > > posting to the ML. > > aaaaaaaaaand I didn't include a link to the actual review, doh! > > compute: Make image certificate validation a configurable feature > https://review.opendev.org/#/c/663596/ Thanks for proposing the fix. I left few comments on review otherwise lgtm. > > Thanks again, > > -- > Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 > From a.settle at outlook.com Tue Jun 18 10:45:59 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Tue, 18 Jun 2019 10:45:59 +0000 Subject: [placement] candidate logo In-Reply-To: References: <3857aa6b8dffb621d881c0d60c99091d190e5f1c.camel@redhat.com> Message-ID: For those who have never spent an Australian spring getting chased by the damn things, this is exactly what a Magpie looks like. I give you: https://www.theguardian.com/environment/2017/jun/28/when-magpies-attack-the-swooping-dive-bombing-menace-and-how-to-avoid-them Although I could understand why this may send the wrong message. Any thoughts towards the Cockatoo? https://www.abc.net.au/news/image/6874144-3x2-700x467.jpg On 17/06/2019 13:55, Chris Dent wrote: > On Mon, 17 Jun 2019, Sean Mooney wrote: > >> On Mon, 2019-06-17 at 11:02 +0100, Chris Dent wrote: >>> https://burningchrome.com/magpie2.png >>> >>> Is the latest iteration of the candidate placement logo. >>> Please shout out if it's not okay. It's been through a few >>> iterations of feedback to get it to a form closer to what we >>> discussed a couple of weeks ago. >>> >>> For those not aware: The image is of an Australian Magpie. >> i like the image in general but i think the red and the angle of the >> iris sets the wrong tone. it reads as slightly arragant/sinister > > It's red because it is an australian magpie, which has red eyes. It > is supposed to be somewhat > sinister/snarky/bossing-you-about-where-to-place-your-stuff. > > An earlier version was even more so, and a smile was added to make > it look less so, and that was too much. So this tries to strike a > bit of a balance. > >> its kind of hard to edit a 640*640 png without and get the quality >> you would want >> for a logo but i mocked up a slight change that i think makes it more >> cheerful and >> aprochable. i think the rotation of the iris is the main thing. > > If you view the image in a smaller size (which it will often be) the > red goes more like a glint. > >> the more hoizontal placement of the in the origial makes the image >> look like the >> magpie si look over its sholder at you where as the more vertical >> inclanation >> in my modifed version read less like its stearing at you. > > It is supposed to be looking back at you a bit. Based on > https://en.wikipedia.org/wiki/Australian_magpie#/media/File:Magpie_samcem05.jpg > > >> looking at the original because of its stance and the perception its >> watching you it >> read more like a raven giving me an edger alan po vibe but i think >> the modifed version >> could be a magpie. the cyan/blue tones also read much more like the >> magpies i am used >> to seeing >> https://download.ams.birds.cornell.edu/api/v1/asset/70580781/1800 > > That picture is a European Magpie, completely different bird. An > australian magpie isn't really a magpie. > From soulxu at gmail.com Tue Jun 18 13:33:00 2019 From: soulxu at gmail.com (Alex Xu) Date: Tue, 18 Jun 2019 21:33:00 +0800 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: <9e2d014ec8b2c3e749d0362b8d814128ee3321e4.camel@redhat.com> References: <3b7955951183eff75b643b0cfc91c88bbf737e62.camel@redhat.com> <394fd3b198d72ccc533b504b20cc86d2da1a5210.camel@redhat.com> <9e2d014ec8b2c3e749d0362b8d814128ee3321e4.camel@redhat.com> Message-ID: Stephen Finucane 于2019年6月18日周二 下午5:55写道: > On Tue, 2019-06-18 at 06:41 +0000, Shewale, Bhagyashri wrote: > > > As above, ignore 'cpu_shared_set' but issue a warning. Use the value of > > > ‘vcpu_pin_set' to report both VCPU and PCPU inventory. Note that > > > ‘vcpu_pin_set' is already used to calculate VCPU inventory. > > > > As mentioned in the spec, If operator sets the ``vcpu_pin_set`` in > > the Stein and upgrade to Train then both VCPU and PCPU inventory > > should be reported in placement. > > > > As on current master (Stein) if operator sets ``vpcu_pin_set=0-3`` on > > Compute node A and adds that node A into the host aggregate say > > “agg1” having metadata ``pinned=true``, then it allows to create > > both pinned and non-pinned instances which is known big issue. > > Create instance A having flavor extra specs > > ("aggregate_instance_extra_specs:pinned": "true") then instance A > > will float on cpus 0-3 > > Create the instance B having flavor extra specs > > ("aggregate_instance_extra_specs:pinned": "true", "hw:cpu_policy": > > "dedicated") then instance B will be pinned to one of the cpu say 0. > > Now, operator will do the upgrade (Stein to Train), nova compute will > > report both VCPU and PCPU inventory. In this case if > > cpu_allocation_ratio is 1, then total PCPU available will be 4 > > (vpcu_pin_set=0-3) and VCPU will also be 4. And this will allow user > > to create maximum of 4 instances with flavor extra specs > > ``resources:PCPU=1`` and 4 instances with flavor extra specs > > ``resources:VCPU=1``. > > If the cpu_allocation_ratio is 1.0 then yes, this is correct. However, > if it's any greater (and remember, the default is 16.0) then the gap is > much smaller, though still broken. > > > With current master code, it’s possible to create only 4 instances > > where now, by reporting both VCPU and PCPU, it will allow user to > > create total of 8 instances which is adding another level of problem > > along with the existing known issue. Is this acceptable? because > > this is decorating the problems. > > I think is acceptable, yes. As we've said, this is broken behavior and > things are just slightly more broken here, though not horribly so. As > it stands, if you don't isolate pinned instances from non-pinned > instances, you don't get any of the guarantees pinning is supposed to > provide. Using the above example, if you booted two pinned and two > unpinned instances on the same host, the unpinned instances would float > over the pinned instances' cores [*] and impact their performance. If > performance is an issue, host aggregrates will have been used. > > [*] They'll actually float over the entire range of host cores since > instnace without a NUMA topology don't respect the 'vcpu_pin_set' > value. > Yes, agree with Stephen, we don't suggest the user mix the pin and non-pin instance on the same host with current master. If user want to mix pin and non-pin instance, the user need update his configuration to use dedicated_cpu_set and shared_cpu_set. The vcpu_pin_set reports VCPU and PCPU inventories is the intermediate status. In that intermediate status, the operator still need to separate the pin and non-pin instance into different host. > > > If not acceptable, then we can report only PCPU in this case which > > will solve two problems:- > > The existing known issue on current master (allowing both pinned and > > non-pinned instances) on the compute host meant for pinning. > > Above issue of allowing 8 instances to be created on the host. > > But there is one problem in taking this decision, if no instances are > > running on the compute node in case only ``vcpu_pinned_set`` is set, > > how do you find out this compute node is configured to create pinned > > or non-pinned instances? If instances are running, based on the Host > > numa_topology.pinned_cpus, it’s possible to detect that. > > As noted previously, this is too complex and too error prone. Let's > just suffer the potential additional impact on performance for those > who haven't correctly configured their deployment, knowing that as soon > as they get to U, where we can require the 'cpu_dedicated_set' and > 'cpu_shared_set' options if you want to use pinned instances, things > will be fixed. > > Stephen > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.settle at outlook.com Tue Jun 18 13:44:35 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Tue, 18 Jun 2019 13:44:35 +0000 Subject: [tc] July meeting host Message-ID: Hi team, The next TC meeting is set to be on the 5th of July, 2019. As I have every intention of leaving my laptop in the UK whilst I vacay on the sunny shores of US, I need a volunteer to host the meeting. I am currently going through the process of updating the meeting agenda, so minimum effort required :) just turn up, and ensure your gif game is strong, and run the commands. Any takers? I'll return for the following meeting on the 1st of August. Cheers, Alex From fungi at yuggoth.org Tue Jun 18 13:52:00 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 18 Jun 2019 13:52:00 +0000 Subject: [tc] July meeting host In-Reply-To: References: Message-ID: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> On 2019-06-18 13:44:35 +0000 (+0000), Alexandra Settle wrote: > The next TC meeting is set to be on the 5th of July, 2019. July 5 is a Friday and we usually meet on Thursdays at 15:00 UTC. Is the change of weekday intentional? Are we keeping the same time but on Friday instead? > As I have every intention of leaving my laptop in the UK whilst I vacay > on the sunny shores of US, I need a volunteer to host the meeting. > > I am currently going through the process of updating the meeting agenda, > so minimum effort required :) just turn up, and ensure your gif game is > strong, and run the commands. > > Any takers? I'll return for the following meeting on the 1st of August. Expect it to be a lightly-attended meeting as many USA-based TC members may also be travelling for the July 4th holiday there. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Tue Jun 18 13:53:53 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 18 Jun 2019 13:53:53 +0000 Subject: [tc] July meeting host In-Reply-To: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> Message-ID: <20190618135352.kkwyqm6whpogutp4@yuggoth.org> On 2019-06-18 13:52:00 +0000 (+0000), Jeremy Stanley wrote: [...] > we usually meet on Thursdays at 15:00 UTC [...] Sorry, I meant 14:00 UTC of course. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From MM9745 at att.com Tue Jun 18 14:08:00 2019 From: MM9745 at att.com (MCEUEN, MATT) Date: Tue, 18 Jun 2019 14:08:00 +0000 Subject: [openstack-helm] Voting for OpenStack-Helm IRC Meeting Times Message-ID: <7C64A75C21BB8D43BD75BB18635E4D89709D1E91@MOSTLS1MSGUSRFF.ITServices.sbc.com> Hi team, As discussed in the Denver PTG, OpenStack-Helm has a number of contributors across Europe, Asia, and the US, and there's no single weekly meeting time that works well for everyone. There are also different approaches that could be used to better accommodate team members in different time zones. Let's vote as a team to get the Times/Approaches that work for the majority of contributors Tuesday meeting times: - Time A: [1pm UTC] https://everytimezone.com/s/d147ddd2 - Time B: [3pm UTC] https://everytimezone.com/s/d5289c0a - Time C: [9pm UTC] https://everytimezone.com/s/c5ee2f54 - Time D: [10pm UTC] https://everytimezone.com/s/c428653b Time B is our current meeting time. We also have three different ways we can approach those meeting times: - Uniform: same meeting time every week - Alternating: switch between two different times, every other week - Doubled: have two meetings every Tuesday on opposite ends of the day If we did "Doubled", portdirect would generally facilitate both meetings, along with anyone who wanted to attend both; in general team members wouldn't be expected to be at both. INSTRUCTIONS: Please use the following etherpad: https://etherpad.openstack.org/p/airship-meeting-vote-2019 Please vote for your top five preferred meeting schedules, one per line, including a Time and an Approach. Please include your name/nick. Example: example_person: 1. Time A Uniform 2. Time D Uniform 3. Time A and D Alternating 4. Time A and C Doubled 5. Time C Uniform Portdirect will abstain from voting, do a by-hand Condorcet tally, and share the results. I'm helping him facilitate. Voting is open to all OpenStack-Helm Contributors, will conclude EOD Tuesday the 25th, so we get two shots at reminding folks in the team meetings. Thanks! Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From gr at ham.ie Tue Jun 18 15:13:39 2019 From: gr at ham.ie (Graham Hayes) Date: Tue, 18 Jun 2019 16:13:39 +0100 Subject: [tc] July meeting host In-Reply-To: <20190618135352.kkwyqm6whpogutp4@yuggoth.org> References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> Message-ID: On 18/06/2019 14:53, Jeremy Stanley wrote: > On 2019-06-18 13:52:00 +0000 (+0000), Jeremy Stanley wrote: > [...] >> we usually meet on Thursdays at 15:00 UTC > [...] > > Sorry, I meant 14:00 UTC of course. > I would suggest we move it ot the following week, unless a lot of the US based folks think they can make it? - Graham -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From a.settle at outlook.com Tue Jun 18 15:40:17 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Tue, 18 Jun 2019 15:40:17 +0000 Subject: [tc] July meeting host In-Reply-To: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> Message-ID: On 18/06/2019 14:52, Jeremy Stanley wrote: > On 2019-06-18 13:44:35 +0000 (+0000), Alexandra Settle wrote: >> The next TC meeting is set to be on the 5th of July, 2019. > July 5 is a Friday and we usually meet on Thursdays at 15:00 UTC. Is > the change of weekday intentional? Are we keeping the same time but > on Friday instead? I don't know what it is, but writing emails to this ML I *always* make a mistake. Sigh. Yes, I meant the 4th. > >> As I have every intention of leaving my laptop in the UK whilst I vacay >> on the sunny shores of US, I need a volunteer to host the meeting. >> >> I am currently going through the process of updating the meeting agenda, >> so minimum effort required :) just turn up, and ensure your gif game is >> strong, and run the commands. >> >> Any takers? I'll return for the following meeting on the 1st of August. > Expect it to be a lightly-attended meeting as many USA-based TC > members may also be travelling for the July 4th holiday there. Of course. So perhaps we change to the following week. From a.settle at outlook.com Tue Jun 18 15:40:50 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Tue, 18 Jun 2019 15:40:50 +0000 Subject: [tc] July meeting host In-Reply-To: References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> Message-ID: On 18/06/2019 16:13, Graham Hayes wrote: > On 18/06/2019 14:53, Jeremy Stanley wrote: >> On 2019-06-18 13:52:00 +0000 (+0000), Jeremy Stanley wrote: >> [...] >>> we usually meet on Thursdays at 15:00 UTC >> [...] >> >> Sorry, I meant 14:00 UTC of course. >> > I would suggest we move it ot the following week, unless a lot of > the US based folks think they can make it? Does anyone have an issue with Thursday the 11th? > > - Graham > From fungi at yuggoth.org Tue Jun 18 15:44:29 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 18 Jun 2019 15:44:29 +0000 Subject: [tc] July meeting host In-Reply-To: References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> Message-ID: <20190618154429.qg3v47bdn75ealh2@yuggoth.org> On 2019-06-18 15:40:50 +0000 (+0000), Alexandra Settle wrote: [...] > Does anyone have an issue with Thursday the 11th? That works fine for me. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From mnaser at vexxhost.com Tue Jun 18 15:46:21 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 18 Jun 2019 11:46:21 -0400 Subject: [tc] July meeting host In-Reply-To: References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> Message-ID: On Tue, Jun 18, 2019 at 11:21 AM Graham Hayes wrote: > I would suggest we move it ot the following week, unless a lot of > the US based folks think they can make it? While I think that we shouldn't necessarily just build our community to accommodate a specific region, I think in this case, the majority of the members involved in this meeting are in that region. I guess it would be nice if TC members can chime in and mention if it is easier to meet the week after which would be on the 11th of July. -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From jim at jimrollenhagen.com Tue Jun 18 15:52:51 2019 From: jim at jimrollenhagen.com (Jim Rollenhagen) Date: Tue, 18 Jun 2019 11:52:51 -0400 Subject: [tc] July meeting host In-Reply-To: References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> Message-ID: On Tue, Jun 18, 2019 at 11:47 AM Mohammed Naser wrote: > On Tue, Jun 18, 2019 at 11:21 AM Graham Hayes wrote: > > I would suggest we move it ot the following week, unless a lot of > > the US based folks think they can make it? > > While I think that we shouldn't necessarily just build our community > to accommodate a specific region, I think in this case, the majority > of the members involved in this meeting are in that region. > > I guess it would be nice if TC members can chime in and mention if it > is easier to meet the week after which would be on the 11th of July. > I won't be working on the 4th, so yes for me. // jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Tue Jun 18 16:53:35 2019 From: openstack at nemebean.com (Ben Nemec) Date: Tue, 18 Jun 2019 11:53:35 -0500 Subject: [oslo] Courtesy ping changes In-Reply-To: <4d50450b-61fe-ac98-5144-d360de91020c@nemebean.com> References: <4d50450b-61fe-ac98-5144-d360de91020c@nemebean.com> Message-ID: <6a269ff2-2827-541e-639b-5c80273fd2cd@nemebean.com> Friendly reminder that this week we moved to the new agenda-based ping list. If you want to continue receiving pings and haven't already added yourself to the new list you should do so ASAP. Thanks. On 5/20/19 11:48 AM, Ben Nemec wrote: > Important: Action is required if you want to continue receiving courtesy > pings. Read on for details. > > This is an oslo-specific followup to [0]. There's a lot of good > discussion there if you're interested in the background for this email. > > The TLDR version is that we're going to keep the Oslo courtesy ping list > because a number of Oslo contributors have expressed their preference > for it. However, we are making some changes. > > First, the ping list will be cleared at the start of each cycle. This > should prevent us from pinging people who no longer work on Oslo (which > is almost certainly happening right now). Everyone who wants a courtesy > ping will need to re-opt-in each cycle. We'll work out a transition > process so people don't just stop receiving pings. > > Second, the ping list is going to move from the script in oslo.tools to > the meeting agenda[1] on the wiki. There's no need for Oslo core signoff > on ping list changes and that makes it a waste of time for both the > cores and the people looking to make changes to the list. This does mean > we'll lose the automatic wrapping of the list, but once we clean up the > stale entries I suspect we won't need to wrap it as much anyway. > > I will continue to use the existing ping list for the next two weeks to > give everyone a chance to add their name to the new list. I've seeded > the new list with a couple of people who had expressed interest in > continuing to receive pings, but if anyone else wants to continue > getting them please add yourself to the list in [1] (see the Agenda > Template section). > > I'm intentionally _not_ adding all of the active Oslo cores on the > assumption that you may prefer to set up your own notification method. I > might automatically carry over cores from cycle to cycle since they are > presumably still interested in Oslo and chose that notification method, > but we'll worry about that at the start of next cycle. > > I think that covers the current plan for courtesy pings in Oslo. If you > have any comments or concerns please let me know. Otherwise expect this > new system to take effect in 3 weeks. > > Thanks. > > -Ben > > 0: > http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006235.html > 1: https://wiki.openstack.org/wiki/Meetings/Oslo > From zbitter at redhat.com Tue Jun 18 19:51:31 2019 From: zbitter at redhat.com (Zane Bitter) Date: Tue, 18 Jun 2019 15:51:31 -0400 Subject: [tc] July meeting host In-Reply-To: References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> Message-ID: <0aa1b541-a6ff-bb03-20db-d55ab72274a3@redhat.com> On 18/06/19 11:52 AM, Jim Rollenhagen wrote: > On Tue, Jun 18, 2019 at 11:47 AM Mohammed Naser > wrote: > > On Tue, Jun 18, 2019 at 11:21 AM Graham Hayes > wrote: > > I would suggest we move it ot the following week, unless a lot of > > the US based folks think they can make it? > > While I think that we shouldn't necessarily just build our community > to accommodate a specific region, I think in this case, the majority > of the members involved in this meeting are in that region. > > I guess it would be nice if TC members can chime in and mention if it > is easier to meet the week after which would be on the 11th of July. > > > I won't be working on the 4th, so yes for me. +1 From doug at doughellmann.com Tue Jun 18 21:49:59 2019 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 18 Jun 2019 17:49:59 -0400 Subject: [tc] July meeting host In-Reply-To: <0aa1b541-a6ff-bb03-20db-d55ab72274a3@redhat.com> References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> <0aa1b541-a6ff-bb03-20db-d55ab72274a3@redhat.com> Message-ID: Zane Bitter writes: > On 18/06/19 11:52 AM, Jim Rollenhagen wrote: >> On Tue, Jun 18, 2019 at 11:47 AM Mohammed Naser > > wrote: >> >> On Tue, Jun 18, 2019 at 11:21 AM Graham Hayes > > wrote: >> > I would suggest we move it ot the following week, unless a lot of >> > the US based folks think they can make it? >> >> While I think that we shouldn't necessarily just build our community >> to accommodate a specific region, I think in this case, the majority >> of the members involved in this meeting are in that region. >> >> I guess it would be nice if TC members can chime in and mention if it >> is easier to meet the week after which would be on the 11th of July. >> >> >> I won't be working on the 4th, so yes for me. > > +1 > I will also be offline on the 4th. -- Doug From ed at leafe.com Tue Jun 18 22:03:38 2019 From: ed at leafe.com (Ed Leafe) Date: Tue, 18 Jun 2019 17:03:38 -0500 Subject: [nova] [placement] [ops] visualizing placement data as a graph In-Reply-To: <1560849806.7124.2@smtp.office365.com> References: <1560849806.7124.2@smtp.office365.com> Message-ID: <2D03106F-FBBC-4348-A9BE-1C7E465EB220@leafe.com> On Jun 18, 2019, at 4:23 AM, Balázs Gibizer wrote: > > Have you ever wondered how the placement resource provider tree looks > like in your deployment or in your functional test? I did, so I put > together a tool that can dump these trees as a dot file so it can be > visualized as a graph. > > The osc-placement-tree [1][2] works as an openstack CLI plugin and also > gives adapters to dump the trees from nova and placement functional > test envs. I looked around at that, and it’s pretty impressive! Now imagine if the data actually *were* a graph… :) -- Ed Leafe From melwittt at gmail.com Tue Jun 18 22:40:13 2019 From: melwittt at gmail.com (melanie witt) Date: Tue, 18 Jun 2019 15:40:13 -0700 Subject: [nova][dev][ops] server status when compute host is down In-Reply-To: <3c74024f-aa0d-12a6-b5bb-54ceebc07c64@gmail.com> References: <065da98d-300d-00ca-83ee-f6e9dc458277@gmail.com> <3c74024f-aa0d-12a6-b5bb-54ceebc07c64@gmail.com> Message-ID: On 5/23/19 1:08 PM, melanie witt wrote: > On Thu, 23 May 2019 11:56:34 -0700, Iain Macdonnell > wrote: >> >> >> On 5/23/19 11:32 AM, Matt Riedemann wrote: >>> As I said elsewhere in this thread, if you're proposing to add a new >>> policy rule to change the 'status' field based on host_status, why not >>> just tell people to open up the policy rule we already have for the >>> host_status field so non-admins can see it in their server details? This >>> sounds like an education problem more than a technical problem to me. >> >> Because *that* implies revealing infrastructure status details to >> end-users, which is probably not desirable in a lot of cases. > > This is a good point. If an operator were to enable 'host_status' via > policy, end users would also get to see host_status UP and DOWN, which > is typically not desired by cloud admins. There's currently no option > for exposing only UNKNOWN, as a small but helpful bit of info for end > users. > >> Isn't this as simple as not lying to the user about the *server* status >> when it cannot be ascertained for any reason? In that case, the user >> should be given (only) that information, but not any "dirty laundry" >> about what caused it.... >> >> Even if the admin doesn't care about revealing infrastructure status, >> the end-user shouldn't have to know that server_status can't be trusted, >> and that they have to check other fields to figure out if it's reliable >> or not at any given time. > > And yes, I was thinking about it more simply, and the replies on this > thread have led me to think that if we could show the cosmetic-only > status of UNKNOWN for nova-compute communication interruptions, similar > to what we do for down cells, we would not put a policy control on it > (since UNKNOWN is not leaking infra details). And not make any changes > to notifications etc, just a cosmetic-only UNKNOWN status implemented at > the REST API layer if host_status is UNKNOWN. I was thinking maybe we'd > leave server status alone if host_status is UP or DOWN since its status > should be reflected in those cases as-is. > > Assuming we could move forward without a policy control on it, I think > the only remaining concern would be the collision of UNKNOWN status with > down cells where for down cells, some server attributes are not > available. Personally, this doesn't seem like a major problem to me > since UNKNOWN implies an uncertain state, in general. But maybe I'm > wrong. How important is the difference? > > Finally, it sounds like the consensus is that if we do decide to make > this change, we would need a new microversion to account for server > status being able to be UNKNOWN if host_status is UNKNOWN. FYI, I've proposed a spec here: https://review.opendev.org/666181 -melanie From gmann at ghanshyammann.com Wed Jun 19 00:59:52 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 19 Jun 2019 09:59:52 +0900 Subject: [tc] July meeting host In-Reply-To: References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> <0aa1b541-a6ff-bb03-20db-d55ab72274a3@redhat.com> Message-ID: <16b6d3f4a41.12164a6a0120328.8841081913769246802@ghanshyammann.com> ---- On Wed, 19 Jun 2019 06:49:59 +0900 Doug Hellmann wrote ---- > Zane Bitter writes: > > > On 18/06/19 11:52 AM, Jim Rollenhagen wrote: > >> On Tue, Jun 18, 2019 at 11:47 AM Mohammed Naser >> > wrote: > >> > >> On Tue, Jun 18, 2019 at 11:21 AM Graham Hayes >> > wrote: > >> > I would suggest we move it ot the following week, unless a lot of > >> > the US based folks think they can make it? > >> > >> While I think that we shouldn't necessarily just build our community > >> to accommodate a specific region, I think in this case, the majority > >> of the members involved in this meeting are in that region. > >> > >> I guess it would be nice if TC members can chime in and mention if it > >> is easier to meet the week after which would be on the 11th of July. > >> > >> > >> I won't be working on the 4th, so yes for me. > > > > +1 > > > > I will also be offline on the 4th. Both dates are ok for me. -gmann > > -- > Doug > > From Bhagyashri.Shewale at nttdata.com Wed Jun 19 02:56:44 2019 From: Bhagyashri.Shewale at nttdata.com (Shewale, Bhagyashri) Date: Wed, 19 Jun 2019 02:56:44 +0000 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: <9e2d014ec8b2c3e749d0362b8d814128ee3321e4.camel@redhat.com> References: ,<3b7955951183eff75b643b0cfc91c88bbf737e62.camel@redhat.com> ,<394fd3b198d72ccc533b504b20cc86d2da1a5210.camel@redhat.com> , <9e2d014ec8b2c3e749d0362b8d814128ee3321e4.camel@redhat.com> Message-ID: Hi All, After all discussions on mailing thread, I would like to summarize concluded points as under:- 1. If operator sets ``vcpu_pin_set`` in Stein and upgrade it to Train or ``vcpu_pin_set`` is set on a new compute node, then both VCPU and PCPU inventory should be reported to placement. 2. User can’t request both ``resources:PCPU`` and ``resources:VCPU`` in a single request for Train release. And in future ‘U’ release, user can request both ``resources:PCPU`` and ``resources:VCPU`` in a single request. 3. In “U” release, “vcpu_pin_set” config option will be removed. In this case, operator will either need to set “cpu_shared_set” or “cpu_dedicated_set” accordingly on old compute nodes and on new compute nodes, operator can set both the config option “cpu_shared_set” and “cpu_dedicated_set” if required. 4. In Train release, operator will also need to continue retaining the same behavior of host aggregates as that of Stein to differentiate between Numa-awared compute host: * Hosts meant for pinned instances should be part of the aggregate with metadata “pinned=True” * Hosts meant for non-pinned instances should be part of the aggregate with metadata “pinned=False” 5. In Train release, old flavor can be used as is in which case scheduler pre-filter will map it to the next syntax “resources:PCPU” in case cpu_policy=dedicated. 6. In Train release, new flavor syntax “resources:PCPU=1 will be accepted in flavor extra specs but in this case we expect operator will set “aggregate_instance_extra_specs:pinned=True” in flavor extra specs and the hosts are part of the aggregate which has metadata “pinned=True”. Regards, Bhagyashri Shewale ________________________________ From: Stephen Finucane Sent: Tuesday, June 18, 2019 6:51:15 PM To: Shewale, Bhagyashri; openstack-discuss at lists.openstack.org Subject: Re: [nova] Spec: Standardize CPU resource tracking On Tue, 2019-06-18 at 06:41 +0000, Shewale, Bhagyashri wrote: > > As above, ignore 'cpu_shared_set' but issue a warning. Use the value of > > ‘vcpu_pin_set' to report both VCPU and PCPU inventory. Note that > > ‘vcpu_pin_set' is already used to calculate VCPU inventory. > > As mentioned in the spec, If operator sets the ``vcpu_pin_set`` in > the Stein and upgrade to Train then both VCPU and PCPU inventory > should be reported in placement. > > As on current master (Stein) if operator sets ``vpcu_pin_set=0-3`` on > Compute node A and adds that node A into the host aggregate say > “agg1” having metadata ``pinned=true``, then it allows to create > both pinned and non-pinned instances which is known big issue. > Create instance A having flavor extra specs > ("aggregate_instance_extra_specs:pinned": "true") then instance A > will float on cpus 0-3 > Create the instance B having flavor extra specs > ("aggregate_instance_extra_specs:pinned": "true", "hw:cpu_policy": > "dedicated") then instance B will be pinned to one of the cpu say 0. > Now, operator will do the upgrade (Stein to Train), nova compute will > report both VCPU and PCPU inventory. In this case if > cpu_allocation_ratio is 1, then total PCPU available will be 4 > (vpcu_pin_set=0-3) and VCPU will also be 4. And this will allow user > to create maximum of 4 instances with flavor extra specs > ``resources:PCPU=1`` and 4 instances with flavor extra specs > ``resources:VCPU=1``. If the cpu_allocation_ratio is 1.0 then yes, this is correct. However, if it's any greater (and remember, the default is 16.0) then the gap is much smaller, though still broken. > With current master code, it’s possible to create only 4 instances > where now, by reporting both VCPU and PCPU, it will allow user to > create total of 8 instances which is adding another level of problem > along with the existing known issue. Is this acceptable? because > this is decorating the problems. I think is acceptable, yes. As we've said, this is broken behavior and things are just slightly more broken here, though not horribly so. As it stands, if you don't isolate pinned instances from non-pinned instances, you don't get any of the guarantees pinning is supposed to provide. Using the above example, if you booted two pinned and two unpinned instances on the same host, the unpinned instances would float over the pinned instances' cores [*] and impact their performance. If performance is an issue, host aggregrates will have been used. [*] They'll actually float over the entire range of host cores since instnace without a NUMA topology don't respect the 'vcpu_pin_set' value. > If not acceptable, then we can report only PCPU in this case which > will solve two problems:- > The existing known issue on current master (allowing both pinned and > non-pinned instances) on the compute host meant for pinning. > Above issue of allowing 8 instances to be created on the host. > But there is one problem in taking this decision, if no instances are > running on the compute node in case only ``vcpu_pinned_set`` is set, > how do you find out this compute node is configured to create pinned > or non-pinned instances? If instances are running, based on the Host > numa_topology.pinned_cpus, it’s possible to detect that. As noted previously, this is too complex and too error prone. Let's just suffer the potential additional impact on performance for those who haven't correctly configured their deployment, knowing that as soon as they get to U, where we can require the 'cpu_dedicated_set' and 'cpu_shared_set' options if you want to use pinned instances, things will be fixed. Stephen Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From li.canwei2 at zte.com.cn Wed Jun 19 03:56:02 2019 From: li.canwei2 at zte.com.cn (li.canwei2 at zte.com.cn) Date: Wed, 19 Jun 2019 11:56:02 +0800 (CST) Subject: =?UTF-8?B?W1dhdGNoZXJdIHRlYW0gbWVldGluZyBhbmQgYWdlbmRh?= Message-ID: <201906191156028743737@zte.com.cn> Hi, Watcher will have a meeting at 08:00 UTC today in the #openstack-meeting-alt channel. The agenda is available on https://wiki.openstack.org/wiki/Watcher_Meeting_Agenda feel free to add any additional items. Thanks! Canwei Li -------------- next part -------------- An HTML attachment was scrubbed... URL: From soulxu at gmail.com Wed Jun 19 03:59:44 2019 From: soulxu at gmail.com (Alex Xu) Date: Wed, 19 Jun 2019 11:59:44 +0800 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: <3b7955951183eff75b643b0cfc91c88bbf737e62.camel@redhat.com> <394fd3b198d72ccc533b504b20cc86d2da1a5210.camel@redhat.com> <9e2d014ec8b2c3e749d0362b8d814128ee3321e4.camel@redhat.com> Message-ID: Shewale, Bhagyashri 于2019年6月19日周三 上午11:01写道: > Hi All, > > > After all discussions on mailing thread, I would like to summarize > concluded points as under:- > > 1. If operator sets ``vcpu_pin_set`` in Stein and upgrade it to Train > or ``vcpu_pin_set`` is set on a new compute node, then both VCPU and PCPU > inventory should be reported to placement. > 2. User can’t request both ``resources:PCPU`` and ``resources:VCPU`` > in a single request for Train release. And in future ‘U’ release, > user can request both ``resources:PCPU`` and ``resources:VCPU`` in a > single request. > 3. In “U” release, “vcpu_pin_set” config option will be removed. In > this case, operator will either need to set “cpu_shared_set” or > “cpu_dedicated_set” accordingly on old compute nodes and on new compute > nodes, operator can set both the config option “cpu_shared_set” and “cpu_dedicated_set” > if required. > 4. In Train release, operator will also need to continue retaining the > same behavior of host aggregates as that of Stein to differentiate between > Numa-awared compute host: > - Hosts meant for pinned instances should be part of the aggregate > with metadata “pinned=True” > - Hosts meant for non-pinned instances should be part of the > aggregate with metadata “pinned=False” > 5. In Train release, old flavor can be used as is in which case > scheduler pre-filter will map it to the next syntax “resources:PCPU” in > case cpu_policy=dedicated. > > +1 all above > > 1. In Train release, new flavor syntax “resources:PCPU=1 will be > accepted in flavor extra specs but in this case we expect operator > will set “aggregate_instance_extra_specs:pinned=True” in flavor extra specs > and the hosts are part of the aggregate which has metadata “pinned=True”. > > If the user finished the upgrade, and switch to dedicated_cpu_set and shared_cpu_set, then he needn't aggregate anymore. For using resources:PCPU directly, I'm not sure. I have talk with Sean few days ago, we both think that we shouldn't allow the user using "resources" extra spec directly. Thinking about the future, we have numa on placement, the resources and traits extra spec can't express all the guest numa info. Like which numa node is the first one. And it is hard to parse the guest numa topo from those extra spec, and it isn't human readable. Also "hw:" provides some abstraction than resources/traits extra spec, allow us to do some upgrade without asking user update their flavor. But this isn't critical problem for now, until we have numa on placement. > Regards, > > Bhagyashri Shewale > > ------------------------------ > *From:* Stephen Finucane > *Sent:* Tuesday, June 18, 2019 6:51:15 PM > *To:* Shewale, Bhagyashri; openstack-discuss at lists.openstack.org > *Subject:* Re: [nova] Spec: Standardize CPU resource tracking > > On Tue, 2019-06-18 at 06:41 +0000, Shewale, Bhagyashri wrote: > > > As above, ignore 'cpu_shared_set' but issue a warning. Use the value of > > > ‘vcpu_pin_set' to report both VCPU and PCPU inventory. Note that > > > ‘vcpu_pin_set' is already used to calculate VCPU inventory. > > > > As mentioned in the spec, If operator sets the ``vcpu_pin_set`` in > > the Stein and upgrade to Train then both VCPU and PCPU inventory > > should be reported in placement. > > > > As on current master (Stein) if operator sets ``vpcu_pin_set=0-3`` on > > Compute node A and adds that node A into the host aggregate say > > “agg1” having metadata ``pinned=true``, then it allows to create > > both pinned and non-pinned instances which is known big issue. > > Create instance A having flavor extra specs > > ("aggregate_instance_extra_specs:pinned": "true") then instance A > > will float on cpus 0-3 > > Create the instance B having flavor extra specs > > ("aggregate_instance_extra_specs:pinned": "true", "hw:cpu_policy": > > "dedicated") then instance B will be pinned to one of the cpu say 0. > > Now, operator will do the upgrade (Stein to Train), nova compute will > > report both VCPU and PCPU inventory. In this case if > > cpu_allocation_ratio is 1, then total PCPU available will be 4 > > (vpcu_pin_set=0-3) and VCPU will also be 4. And this will allow user > > to create maximum of 4 instances with flavor extra specs > > ``resources:PCPU=1`` and 4 instances with flavor extra specs > > ``resources:VCPU=1``. > > If the cpu_allocation_ratio is 1.0 then yes, this is correct. However, > if it's any greater (and remember, the default is 16.0) then the gap is > much smaller, though still broken. > > > With current master code, it’s possible to create only 4 instances > > where now, by reporting both VCPU and PCPU, it will allow user to > > create total of 8 instances which is adding another level of problem > > along with the existing known issue. Is this acceptable? because > > this is decorating the problems. > > I think is acceptable, yes. As we've said, this is broken behavior and > things are just slightly more broken here, though not horribly so. As > it stands, if you don't isolate pinned instances from non-pinned > instances, you don't get any of the guarantees pinning is supposed to > provide. Using the above example, if you booted two pinned and two > unpinned instances on the same host, the unpinned instances would float > over the pinned instances' cores [*] and impact their performance. If > performance is an issue, host aggregrates will have been used. > > [*] They'll actually float over the entire range of host cores since > instnace without a NUMA topology don't respect the 'vcpu_pin_set' > value. > > > If not acceptable, then we can report only PCPU in this case which > > will solve two problems:- > > The existing known issue on current master (allowing both pinned and > > non-pinned instances) on the compute host meant for pinning. > > Above issue of allowing 8 instances to be created on the host. > > But there is one problem in taking this decision, if no instances are > > running on the compute node in case only ``vcpu_pinned_set`` is set, > > how do you find out this compute node is configured to create pinned > > or non-pinned instances? If instances are running, based on the Host > > numa_topology.pinned_cpus, it’s possible to detect that. > > As noted previously, this is too complex and too error prone. Let's > just suffer the potential additional impact on performance for those > who haven't correctly configured their deployment, knowing that as soon > as they get to U, where we can require the 'cpu_dedicated_set' and > 'cpu_shared_set' options if you want to use pinned instances, things > will be fixed. > > Stephen > > Disclaimer: This email and any attachments are sent in strictest > confidence for the sole use of the addressee and may contain legally > privileged, confidential, and proprietary data. If you are not the intended > recipient, please advise the sender by replying promptly to this email and > then delete and destroy this email and any attachments without any further > use, copying or forwarding. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rico.lin.guanyu at gmail.com Wed Jun 19 04:25:01 2019 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Wed, 19 Jun 2019 12:25:01 +0800 Subject: [tc] July meeting host In-Reply-To: References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> Message-ID: On Tue, Jun 18, 2019 at 11:52 PM Mohammed Naser wrote: > > On Tue, Jun 18, 2019 at 11:21 AM Graham Hayes wrote: > > I would suggest we move it ot the following week, unless a lot of > > the US based folks think they can make it? > > While I think that we shouldn't necessarily just build our community > to accommodate a specific region, I think in this case, the majority > of the members involved in this meeting are in that region. > > I guess it would be nice if TC members can chime in and mention if it > is easier to meet the week after which would be on the 11th of July. > Both dates work for me -- May The Force of OpenStack Be With You, Rico Lin irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From dangtrinhnt at gmail.com Wed Jun 19 05:20:41 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Wed, 19 Jun 2019 14:20:41 +0900 Subject: [telemetry] Cancel team meeting tomorrow Message-ID: Hi team, Unfortunately, I am on a business trip tomorrow so we cannot hold the team meeting. Please continue any conversation on the IRC channel if you want. For anything else, please let me know. Thank you, -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Wed Jun 19 08:07:12 2019 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 19 Jun 2019 10:07:12 +0200 Subject: [tc] July meeting host In-Reply-To: <16b6d3f4a41.12164a6a0120328.8841081913769246802@ghanshyammann.com> References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> <0aa1b541-a6ff-bb03-20db-d55ab72274a3@redhat.com> <16b6d3f4a41.12164a6a0120328.8841081913769246802@ghanshyammann.com> Message-ID: <2edafd0b-f3e3-53f6-31fb-74568ec1409b@openstack.org> Ghanshyam Mann wrote: > ---- On Wed, 19 Jun 2019 06:49:59 +0900 Doug Hellmann wrote ---- > > Zane Bitter writes: > > > > > On 18/06/19 11:52 AM, Jim Rollenhagen wrote: > > >> On Tue, Jun 18, 2019 at 11:47 AM Mohammed Naser > >> > wrote: > > >> > > >> On Tue, Jun 18, 2019 at 11:21 AM Graham Hayes > >> > wrote: > > >> > I would suggest we move it ot the following week, unless a lot of > > >> > the US based folks think they can make it? > > >> > > >> While I think that we shouldn't necessarily just build our community > > >> to accommodate a specific region, I think in this case, the majority > > >> of the members involved in this meeting are in that region. > > >> > > >> I guess it would be nice if TC members can chime in and mention if it > > >> is easier to meet the week after which would be on the 11th of July. > > >> > > >> > > >> I won't be working on the 4th, so yes for me. > > > > > > +1 > > > > > > > I will also be offline on the 4th. > > Both dates are ok for me. Both OK for me -- Thierry Carrez (ttx) From stig.openstack at telfer.org Wed Jun 19 10:09:30 2019 From: stig.openstack at telfer.org (Stig Telfer) Date: Wed, 19 Jun 2019 11:09:30 +0100 Subject: [scientific-sig] IRC meeting today - ISC roundup, HPC containers, Shanghai CFP Message-ID: <08D87F3F-D690-4D4A-AFE6-3D0B428D5E48@telfer.org> Hi all - We have a Scientific SIG IRC meeting at 1100 UTC (just under an hour’s time). Everyone is welcome. Today’s agenda is here: https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_June_19th_2019 Today we’ll do a roundup of the HPC/cloud activity from the International Supercomputer Conference currently underway in Frankfurt. We’ll also have trip reports from the HPC containers workshop (also in Frankfurt) and the Sanger OpenStack day. Finally, we’d like to kick-off the planning for coordinated activities at the Shanghai Open Infrastructure Summit. Cheers, Stig -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.settle at outlook.com Wed Jun 19 10:58:51 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Wed, 19 Jun 2019 10:58:51 +0000 Subject: [tc] July meeting host In-Reply-To: References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> Message-ID: Thanks all. Top posting on purpose as I still need a volunteer for the July 11 meeting? :) Cheers, Alex On 19/06/2019 05:25, Rico Lin wrote: On Tue, Jun 18, 2019 at 11:52 PM Mohammed Naser > wrote: > > On Tue, Jun 18, 2019 at 11:21 AM Graham Hayes > wrote: > > I would suggest we move it ot the following week, unless a lot of > > the US based folks think they can make it? > > While I think that we shouldn't necessarily just build our community > to accommodate a specific region, I think in this case, the majority > of the members involved in this meeting are in that region. > > I guess it would be nice if TC members can chime in and mention if it > is easier to meet the week after which would be on the 11th of July. > Both dates work for me -- May The Force of OpenStack Be With You, Rico Lin irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Jun 19 11:29:47 2019 From: smooney at redhat.com (Sean Mooney) Date: Wed, 19 Jun 2019 12:29:47 +0100 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: <3b7955951183eff75b643b0cfc91c88bbf737e62.camel@redhat.com> <394fd3b198d72ccc533b504b20cc86d2da1a5210.camel@redhat.com> <9e2d014ec8b2c3e749d0362b8d814128ee3321e4.camel@redhat.com> Message-ID: <30c123a8834a49e112a5832ec197d4ec220561d1.camel@redhat.com> On Wed, 2019-06-19 at 11:59 +0800, Alex Xu wrote: > Shewale, Bhagyashri 于2019年6月19日周三 > 上午11:01写道: > > > Hi All, > > > > > > After all discussions on mailing thread, I would like to summarize > > concluded points as under:- > > > > 1. If operator sets ``vcpu_pin_set`` in Stein and upgrade it to Train > > or ``vcpu_pin_set`` is set on a new compute node, then both VCPU and PCPU > > inventory should be reported to placement. > > 2. User can’t request both ``resources:PCPU`` and ``resources:VCPU`` > > in a single request for Train release. And in future ‘U’ release, > > user can request both ``resources:PCPU`` and ``resources:VCPU`` in a > > single request. > > 3. In “U” release, “vcpu_pin_set” config option will be removed. In > > this case, operator will either need to set “cpu_shared_set” or > > “cpu_dedicated_set” accordingly on old compute nodes and on new compute > > nodes, operator can set both the config option “cpu_shared_set” and “cpu_dedicated_set” > > if required. > > 4. In Train release, operator will also need to continue retaining the > > same behavior of host aggregates as that of Stein to differentiate between > > Numa-awared compute host: > > - Hosts meant for pinned instances should be part of the aggregate > > with metadata “pinned=True” > > - Hosts meant for non-pinned instances should be part of the > > aggregate with metadata “pinned=False” > > 5. In Train release, old flavor can be used as is in which case > > scheduler pre-filter will map it to the next syntax “resources:PCPU” in > > case cpu_policy=dedicated. > > > > +1 all above > > > > 1. In Train release, new flavor syntax “resources:PCPU=1 will be > > accepted in flavor extra specs but in this case we expect operator > > will set “aggregate_instance_extra_specs:pinned=True” in flavor extra specs > > and the hosts are part of the aggregate which has metadata “pinned=True”. > > > > If the user finished the upgrade, and switch to dedicated_cpu_set and > > shared_cpu_set, then he needn't aggregate anymore. yes once they remove vcpu_pin_set and define share_cpu_set or dedicated_cpu_set they no longer need agggates. the aggates are just needed to cover the time after they have upgraded but before the reconfigure as that shoudl be done as two discreet actions and can be seperated by a long time period. > > For using resources:PCPU directly, I'm not sure. I have talk with Sean few > days ago, we both think that we shouldn't allow the user using "resources" > extra spec directly. Thinking about the future, we have numa on placement, > the resources and traits extra spec can't express all the guest numa info. > Like which numa node is the first one. And it is hard to parse the guest > numa topo from those extra spec, and it isn't human readable. Also "hw:" > provides some abstraction than resources/traits extra spec, allow us to do > some upgrade without asking user update their flavor. But this isn't > critical problem for now, until we have numa on placement. yep i agree with ^ using "resouces:" directly is something i would discourage as it a leaky abstration that is directly mapped to the placmeent api and as a result if you use it it means you will need to update your flavors as new features are added. so for cpu pinning i would personally recommend only using hw:cpu_policy > > > > Regards, > > > > Bhagyashri Shewale > > > > ------------------------------ > > *From:* Stephen Finucane > > *Sent:* Tuesday, June 18, 2019 6:51:15 PM > > *To:* Shewale, Bhagyashri; openstack-discuss at lists.openstack.org > > *Subject:* Re: [nova] Spec: Standardize CPU resource tracking > > > > On Tue, 2019-06-18 at 06:41 +0000, Shewale, Bhagyashri wrote: > > > > As above, ignore 'cpu_shared_set' but issue a warning. Use the value of > > > > ‘vcpu_pin_set' to report both VCPU and PCPU inventory. Note that > > > > ‘vcpu_pin_set' is already used to calculate VCPU inventory. > > > > > > As mentioned in the spec, If operator sets the ``vcpu_pin_set`` in > > > the Stein and upgrade to Train then both VCPU and PCPU inventory > > > should be reported in placement. > > > > > > As on current master (Stein) if operator sets ``vpcu_pin_set=0-3`` on > > > Compute node A and adds that node A into the host aggregate say > > > “agg1” having metadata ``pinned=true``, then it allows to create > > > both pinned and non-pinned instances which is known big issue. > > > Create instance A having flavor extra specs > > > ("aggregate_instance_extra_specs:pinned": "true") then instance A > > > will float on cpus 0-3 > > > Create the instance B having flavor extra specs > > > ("aggregate_instance_extra_specs:pinned": "true", "hw:cpu_policy": > > > "dedicated") then instance B will be pinned to one of the cpu say 0. > > > Now, operator will do the upgrade (Stein to Train), nova compute will > > > report both VCPU and PCPU inventory. In this case if > > > cpu_allocation_ratio is 1, then total PCPU available will be 4 > > > (vpcu_pin_set=0-3) and VCPU will also be 4. And this will allow user > > > to create maximum of 4 instances with flavor extra specs > > > ``resources:PCPU=1`` and 4 instances with flavor extra specs > > > ``resources:VCPU=1``. > > > > If the cpu_allocation_ratio is 1.0 then yes, this is correct. However, > > if it's any greater (and remember, the default is 16.0) then the gap is > > much smaller, though still broken. > > > > > With current master code, it’s possible to create only 4 instances > > > where now, by reporting both VCPU and PCPU, it will allow user to > > > create total of 8 instances which is adding another level of problem > > > along with the existing known issue. Is this acceptable? because > > > this is decorating the problems. > > > > I think is acceptable, yes. As we've said, this is broken behavior and > > things are just slightly more broken here, though not horribly so. As > > it stands, if you don't isolate pinned instances from non-pinned > > instances, you don't get any of the guarantees pinning is supposed to > > provide. Using the above example, if you booted two pinned and two > > unpinned instances on the same host, the unpinned instances would float > > over the pinned instances' cores [*] and impact their performance. If > > performance is an issue, host aggregrates will have been used. > > > > [*] They'll actually float over the entire range of host cores since > > instnace without a NUMA topology don't respect the 'vcpu_pin_set' > > value. > > > > > If not acceptable, then we can report only PCPU in this case which > > > will solve two problems:- > > > The existing known issue on current master (allowing both pinned and > > > non-pinned instances) on the compute host meant for pinning. > > > Above issue of allowing 8 instances to be created on the host. > > > But there is one problem in taking this decision, if no instances are > > > running on the compute node in case only ``vcpu_pinned_set`` is set, > > > how do you find out this compute node is configured to create pinned > > > or non-pinned instances? If instances are running, based on the Host > > > numa_topology.pinned_cpus, it’s possible to detect that. > > > > As noted previously, this is too complex and too error prone. Let's > > just suffer the potential additional impact on performance for those > > who haven't correctly configured their deployment, knowing that as soon > > as they get to U, where we can require the 'cpu_dedicated_set' and > > 'cpu_shared_set' options if you want to use pinned instances, things > > will be fixed. > > > > Stephen > > > > Disclaimer: This email and any attachments are sent in strictest > > confidence for the sole use of the addressee and may contain legally > > privileged, confidential, and proprietary data. If you are not the intended > > recipient, please advise the sender by replying promptly to this email and > > then delete and destroy this email and any attachments without any further > > use, copying or forwarding. > > From cdent+os at anticdent.org Wed Jun 19 12:39:59 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Wed, 19 Jun 2019 13:39:59 +0100 (BST) Subject: [placement] db query analysis Message-ID: One of the queries that has come up recently with placement performance is whether there may be opportunities to gain some improvement by making fewer queries to the db. That is: * are there redundant queries * are there places where data is gathered in multiple queries that could be in one (or at least fewer) * are there queries that are not doing what we think to that end I've done some analysis of logs produced when [placement_database]/connection_debug is set to 50 (which dumps SQL queries to the INFO log). The collection of queries made during a single request to GET /allocation_candidates is at http://paste.openstack.org/show/753183/ The data set is a single resource resource provider of the same form as used in the placement-perfload job (where 1000 providers are used). Only 1 is used in this case as several queries use 'IN' statements that list all the resource provider ids currently in play, and that gets dumped to the log making it inscrutable. I've noted in the paste where this happens. Each block of SQL is associated with the method that calls it. The queries are in the order they happen. One query that happens three times (once for each resource class requested) is listed once. Observations: * The way we use IN could be improved using a bindparam: https://docs.sqlalchemy.org/en/13/core/sqlelement.html?highlight=expanding%20bindparam#sqlalchemy.sql.operators.ColumnOperators.in_ * That we use IN in that fashion at all, where we are carrying lists of rp ids around and making multiple queries, instead of one giant one, might be an area worth exploring. * There are a couple of places where get get a trait id (via name) in a separate query from using the trait id. * What can you see? Please have a look to see if anything looks odd, wrong, etc. Basically what we're after is trying to find things that violate our expectations. Note that this is just one of several paths through the database. When there are sharing or nested providers things change. I didn't bother to do a more complex set of queries at this time as it seemed starting simple would help us tease out how best to communicate these sorts of things. Related to that, I've started working on a nested-perfload at https://review.opendev.org/665695 -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From thierry at openstack.org Wed Jun 19 12:43:55 2019 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 19 Jun 2019 14:43:55 +0200 Subject: [tc] July meeting host In-Reply-To: References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> Message-ID: <3eb837f5-5866-f7d6-f7de-1ed2cfc4ef86@openstack.org> Alexandra Settle wrote: > Thanks all. Top posting on purpose as I still need a volunteer for the > July 11 meeting? :) I can run it if nobody else wants to take that opportunity. My gif game is pretty weak though. -- Thierry From thierry at openstack.org Wed Jun 19 13:05:23 2019 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 19 Jun 2019 15:05:23 +0200 Subject: [all] "Popup teams" proposal about to merge Message-ID: Hi everyone, Following the discussion at [1], a governance change formalizing "popup teams" (short-lived, objective-driven cross-project groups) is about to be approved by the TC at: https://review.opendev.org/#/c/661356/ Please have a look and comment if you have strong feelings about that. [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006225.html -- Thierry Carrez (ttx) From ekultails at gmail.com Wed Jun 19 13:35:22 2019 From: ekultails at gmail.com (Luke Short) Date: Wed, 19 Jun 2019 09:35:22 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: Hey folks, The meeting time results are in! For transparency, these are the full results: - 6 anonymous responses. Each person could vote for multiple times that best fit their schedule. - 13:00 UTC = 3 - 14:00 UTC = 3 - 15:00 UTC = 2 - 16:00 UTC = 1 Since we had a tie for UTC 13 and 14, I am referring to our Etherpad for the tie breaker. The most votes were for 14:00 UTC. I look forward to meeting with everyone during that time on IRC via the #tripleo channel tomorrow! Sincerely, Luke Short On Fri, Jun 14, 2019 at 12:14 PM Luke Short wrote: > Hey folks, > > Since we have not gotten a lot of feedback on the times to meet-up, I have > created a Google Forms survey to help us figure it out. When you get a > chance, please answer this simple 1-question survey. > > > https://docs.google.com/forms/d/e/1FAIpQLSfHkN_T7T-W4Dhc17Pf6VHm1oUzKLJYz0u9ORAJYafrIGooZQ/viewform?usp=sf_link > > Try to complete this by end-of-day Tuesday so we have an idea of when we > should meet on this upcoming Thursday. Thanks for your help! > > Sincerely, > Luke Short > > On Wed, Jun 12, 2019 at 9:43 AM Kevin Carter wrote: > >> I've submitted reviews under the topic "retire-role" to truncate all of >> the ansible-role-tripleo-* repos, that set can be seen here [0]. When folks >> get a chance, I'd greatly appreciate folks have a look at these reviews. >> >> [0] - https://review.opendev.org/#/q/topic:retire-role+status:open >> >> -- >> >> Kevin Carter >> IRC: kecarter >> >> >> On Wed, Jun 5, 2019 at 11:27 AM Luke Short wrote: >> >>> Hey everyone, >>> >>> For the upcoming work on focusing on more Ansible automation and >>> testing, I have created a dedicated #tripleo-transformation channel for our >>> new squad. Feel free to join if you are interested in joining and helping >>> out! >>> >>> +1 to removing repositories we don't use, especially if they have no >>> working code. I'd like to see the consolidation of TripleO specific things >>> into the tripleo-ansible repository and then using upstream Ansible roles >>> for all of the different services (nova, glance, cinder, etc.). >>> >>> Sincerely, >>> >>> Luke Short, RHCE >>> Software Engineer, OpenStack Deployment Framework >>> Red Hat, Inc. >>> >>> >>> On Wed, Jun 5, 2019 at 8:53 AM David Peacock >>> wrote: >>> >>>> On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter >>>> wrote: >>>> >>>>> So the questions at hand are: what, if anything, should we do with >>>>> these repositories? Should we retire them or just ignore them? Is there >>>>> anyone using any of the roles? >>>>> >>>> >>>> My initial reaction was to suggest we just ignore them, but on second >>>> thought I'm wondering if there is anything negative if we leave them lying >>>> around. Unless we're going to benefit from them in the future if we start >>>> actively working in these repos, they represent obfuscation and debt, so it >>>> might be best to retire / dispose of them. >>>> >>>> David >>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Wed Jun 19 13:42:43 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 19 Jun 2019 14:42:43 +0100 Subject: [cinder] nfs in kolla In-Reply-To: References: Message-ID: On Thu, 13 Jun 2019 at 13:24, Ignazio Cassano wrote: > > Hello, I' just deployed ocata with kolla and my cinder backend is nfs. > Volumes are created successfully but live migration does not work. > While cinder_volume container mounts the cinder nfs backend, the cinder api not and during live migration the cinder api logs reports errors accessing volumes : > > Stderr: u"qemu-img: Could not open '/var/lib/cinder/mnt/451bacc11bd88b51ce7bdf31aa97cf39/volume-4889a547-0a0d-440e-8b50-413285b5979c' Hi Ignazio, Could you try adding 'cinder:/var/lib/cinder' to the list of Docker volumes for the cinder-api container in kolla-ansible? https://opendev.org/openstack/kolla-ansible/src/branch/stable/ocata/ansible/roles/cinder/defaults/main.yml#L10 > > Any help, please ? > Regards > Ignazio From mark at stackhpc.com Wed Jun 19 13:51:17 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 19 Jun 2019 14:51:17 +0100 Subject: [Kolla] ansible compatibility In-Reply-To: References: Message-ID: On Wed, 12 Jun 2019 at 19:50, Ignazio Cassano wrote: > > Hello All, > I'd like to know if there is a Matrix for kolla-ansible and ansible version ...in other words, which ansible version must be used for a kolla-ansible version. > For example ocata used kolla-ansible 4.0.5 but I do not know which version of ansible must be used. > Installing kolla-ansible with pip it does not install ansible. Hi, For ocata the minimum version of Ansible is 2.0.0. In Pike it bumped to 2.2, and in Rocky it bumped to 2.4. See the ansible_version_min variable in ansible/roles/prechecks/vars/main.yml. It would be good if we documented this. Mark > Reverse > Ignazio From mark at stackhpc.com Wed Jun 19 13:53:13 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 19 Jun 2019 14:53:13 +0100 Subject: [nova][kolla][openstack-ansible][tripleo] Cells v2 upgrades In-Reply-To: References: Message-ID: On Fri, 7 Jun 2019 at 18:52, Mohammed Naser wrote: > > On Wed, Jun 5, 2019 at 5:16 AM Mark Goddard wrote: > > > > Hi, > > > > At the recent kolla virtual PTG [1] we had a good discussion about > > adding support for multiple nova cells in kolla-ansible. We agreed a > > key requirement is to be able to perform operations on one or more > > cells without affecting the rest for damage limitation. This also > > seems like it would apply to upgrades. We're seeking input on > > ordering. Looking at the nova upgrade guide [2] I might propose > > something like this: > > > > 1. DB syncs > > 2. Upgrade API, super conductor > > > > For each cell: > > 3a. Upgrade cell conductor > > 3b. Upgrade cell computes > > > > 4. SIGHUP all services > > Unfortunately, this is a problem right now: > > https://review.opendev.org/#/c/641907/ > > I sat down at the PTG to settle this down, I was going to finish this > patch up but I didn't get around to it. That might be an action item > to be able to do this successfully. > Ah, that's true. Thanks for reminding me. s/SIGHUP/restart/, > > 5. Run online migrations > > > > At some point in here we also need to run the upgrade check. > > Presumably between steps 1 and 2? > > > > It would be great to get feedback both from the nova team and anyone > > running cells > > Thanks, > > Mark > > > > [1] https://etherpad.openstack.org/p/kolla-train-ptg > > [2] https://docs.openstack.org/nova/latest/user/upgrade.html > > > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com From a.settle at outlook.com Wed Jun 19 13:56:30 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Wed, 19 Jun 2019 13:56:30 +0000 Subject: [tc] July meeting host In-Reply-To: <3eb837f5-5866-f7d6-f7de-1ed2cfc4ef86@openstack.org> References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> <3eb837f5-5866-f7d6-f7de-1ed2cfc4ef86@openstack.org> Message-ID: On 19/06/2019 13:43, Thierry Carrez wrote: > Alexandra Settle wrote: >> Thanks all. Top posting on purpose as I still need a volunteer for >> the July 11 meeting? :) > > I can run it if nobody else wants to take that opportunity. > My gif game is pretty weak though. Much appreciated :D Mugs has already offered to help the gif game ;) The agenda is already up-to-date as of *now* [1]. If there's anything you need to change. Please do! Thank you! Alex [1] https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee From emccormick at cirrusseven.com Wed Jun 19 14:01:24 2019 From: emccormick at cirrusseven.com (Erik McCormick) Date: Wed, 19 Jun 2019 10:01:24 -0400 Subject: [Kolla] ansible compatibility In-Reply-To: References: Message-ID: On Wed, Jun 19, 2019 at 9:53 AM Mark Goddard wrote: > > On Wed, 12 Jun 2019 at 19:50, Ignazio Cassano wrote: > > > > Hello All, > > I'd like to know if there is a Matrix for kolla-ansible and ansible version ...in other words, which ansible version must be used for a kolla-ansible version. > > For example ocata used kolla-ansible 4.0.5 but I do not know which version of ansible must be used. > > Installing kolla-ansible with pip it does not install ansible. > Hi, > For ocata the minimum version of Ansible is 2.0.0. In Pike it bumped > to 2.2, and in Rocky it bumped to 2.4. > See the ansible_version_min variable in ansible/roles/prechecks/vars/main.yml. > It would be good if we documented this. > Mark It would be super if we could also have a max version. I've had to downgrade from latest almost every time I've had to do a fresh install. > > Reverse > > Ignazio > From mark at stackhpc.com Wed Jun 19 14:04:38 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 19 Jun 2019 15:04:38 +0100 Subject: [Kolla] ansible compatibility In-Reply-To: References: Message-ID: On Wed, 19 Jun 2019 at 15:01, Erik McCormick wrote: > > On Wed, Jun 19, 2019 at 9:53 AM Mark Goddard wrote: > > > > On Wed, 12 Jun 2019 at 19:50, Ignazio Cassano wrote: > > > > > > Hello All, > > > I'd like to know if there is a Matrix for kolla-ansible and ansible version ...in other words, which ansible version must be used for a kolla-ansible version. > > > For example ocata used kolla-ansible 4.0.5 but I do not know which version of ansible must be used. > > > Installing kolla-ansible with pip it does not install ansible. > > Hi, > > For ocata the minimum version of Ansible is 2.0.0. In Pike it bumped > > to 2.2, and in Rocky it bumped to 2.4. > > See the ansible_version_min variable in ansible/roles/prechecks/vars/main.yml. > > It would be good if we documented this. > > Mark > > It would be super if we could also have a max version. I've had to > downgrade from latest almost every time I've had to do a fresh > install. That's a fair point. With kayobe we just include a cap in requirements.txt so as not to worry about it. > > > > Reverse > > > Ignazio > > From thierry at openstack.org Wed Jun 19 14:27:46 2019 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 19 Jun 2019 16:27:46 +0200 Subject: [tc][tripleo][charms][helm][kolla][ansible][puppet][chef] Deployment tools capabilities In-Reply-To: <569964f6-ead3-a983-76d2-ffa59c753dbe@openstack.org> References: <569964f6-ead3-a983-76d2-ffa59c753dbe@openstack.org> Message-ID: <8a4b672a-1398-6aba-9a38-7e868cbdeae6@openstack.org> Thierry Carrez wrote: > [...] > The next steps are: > > - commit the detailed list of tags (action:ttx) Now posted for your reviewing pleasure: https://review.opendev.org/666303 In this first version, I took most of the non-controversial and simple tags from https://etherpad.openstack.org/p/deployment-tools-tags ... The idea is to quickly merge the first version and then iterate on it. -- Thierry Carrez (ttx) From kecarter at redhat.com Wed Jun 19 14:33:32 2019 From: kecarter at redhat.com (Kevin Carter) Date: Wed, 19 Jun 2019 09:33:32 -0500 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: As it turns out the edge squad has a regularly scheduled meeting on Thursday's at 14:00 UTC. So to ensure we're not creating conflict we'll select the runner up time slot and shoot to have our meetings on Thursday's at 13:00 UTC. I've updated the wiki with our new squad [0] and linked to our running etherpad [1]. I hope this time slot works for most everyone and I look forward to meeting with you all soon. [0] https://wiki.openstack.org/wiki/Meetings/TripleO#Squads [1] https://etherpad.openstack.org/p/tripleo-ansible-agenda -- Kevin Carter IRC: cloudnull On Wed, Jun 19, 2019 at 8:35 AM Luke Short wrote: > Hey folks, > > The meeting time results are in! For transparency, these are the full > results: > > > - 6 anonymous responses. Each person could vote for multiple times > that best fit their schedule. > - 13:00 UTC = 3 > - 14:00 UTC = 3 > - 15:00 UTC = 2 > - 16:00 UTC = 1 > > > Since we had a tie for UTC 13 and 14, I am referring to our Etherpad > for the tie > breaker. The most votes were for 14:00 UTC. I look forward to meeting with > everyone during that time on IRC via the #tripleo channel tomorrow! > > Sincerely, > Luke Short > > On Fri, Jun 14, 2019 at 12:14 PM Luke Short wrote: > >> Hey folks, >> >> Since we have not gotten a lot of feedback on the times to meet-up, I >> have created a Google Forms survey to help us figure it out. When you get a >> chance, please answer this simple 1-question survey. >> >> >> https://docs.google.com/forms/d/e/1FAIpQLSfHkN_T7T-W4Dhc17Pf6VHm1oUzKLJYz0u9ORAJYafrIGooZQ/viewform?usp=sf_link >> >> Try to complete this by end-of-day Tuesday so we have an idea of when we >> should meet on this upcoming Thursday. Thanks for your help! >> >> Sincerely, >> Luke Short >> >> On Wed, Jun 12, 2019 at 9:43 AM Kevin Carter wrote: >> >>> I've submitted reviews under the topic "retire-role" to truncate all of >>> the ansible-role-tripleo-* repos, that set can be seen here [0]. When folks >>> get a chance, I'd greatly appreciate folks have a look at these reviews. >>> >>> [0] - https://review.opendev.org/#/q/topic:retire-role+status:open >>> >>> -- >>> >>> Kevin Carter >>> IRC: kecarter >>> >>> >>> On Wed, Jun 5, 2019 at 11:27 AM Luke Short wrote: >>> >>>> Hey everyone, >>>> >>>> For the upcoming work on focusing on more Ansible automation and >>>> testing, I have created a dedicated #tripleo-transformation channel for our >>>> new squad. Feel free to join if you are interested in joining and helping >>>> out! >>>> >>>> +1 to removing repositories we don't use, especially if they have no >>>> working code. I'd like to see the consolidation of TripleO specific things >>>> into the tripleo-ansible repository and then using upstream Ansible roles >>>> for all of the different services (nova, glance, cinder, etc.). >>>> >>>> Sincerely, >>>> >>>> Luke Short, RHCE >>>> Software Engineer, OpenStack Deployment Framework >>>> Red Hat, Inc. >>>> >>>> >>>> On Wed, Jun 5, 2019 at 8:53 AM David Peacock >>>> wrote: >>>> >>>>> On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter >>>>> wrote: >>>>> >>>>>> So the questions at hand are: what, if anything, should we do with >>>>>> these repositories? Should we retire them or just ignore them? Is there >>>>>> anyone using any of the roles? >>>>>> >>>>> >>>>> My initial reaction was to suggest we just ignore them, but on second >>>>> thought I'm wondering if there is anything negative if we leave them lying >>>>> around. Unless we're going to benefit from them in the future if we start >>>>> actively working in these repos, they represent obfuscation and debt, so it >>>>> might be best to retire / dispose of them. >>>>> >>>>> David >>>>> >>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Wed Jun 19 15:18:37 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Wed, 19 Jun 2019 17:18:37 +0200 Subject: [Kolla] ansible compatibility In-Reply-To: References: Message-ID: Many thanks, I found the minversion in files. Ignazio Il giorno mer 19 giu 2019 alle ore 15:51 Mark Goddard ha scritto: > On Wed, 12 Jun 2019 at 19:50, Ignazio Cassano > wrote: > > > > Hello All, > > I'd like to know if there is a Matrix for kolla-ansible and ansible > version ...in other words, which ansible version must be used for a > kolla-ansible version. > > For example ocata used kolla-ansible 4.0.5 but I do not know which > version of ansible must be used. > > Installing kolla-ansible with pip it does not install ansible. > Hi, > For ocata the minimum version of Ansible is 2.0.0. In Pike it bumped > to 2.2, and in Rocky it bumped to 2.4. > See the ansible_version_min variable in > ansible/roles/prechecks/vars/main.yml. > It would be good if we documented this. > Mark > > Reverse > > Ignazio > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Wed Jun 19 15:22:47 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Wed, 19 Jun 2019 17:22:47 +0200 Subject: [cinder] nfs in kolla In-Reply-To: References: Message-ID: I will try it. Thanks Ignazio Il giorno mer 19 giu 2019 alle ore 15:42 Mark Goddard ha scritto: > On Thu, 13 Jun 2019 at 13:24, Ignazio Cassano > wrote: > > > > Hello, I' just deployed ocata with kolla and my cinder backend is nfs. > > Volumes are created successfully but live migration does not work. > > While cinder_volume container mounts the cinder nfs backend, the cinder > api not and during live migration the cinder api logs reports errors > accessing volumes : > > > > Stderr: u"qemu-img: Could not open > '/var/lib/cinder/mnt/451bacc11bd88b51ce7bdf31aa97cf39/volume-4889a547-0a0d-440e-8b50-413285b5979c' > > Hi Ignazio, > > Could you try adding 'cinder:/var/lib/cinder' to the list of Docker > volumes for the cinder-api container in kolla-ansible? > > > https://opendev.org/openstack/kolla-ansible/src/branch/stable/ocata/ansible/roles/cinder/defaults/main.yml#L10 > > > > > Any help, please ? > > Regards > > Ignazio > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lbragstad at gmail.com Wed Jun 19 15:37:56 2019 From: lbragstad at gmail.com (Lance Bragstad) Date: Wed, 19 Jun 2019 10:37:56 -0500 Subject: [tc] July meeting host In-Reply-To: References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> <0aa1b541-a6ff-bb03-20db-d55ab72274a3@redhat.com> Message-ID: On 6/18/19 4:49 PM, Doug Hellmann wrote: > Zane Bitter writes: > >> On 18/06/19 11:52 AM, Jim Rollenhagen wrote: >>> On Tue, Jun 18, 2019 at 11:47 AM Mohammed Naser >> > wrote: >>> >>> On Tue, Jun 18, 2019 at 11:21 AM Graham Hayes >> > wrote: >>> > I would suggest we move it ot the following week, unless a lot of >>> > the US based folks think they can make it? >>> >>> While I think that we shouldn't necessarily just build our community >>> to accommodate a specific region, I think in this case, the majority >>> of the members involved in this meeting are in that region. >>> >>> I guess it would be nice if TC members can chime in and mention if it >>> is easier to meet the week after which would be on the 11th of July. >>> >>> >>> I won't be working on the 4th, so yes for me. >> +1 >> > I will also be offline on the 4th. >  +1 - I won't be working on the 4th. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From Rajini.Karthik at Dell.com Wed Jun 19 15:39:08 2019 From: Rajini.Karthik at Dell.com (Rajini.Karthik at Dell.com) Date: Wed, 19 Jun 2019 15:39:08 +0000 Subject: [Thirdparty CI] CI Watch is stuck Message-ID: Hi all, Is there any issue with CI-WATCH, it seems to be stuck from June 14th http://ciwatch.mmedvede.net/project?project=cinder&time=24+hours Thanks Rajini -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdent+os at anticdent.org Wed Jun 19 15:57:11 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Wed, 19 Jun 2019 16:57:11 +0100 (BST) Subject: [placement][docs][packaging] rpm- and deb- related placement install docs Message-ID: The placement install docs [1] have the following warning plastered on top: These installation documents are a work in progress. Some of the distribution packages mentioned are not yet available so the instructions will not work. It is likely that packages are now available, but the docs have not been updated to reflect that, nor have the docs been verified against those packages. None of the regular and active contributors to placement are involved with distro-related packaging. In addition they are all already fully booked. Are there members of the community who are involved with the distros, and who can update and verify these docs? If not, we will likely need to remove the distro related install documents before the end of the Train release and solely maintain the install-from-pypi docs [2] (which are up to date). This is because the existing docs are now misleading, and thus potentially dangerous and extra-work-inducing, as suggested by a recent bug [3]. If you are interested in helping out, please follow up. If you're unable to help out, but know people involved with packaging who should know about this concern, please let them know. Thanks for your help. [1] https://docs.openstack.org/placement/latest/install/index.htmlk [2] https://docs.openstack.org/placement/latest/install/from-pypi.html [3] https://storyboard.openstack.org/#!/story/2005910 -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From sbauza at redhat.com Wed Jun 19 15:57:41 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 19 Jun 2019 17:57:41 +0200 Subject: [nova][placement][ops] Need for a nova-manage placement audit command In-Reply-To: References: Message-ID: On Mon, Mar 25, 2019 at 2:43 PM Matt Riedemann wrote: > Re-posting this since it came up again in IRC today. > > ... and we discussed yet again today on IRC. Adding ops tho because I'd love their feedback about what they'd really need (or like). On 9/20/2018 6:09 PM, Matt Riedemann wrote: > > mnaser wrote a simple placement audit tool today and dumped his script > > into a bug report [1]. It turns out several operators have already > > written a variant of this same tool. The idea is to compare what's in > > placement for allocations and (compute node) resource providers against > > what's in nova for instances and compute nodes, look for orphans in > > placement and then report them, or maybe also clean them up. > > > Yeah, and also see whether we would have allocations against migration records that aren't currenty in progress, as https://review.opendev.org/#/c/661349/ tries the fix the problem. > I think this could go into a "nova-manage placement audit" command and > > should be pretty easy to write for minimum functionality (start with > > just the report). > > > Yup, I'd consider checking Resource Providers that have inventories of resource classes that Nova handles and checks whether they are related to somehow a compute node (hint: this isn't gonna be trivial as it could be nested resource providers - but the root node would be a compute node) I'd also consider allocations (of the same nova-ish resource classes) that aren't against instances or in-progress migrations and bail them out. All of that would be resulting in some sort of textual output that could be parsable if needed. WFY, folks ? > I'm advertising the need here in case someone wants to work on this. I'd > > like to myself, but just can't justify the time right now. > > > *I* can justify my time on it so I'm gladly volunteering on it. Thanks Matt for raising up the case. -Sylvain > [1] https://bugs.launchpad.net/nova/+bug/1793569 > > > > -- > > Thanks, > > Matt > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Jun 19 17:14:01 2019 From: smooney at redhat.com (Sean Mooney) Date: Wed, 19 Jun 2019 18:14:01 +0100 Subject: [placement][docs][packaging] rpm- and deb- related placement install docs In-Reply-To: References: Message-ID: On Wed, 2019-06-19 at 16:57 +0100, Chris Dent wrote: > The placement install docs [1] have the following warning plastered > on top: > > These installation documents are a work in progress. Some of the > distribution packages mentioned are not yet available so the > instructions will not work. > > It is likely that packages are now available, but the docs have not > been updated to reflect that, nor have the docs been verified > against those packages. > > None of the regular and active contributors to placement are > involved with distro-related packaging. In addition they are all > already fully booked. Are there members of the community who are > involved with the distros, and who can update and verify these docs? from an RPM perspective the validation should like be done with RDO rpms and centos/fedora. aproching this from a product perspective while redhat OSP will contain RPM for placment OSP from 13+ is only supporot if the installation is done via OSP Director (tripleo) which uses docker contaienr that are prebuilt and hosted on a redhat registry so from a RHEL/OSP pserspectinv direcly inalling placment via the RPM is not a supported way of using of deploying placmenet. that said we will be using the rpm to build the container image so the bits will be identicaly so it should work but for the docs to be useful i think the RPM docs shoudl not refernece rhel or osp and focus just on the cenots/fedora based install and the RDO rpms. assuming we keep the distro docs that is. > > If not, we will likely need to remove the distro related install > documents before the end of the Train release and solely maintain > the install-from-pypi docs [2] (which are up to date). This is > because the existing docs are now misleading, and thus potentially > dangerous and extra-work-inducing, as suggested by a recent bug [3]. im not really invovled in any of the packaging but in theory placement would fall under the remit of the Compute DFG which im in so ill bring this up internally and see if we can help with this but i also think its perfectly fair to just document the pip/git installation workflow and leave the distor stuff to distors to document downstream if they supprot that. > > If you are interested in helping out, please follow up. If you're > unable to help out, but know people involved with packaging who > should know about this concern, please let them know. > > Thanks for your help. > > [1] https://docs.openstack.org/placement/latest/install/index.htmlk > [2] https://docs.openstack.org/placement/latest/install/from-pypi.html > [3] https://storyboard.openstack.org/#!/story/2005910 > From colleen at gazlene.net Wed Jun 19 17:31:14 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Wed, 19 Jun 2019 10:31:14 -0700 Subject: =?UTF-8?Q?Re:_[placement][docs][packaging]_rpm-_and_deb-_related_placeme?= =?UTF-8?Q?nt_install_docs?= In-Reply-To: References: Message-ID: <970d9e09-f0d2-4792-b0b3-2c73f30f6d13@www.fastmail.com> Hello, On Wed, Jun 19, 2019, at 08:57, Chris Dent wrote: > > The placement install docs [1] have the following warning plastered > on top: > > These installation documents are a work in progress. Some of the > distribution packages mentioned are not yet available so the > instructions will not work. > > It is likely that packages are now available, but the docs have not > been updated to reflect that, nor have the docs been verified > against those packages. > > None of the regular and active contributors to placement are > involved with distro-related packaging. In addition they are all > already fully booked. Are there members of the community who are > involved with the distros, and who can update and verify these docs? SUSE has packages for Placement now. I've gone ahead and made the adjustments to the docs: https://review.opendev.org/666408 Thanks for pointing it out. Colleen > > If not, we will likely need to remove the distro related install > documents before the end of the Train release and solely maintain > the install-from-pypi docs [2] (which are up to date). This is > because the existing docs are now misleading, and thus potentially > dangerous and extra-work-inducing, as suggested by a recent bug [3]. > > If you are interested in helping out, please follow up. If you're > unable to help out, but know people involved with packaging who > should know about this concern, please let them know. > > Thanks for your help. > > [1] https://docs.openstack.org/placement/latest/install/index.htmlk > [2] https://docs.openstack.org/placement/latest/install/from-pypi.html > [3] https://storyboard.openstack.org/#!/story/2005910 > > -- > Chris Dent ٩◔̯◔۶ https://anticdent.org/ > freenode: cdent From juliaashleykreger at gmail.com Wed Jun 19 20:05:12 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Wed, 19 Jun 2019 13:05:12 -0700 Subject: [ironic] cancelling the meeting for June 24th Message-ID: Greetings everyone! Next Monday, June 24th, I will be largely unavailable due to business travel. On top of that, The person who I would normally ask to run the meeting in my absence, will also be traveling that day. With that being said, I think it is best to cancel the meeting. Now, if there is someone willing to be the ad-hoc meeting organizer and run the meeting or if everyone were to just treat it like an office hours that would be a good thing. Thanks everyone! -Julia From tobias.urdin at binero.se Thu Jun 20 08:04:05 2019 From: tobias.urdin at binero.se (Tobias Urdin) Date: Thu, 20 Jun 2019 10:04:05 +0200 Subject: [placement][docs][packaging] rpm- and deb- related placement install docs In-Reply-To: References: Message-ID: <9e639aac-3846-06fb-85c4-4ef2fd44cde4@binero.se> Hello, The Ubuntu package placement-api is valid from Train, earlier the package was named nova-placement-api so the documentation should be correct for the Train release. When thinking about it; there might be duplicates right now and maybe the nova-placement-api hasn't been removed for Train yet. Best regards Tobias On 06/19/2019 06:00 PM, Chris Dent wrote: > The placement install docs [1] have the following warning plastered > on top: > > These installation documents are a work in progress. Some of the > distribution packages mentioned are not yet available so the > instructions will not work. > > It is likely that packages are now available, but the docs have not > been updated to reflect that, nor have the docs been verified > against those packages. > > None of the regular and active contributors to placement are > involved with distro-related packaging. In addition they are all > already fully booked. Are there members of the community who are > involved with the distros, and who can update and verify these docs? > > If not, we will likely need to remove the distro related install > documents before the end of the Train release and solely maintain > the install-from-pypi docs [2] (which are up to date). This is > because the existing docs are now misleading, and thus potentially > dangerous and extra-work-inducing, as suggested by a recent bug [3]. > > If you are interested in helping out, please follow up. If you're > unable to help out, but know people involved with packaging who > should know about this concern, please let them know. > > Thanks for your help. > > [1] https://docs.openstack.org/placement/latest/install/index.htmlk > [2] https://docs.openstack.org/placement/latest/install/from-pypi.html > [3] https://storyboard.openstack.org/#!/story/2005910 > From tobias.rydberg at citynetwork.eu Thu Jun 20 10:03:12 2019 From: tobias.rydberg at citynetwork.eu (Tobias Rydberg) Date: Thu, 20 Jun 2019 12:03:12 +0200 Subject: [sigs][publiccloud][publiccloud-wg][publiccloud-sig][billing] Cancellation todays meeting for the Public Cloud SIG Message-ID: <264067be-fbad-4660-5631-9a40d0a5bab8@citynetwork.eu> Hi all, Unfortunate I need to cancel today meeting for the Public Cloud SIG since I can't manage to make it. Feel free to use the time if you will to continue our ongoing discussions regarding the billing initiative. I'm happy to schedule a meeting next week at the same time, please respond to this email or drop a message in the channel if that i suitable for you. I also want to bring the new User Survey to attention to you all if you have missed that and haven't taken it. Please also spread the word to fellow colleagues, customers and what not. URL to survey: https://www.openstack.org/user-survey/survey-2019/ Cheers, Tobias -- Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED From jacob.anders.au at gmail.com Thu Jun 20 10:17:27 2019 From: jacob.anders.au at gmail.com (Jacob Anders) Date: Thu, 20 Jun 2019 20:17:27 +1000 Subject: [ironic] To not have meetings? In-Reply-To: References: Message-ID: I think this is a great idea (or should I say - set of ideas) which goes beyond making the weekly meeting work for us APAC peeps. I think with this approach we will likely achieve better responsiveness and flexibility overall. I look forward to trying this out. Thank you Julia. On Tue, Jun 11, 2019 at 12:06 AM Julia Kreger wrote: > Last week the discussion came up of splitting the ironic meeting to > alternate time zones as we have increasing numbers of contributors in > the Asia/Pacific areas of the world[0]. With that discussion, an > additional interesting question came up posing the question of > shifting to the mailing list instead of our present IRC meeting[1]? > > It is definitely an interesting idea, one that I'm personally keen on > because of time zones and daylight savings time. > > I think before we do this, we should collect thoughts and also try to > determine how we would pull this off so we don't forget the weekly > checkpoint that the meeting serves. I think we need to do something, > so I guess now is a good time to provide input into what everyone > thinks would be best for the project and facilitating the weekly > check-in. > > What I think might work: > > By EOD UTC Monday: > > * Listed primary effort participants will be expected to update the > whiteboard[2] weekly before EOD Monday UTC > * Contributors propose patches to the whiteboard that they believe > would be important for reviewers to examine this coming week. > * PTL or designee sends weekly email to the mailing list to start an > update thread shortly after EOD Monday UTC or early Tuesday UTC. > ** Additional updates, questions, and topical discussion (new > features, RFEs) would ideally be wrapped up by EOD UTC Tuesday. > > With that, I think we would also need to go ahead and begin having > "office hours" as during the week we generally know some ironic > contributors will be in IRC and able to respond to questions. I think > this would initially consist of our meeting time and perhaps the other > time that seems to be most friendly to the contributors int he > Asia/Pacific area[3]. > > Thoughts/ideas/suggestions welcome! > > -Julia > > [0]: > http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-03.log.html#t2019-06-03T15:31:33 > [1]: > http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-03.log.html#t2019-06-03T15:43:16 > [2]: https://etherpad.openstack.org/p/IronicWhiteBoard > [3]: https://doodle.com/poll/bv9a4qyqy44wiq92 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rony.khan at brilliant.com.bd Thu Jun 20 11:04:59 2019 From: rony.khan at brilliant.com.bd (Md. Farhad Hasan Khan) Date: Thu, 20 Jun 2019 17:04:59 +0600 Subject: Openstack cinder HA Message-ID: <003001d52758$007e2510$017a6f30$@brilliant.com.bd> Hi, In my environment openstack volume service running on compute node. When the openstack-cinder-volume service goes down on, the Block Storage volumes which were created using the openstack-cinder-volume service cannot be managed until the service comes up again. I need help how to configure openstack cinder volume HA. Thanks & B'Rgds, Rony -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jun 20 11:10:59 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 20 Jun 2019 20:10:59 +0900 Subject: [nova] API updates week 19-25 Message-ID: <16b74952439.bb87f1c1171676.830759036844712076@ghanshyammann.com> Hi All, Please find the Nova API updates of this week. API Related BP : ============ Code Ready for Review: ------------------------------ 1. Support adding description while locking an instance: - Topic: https://review.opendev.org/#/q/topic:bp/add-locked-reason+(status:open+OR+status:merged) - Weekly Progress: OSC patch has been updated by tssurya. mriedem is +1. 2. Add host and hypervisor_hostname flag to create server - Topic: https://review.opendev.org/#/q/topic:bp/add-host-and-hypervisor-hostname-flag-to-create-server+(status:open+OR+status:merged) - Weekly Progress: patch is updated with review comment. ready for re-review. 3. Specifying az when restore shelved server - Spec: https://review.opendev.org/#/q/topic:bp/support-specifying-az-when-restore-shelved-server+(status:open+OR+status:merged) - Weekly Progress: Spec is merged and code is up for review. 4. Nova API cleanup - Spec: https://review.openstack.org/#/c/603969/ - Weekly Progress: working on code, seems like a lot of tests changes. Should be able to push code by next report. 5. Detach and attach boot volumes: - Topic: https://review.openstack.org/#/q/topic:bp/detach-boot-volume+(status:open+OR+status:merged) - Weekly Progress: No Progress Spec Ready for Review: ----------------------------- 1. Nova API policy improvement - Spec: https://review.openstack.org/#/c/547850/ - PoC: https://review.openstack.org/#/q/topic:bp/policy-default-refresh+(status:open+OR+status:merged) - Weekly Progress: Under review and updates. 2. Support for changing deleted_on_termination after boot -Spec: https://review.openstack.org/#/c/580336/ - Weekly Progress: No update this week. 3. Support delete_on_termination in volume attach api -Spec: https://review.openstack.org/#/c/612949/ - Weekly Progress: No updates this week. 4. Add API ref guideline for body text - ~8 api-ref are left to fix. Previously approved Spec needs to be re-proposed for Train: --------------------------------------------------------------------------- 1. Servers Ips non-unique network names : - https://blueprints.launchpad.net/nova/+spec/servers-ips-non-unique-network-names - https://review.openstack.org/#/q/topic:bp/servers-ips-non-unique-network-names+(status:open+OR+status:merged) 2. Volume multiattach enhancements: - https://blueprints.launchpad.net/nova/+spec/volume-multiattach-enhancements - https://review.openstack.org/#/q/topic:bp/volume-multiattach-enhancements+(status:open+OR+status:merged) Bugs: ==== No progress report in this week. NOTE- There might be some bug which is not tagged as 'api' or 'api-ref', those are not in the above list. Tag such bugs so that we can keep our eyes. -gmann From mark at stackhpc.com Thu Jun 20 13:40:15 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 20 Jun 2019 14:40:15 +0100 Subject: [kolla][kayobe] vote: kayobe as a kolla deliverable Message-ID: Hi, In the most recent kolla meeting [1] we discussed the possibility of kayobe becoming a deliverable of the kolla project. This follows on from discussion at the PTG and then on here [3]. The two options discussed are: 1. become a deliverable of the Kolla project 2. become an official top level OpenStack project There has been some positive feedback about option 1 and no negative feedback that I am aware of. I would therefore like to ask the kolla community to vote on whether to include kayobe as a deliverable of the kolla project. The electorate is the kolla-core and kolla-ansible core teams, excluding me. The opinion of others in the community is also welcome. If you have questions or feedback, please respond to this email. Once you have made a decision, please respond with your answer to the following question: "Should kayobe become a deliverable of the kolla project?" (yes/no) Thanks, Mark [1] http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-120 [2] https://etherpad.openstack.org/p/kolla-train-ptg [3] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006901.html From mark at stackhpc.com Thu Jun 20 13:51:48 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 20 Jun 2019 14:51:48 +0100 Subject: [kolla][octavia][openstack-ansible][tripleo] octavia in kolla ansible Message-ID: Hi, In the recent kolla meeting [1] we discussed the usability of octavia in kolla ansible. We had feedback at the Denver summit [2] that this service is difficult to deploy and requires a number of manual steps. Certificates are one of the main headaches. It was stated that OSA [3] may have some useful code we could look into. As a starting point to improving this support, I'd like to gather information from people who are using octavia in kolla ansible, and what they have had to do to make it work. Please respond to this email. I've also tagged openstack-ansible and Tripleo - if there is any useful information those teams have to share about this topic, it is most welcome. Alternatively if your support for octavia also falls short perhaps we could collaborate on improvements. Thanks, Mark [1] http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-86 [2] https://etherpad.openstack.org/p/DEN-train-kolla-feedback [3] https://opendev.org/openstack/openstack-ansible-os_octavia From cgoncalves at redhat.com Thu Jun 20 14:01:51 2019 From: cgoncalves at redhat.com (Carlos Goncalves) Date: Thu, 20 Jun 2019 16:01:51 +0200 Subject: [kolla][octavia][openstack-ansible][tripleo] octavia in kolla ansible In-Reply-To: References: Message-ID: On Thu, Jun 20, 2019 at 3:53 PM Mark Goddard wrote: > > Hi, > > In the recent kolla meeting [1] we discussed the usability of octavia > in kolla ansible. We had feedback at the Denver summit [2] that this > service is difficult to deploy and requires a number of manual steps. > Certificates are one of the main headaches. It was stated that OSA [3] > may have some useful code we could look into. > > As a starting point to improving this support, I'd like to gather > information from people who are using octavia in kolla ansible, and > what they have had to do to make it work. Please respond to this > email. > > I've also tagged openstack-ansible and Tripleo - if there is any > useful information those teams have to share about this topic, it is > most welcome. Alternatively if your support for octavia also falls > short perhaps we could collaborate on improvements. TripleO has a few Ansible roles that perform several post-deployment configurations like handling certificates (auto-generated or passed in by operator), amphora image upload to Glance, LB management network and interfaces provisioning, keypair create and setting in config file, etc. Have a look and feel free to ping the team if you need further info. > > Thanks, > Mark > > [1] http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-86 > [2] https://etherpad.openstack.org/p/DEN-train-kolla-feedback > [3] https://opendev.org/openstack/openstack-ansible-os_octavia > From cgoncalves at redhat.com Thu Jun 20 14:05:37 2019 From: cgoncalves at redhat.com (Carlos Goncalves) Date: Thu, 20 Jun 2019 16:05:37 +0200 Subject: [kolla][octavia][openstack-ansible][tripleo] octavia in kolla ansible In-Reply-To: References: Message-ID: On Thu, Jun 20, 2019 at 4:01 PM Carlos Goncalves wrote: > > On Thu, Jun 20, 2019 at 3:53 PM Mark Goddard wrote: > > > > Hi, > > > > In the recent kolla meeting [1] we discussed the usability of octavia > > in kolla ansible. We had feedback at the Denver summit [2] that this > > service is difficult to deploy and requires a number of manual steps. > > Certificates are one of the main headaches. It was stated that OSA [3] > > may have some useful code we could look into. > > > > As a starting point to improving this support, I'd like to gather > > information from people who are using octavia in kolla ansible, and > > what they have had to do to make it work. Please respond to this > > email. > > > > I've also tagged openstack-ansible and Tripleo - if there is any > > useful information those teams have to share about this topic, it is > > most welcome. Alternatively if your support for octavia also falls > > short perhaps we could collaborate on improvements. > > TripleO has a few Ansible roles that perform several post-deployment > configurations like handling certificates (auto-generated or passed in > by operator), amphora image upload to Glance, LB management network > and interfaces provisioning, keypair create and setting in config > file, etc. > > Have a look and feel free to ping the team if you need further info. Oops! Here: https://github.com/openstack/tripleo-common/tree/master/playbooks > > > > > > Thanks, > > Mark > > > > [1] http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-86 > > [2] https://etherpad.openstack.org/p/DEN-train-kolla-feedback > > [3] https://opendev.org/openstack/openstack-ansible-os_octavia > > From qianxi416 at foxmail.com Thu Jun 20 14:06:11 2019 From: qianxi416 at foxmail.com (=?gb18030?B?x67O9Q==?=) Date: Thu, 20 Jun 2019 22:06:11 +0800 Subject: [nova]Bug #1829696: qemu-kvm process takes 100% CPU usage when running redhat/centos 7.6 guest OS Message-ID: Hi there, I am struggling with qemu-kvm 100% CPU usage problem. When running redhat or centos 7.6 guest os on vm, the cpu usage is very low on vm(almost 100% idle when no tasks run), but on the host, qemu-kvm reports 100% cpu busy usage. https://bugs.launchpad.net/nova/+bug/1829696 I opened the bug above, however it did not find some interest. After searching some related bugs report, I suspect that it is due to the clock settings in vm's domain xml. My settings are as follows: And details about the bug, please see https://bugs.launchpad.net/nova/+bug/1829696 It shows that only the version 7.6 of redhat or centos affected by this bug behavior. In my cluster, it is OK for versions from redhat or centos 6.8 to 7.5. Any clue or suggestion? ------------------ Thanks a lot! Best regards QianXi -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Thu Jun 20 14:07:56 2019 From: aschultz at redhat.com (Alex Schultz) Date: Thu, 20 Jun 2019 08:07:56 -0600 Subject: [kolla][octavia][openstack-ansible][tripleo] octavia in kolla ansible In-Reply-To: References: Message-ID: On Thu, Jun 20, 2019 at 8:00 AM Mark Goddard wrote: > Hi, > > In the recent kolla meeting [1] we discussed the usability of octavia > in kolla ansible. We had feedback at the Denver summit [2] that this > service is difficult to deploy and requires a number of manual steps. > Certificates are one of the main headaches. It was stated that OSA [3] > may have some useful code we could look into. > I second that Octavia is very painful to install. There is a requirement of a bunch of openstack cloud configurations (flavors/images/etc) that must be handled prior to actually configuring the service which means it's complex to deploy. IMHO it would have been beneficial for some of these items to actually have been rolled into the service itself (ie dynamically querying the services for flavor information rather than expecting an ID put into a configuration file). That being said, we have managed to get it integrated into tripleo but it's rather complex. It does use ansible if you want to borrow some of the concepts for os_octavia. https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/octavia https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/octavia-files.yaml https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/roles Additionally we're still leveraging some of the puppet-octavia code to manage configs/nova flavors. Thanks, -Alex > > As a starting point to improving this support, I'd like to gather > information from people who are using octavia in kolla ansible, and > what they have had to do to make it work. Please respond to this > email. > > I've also tagged openstack-ansible and Tripleo - if there is any > useful information those teams have to share about this topic, it is > most welcome. Alternatively if your support for octavia also falls > short perhaps we could collaborate on improvements. > > Thanks, > Mark > > [1] > http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-86 > [2] https://etherpad.openstack.org/p/DEN-train-kolla-feedback > [3] https://opendev.org/openstack/openstack-ansible-os_octavia > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Thu Jun 20 14:16:14 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 20 Jun 2019 16:16:14 +0200 Subject: [kolla][kayobe] vote: kayobe as a kolla deliverable In-Reply-To: References: Message-ID: Hi Mark, non-core voice here. :-) I had the chance to express my opinion already during yesterday's meeting (as yoctozepto) but I want to expand it. For me it's natural that kayobe becomes a deliverable of kolla ("yes" aka "option 1") since it extends its goals. As kolla-ansible makes kolla images deployable, kayobe makes kolla-ansible apply to bare metal. Time will tell if someone comes up with an idea to take this even further (whatever that might be). ;-) Kind regards, Radek czw., 20 cze 2019 o 15:53 Mark Goddard napisał(a): > Hi, > > In the most recent kolla meeting [1] we discussed the possibility of > kayobe becoming a deliverable of the kolla project. This follows on > from discussion at the PTG and then on here [3]. > > The two options discussed are: > > 1. become a deliverable of the Kolla project > 2. become an official top level OpenStack project > > There has been some positive feedback about option 1 and no negative > feedback that I am aware of. I would therefore like to ask the kolla > community to vote on whether to include kayobe as a deliverable of the > kolla project. The electorate is the kolla-core and kolla-ansible core > teams, excluding me. The opinion of others in the community is also > welcome. > > If you have questions or feedback, please respond to this email. > > Once you have made a decision, please respond with your answer to the > following question: > > "Should kayobe become a deliverable of the kolla project?" (yes/no) > > Thanks, > Mark > > [1] > http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-120 > [2] https://etherpad.openstack.org/p/kolla-train-ptg > [3] > http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006901.html > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Thu Jun 20 14:27:46 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 20 Jun 2019 15:27:46 +0100 Subject: [kolla][octavia][openstack-ansible][tripleo] octavia in kolla ansible In-Reply-To: References: Message-ID: On Thu, 20 Jun 2019 at 15:08, Alex Schultz wrote: > > > > On Thu, Jun 20, 2019 at 8:00 AM Mark Goddard wrote: >> >> Hi, >> >> In the recent kolla meeting [1] we discussed the usability of octavia >> in kolla ansible. We had feedback at the Denver summit [2] that this >> service is difficult to deploy and requires a number of manual steps. >> Certificates are one of the main headaches. It was stated that OSA [3] >> may have some useful code we could look into. > > > I second that Octavia is very painful to install. There is a requirement of a bunch of openstack cloud configurations (flavors/images/etc) that must be handled prior to actually configuring the service which means it's complex to deploy. IMHO it would have been beneficial for some of these items to actually have been rolled into the service itself (ie dynamically querying the services for flavor information rather than expecting an ID put into a configuration file). That being said, we have managed to get it integrated into tripleo but it's rather complex. It does use ansible if you want to borrow some of the concepts for os_octavia. > > https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/octavia > https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/octavia-files.yaml > https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/roles > > Additionally we're still leveraging some of the puppet-octavia code to manage configs/nova flavors. Thanks for sharing Alex & Carlos - useful source material. > > Thanks, > -Alex > >> >> >> As a starting point to improving this support, I'd like to gather >> information from people who are using octavia in kolla ansible, and >> what they have had to do to make it work. Please respond to this >> email. >> >> I've also tagged openstack-ansible and Tripleo - if there is any >> useful information those teams have to share about this topic, it is >> most welcome. Alternatively if your support for octavia also falls >> short perhaps we could collaborate on improvements. >> >> Thanks, >> Mark >> >> [1] http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-86 >> [2] https://etherpad.openstack.org/p/DEN-train-kolla-feedback >> [3] https://opendev.org/openstack/openstack-ansible-os_octavia >> From jim at jimrollenhagen.com Thu Jun 20 14:34:24 2019 From: jim at jimrollenhagen.com (Jim Rollenhagen) Date: Thu, 20 Jun 2019 10:34:24 -0400 Subject: [nova] TPM passthrough Message-ID: Hey y'all, We have an internal use case which requires a VM with a TPM, to be used to store a private key. Libvirt has two ways to present a TPM to a VM: passthrough or emulated. Per kashyap and the #qemu IRC channel, libvirt stores the TPM's state on disk, unencrypted. Our risk profile includes "someone walks away with a disk", so this won't work for our use case. The QEMU devs have asked for RFEs to implement vTPMs where the state never touches the disk, so I have hopes that this will be done eventually. However, I suspect that this will still take some time, especially as nobody has volunteered to actually do the work yet. So, I'd like to propose we implement TPM passthrough in Nova. My team is happy to do the work, but I'd love some guidance as to the best way to implement this so we can get a spec done (I assume it's "just another resource class"?). If Nova doesn't want this feature in, and would rather just wait for the features in QEMU, we'll carry it downstream, I guess. :) Thoughts? // jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Thu Jun 20 14:44:47 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Thu, 20 Jun 2019 10:44:47 -0400 Subject: [nova] TPM passthrough In-Reply-To: References: Message-ID: On Thu, Jun 20, 2019 at 10:40 AM Jim Rollenhagen wrote: > > Hey y'all, > > We have an internal use case which requires a VM with a TPM, to be used to > store a private key. Libvirt has two ways to present a TPM to a VM: passthrough > or emulated. Per kashyap and the #qemu IRC channel, libvirt stores the TPM's > state on disk, unencrypted. Our risk profile includes "someone walks away with > a disk", so this won't work for our use case. > > The QEMU devs have asked for RFEs to implement vTPMs where the state never > touches the disk, so I have hopes that this will be done eventually. > > However, I suspect that this will still take some time, especially as nobody > has volunteered to actually do the work yet. So, I'd like to propose we > implement TPM passthrough in Nova. My team is happy to do the work, but I'd > love some guidance as to the best way to implement this so we can get a spec > done (I assume it's "just another resource class"?). https://wiki.qemu.org/Features/TPM Would it be using this? I'm just trying to gauge out what TPM passthrough involves out of personal curiosity. > If Nova doesn't want this feature in, and would rather just wait for the > features in QEMU, we'll carry it downstream, I guess. :) > > Thoughts? > > // jim -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From jim at jimrollenhagen.com Thu Jun 20 14:57:23 2019 From: jim at jimrollenhagen.com (Jim Rollenhagen) Date: Thu, 20 Jun 2019 10:57:23 -0400 Subject: [nova] TPM passthrough In-Reply-To: References: Message-ID: // jim On Thu, Jun 20, 2019 at 10:44 AM Mohammed Naser wrote: > On Thu, Jun 20, 2019 at 10:40 AM Jim Rollenhagen > wrote: > > > > Hey y'all, > > > > We have an internal use case which requires a VM with a TPM, to be used > to > > store a private key. Libvirt has two ways to present a TPM to a VM: > passthrough > > or emulated. Per kashyap and the #qemu IRC channel, libvirt stores the > TPM's > > state on disk, unencrypted. Our risk profile includes "someone walks > away with > > a disk", so this won't work for our use case. > > > > The QEMU devs have asked for RFEs to implement vTPMs where the state > never > > touches the disk, so I have hopes that this will be done eventually. > > > > However, I suspect that this will still take some time, especially as > nobody > > has volunteered to actually do the work yet. So, I'd like to propose we > > implement TPM passthrough in Nova. My team is happy to do the work, but > I'd > > love some guidance as to the best way to implement this so we can get a > spec > > done (I assume it's "just another resource class"?). > > https://wiki.qemu.org/Features/TPM > > Would it be using this? I'm just trying to gauge out what TPM passthrough > involves out of personal curiosity. > Yes, though I think those notes are from before it was implemented. Here's the libvirt XML to make it work: https://libvirt.org/formatdomain.html#elementsTpm I assume we'd just translate a TPM resource class in the flavor to this XML, but I'm hoping a nova developer can confirm this. :) // jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Thu Jun 20 15:08:35 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 20 Jun 2019 10:08:35 -0500 Subject: [nova] TPM passthrough In-Reply-To: References: Message-ID: <9382a273-c393-5de1-48d2-43f775a0f1d2@fried.cc> Jim- > So, I'd like to propose we > implement TPM passthrough in Nova. My team is happy to do the work, but I'd > love some guidance as to the best way to implement this so we can get a spec > done (I assume it's "just another resource class"?). And by "just another resource class" you mean: - Add TPM to os-resource-classes (exact name subject to bikeshedding). - Virt driver's update_provider_tree() looks at the guts of the host to figure out how many TPM devices exist and, if nonzero, tacks an inventory of that many TPM onto the root provider (max_unit 1 presumably; all others default). - Flavor desiring this thingy is authored with extra spec resources:TPM=1. - Scheduler lands instance on host with TPM inventory, and allocates one. (This is free, no additional code changes necessary.) - Virt driver's spawn() looks at the allocation, sees TPM:1, and augments the guest's domain XML to attach the thingy. Is it any more complicated than that? I'm fine with this. efried . From johnsomor at gmail.com Thu Jun 20 16:27:03 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Thu, 20 Jun 2019 09:27:03 -0700 Subject: [kolla][octavia][openstack-ansible][tripleo] octavia in kolla ansible In-Reply-To: References: Message-ID: Hi Mark, I wanted to highlight that I wrote a detailed certificate configuration guide for Octavia here: https://docs.openstack.org/octavia/latest/admin/guides/certificates.html This could be used to automate the certificate generation for Kolla deployments. Let me know if you have any questions about the guide or steps, Michael On Thu, Jun 20, 2019 at 7:31 AM Mark Goddard wrote: > > On Thu, 20 Jun 2019 at 15:08, Alex Schultz wrote: > > > > > > > > On Thu, Jun 20, 2019 at 8:00 AM Mark Goddard wrote: > >> > >> Hi, > >> > >> In the recent kolla meeting [1] we discussed the usability of octavia > >> in kolla ansible. We had feedback at the Denver summit [2] that this > >> service is difficult to deploy and requires a number of manual steps. > >> Certificates are one of the main headaches. It was stated that OSA [3] > >> may have some useful code we could look into. > > > > > > I second that Octavia is very painful to install. There is a requirement of a bunch of openstack cloud configurations (flavors/images/etc) that must be handled prior to actually configuring the service which means it's complex to deploy. IMHO it would have been beneficial for some of these items to actually have been rolled into the service itself (ie dynamically querying the services for flavor information rather than expecting an ID put into a configuration file). That being said, we have managed to get it integrated into tripleo but it's rather complex. It does use ansible if you want to borrow some of the concepts for os_octavia. > > > > https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/octavia > > https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/octavia-files.yaml > > https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/roles > > > > Additionally we're still leveraging some of the puppet-octavia code to manage configs/nova flavors. > > Thanks for sharing Alex & Carlos - useful source material. > > > > > Thanks, > > -Alex > > > >> > >> > >> As a starting point to improving this support, I'd like to gather > >> information from people who are using octavia in kolla ansible, and > >> what they have had to do to make it work. Please respond to this > >> email. > >> > >> I've also tagged openstack-ansible and Tripleo - if there is any > >> useful information those teams have to share about this topic, it is > >> most welcome. Alternatively if your support for octavia also falls > >> short perhaps we could collaborate on improvements. > >> > >> Thanks, > >> Mark > >> > >> [1] http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-86 > >> [2] https://etherpad.openstack.org/p/DEN-train-kolla-feedback > >> [3] https://opendev.org/openstack/openstack-ansible-os_octavia > >> > From kchamart at redhat.com Thu Jun 20 16:32:09 2019 From: kchamart at redhat.com (Kashyap Chamarthy) Date: Thu, 20 Jun 2019 18:32:09 +0200 Subject: [nova]Bug #1829696: qemu-kvm process takes 100% CPU usage when running redhat/centos 7.6 guest OS In-Reply-To: References: Message-ID: <20190620163209.GD19519@paraplu> On Thu, Jun 20, 2019 at 10:06:11PM +0800, 钱熙 wrote: > Hi there, Hi, > > I am struggling with qemu-kvm 100% CPU usage problem. > When running redhat or centos 7.6 guest os on vm, > the cpu usage is very low on vm(almost 100% idle when no tasks run), but on the host, > qemu-kvm reports 100% cpu busy usage. > > > https://bugs.launchpad.net/nova/+bug/1829696 > I opened the bug above, however it did not find some interest. > > > After searching some related bugs report, > I suspect that it is due to the clock settings in vm's domain xml. > My settings are as follows: > > > > > > And details about the bug, please see https://bugs.launchpad.net/nova/+bug/1829696 I don't think it is related to clock settings at all. > It shows that only the version 7.6 of redhat or centos affected by this bug behavior. > In my cluster, it is OK for versions from redhat or centos 6.8 to 7.5. > > > Any clue or suggestion? Please see DanPB's response in comment#2; I agree with it. (Note that I've changed the bug component from 'openstack-nova' --> 'qemu') https://bugs.launchpad.net/qemu/+bug/1829696/comments/2 -- /kashyap From mark at stackhpc.com Thu Jun 20 16:32:13 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 20 Jun 2019 17:32:13 +0100 Subject: [kolla][octavia][openstack-ansible][tripleo] octavia in kolla ansible In-Reply-To: References: Message-ID: On Thu, 20 Jun 2019 at 17:27, Michael Johnson wrote: > > Hi Mark, > > I wanted to highlight that I wrote a detailed certificate > configuration guide for Octavia here: > https://docs.openstack.org/octavia/latest/admin/guides/certificates.html > > This could be used to automate the certificate generation for Kolla deployments. Great, that looks useful. > > Let me know if you have any questions about the guide or steps, > Michael > > On Thu, Jun 20, 2019 at 7:31 AM Mark Goddard wrote: > > > > On Thu, 20 Jun 2019 at 15:08, Alex Schultz wrote: > > > > > > > > > > > > On Thu, Jun 20, 2019 at 8:00 AM Mark Goddard wrote: > > >> > > >> Hi, > > >> > > >> In the recent kolla meeting [1] we discussed the usability of octavia > > >> in kolla ansible. We had feedback at the Denver summit [2] that this > > >> service is difficult to deploy and requires a number of manual steps. > > >> Certificates are one of the main headaches. It was stated that OSA [3] > > >> may have some useful code we could look into. > > > > > > > > > I second that Octavia is very painful to install. There is a requirement of a bunch of openstack cloud configurations (flavors/images/etc) that must be handled prior to actually configuring the service which means it's complex to deploy. IMHO it would have been beneficial for some of these items to actually have been rolled into the service itself (ie dynamically querying the services for flavor information rather than expecting an ID put into a configuration file). That being said, we have managed to get it integrated into tripleo but it's rather complex. It does use ansible if you want to borrow some of the concepts for os_octavia. > > > > > > https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/octavia > > > https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/octavia-files.yaml > > > https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/roles > > > > > > Additionally we're still leveraging some of the puppet-octavia code to manage configs/nova flavors. > > > > Thanks for sharing Alex & Carlos - useful source material. > > > > > > > > Thanks, > > > -Alex > > > > > >> > > >> > > >> As a starting point to improving this support, I'd like to gather > > >> information from people who are using octavia in kolla ansible, and > > >> what they have had to do to make it work. Please respond to this > > >> email. > > >> > > >> I've also tagged openstack-ansible and Tripleo - if there is any > > >> useful information those teams have to share about this topic, it is > > >> most welcome. Alternatively if your support for octavia also falls > > >> short perhaps we could collaborate on improvements. > > >> > > >> Thanks, > > >> Mark > > >> > > >> [1] http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-86 > > >> [2] https://etherpad.openstack.org/p/DEN-train-kolla-feedback > > >> [3] https://opendev.org/openstack/openstack-ansible-os_octavia > > >> > > From jim at jimrollenhagen.com Thu Jun 20 17:01:06 2019 From: jim at jimrollenhagen.com (Jim Rollenhagen) Date: Thu, 20 Jun 2019 13:01:06 -0400 Subject: [nova] TPM passthrough In-Reply-To: <9382a273-c393-5de1-48d2-43f775a0f1d2@fried.cc> References: <9382a273-c393-5de1-48d2-43f775a0f1d2@fried.cc> Message-ID: On Thu, Jun 20, 2019 at 11:20 AM Eric Fried wrote: > Jim- > > > So, I'd like to propose we > > implement TPM passthrough in Nova. My team is happy to do the work, but > I'd > > love some guidance as to the best way to implement this so we can get a > spec > > done (I assume it's "just another resource class"?). > > And by "just another resource class" you mean: > > - Add TPM to os-resource-classes (exact name subject to bikeshedding). > - Virt driver's update_provider_tree() looks at the guts of the host to > figure out how many TPM devices exist and, if nonzero, tacks an > inventory of that many TPM onto the root provider (max_unit 1 > presumably; all others default). > - Flavor desiring this thingy is authored with extra spec resources:TPM=1. > - Scheduler lands instance on host with TPM inventory, and allocates > one. (This is free, no additional code changes necessary.) > - Virt driver's spawn() looks at the allocation, sees TPM:1, and > augments the guest's domain XML to attach the thingy. > > Is it any more complicated than that? > That makes sense to me. I don't know these bits well enough to comment if there's anything else to do. Maybe choosing the correct /dev/tpmN may get weird? > I'm fine with this. > Cool, will attempt to get a spec going, unless violent opposition shows up in this thread in the meantime. Thanks! // jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Thu Jun 20 17:32:40 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 20 Jun 2019 12:32:40 -0500 Subject: [nova] TPM passthrough In-Reply-To: References: <9382a273-c393-5de1-48d2-43f775a0f1d2@fried.cc> Message-ID: > That makes sense to me. I don't know these bits well enough > to comment if there's anything else to do. Maybe choosing > the correct /dev/tpmN may get weird? If a) they're all the same; and b) you can tell which ones are in use, then I don't see an issue. > Cool, will attempt to get a spec going, unless violent > opposition shows up in this thread in the meantime. I want to say you could probably get away with a specless blueprint, since we've been able to describe the problem and solution in like two dozen lines. Perhaps you want to start there and see if anyone complains. efried . From zbitter at redhat.com Thu Jun 20 17:37:19 2019 From: zbitter at redhat.com (Zane Bitter) Date: Thu, 20 Jun 2019 13:37:19 -0400 Subject: [devel] Fast flake8 testing Message-ID: <99fea4a5-6f7e-bff0-72ff-3c6f322e0fbc@redhat.com> Those of you who work on a fairly large project will have noticed that running flake8 over all of it takes some time, and that this slows down development. Nova (at least) has a solution to this, in the form of a "fast8" tox environment that runs flake8 only against the files that have changed in the latest patch + the working directory. This is *much* faster, but that approach has some limitations: the script is buggy, it only tests the top-most patch, it creates a second tox environment (which is slow) that can then get out of sync with your regular pep8 environment, and of course it requires the project to add it explicitly. If you're interested in a solution with none of those limitations, here is a script that I've been using: https://gist.github.com/zaneb/7a8c752bfd97dd8972756d296fc5e41f It tests all changes on the branch, using your existing pep8 tox environment, handles deleted files and changes to non-python files correctly, and should be usable for every OpenStack project. I hope this is helpful to someone. (Note that the pep8 environment on many projects includes other test commands in addition to flake8 - such as bandit - so you should still run the pep8 tox tests once before submitting a patch.) cheers, Zane. From grant at civo.com Thu Jun 20 11:30:52 2019 From: grant at civo.com (Grant Morley) Date: Thu, 20 Jun 2019 12:30:52 +0100 Subject: NetApp E-series infiniband support for Cinder Message-ID: Hi All, Just a quick one to see if anybody knows if there is currently any infiniband support for cinder using a NetApp E-Series SAN. I have had a look at: https://docs.openstack.org/cinder/queens/configuration/block-storage/drivers/netapp-volume-driver.html and: https://docs.openstack.org/cinder/rocky/reference/support-matrix.html They seem to suggest that only iSCSI and FC are supported. I just want to make sure before I start trying to do a POC with the E-series and infiniband. Any advice would be much appreciated. Kind Regards, -- Grant Morley Cloud Lead, Civo Ltd www.civo.com | Signup for an account! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Jun 20 18:05:35 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 20 Jun 2019 13:05:35 -0500 Subject: [nova] TPM passthrough In-Reply-To: References: <9382a273-c393-5de1-48d2-43f775a0f1d2@fried.cc> Message-ID: <86e693ba-3478-5994-2a59-7990fa2aa561@gmail.com> On 6/20/2019 12:32 PM, Eric Fried wrote: > I want to say you could probably get away with a specless blueprint, > since we've been able to describe the problem and solution in like two > dozen lines. Perhaps you want to start there and see if anyone complains. As Resident Complainer I feel compelled to say there should at least be a small spec (as I said in IRC). The question in this thread about how to keep track of and know which devices are allocated is a good one that is going to require some thought. Also the thing not mentioned here (and usually not thought about early on) is move operations and how those will be handled - what will and will not be supported when you have a TPM passthrough device on a guest and will there be any issues, e.g. resizing to/from a flavor with one of these. If using a resource class for tracking inventory then scheduling should be straight-forward, i.e. migration shouldn't pick a host that can't support the flavor. -- Thanks, Matt From mriedemos at gmail.com Thu Jun 20 18:08:52 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 20 Jun 2019 13:08:52 -0500 Subject: [nova] TPM passthrough In-Reply-To: <86e693ba-3478-5994-2a59-7990fa2aa561@gmail.com> References: <9382a273-c393-5de1-48d2-43f775a0f1d2@fried.cc> <86e693ba-3478-5994-2a59-7990fa2aa561@gmail.com> Message-ID: <9c98c459-2a91-bbaf-154c-a64c0cd93a71@gmail.com> On 6/20/2019 1:05 PM, Matt Riedemann wrote: > As Resident Complainer I feel compelled to say there should at least be > a small spec (as I said in IRC). The question in this thread about how > to keep track of and know which devices are allocated is a good one that > is going to require some thought. It's probably also worth thinking about what happens when evacuating a sever from a host with one of these and/or deleting a server when the compute service on which it was running is down. Upon restart of the compute service we'll need to detect the guest is gone and cleanup the previously allocated devices somehow (I think this has been a problem for PCI device allocations in the past as well). -- Thanks, Matt From kchamart at redhat.com Thu Jun 20 19:14:46 2019 From: kchamart at redhat.com (Kashyap Chamarthy) Date: Thu, 20 Jun 2019 21:14:46 +0200 Subject: [nova] TPM passthrough In-Reply-To: References: Message-ID: <20190620191446.GE19519@paraplu> On Thu, Jun 20, 2019 at 10:34:24AM -0400, Jim Rollenhagen wrote: > Hey y'all, > > We have an internal use case which requires a VM with a TPM, to be > used to store a private key. Libvirt has two ways to present a TPM to > a VM: passthrough or emulated. Per kashyap and the #qemu IRC channel, > libvirt stores the TPM's state on disk, unencrypted. Our risk profile > includes "someone walks away with a disk", so this won't work for our > use case. > > The QEMU devs have asked for RFEs to implement vTPMs where the state > never touches the disk, so I have hopes that this will be done > eventually. I haven't gotten around to file the two things requested for this feature for TPM passthrough. (At least for libvirt, anyone with an upstream Bugzilla account can file the RFE in its tracker[1].) (*) File a libvirt RFE to make it not store encryption keys on disk (currently it stores the TPM state under: /var/lib/libvirt/swtpm/). The libvirt/QEMU folks are still debating the design. It is likely to be similar to how libvirt handles the keys for LUKS encryption. (*) Daniel Berrangé suggested that we might also need an RFE for "swtpm" project[2] so as to allow libvirt / QEMU pass keys around without them being on disk. Currently `swtpm` has: --key file=[,format=][,mode=aes-cbc|aes-256-cbc],[remove[=true|false]] E.g. to have a parameter that allows file descriptor passing: "--key fd=NN" - - - And for Nova, I agree, a "lightweight" spec would be useful to flesh out details. Based on the IRC chat on #qemu (OFTC), there's a subtle detail that Nova needs to pay attention to: as the key file would still pass through libvirt, the "mgmt app" (e.g. Nova) can control how long it is around for. Daniel suggested "mgmt app" can purge the keys from libvirt once the guest is running, and can tell libvirt to keep it only in memory. [1] https://libvirt.org/bugs.html [2] https://github.com/stefanberger/swtpm/wiki [...] -- /kashyap From fungi at yuggoth.org Thu Jun 20 19:58:49 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 20 Jun 2019 19:58:49 +0000 Subject: [all] Long overdue cleanups of Zuulv2 compatibility base configs In-Reply-To: <5980f30f-d4c7-4e0d-8d3a-984e2dcd707a@www.fastmail.com> Message-ID: <20190620195848.quo7or6xqmcueb6p@yuggoth.org> On 2019-06-04 22:45:58 +0000 (+0000), Clark Boylan wrote: > As part of our transition to Zuulv3 a year and a half ago, we > carried over some compatibility tooling that we would now like to > clean up. Specifically, we add a zuul-cloner (which went away in > zuulv3) shim and set a global bindep fallback file value in all > jobs. Zuulv3 native jobs are expected to use the repos zuul has > precloned for you (no zuul-cloner required) as well as supply an > in repo bindep.txt (or specify a bindep.txt path or install > packages via some other method). > > This means that we should be able to remove both of these items > from the non legacy base job in OpenDev's zuul. The legacy base > job will continue to carry these for you so that you can write new > native jobs over time. We have two changes [0][1] ready to go for > this; however, due to the potential for disruption we would like > to give everyone some time to test and prepare for this change. > Fungi has a change to base-test [2] which will remove the > zuul-cloner shim. Once this is in you can push "Do Not Merge" > changes to your zuul config that reparent your tests from "base" > to "base-test" and that will run the jobs without the zuul-cloner > shim. > > Testing the bindep fallback removal is a bit more difficult as we > set that in zuul's server config globally. What you can do is > check your jobs' job-output.txt log files for usage of > "bindep-fallback.txt". > > Our current plan is to merge these changes on June 24, 2019. We > will be around to help debug any unexpected issues that come up. > Jobs can be updated to use the "legacy-base" base job instead of > the "base" base job if they need to be reverted to the old > behavior quickly. [...] > [0] https://review.opendev.org/656195 > [1] https://review.opendev.org/663151 > [2] https://review.opendev.org/663135 This is just a reminder that the above work to remove the zuul-cloner shim and bindep fallback package list from non-legacy jobs will be merged some time on Monday, June 24 (just the other side of this coming weekend). Per other messages in this thread you can test the zuul-cloner shim removal for particular jobs you're concerned about with "Depends-On: https://review.opendev.org/663996" or by creating a similar do-not-merge sort of change to parent your custom job definitions on the base-test job. Obvious uncaught breakage to look out for on Monday will be failures revolving around "command not found" errors for zuul-cloner or other commands which may have been provided by one of the packages in the list. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gagehugo at gmail.com Thu Jun 20 22:19:12 2019 From: gagehugo at gmail.com (Gage Hugo) Date: Thu, 20 Jun 2019 17:19:12 -0500 Subject: [Security SIG] Weekly Newsletter June 13th & June 20th Message-ID: So I wrote the newsletter for last week, but forgot to send out the actual email, so this week will have double the content! #Week of: 20 June 2019 - Security SIG Meeting Info: http://eavesdrop.openstack.org/#Security_SIG_meeting - Weekly on Thursday at 1500 UTC in #openstack-meeting - Agenda: https://etherpad.openstack.org/p/security-agenda - https://security.openstack.org/ - https://wiki.openstack.org/wiki/Security-SIG #Meeting Notes - Summary: http://eavesdrop.openstack.org/meetings/security/2019/security.2019-06-20-15.01.html - This week we discussed cleaning up the security.openstack.org page - Overall there are many outdated sections, we came up with a current rough plan that is outlined in the security-agenda notes for this week's meeting - Retiring Syntribos - On the topic of cleaning up the security.openstack.org page, one section is security tools, which currently lists bandit and syntribos. - Looking at the Syntribos repo, it seems like lately there's only been changes related to doc fixes and overall zuul updates, with a couple actual updates to the project. - If there's anyone still with interest in updating/using Syntribos, please reach out to us. ## News - [nova] TPM thread: http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007258.html # VMT Reports - A full list of publicly marked security issues can be found here: https://bugs.launchpad.net/ossa/ - No new public security bugs this week ======================================================================================================== #Week of: 13 June 2019 - Security SIG Meeting Info: http://eavesdrop.openstack.org/#Security_SIG_meeting - Weekly on Thursday at 1500 UTC in #openstack-meeting - Agenda: https://etherpad.openstack.org/p/security-agenda - https://security.openstack.org/ - https://wiki.openstack.org/wiki/Security-SIG #Meeting Notes - Summary: http://eavesdrop.openstack.org/meetings/security/2019/security.2019-06-13-15.01.html - This week we finalized the details and settings for the [openstack-security] mailing list. The list will be used to provide purely automated notifications about security-related changes and bug reports within OpenStack. Anyone looking to reach out to the security SIG can either use the [openstack-discuss] mailing list or use the #openstack-security channel on freenode IRC. ## News - Storyboard: The security team autoassignment feature landed - If a new story is marked as "security" upon creation, it will automatically become private, however it can be edited to become public after it is created. # VMT Reports - A full list of publicly marked security issues can be found here: https://bugs.launchpad.net/ossa/ - No new public security bugs this week -------------- next part -------------- An HTML attachment was scrubbed... URL: From qianxi416 at foxmail.com Fri Jun 21 01:07:29 2019 From: qianxi416 at foxmail.com (=?gb18030?B?x67O9Q==?=) Date: Fri, 21 Jun 2019 09:07:29 +0800 Subject: =?gb18030?B?UmVwbHmjuiBbbm92YV1CdWcgIzE4Mjk2OTY6IHFl?= =?gb18030?B?bXUta3ZtIHByb2Nlc3MgdGFrZXMgMTAwJSBDUFUg?= =?gb18030?B?dXNhZ2Ugd2hlbnJ1bm5pbmcgcmVkaGF0L2NlbnRv?= =?gb18030?B?cyA3LjYgZ3Vlc3QgT1M=?= Message-ID: Thanks very much. I will upgrade the virt tools software on the host and check if it still happens. Later, I hope I could update you guys. Thanks again. Best regards QianXi------------------ 原始邮件 ------------------ 发件人: "Kashyap Chamarthy" 发送时间: 2019年6月21日(星期五) 凌晨0:32 收件人: "钱熙"; 抄送: "openstack-discuss"; 主题: Re: [nova]Bug #1829696: qemu-kvm process takes 100% CPU usage whenrunning redhat/centos 7.6 guest OS On Thu, Jun 20, 2019 at 10:06:11PM +0800, 钱熙 wrote: > Hi there, Hi, > > I am struggling with qemu-kvm 100% CPU usage problem. > When running redhat or centos 7.6 guest os on vm, > the cpu usage is very low on vm(almost 100% idle when no tasks run), but on the host, > qemu-kvm reports 100% cpu busy usage. > > > https://bugs.launchpad.net/nova/+bug/1829696 > I opened the bug above, however it did not find some interest. > > > After searching some related bugs report, > I suspect that it is due to the clock settings in vm's domain xml. > My settings are as follows: > > > > > > And details about the bug, please see https://bugs.launchpad.net/nova/+bug/1829696 I don't think it is related to clock settings at all. > It shows that only the version 7.6 of redhat or centos affected by this bug behavior. > In my cluster, it is OK for versions from redhat or centos 6.8 to 7.5. > > > Any clue or suggestion? Please see DanPB's response in comment#2; I agree with it. (Note that I've changed the bug component from 'openstack-nova' --> 'qemu') https://bugs.launchpad.net/qemu/+bug/1829696/comments/2 -- /kashyap -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Fri Jun 21 09:11:59 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Fri, 21 Jun 2019 11:11:59 +0200 Subject: NetApp E-series infiniband support for Cinder In-Reply-To: References: Message-ID: <20190621091159.isdnyah3tkms26xb@localhost> On 20/06, Grant Morley wrote: > Hi All, > > Just a quick one to see if anybody knows if there is currently any > infiniband support for cinder using a NetApp E-Series SAN. I have had a look > at: > > https://docs.openstack.org/cinder/queens/configuration/block-storage/drivers/netapp-volume-driver.html > > and: > > https://docs.openstack.org/cinder/rocky/reference/support-matrix.html > > They seem to suggest that only iSCSI and FC are supported. I just want to > make sure before I start trying to do a POC with the E-series and > infiniband. > > Any advice would be much appreciated. > > Kind Regards, > > -- > > Grant Morley > Cloud Lead, Civo Ltd > www.civo.com | Signup for an account! > Hi, I don't know about support for infiniband, but the driver support has been dropped in Stein. The release notes [1] state: Support for NetApp E-Series has been removed. The NetApp Unified driver can now only be used with NetApp Clustered Data ONTAP. Regards, Gorka. [1]: https://docs.openstack.org/releasenotes/cinder/stein.html From grant at civo.com Fri Jun 21 09:14:58 2019 From: grant at civo.com (Grant Morley) Date: Fri, 21 Jun 2019 10:14:58 +0100 Subject: NetApp E-series infiniband support for Cinder In-Reply-To: <20190621091159.isdnyah3tkms26xb@localhost> References: <20190621091159.isdnyah3tkms26xb@localhost> Message-ID: Hi Gorka, Thanks for that, I'll let the business know that we wont want to be using that. We had planned to move to Stein later this year! Regards, On 21/06/2019 10:11, Gorka Eguileor wrote: > On 20/06, Grant Morley wrote: >> Hi All, >> >> Just a quick one to see if anybody knows if there is currently any >> infiniband support for cinder using a NetApp E-Series SAN. I have had a look >> at: >> >> https://docs.openstack.org/cinder/queens/configuration/block-storage/drivers/netapp-volume-driver.html >> >> and: >> >> https://docs.openstack.org/cinder/rocky/reference/support-matrix.html >> >> They seem to suggest that only iSCSI and FC are supported. I just want to >> make sure before I start trying to do a POC with the E-series and >> infiniband. >> >> Any advice would be much appreciated. >> >> Kind Regards, >> >> -- >> >> Grant Morley >> Cloud Lead, Civo Ltd >> www.civo.com | Signup for an account! >> > Hi, > > I don't know about support for infiniband, but the driver support has > been dropped in Stein. > > The release notes [1] state: > > Support for NetApp E-Series has been removed. The NetApp Unified > driver can now only be used with NetApp Clustered Data ONTAP. > > Regards, > Gorka. > > [1]: https://docs.openstack.org/releasenotes/cinder/stein.html -- Grant Morley Cloud Lead, Civo Ltd www.civo.com | Signup for an account! -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Fri Jun 21 09:23:49 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Fri, 21 Jun 2019 11:23:49 +0200 Subject: Openstack cinder HA In-Reply-To: <003001d52758$007e2510$017a6f30$@brilliant.com.bd> References: <003001d52758$007e2510$017a6f30$@brilliant.com.bd> Message-ID: <20190621092349.2jpmggvcsutyeepx@localhost> On 20/06, Md. Farhad Hasan Khan wrote: > Hi, > > In my environment openstack volume service running on compute node. When the > openstack-cinder-volume service goes down on, the Block Storage volumes > which were created using the openstack-cinder-volume service cannot be > managed until the service comes up again. > > > > I need help how to configure openstack cinder volume HA. > > > > Thanks & B'Rgds, > > Rony > Hi Rony, You can configure Cinder API, Scheduler, and Backup in Active-Active and Cinder Volume in Active-Passive using Pacemaker. Please check the "Highly available Block Storage API" documentation [1] for a detailed guide. Cheers, Gorka. [1]: https://docs.openstack.org/ha-guide/storage-ha-block.html From geguileo at redhat.com Fri Jun 21 09:30:30 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Fri, 21 Jun 2019 11:30:30 +0200 Subject: NetApp E-series infiniband support for Cinder In-Reply-To: References: <20190621091159.isdnyah3tkms26xb@localhost> Message-ID: <20190621093030.otjr3ukiif635ork@localhost> On 21/06, Grant Morley wrote: > Hi Gorka, > > Thanks for that, I'll let the business know that we wont want to be using > that. We had planned to move to Stein later this year! > > Regards, Glad I could help avoid that awkward moment. ;-) > > On 21/06/2019 10:11, Gorka Eguileor wrote: > > On 20/06, Grant Morley wrote: > > > Hi All, > > > > > > Just a quick one to see if anybody knows if there is currently any > > > infiniband support for cinder using a NetApp E-Series SAN. I have had a look > > > at: > > > > > > https://docs.openstack.org/cinder/queens/configuration/block-storage/drivers/netapp-volume-driver.html > > > > > > and: > > > > > > https://docs.openstack.org/cinder/rocky/reference/support-matrix.html > > > > > > They seem to suggest that only iSCSI and FC are supported. I just want to > > > make sure before I start trying to do a POC with the E-series and > > > infiniband. > > > > > > Any advice would be much appreciated. > > > > > > Kind Regards, > > > > > > -- > > > > > > Grant Morley > > > Cloud Lead, Civo Ltd > > > www.civo.com | Signup for an account! > > > > > Hi, > > > > I don't know about support for infiniband, but the driver support has > > been dropped in Stein. > > > > The release notes [1] state: > > > > Support for NetApp E-Series has been removed. The NetApp Unified > > driver can now only be used with NetApp Clustered Data ONTAP. > > > > Regards, > > Gorka. > > > > [1]: https://docs.openstack.org/releasenotes/cinder/stein.html > -- > > Grant Morley > Cloud Lead, Civo Ltd > www.civo.com | Signup for an account! > From cdent+os at anticdent.org Fri Jun 21 11:22:22 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Fri, 21 Jun 2019 12:22:22 +0100 (BST) Subject: [placement] update 19-24 Message-ID: HTML: https://anticdent.org/placement-update-19-24.html Here's a 19-24 pupdate. Last week I said there wouldn't be one this week. I was wrong. There won't be one next week. I'm taking the week off to reset (I hope). I've tried to make sure that there's nothing floating about in Placement land that is blocking on me. # Most Important The [spec for nested magic](https://review.opendev.org/662191) is close. What it needs now is review from people outside the regulars to make sure it doesn't suffer from tunnel vision. The features discussed form the foundation for Placement being able to service queries for real world uses of nested providers. Placement can already model nested providers but asking for the right one has needed some work. # Editorial I read an article this morning which touches on the importance of considering [cognitive load in software design](https://techbeacon.com/app-dev-testing/forget-monoliths-vs-microservices-cognitive-load-what-matters). It's fully of glittering generalities, but it reminds me of some of the reasons why it was important to keep a solid boundary between Nova and Placement and why, now that Placement is extracted, the complexity of this nested magic is something we need to measure against the cognitive load it will induce in the people who come after us as developers and users. # What's Changed * Support for mappings in allocation candidates has merged as [microversion 1.34](https://docs.openstack.org/placement/latest/placement-api-microversion-history.html#request-group-mappings-in-allocation-candidates). * Gibi made it so [OSProfiler](https://docs.openstack.org/osprofiler/latest/) works with placement again. # Specs/Features * Support Consumer Types. This has some open questions that need to be addressed, but we're still go on the general idea. * Spec for nested magic 1. See "Most Important" above. Some non-placement specs are listed in the Other section below. # Stories/Bugs (Numbers in () are the change since the last pupdate.) There are 23 (3) stories in [the placement group](https://storyboard.openstack.org/#!/project_group/placement). 0 (0) are [untagged](https://storyboard.openstack.org/#!/worklist/580). 4 (1) are [bugs](https://storyboard.openstack.org/#!/worklist/574). 5 (-1) are [cleanups](https://storyboard.openstack.org/#!/worklist/575). 11 (0) are [rfes](https://storyboard.openstack.org/#!/worklist/594). 3 (1) are [docs](https://storyboard.openstack.org/#!/worklist/637). If you're interested in helping out with placement, those stories are good places to look. * Placement related nova [bugs not yet in progress](https://goo.gl/TgiPXb) on launchpad: 16 (0). * Placement related nova [in progress bugs](https://goo.gl/vzGGDQ) on launchpad: 5 (-1). [1832814: Placement API appears to have issues when compute host replaced](https://bugs.launchpad.net/nova/+bug/1832814) is an interesting bug. In a switch from RDO to OSA, resource providers are being duplicated because of a change in node name. # osc-placement osc-placement is currently behind by 11 microversions. * Add support for multiple member_of. # Main Themes ## Nested Magic The overview of the features encapsulated by the term "nested magic" are in a [story](https://storyboard.openstack.org/#!/story/2005575) and [spec](https://review.opendev.org/662191). Code related to this: * PoC: resourceless request, including some code from WIP: Allow RequestGroups without resources * Support for the new root_required query parameter. Introduces the useful concepts of required wide params and a request wide search context. ## Consumer Types Adding a type to consumers will allow them to be grouped for various purposes, including quota accounting. A [spec](https://review.opendev.org/654799) has started. There are some questions about request and response details that need to be resolved, but the overall concept is sound. ## Cleanup We continue to do cleanup work to lay in reasonable foundations for the nested work above. As a nice bonus, we keep eking out additional performance gains too. * Add a nested-perfload job, using gabbi to create the trees. Unsurprisingly, nested topologies are slower than non; having this job will help us track that. # Other Placement Miscellaneous changes can be found in [the usual place](https://review.opendev.org/#/q/project:openstack/placement+status:open). There are five [os-traits changes](https://review.opendev.org/#/q/project:openstack/os-traits+status:open) being discussed. And one [os-resource-classes change](https://review.opendev.org/#/q/project:openstack/os-resource-classes+status:open). # Other Service Users New discoveries are added to the end. Merged stuff is removed. Anything that has had no activity in 4 weeks has been removed. * Nova: spec: support virtual persistent memory * Nova: nova-manage: heal port allocations * nova-spec: Allow compute nodes to use DISK_GB from shared storage RP * Cyborg: Placement report * rpm-packaging: placement service * helm: WIP: add placement chart * kolla-ansible: Add a explanatory note for "placement_api_port" * Nova: Use OpenStack SDK for placement * Nova: Spec: Provider config YAML file * Nova: single pass instance info fetch in host manager * libvirt: report pmem namespaces resources by provider tree * Nova: Remove PlacementAPIConnectFailure handling from AggregateAPI * Nova: support move ops with qos ports * TripleO: Enable Request Filter for Image Types * Nova: get_ksa_adapter: nix by-service-type confgrp hack * OSA: Add nova placement to placement migration * Nova: Defaults missing group_policy to 'none' * Blazar: Create placement client for each request * OSA: Fix aio_distro_metal jobs for openSUSE # End Go outside. Reflect a bit. Then do. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From arnaud.morin at gmail.com Fri Jun 21 12:56:59 2019 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Fri, 21 Jun 2019 12:56:59 +0000 Subject: [openstack-operators][neutron][olso.db] connection pool configuration Message-ID: <20190621125659.GC13783@sync> Hey all, I am wondering what configuration the operators in this mailing list are using in [oslo.db] section. For example, the max_pool_size default value is 5, which seems very low to me (our deployment is having more than 1000 neutron agents). So, is there any recommendation somewhere for those settings? Thanks for any tip. -- Arnaud Morin From virendra-sharma.sharma at hpe.com Fri Jun 21 08:33:24 2019 From: virendra-sharma.sharma at hpe.com (Sharma, Virendra Sharma) Date: Fri, 21 Jun 2019 08:33:24 +0000 Subject: [ThirdParty][CI] : CI-watch is not updating with latest patchset Message-ID: Hi Team, >From last few days unable to see latest patch set entry in CI-WATCH (http://ciwatch.mmedvede.net/project?project=cinder&time=7+days). However third party CI running on latest patch set and reporting its vote to Gerrit accordingly. Currently it is visible for last 7 days but soon it will also go off, if it won't reflect updated one. Please help me to understand if any update regarding this or else provide way to get it resolved from my end. Regards, Virendra -------------- next part -------------- An HTML attachment was scrubbed... URL: From Rajini.Karthik at Dell.com Fri Jun 21 16:34:21 2019 From: Rajini.Karthik at Dell.com (Rajini.Karthik at Dell.com) Date: Fri, 21 Jun 2019 16:34:21 +0000 Subject: [ThirdParty][CI] : CI-watch is not updating with latest patchset In-Reply-To: References: Message-ID: <2787481e2782433fbd6363582e986282@AUSX13MPS304.AMER.DELL.COM> Does anyone else use the ci-watch for third party Ci monitoring? We use it for the ironic thirdparty drivers. That's not an infra run thing, so probably not going to get much help there. Probably just have to try to get ahold of mmedvede. Sean McGinnis shared this one for cinder. It's not as pretty as the old Tintri one, http://cinderstats.ivehearditbothways.com/cireport.txt I would like to have a solution for this as well Regards Rajini From: Sharma, Virendra Sharma Sent: Friday, June 21, 2019 3:33 AM To: openstack-dev at lists.openstack.org Cc: Shashikant, Sonawane Subject: [ThirdParty][CI] : CI-watch is not updating with latest patchset [EXTERNAL EMAIL] Hi Team, >From last few days unable to see latest patch set entry in CI-WATCH (http://ciwatch.mmedvede.net/project?project=cinder&time=7+days). However third party CI running on latest patch set and reporting its vote to Gerrit accordingly. Currently it is visible for last 7 days but soon it will also go off, if it won't reflect updated one. Please help me to understand if any update regarding this or else provide way to get it resolved from my end. Regards, Virendra -------------- next part -------------- An HTML attachment was scrubbed... URL: From sfinucan at redhat.com Fri Jun 21 16:47:05 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Fri, 21 Jun 2019 17:47:05 +0100 Subject: [devel] Fast flake8 testing In-Reply-To: <99fea4a5-6f7e-bff0-72ff-3c6f322e0fbc@redhat.com> References: <99fea4a5-6f7e-bff0-72ff-3c6f322e0fbc@redhat.com> Message-ID: <77112531d1c3e11bd6b2f9c178391ad073174586.camel@redhat.com> On Thu, 2019-06-20 at 13:37 -0400, Zane Bitter wrote: > Those of you who work on a fairly large project will have noticed that > running flake8 over all of it takes some time, and that this slows down > development. > > Nova (at least) has a solution to this, in the form of a "fast8" tox > environment that runs flake8 only against the files that have changed in > the latest patch + the working directory. This is *much* faster, but > that approach has some limitations: the script is buggy, it only tests > the top-most patch, it creates a second tox environment (which is slow) > that can then get out of sync with your regular pep8 environment, and of > course it requires the project to add it explicitly. > > If you're interested in a solution with none of those limitations, here > is a script that I've been using: > > https://gist.github.com/zaneb/7a8c752bfd97dd8972756d296fc5e41f Neat :) There's also the opportunity of integrating flake8 (and other things) as a pre-commit hook, which is something I'm trying to adopt within nova and the maybe oslo and further over time. http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007151.html That requires some project-level work though (including backports, if you want it on stable branches) whereas your script can be used anywhere. Both useful. Stephen > It tests all changes on the branch, using your existing pep8 tox > environment, handles deleted files and changes to non-python files > correctly, and should be usable for every OpenStack project. > > I hope this is helpful to someone. > > (Note that the pep8 environment on many projects includes other test > commands in addition to flake8 - such as bandit - so you should still > run the pep8 tox tests once before submitting a patch.) > > cheers, > Zane. > From corey.bryant at canonical.com Fri Jun 21 21:31:24 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Fri, 21 Jun 2019 17:31:24 -0400 Subject: [goal][python3] Train unit tests weekly update (goal-12) Message-ID: This is the goal-12 weekly update for the "Update Python 3 test runtimes for Train" goal [1]. There are 12 weeks remaining for completion of Train community goals [2]. == What's the Goal? == To ensure (in the Train cycle) that all official OpenStack repositories with Python 3 unit tests are exclusively using the 'openstack-python3-train-jobs' Zuul template or one of its variants (e.g. 'openstack-python3-train-jobs-neutron') to run unit tests, and that tests are passing. This will ensure that all official projects are running py36 and py37 unit tests in Train. For complete details please see [1]. == Ongoing Work == I've submitted patch automation scripts for review: https://review.opendev.org/#/c/666934 And I've started submitting patches that were generated using the above scripts. Open patches needing reviews: https://review.openstack.org/#/q/topic:python3-train+is:open Failing patches: https://review.openstack.org/#/q/topic:python3-train+status:open+(+label:Verified-1+OR+label:Verified-2+) Some notes on 2 issues I came across this week: 1) Some projects that have a valid reason to keep old tox.ini py3 environments or setup.cfg classifiers. For example, the OpenStack Charms project is a deployment project that still supports deployments on xenial, and keeping the py35 environment in tox.ini makes sense. In scenarios like this I plan to continue with the Zuul switch to openstack-python3-train-jobs without removing py35 tox.ini or Python 3.5 classifier from setup.cfg. 2) Some projects are missing Zuul config. Not to pick on any projects in particular but as an example there are projects that I'd expect to have .zuul.yaml or equivalent, such as python-adjutant, however it doesn't. In scenarios like this I plan to skip the project entirely. == Completed Work == Merged patches: https://review.openstack.org/#/q/topic:python3-train+is:merged == How can you help? == Please take a look at the failing patches and help fix any failing unit tests for your project(s). Python 3.7 unit tests will be self-testing in Zuul. If you're interested in helping submit patches, please let me know. == Reference Material == [1] Goal description: https://governance.openstack.org/tc/goals /train/python3-updates.html [2] Train release schedule: https://releases.openstack.org/train/schedule.html (see R-5 for "Train Community Goals Completed") Storyboard: https://storyboard.openstack.org/#!/story/2005924 Porting to Python 3.7: https://docs.python.org/3/whatsnew/3.7.html#porting-to-python-3-7 Python Update Process: https://opendev.org/openstack/governance/src/branch/master/resolutions/20181024-python-update-process.rst Train runtimes: https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/train.rst Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From colleen at gazlene.net Fri Jun 21 22:18:00 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Fri, 21 Jun 2019 15:18:00 -0700 Subject: [keystone] Keystone Team Update - Week of 17 June 2019 Message-ID: # Keystone Team Update - Week of 17 June 2019 ## News ### Next steps for oslo.limit In order to make progress on having a usable interface for oslo.limit, we discussed[1][2] simplifying the current implementation. Lance has some patches[3] that remove the ideas of claims and context managers, which sacrifices the risk of race conditions and performance penalties in favor of simplicity for the time being. [1] http://eavesdrop.openstack.org/meetings/keystone/2019/keystone.2019-06-18-16.00.log.html#l-10 [2] http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2019-06-20.log.html#t2019-06-20T13:44:48 [3] https://review.opendev.org/665708 ## Open Specs Train specs: https://bit.ly/2uZ2tRl Ongoing specs: https://bit.ly/2OyDLTh Train specs need to be merged by July 26. Please help review these specs and be mindful of feedback. ## Recently Merged Changes Search query: https://bit.ly/2pquOwT We merged 20 changes this week. ## Changes that need Attention Search query: https://bit.ly/2tymTje There are 44 changes that are passing CI, not in merge conflict, have no negative reviews and aren't proposed by bots. ## Bugs This week we opened 6 new bugs and closed 5. Bugs opened (6) Bug #1833085 (keystone:High) opened by Sebastian Riese https://bugs.launchpad.net/keystone/+bug/1833085 Bug #1833739 (keystone:High) opened by David Orman https://bugs.launchpad.net/keystone/+bug/1833739 Bug #1833311 (keystone:Undecided) opened by Michael Carpenter https://bugs.launchpad.net/keystone/+bug/1833311 Bug #1833340 (keystone:Undecided) opened by cnaik https://bugs.launchpad.net/keystone/+bug/1833340 Bug #1833554 (keystone:Undecided) opened by Chason Chan https://bugs.launchpad.net/keystone/+bug/1833554 Bug #1833207 (keystoneauth:Undecided) opened by Jagatjot Singh https://bugs.launchpad.net/keystoneauth/+bug/1833207 Bugs closed (2) Bug #1832005 (keystone:Undecided) https://bugs.launchpad.net/keystone/+bug/1832005 Bug #1833340 (keystone:Undecided) https://bugs.launchpad.net/keystone/+bug/1833340 Bugs fixed (3) Bug #1750676 (keystone:High) fixed by Lance Bragstad https://bugs.launchpad.net/keystone/+bug/1750676 Bug #1818844 (keystone:Medium) fixed by Lance Bragstad https://bugs.launchpad.net/keystone/+bug/1818844 Bug #1649735 (keystonemiddleware:Medium) fixed by Colleen Murphy https://bugs.launchpad.net/keystonemiddleware/+bug/1649735 ## Milestone Outlook https://releases.openstack.org/train/schedule.html Milestone 2 is in 5 weeks. Train specs need to be merged by then. Feature proposal freeze is in 8 weeks. ## Shout-outs Shout out to Nathan Oyler, whose first patch to OpenStack[4] fixed a major omission in our audit notifications. Thanks, Nathan! [4] https://review.opendev.org/664618 ## Help with this newsletter Help contribute to this newsletter by editing the etherpad: https://etherpad.openstack.org/p/keystone-team-newsletter From rony.khan at brilliant.com.bd Sat Jun 22 05:15:39 2019 From: rony.khan at brilliant.com.bd (Md. Farhad Hasan Khan) Date: Sat, 22 Jun 2019 11:15:39 +0600 Subject: Openstack cinder HA In-Reply-To: <20190621092349.2jpmggvcsutyeepx@localhost> References: <003001d52758$007e2510$017a6f30$@brilliant.com.bd> <20190621092349.2jpmggvcsutyeepx@localhost> Message-ID: <029301d528b9$8860be40$99223ac0$@brilliant.com.bd> Hi Gorka, According to your link I configure. My backend storage is lvm. But created volume not sync to all compute node. And when active cinder-volume node change in pacemaker, unable to do volume operation like: extend. Here is the error log. Could you please help me to solve this. [root at controller1 ~]# openstack volume service list +------------------+------------------------------+------+---------+-------+ ----------------------------+ | Binary | Host | Zone | Status | State | Updated At | +------------------+------------------------------+------+---------+-------+ ----------------------------+ | cinder-volume | cinder-cluster-hostgroup at lvm | nova | enabled | up | 2019-06-22T05:07:15.000000 | | cinder-scheduler | cinder-cluster-hostgroup | nova | enabled | up | 2019-06-22T05:07:18.000000 | +------------------+------------------------------+------+---------+-------+ ----------------------------+ [root at compute1 ~]# pcs status Cluster name: cindervolumecluster Stack: corosync Current DC: compute1 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum Last updated: Sat Jun 22 11:08:35 2019 Last change: Fri Jun 21 16:00:08 2019 by root via cibadmin on compute1 3 nodes configured 1 resource configured Online: [ compute1 compute2 compute3 ] Full list of resources: openstack-cinder-volume (systemd:openstack-cinder-volume): Started compute2 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~ 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager [req-341fd5ce-dcdb-482e-9309-4c5bfe272137 b0ff2eb16d9b4af58e812d47e0bc753b fc78335beea842038579b36c5a3eef7d - default default] Extend volume failed.: ProcessExecutionError: Unexpected error while running command. Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvdisplay --noheading -C -o Attr cinder-lvm-volumes/volume-5369a96c-0369-4bb2-9ea1-759359a418be Exit code: 5 Stdout: u'' Stderr: u'File descriptor 20 (/dev/urandom) leaked on lvdisplay invocation. Parent PID 18064: /usr/bin/python2\n Failed to find logical volume "cinder-lvm-volumes/volume-5369a96c-0369-4bb2-9ea1-759359a418be"\n' 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Traceback (most recent call last): 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 2622, in extend_volume 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager self.driver.extend_volume(volume, new_size) 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/lvm.py", line 576, in extend_volume 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager self._sizestr(new_size)) 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File "/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line 818, in extend_volume 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager has_snapshot = self.lv_has_snapshot(lv_name) 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File "/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line 767, in lv_has_snapshot 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager run_as_root=True) 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52, in _execute 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager result = self.__execute(*args, **kwargs) 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File "/usr/lib/python2.7/site-packages/cinder/utils.py", line 128, in execute 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager return processutils.execute(*cmd, **kwargs) 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 424, in execute 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager cmd=sanitized_cmd) 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager ProcessExecutionError: Unexpected error while running command. 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvdisplay --noheading -C -o Attr cinder-lvm-volumes/volume-5369a96c-0369-4bb2-9ea1-759359a418be 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Exit code: 5 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Stdout: u'' 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Stderr: u'File descriptor 20 (/dev/urandom) leaked on lvdisplay invocation. Parent PID 18064: /usr/bin/python2\n Failed to find logical volume "cinder-lvm-volumes/volume-5369a96c-0369-4bb2-9ea1-759359a418be"\n' 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~ Thanks & B'Rgds, Rony -----Original Message----- From: Gorka Eguileor [mailto:geguileo at redhat.com] Sent: Friday, June 21, 2019 3:24 PM To: Md. Farhad Hasan Khan Cc: openstack-discuss at lists.openstack.org Subject: Re: Openstack cinder HA On 20/06, Md. Farhad Hasan Khan wrote: > Hi, > > In my environment openstack volume service running on compute node. > When the openstack-cinder-volume service goes down on, the Block > Storage volumes which were created using the openstack-cinder-volume > service cannot be managed until the service comes up again. > > > > I need help how to configure openstack cinder volume HA. > > > > Thanks & B'Rgds, > > Rony > Hi Rony, You can configure Cinder API, Scheduler, and Backup in Active-Active and Cinder Volume in Active-Passive using Pacemaker. Please check the "Highly available Block Storage API" documentation [1] for a detailed guide. Cheers, Gorka. [1]: https://docs.openstack.org/ha-guide/storage-ha-block.html From geguileo at redhat.com Sat Jun 22 08:04:13 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Sat, 22 Jun 2019 10:04:13 +0200 Subject: Openstack cinder HA In-Reply-To: <029301d528b9$8860be40$99223ac0$@brilliant.com.bd> References: <003001d52758$007e2510$017a6f30$@brilliant.com.bd> <20190621092349.2jpmggvcsutyeepx@localhost> <029301d528b9$8860be40$99223ac0$@brilliant.com.bd> Message-ID: <20190622080413.kqmwmtpayunlsnyu@localhost> On 22/06, Md. Farhad Hasan Khan wrote: > Hi Gorka, > According to your link I configure. My backend storage is lvm. But created > volume not sync to all compute node. And when active cinder-volume node > change in pacemaker, unable to do volume operation like: extend. Here is the > error log. Could you please help me to solve this. > Hi, Unfortunately cinder-volume configured with LVM will not support any kind of HA deployment. The reason is that the actual volumes are local to a single node, so other nodes don't have access to the storage and cinder-volume cannot manage something it doesn't have access to. That is why LVM is not recommended for production environments, because when a node goes down you lose both the data and control planes. Regards, Gorka. > [root at controller1 ~]# openstack volume service list > +------------------+------------------------------+------+---------+-------+ > ----------------------------+ > | Binary | Host | Zone | Status | State | > Updated At | > +------------------+------------------------------+------+---------+-------+ > ----------------------------+ > | cinder-volume | cinder-cluster-hostgroup at lvm | nova | enabled | up | > 2019-06-22T05:07:15.000000 | > | cinder-scheduler | cinder-cluster-hostgroup | nova | enabled | up | > 2019-06-22T05:07:18.000000 | > +------------------+------------------------------+------+---------+-------+ > ----------------------------+ > > [root at compute1 ~]# pcs status > Cluster name: cindervolumecluster > Stack: corosync > Current DC: compute1 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with > quorum > Last updated: Sat Jun 22 11:08:35 2019 > Last change: Fri Jun 21 16:00:08 2019 by root via cibadmin on compute1 > > 3 nodes configured > 1 resource configured > > Online: [ compute1 compute2 compute3 ] > > Full list of resources: > > openstack-cinder-volume (systemd:openstack-cinder-volume): > Started compute2 > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ~~~~~~~~ > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager > [req-341fd5ce-dcdb-482e-9309-4c5bfe272137 b0ff2eb16d9b4af58e812d47e0bc753b > fc78335beea842038579b36c5a3eef7d - default default] Extend volume failed.: > ProcessExecutionError: Unexpected error while running command. > Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C > lvdisplay --noheading -C -o Attr > cinder-lvm-volumes/volume-5369a96c-0369-4bb2-9ea1-759359a418be > Exit code: 5 > Stdout: u'' > Stderr: u'File descriptor 20 (/dev/urandom) leaked on lvdisplay invocation. > Parent PID 18064: /usr/bin/python2\n Failed to find logical volume > "cinder-lvm-volumes/volume-5369a96c-0369-4bb2-9ea1-759359a418be"\n' > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Traceback (most > recent call last): > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File > "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 2622, in > extend_volume > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager > self.driver.extend_volume(volume, new_size) > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File > "/usr/lib/python2.7/site-packages/cinder/volume/drivers/lvm.py", line 576, > in extend_volume > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager > self._sizestr(new_size)) > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File > "/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line 818, > in extend_volume > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager has_snapshot = > self.lv_has_snapshot(lv_name) > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File > "/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line 767, > in lv_has_snapshot > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager > run_as_root=True) > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File > "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52, in > _execute > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager result = > self.__execute(*args, **kwargs) > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File > "/usr/lib/python2.7/site-packages/cinder/utils.py", line 128, in execute > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager return > processutils.execute(*cmd, **kwargs) > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File > "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line > 424, in execute > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager > cmd=sanitized_cmd) > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager > ProcessExecutionError: Unexpected error while running command. > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Command: sudo > cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvdisplay --noheading > -C -o Attr cinder-lvm-volumes/volume-5369a96c-0369-4bb2-9ea1-759359a418be > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Exit code: 5 > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Stdout: u'' > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Stderr: u'File > descriptor 20 (/dev/urandom) leaked on lvdisplay invocation. Parent PID > 18064: /usr/bin/python2\n Failed to find logical volume > "cinder-lvm-volumes/volume-5369a96c-0369-4bb2-9ea1-759359a418be"\n' > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ~~~~~~~~ > Thanks & B'Rgds, > Rony > > > > > > -----Original Message----- > From: Gorka Eguileor [mailto:geguileo at redhat.com] > Sent: Friday, June 21, 2019 3:24 PM > To: Md. Farhad Hasan Khan > Cc: openstack-discuss at lists.openstack.org > Subject: Re: Openstack cinder HA > > On 20/06, Md. Farhad Hasan Khan wrote: > > Hi, > > > > In my environment openstack volume service running on compute node. > > When the openstack-cinder-volume service goes down on, the Block > > Storage volumes which were created using the openstack-cinder-volume > > service cannot be managed until the service comes up again. > > > > > > > > I need help how to configure openstack cinder volume HA. > > > > > > > > Thanks & B'Rgds, > > > > Rony > > > > Hi Rony, > > You can configure Cinder API, Scheduler, and Backup in Active-Active and > Cinder Volume in Active-Passive using Pacemaker. > > Please check the "Highly available Block Storage API" documentation [1] for > a detailed guide. > > Cheers, > Gorka. > > [1]: https://docs.openstack.org/ha-guide/storage-ha-block.html > From ionut at fleio.com Sat Jun 22 12:11:15 2019 From: ionut at fleio.com (Ionut Biru) Date: Sat, 22 Jun 2019 15:11:15 +0300 Subject: [designate] DKIM TXT record problem Message-ID: Hello guys, I'm running Rocky and as backend for designate I have powerdns. Whenever I try to add a TXT record for DKIM, the API returns that the specified record is not a TXT record. https://paste.xinu.at/OOz7/ It seems that is due to the length of the record It has a maxim a 255 limit. How should I proceed in this case? -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Sat Jun 22 13:17:27 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sat, 22 Jun 2019 13:17:27 +0000 Subject: [designate] DKIM TXT record problem In-Reply-To: References: Message-ID: <20190622131727.g44q3qx7f6fga4gr@yuggoth.org> On 2019-06-22 15:11:15 +0300 (+0300), Ionut Biru wrote: > I'm running Rocky and as backend for designate I have powerdns. > > Whenever I try to add a TXT record for DKIM, the API returns that the > specified record is not a TXT record. > > https://paste.xinu.at/OOz7/ > > It seems that is due to the length of the record It has a maxim a 255 limit. > > How should I proceed in this case? A single TXT value string can not exceed 255 bytes in length. This is fundamental to the IETF's specification for the domain name system and has little to do with either Designate or PowerDNS. DKIM however takes into account that you may have keys whose representation exceeds the limits of a single string, and allows for splitting the key into additional parts: https://tools.ietf.org/html/rfc6376#section-3.6.2.2 I would try adding a space somewhere in the middle of the "p" field so that it is broken up into two shorter strings each no longer than 255 characters. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From yamamoto at midokura.com Sat Jun 22 13:42:48 2019 From: yamamoto at midokura.com (Takashi Yamamoto) Date: Sat, 22 Jun 2019 22:42:48 +0900 Subject: [neutron] bug deputy report for the week of June 17th Message-ID: hi, this week i have looked at the following bugs. a few of them still need triage. i will have a travel tomorrow and i'm not sure if i will have time to triage them. i don't think i can attend the team meeting next week. (sorry!) Still need triage: https://bugs.launchpad.net/neutron/+bug/1833175 pptp vpn doesn't work with openvswitch firewall https://bugs.launchpad.net/neutron/+bug/1833156 neutron fwaas v2 log function does not work https://bugs.launchpad.net/neutron/+bug/1832636 Error creating IPv6 subnet on routed network segment High: https://bugs.launchpad.net/neutron/+bug/1833257 By default Horizon sets Firewall group admin state to False when user trying to set it true Firewall always remains in DOWN state Medium: https://bugs.launchpad.net/neutron/+bug/1833279 TestNeutronServer: start function not called (or not logged in the temp file) https://bugs.launchpad.net/neutron/+bug/1833653 We should cleanup ipv4 address if keepalived is dead https://bugs.launchpad.net/neutron/+bug/1833589 neutron-dynamic-routing broken by introduction of agent_timestamp to _log_heartbeat() https://bugs.launchpad.net/neutron/+bug/1833721 ip_lib synchronized decorator should wrap the privileged one Low: https://bugs.launchpad.net/neutron/+bug/1833125 Remaining neutron-lbaas relevant code and documentation RFE: https://bugs.launchpad.net/neutron/+bug/1832758 [RFE] Allow/deny custom ethertypes in security groups https://bugs.launchpad.net/neutron/+bug/1833674 [RFE] Improve profiling of port binding and vif plugging From emccormick at cirrusseven.com Sat Jun 22 15:30:04 2019 From: emccormick at cirrusseven.com (Erik McCormick) Date: Sat, 22 Jun 2019 11:30:04 -0400 Subject: [heat] Resource replacement terminates at DELETE_COMPLETE Message-ID: HI everyone! I have a situation with a heat stack where it has an Octavia Load Balancer resource which it thinks it's already replaced and so will not recreate it. Resource api_lb with id 3978 already replaced by 3999; not checking check /var/lib/kolla/venv/lib/python2.7/site-packages/heat/engine/check_resource.py:310 : It goes to a DELETE_COMPLETED state and just sits there. The stack stays UPDATE_IN_PROGRESS and nothing else moves. It doesn't even time out after 4 hours. Doing a stack check puts everytinng as CHECK_COMPLETE, even the non-existent load balancers. I can mark the LB and its components unhealthy and start another update, but this just repeats the cycle. This all started with some Octavia shenanigans which ended with all the load balancers being deleted manually. I have 2 similar stacks which recreated fine, but this one went through the cycle several other times as we were trying to fix the LB problem. This is a super edge case, but hopefully someone has another idea how to get out of it. Thanks! Erik -------------- next part -------------- An HTML attachment was scrubbed... URL: From rony.khan at brilliant.com.bd Sun Jun 23 04:27:31 2019 From: rony.khan at brilliant.com.bd (Md. Farhad Hasan Khan) Date: Sun, 23 Jun 2019 10:27:31 +0600 Subject: Openstack cinder HA In-Reply-To: <20190622080413.kqmwmtpayunlsnyu@localhost> References: <003001d52758$007e2510$017a6f30$@brilliant.com.bd> <20190621092349.2jpmggvcsutyeepx@localhost> <029301d528b9$8860be40$99223ac0$@brilliant.com.bd> <20190622080413.kqmwmtpayunlsnyu@localhost> Message-ID: <0e8f01d5297b$f9768cf0$ec63a6d0$@brilliant.com.bd> Hi Gorka, I checked with NFS instead of LVM. Now it's working. Thanks a lot for your quick suggestion. Thanks & B'Rgds, Rony -----Original Message----- From: Gorka Eguileor [mailto:geguileo at redhat.com] Sent: Saturday, June 22, 2019 2:04 PM To: Md. Farhad Hasan Khan Cc: openstack-discuss at lists.openstack.org Subject: Re: Openstack cinder HA On 22/06, Md. Farhad Hasan Khan wrote: > Hi Gorka, > According to your link I configure. My backend storage is lvm. But > created volume not sync to all compute node. And when active > cinder-volume node change in pacemaker, unable to do volume operation > like: extend. Here is the error log. Could you please help me to solve this. > Hi, Unfortunately cinder-volume configured with LVM will not support any kind of HA deployment. The reason is that the actual volumes are local to a single node, so other nodes don't have access to the storage and cinder-volume cannot manage something it doesn't have access to. That is why LVM is not recommended for production environments, because when a node goes down you lose both the data and control planes. Regards, Gorka. > [root at controller1 ~]# openstack volume service list > +------------------+------------------------------+------+---------+-------+ > ----------------------------+ > | Binary | Host | Zone | Status | State | > Updated At | > +------------------+------------------------------+------+---------+-------+ > ----------------------------+ > | cinder-volume | cinder-cluster-hostgroup at lvm | nova | enabled | up | > 2019-06-22T05:07:15.000000 | > | cinder-scheduler | cinder-cluster-hostgroup | nova | enabled | up | > 2019-06-22T05:07:18.000000 | > +------------------+------------------------------+------+---------+-------+ > ----------------------------+ > > [root at compute1 ~]# pcs status > Cluster name: cindervolumecluster > Stack: corosync > Current DC: compute1 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition > with quorum Last updated: Sat Jun 22 11:08:35 2019 Last change: Fri > Jun 21 16:00:08 2019 by root via cibadmin on compute1 > > 3 nodes configured > 1 resource configured > > Online: [ compute1 compute2 compute3 ] > > Full list of resources: > > openstack-cinder-volume (systemd:openstack-cinder-volume): > Started compute2 > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ~~~~~~ > ~~~~~~~~ > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager > [req-341fd5ce-dcdb-482e-9309-4c5bfe272137 > b0ff2eb16d9b4af58e812d47e0bc753b fc78335beea842038579b36c5a3eef7d - default default] Extend volume failed.: > ProcessExecutionError: Unexpected error while running command. > Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C > lvdisplay --noheading -C -o Attr > cinder-lvm-volumes/volume-5369a96c-0369-4bb2-9ea1-759359a418be > Exit code: 5 > Stdout: u'' > Stderr: u'File descriptor 20 (/dev/urandom) leaked on lvdisplay invocation. > Parent PID 18064: /usr/bin/python2\n Failed to find logical volume > "cinder-lvm-volumes/volume-5369a96c-0369-4bb2-9ea1-759359a418be"\n' > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Traceback > (most recent call last): > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File > "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line > 2622, in extend_volume > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager > self.driver.extend_volume(volume, new_size) > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File > "/usr/lib/python2.7/site-packages/cinder/volume/drivers/lvm.py", line > 576, in extend_volume > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager > self._sizestr(new_size)) > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File > "/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line > 818, in extend_volume > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager has_snapshot = > self.lv_has_snapshot(lv_name) > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File > "/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line > 767, in lv_has_snapshot > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager > run_as_root=True) > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File > "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52, in > _execute > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager result = > self.__execute(*args, **kwargs) > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File > "/usr/lib/python2.7/site-packages/cinder/utils.py", line 128, in execute > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager return > processutils.execute(*cmd, **kwargs) > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager File > "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", > line 424, in execute > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager > cmd=sanitized_cmd) > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager > ProcessExecutionError: Unexpected error while running command. > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Command: > sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvdisplay > --noheading -C -o Attr > cinder-lvm-volumes/volume-5369a96c-0369-4bb2-9ea1-759359a418be > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Exit code: 5 > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Stdout: u'' > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager Stderr: > u'File descriptor 20 (/dev/urandom) leaked on lvdisplay invocation. > Parent PID > 18064: /usr/bin/python2\n Failed to find logical volume > "cinder-lvm-volumes/volume-5369a96c-0369-4bb2-9ea1-759359a418be"\n' > 2019-06-21 15:47:56.791 17754 ERROR cinder.volume.manager > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ~~~~~~ > ~~~~~~~~ > Thanks & B'Rgds, > Rony > > > > > > -----Original Message----- > From: Gorka Eguileor [mailto:geguileo at redhat.com] > Sent: Friday, June 21, 2019 3:24 PM > To: Md. Farhad Hasan Khan > Cc: openstack-discuss at lists.openstack.org > Subject: Re: Openstack cinder HA > > On 20/06, Md. Farhad Hasan Khan wrote: > > Hi, > > > > In my environment openstack volume service running on compute node. > > When the openstack-cinder-volume service goes down on, the Block > > Storage volumes which were created using the openstack-cinder-volume > > service cannot be managed until the service comes up again. > > > > > > > > I need help how to configure openstack cinder volume HA. > > > > > > > > Thanks & B'Rgds, > > > > Rony > > > > Hi Rony, > > You can configure Cinder API, Scheduler, and Backup in Active-Active > and Cinder Volume in Active-Passive using Pacemaker. > > Please check the "Highly available Block Storage API" documentation > [1] for a detailed guide. > > Cheers, > Gorka. > > [1]: https://docs.openstack.org/ha-guide/storage-ha-block.html > From emccormick at cirrusseven.com Sun Jun 23 04:51:55 2019 From: emccormick at cirrusseven.com (Erik McCormick) Date: Sun, 23 Jun 2019 00:51:55 -0400 Subject: [kolla][octavia][openstack-ansible][tripleo] octavia in kolla ansible In-Reply-To: References: Message-ID: Some goodies inline below. On Thu, Jun 20, 2019 at 12:34 PM Mark Goddard wrote: > On Thu, 20 Jun 2019 at 17:27, Michael Johnson wrote: > > > > Hi Mark, > > > > I wanted to highlight that I wrote a detailed certificate > > configuration guide for Octavia here: > > https://docs.openstack.org/octavia/latest/admin/guides/certificates.html > > > > This could be used to automate the certificate generation for Kolla > deployments. > > Great, that looks useful. > Michael was super awesome creating this after Tobias and I (and a few other folks) ran into road blocks with this. Many many thanks for that. > > > Let me know if you have any questions about the guide or steps, > > Michael > > > > On Thu, Jun 20, 2019 at 7:31 AM Mark Goddard wrote: > > > > > > On Thu, 20 Jun 2019 at 15:08, Alex Schultz > wrote: > > > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 8:00 AM Mark Goddard > wrote: > > > >> > > > >> Hi, > > > >> > > > >> In the recent kolla meeting [1] we discussed the usability of > octavia > > > >> in kolla ansible. We had feedback at the Denver summit [2] that this > > > >> service is difficult to deploy and requires a number of manual > steps. > > > >> Certificates are one of the main headaches. It was stated that OSA > [3] > > > >> may have some useful code we could look into. > > > > > > > > > > > > I second that Octavia is very painful to install. There is a > requirement of a bunch of openstack cloud configurations > (flavors/images/etc) that must be handled prior to actually configuring the > service which means it's complex to deploy. IMHO it would have been > beneficial for some of these items to actually have been rolled into the > service itself (ie dynamically querying the services for flavor information > rather than expecting an ID put into a configuration file). That being > said, we have managed to get it integrated into tripleo but it's rather > complex. It does use ansible if you want to borrow some of the concepts for > os_octavia. > > > > > > > > > https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/octavia > > > > > https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/octavia-files.yaml > > > > > https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/roles > > > > > > > > Additionally we're still leveraging some of the puppet-octavia code > to manage configs/nova flavors. > > > > > > Thanks for sharing Alex & Carlos - useful source material. > > > > Most of that bitching in Denver was me. Thanks to John for recording it for posterity. With Octavia being the defacto LBaaS standard in Openstack we need to do a much better job deploying it. This proved to be a bit of a headache with kolla-ansible, but I don't think it's terribly far off. The main issue, as mentioned in all the reference materials you provided, is certificate generation. Kolla-ansible presently only supports a single CA and does not help operators create that in any way. Also, you really need to have two CAs. Beyond that, Octavia is extremely fussy about its certificates and getting it right can be a royal pain. During my first attempt at doing this deployment, I went in search of some project that had the certificate generation as a component and found that OSA had the functionality and documented it well. I extracted the needed bits, ran it, and came out the other side with a working set of certificates. Then, like most operators, I saw a squirrel and forgot to do anything more with it. So with the introduction out of the way, here's what I used. It was entirely derived from OSA with much love. https://github.com/emccormickva/octavia-cert-generate I also had to make a few hacks to kolla-ansible to get everything going. 1) In the octavia config template (octavia.conf.j2) I updated the config options to use the new certificates as follows: [certificates] ca_private_key = /etc/octavia/certs/private/cakey.pem ca_certificate = /etc/octavia/certs/ca_server_01.pem [haproxy_amphora] server_ca = /etc/octavia/certs/ca_server_01.pem client_cert = /etc/octavia/certs/client.pem [controller_worker] client_ca = /etc/octavia/certs/ca_01.pem 2) Update kolla's config-yml to copy over all the certs for each container with_items: - cakey.pem - ca_01.pem - ca_server_01.pem - client.pem I think I had to make a few other hacks in Queens, but all of those seem to have been addressed already (just doing a diff of master vs. my current configs). If we can incorporate certificate generation, get 2 CAs, and copy / configure them properly, I think everything will be great. Maybe others have additional asks, but this would do it for me. If I can scrap together some time, I'll see if I can get some commits together to make it happen, but that's always a dodgy proposition. I'm also always willing to review whatever or answer questions from someone else who wants to take it on. Cheers, Erik > > > > > > > Thanks, > > > > -Alex > > > > > > > >> > > > >> > > > >> As a starting point to improving this support, I'd like to gather > > > >> information from people who are using octavia in kolla ansible, and > > > >> what they have had to do to make it work. Please respond to this > > > >> email. > > > >> > > > >> I've also tagged openstack-ansible and Tripleo - if there is any > > > >> useful information those teams have to share about this topic, it is > > > >> most welcome. Alternatively if your support for octavia also falls > > > >> short perhaps we could collaborate on improvements. > > > >> > > > >> Thanks, > > > >> Mark > > > >> > > > >> [1] > http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-86 > > > >> [2] https://etherpad.openstack.org/p/DEN-train-kolla-feedback > > > >> [3] https://opendev.org/openstack/openstack-ansible-os_octavia > > > >> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.sb at garvan.org.au Mon Jun 24 01:31:38 2019 From: manuel.sb at garvan.org.au (Manuel Sopena Ballesteros) Date: Mon, 24 Jun 2019 01:31:38 +0000 Subject: How to reduce image size? Message-ID: <9D8A2486E35F0941A60430473E29F15B017EABB4E3@mxdb2.ad.garvan.unsw.edu.au> Dear Openstack community, I would like to reduce the size of an image, I wrote this command: # openstack image set --property size=700000 centos7.6-kudu-image Unable to set 'size' to '700000'. Reason: '700000' is not of type u'null', u'integer' Failed validating u'type' in schema[u'properties'][u'size']: {u'description': u'Size of image file in bytes', u'readOnly': True, u'type': [u'null', u'integer']} On instance[u'size']: '700000' I guess the real issue is u'readOnly': True How can I reduce the image size? Thank you very much Manuel NOTICE Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From li.canwei2 at zte.com.cn Mon Jun 24 02:11:51 2019 From: li.canwei2 at zte.com.cn (li.canwei2 at zte.com.cn) Date: Mon, 24 Jun 2019 10:11:51 +0800 (CST) Subject: =?UTF-8?B?W1dhdGNoZXJdIGFib3V0IGluYWN0aXZlIGNvcmUgbWVtYmVycw==?= Message-ID: <201906241011514278364@zte.com.cn> Hi Watcher team, Last week in the Watcher IRC meeting we discussed to remove inactive core members. It's welcome if you can take some time to review, and I wish you can reply before the end of this week if you want to get back. Best Regards, Canwei Li -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Jun 24 07:29:29 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Mon, 24 Jun 2019 09:29:29 +0200 Subject: [neutron] CI issue Message-ID: <11787965-5E3F-4DF6-86E0-A709AD05CD46@redhat.com> Hi neutrinos, Since about 2 days we have some problem with iptables_hybrid jobs in Neutron CI. See [1] for details. I’m now trying to find out what can be the problem there but if I will not find it anything soon I will propose temporary patch to skip those 3 failing tests for now. Please don’t recheck Your patches if it failed on those 2 jobs - this will fail again :/ [1] https://bugs.launchpad.net/neutron/+bug/1833902 — Slawek Kaplonski Senior software engineer Red Hat From ionut at fleio.com Mon Jun 24 07:33:26 2019 From: ionut at fleio.com (Ionut Biru) Date: Mon, 24 Jun 2019 10:33:26 +0300 Subject: [designate] DKIM TXT record problem In-Reply-To: <20190622131727.g44q3qx7f6fga4gr@yuggoth.org> References: <20190622131727.g44q3qx7f6fga4gr@yuggoth.org> Message-ID: Hello, Thanks. It worked. On Sat, Jun 22, 2019 at 4:21 PM Jeremy Stanley wrote: > On 2019-06-22 15:11:15 +0300 (+0300), Ionut Biru wrote: > > I'm running Rocky and as backend for designate I have powerdns. > > > > Whenever I try to add a TXT record for DKIM, the API returns that the > > specified record is not a TXT record. > > > > https://paste.xinu.at/OOz7/ > > > > It seems that is due to the length of the record It has a maxim a 255 > limit. > > > > How should I proceed in this case? > > A single TXT value string can not exceed 255 bytes in length. This is > fundamental to the IETF's specification for the domain name system > and has little to do with either Designate or PowerDNS. DKIM however > takes into account that you may have keys whose representation > exceeds the limits of a single string, and allows for splitting the > key into additional parts: > > https://tools.ietf.org/html/rfc6376#section-3.6.2.2 > > I would try adding a space somewhere in the middle of the "p" field > so that it is broken up into two shorter strings each no longer than > 255 characters. > -- > Jeremy Stanley > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.urdin at binero.se Mon Jun 24 07:49:55 2019 From: tobias.urdin at binero.se (Tobias Urdin) Date: Mon, 24 Jun 2019 09:49:55 +0200 Subject: [kolla][octavia][openstack-ansible][tripleo] octavia in kolla ansible In-Reply-To: References: Message-ID: Just a heads up to anybody; we had issues with certificates being broken using QEMU virtualization without full KVM. We changed to using full virt supported hardware in our test environment and it worked. Best regards On 06/23/2019 06:55 AM, Erik McCormick wrote: > Some goodies inline below. > > On Thu, Jun 20, 2019 at 12:34 PM Mark Goddard > wrote: > > On Thu, 20 Jun 2019 at 17:27, Michael Johnson > wrote: > > > > Hi Mark, > > > > I wanted to highlight that I wrote a detailed certificate > > configuration guide for Octavia here: > > > https://docs.openstack.org/octavia/latest/admin/guides/certificates.html > > > > This could be used to automate the certificate generation for > Kolla deployments. > > Great, that looks useful. > > Michael was super awesome creating this after Tobias and I (and a few > other folks) ran into road blocks with this. Many many thanks for that. > > > > > Let me know if you have any questions about the guide or steps, > > Michael > > > > On Thu, Jun 20, 2019 at 7:31 AM Mark Goddard > wrote: > > > > > > On Thu, 20 Jun 2019 at 15:08, Alex Schultz > > wrote: > > > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 8:00 AM Mark Goddard > > wrote: > > > >> > > > >> Hi, > > > >> > > > >> In the recent kolla meeting [1] we discussed the usability > of octavia > > > >> in kolla ansible. We had feedback at the Denver summit [2] > that this > > > >> service is difficult to deploy and requires a number of > manual steps. > > > >> Certificates are one of the main headaches. It was stated > that OSA [3] > > > >> may have some useful code we could look into. > > > > > > > > > > > > I second that Octavia is very painful to install. There is a > requirement of a bunch of openstack cloud configurations > (flavors/images/etc) that must be handled prior to actually > configuring the service which means it's complex to deploy. IMHO > it would have been beneficial for some of these items to actually > have been rolled into the service itself (ie dynamically querying > the services for flavor information rather than expecting an ID > put into a configuration file).  That being said, we have managed > to get it integrated into tripleo but it's rather complex. It does > use ansible if you want to borrow some of the concepts for os_octavia. > > > > > > > > > https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/octavia > > > > > https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/octavia-files.yaml > > > > > https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/roles > > > > > > > > Additionally we're still leveraging some of the > puppet-octavia code to manage configs/nova flavors. > > > > > > Thanks for sharing Alex & Carlos - useful source material. > > > > > > Most of that bitching in Denver was me. Thanks to John for recording > it for posterity. With Octavia being the defacto LBaaS standard in > Openstack we need to do a much better job deploying it. This proved to > be a bit of a headache with kolla-ansible, but I don't think it's > terribly far off. > > The main issue, as mentioned in all the reference materials you > provided, is certificate generation. Kolla-ansible presently only > supports a single CA and does not help operators create that in any > way. Also, you really need to have two CAs. Beyond that, Octavia is > extremely fussy about its certificates and getting it right can be a > royal pain. > > During my first attempt at doing this deployment, I went in search of > some project that had the certificate generation as a component and > found that OSA had the functionality and documented it well. I > extracted the needed bits, ran it, and came out the other side with a > working set of certificates. Then, like most operators, I saw a > squirrel and forgot to do anything more with it. > > So with the introduction out of the way, here's what I used. It was > entirely derived from OSA with much love. > > https://github.com/emccormickva/octavia-cert-generate > > I also had to make a few hacks to kolla-ansible to get everything going. > > 1) In the octavia config template (octavia.conf.j2) I updated the > config options to use the new certificates as follows: > > [certificates] > ca_private_key = /etc/octavia/certs/private/cakey.pem > ca_certificate = /etc/octavia/certs/ca_server_01.pem > > [haproxy_amphora] > server_ca = /etc/octavia/certs/ca_server_01.pem > client_cert = /etc/octavia/certs/client.pem > > [controller_worker] > client_ca = /etc/octavia/certs/ca_01.pem > > 2) Update kolla's config-yml to copy over all the certs for each container > >   with_items: >     - cakey.pem >     - ca_01.pem >     - ca_server_01.pem >     - client.pem > > I think I had to make a few other hacks in Queens, but all of those > seem to have been addressed already (just doing a diff of master vs. > my current configs). If we can incorporate certificate generation, get > 2 CAs, and copy / configure them properly, I think everything will be > great. Maybe others have additional asks, but this would do it for me. > If I can scrap together some time, I'll see if I can get some commits > together to make it happen, but that's always a dodgy proposition. I'm > also always willing to review whatever or answer questions from > someone else who wants to take it on. > > Cheers, > Erik > > > > > > > > > > Thanks, > > > > -Alex > > > > > > > >> > > > >> > > > >> As a starting point to improving this support, I'd like to > gather > > > >> information from people who are using octavia in kolla > ansible, and > > > >> what they have had to do to make it work. Please respond to > this > > > >> email. > > > >> > > > >> I've also tagged openstack-ansible and Tripleo - if there > is any > > > >> useful information those teams have to share about this > topic, it is > > > >> most welcome. Alternatively if your support for octavia > also falls > > > >> short perhaps we could collaborate on improvements. > > > >> > > > >> Thanks, > > > >> Mark > > > >> > > > >> [1] > http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-86 > > > >> [2] > https://etherpad.openstack.org/p/DEN-train-kolla-feedback > > > > >> [3] > https://opendev.org/openstack/openstack-ansible-os_octavia > > > > >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Jun 24 08:01:32 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Mon, 24 Jun 2019 10:01:32 +0200 Subject: [neutron] CI issue In-Reply-To: <11787965-5E3F-4DF6-86E0-A709AD05CD46@redhat.com> References: <11787965-5E3F-4DF6-86E0-A709AD05CD46@redhat.com> Message-ID: <8690DD8D-556F-489D-B71F-828286CDE156@redhat.com> Hi, It looks for me that it is caused by change in Nova [1] I proposed revert of this patch [2] and DNM patch in neutron to check if that will really help: [3] [1] https://review.opendev.org/#/c/644881/ [2] https://review.opendev.org/#/c/667035/ [3] https://review.opendev.org/#/c/667036/ > On 24 Jun 2019, at 09:29, Slawomir Kaplonski wrote: > > Hi neutrinos, > > Since about 2 days we have some problem with iptables_hybrid jobs in Neutron CI. See [1] for details. > I’m now trying to find out what can be the problem there but if I will not find it anything soon I will propose temporary patch to skip those 3 failing tests for now. > Please don’t recheck Your patches if it failed on those 2 jobs - this will fail again :/ > > [1] https://bugs.launchpad.net/neutron/+bug/1833902 > — > Slawek Kaplonski > Senior software engineer > Red Hat > — Slawek Kaplonski Senior software engineer Red Hat From flux.adam at gmail.com Mon Jun 24 08:48:17 2019 From: flux.adam at gmail.com (Adam Harwell) Date: Mon, 24 Jun 2019 17:48:17 +0900 Subject: [kolla][octavia][openstack-ansible][tripleo] octavia in kolla ansible In-Reply-To: References: Message-ID: That's really interesting Tobias -- in not sure what that would have to do with the certificates, except maybe if it is possible you were experiencing severe clock drift with QEMU? That could cause the requests to be rejected, but it has to be pretty extreme (in the scheme of things). That said, we are definitely aware that certificates (and especially somewhat ambiguously or confusingly worded configuration options for them) are difficult right now in Octavia. We do have the guide Michael mentioned, but I personally agree that it shouldn't be nearly as difficult as it is presently. I do have some thoughts about fixing it, but I'm afraid that changing up most of the configuration options around this would be similarly frustrating to the group of people who actually do have it figured out. Possibly it'd be ok with a long deprecation period. We can definitely also help with getting kolla-ansible doing the correct thing from the outset, it's just a matter of available resources. As much as I'd love to drive that effort, I'm already actively overcommitted right now, so unfortunately the only help I can give in the VERY near future would be guidance in our IRC channel to anyone else willing to spend some time on this. --Adam On Mon, Jun 24, 2019, 16:54 Tobias Urdin wrote: > Just a heads up to anybody; we had issues with certificates being broken > using QEMU virtualization without full KVM. > We changed to using full virt supported hardware in our test environment > and it worked. > > Best regards > > > On 06/23/2019 06:55 AM, Erik McCormick wrote: > > Some goodies inline below. > > On Thu, Jun 20, 2019 at 12:34 PM Mark Goddard wrote: > >> On Thu, 20 Jun 2019 at 17:27, Michael Johnson >> wrote: >> > >> > Hi Mark, >> > >> > I wanted to highlight that I wrote a detailed certificate >> > configuration guide for Octavia here: >> > >> https://docs.openstack.org/octavia/latest/admin/guides/certificates.html >> > >> > This could be used to automate the certificate generation for Kolla >> deployments. >> >> Great, that looks useful. >> > > Michael was super awesome creating this after Tobias and I (and a few > other folks) ran into road blocks with this. Many many thanks for that. > > > >> > Let me know if you have any questions about the guide or steps, >> > Michael >> > >> > On Thu, Jun 20, 2019 at 7:31 AM Mark Goddard wrote: >> > > >> > > On Thu, 20 Jun 2019 at 15:08, Alex Schultz >> wrote: >> > > > >> > > > >> > > > >> > > > On Thu, Jun 20, 2019 at 8:00 AM Mark Goddard >> wrote: >> > > >> >> > > >> Hi, >> > > >> >> > > >> In the recent kolla meeting [1] we discussed the usability of >> octavia >> > > >> in kolla ansible. We had feedback at the Denver summit [2] that >> this >> > > >> service is difficult to deploy and requires a number of manual >> steps. >> > > >> Certificates are one of the main headaches. It was stated that OSA >> [3] >> > > >> may have some useful code we could look into. >> > > > >> > > > >> > > > I second that Octavia is very painful to install. There is a >> requirement of a bunch of openstack cloud configurations >> (flavors/images/etc) that must be handled prior to actually configuring the >> service which means it's complex to deploy. IMHO it would have been >> beneficial for some of these items to actually have been rolled into the >> service itself (ie dynamically querying the services for flavor information >> rather than expecting an ID put into a configuration file). That being >> said, we have managed to get it integrated into tripleo but it's rather >> complex. It does use ansible if you want to borrow some of the concepts for >> os_octavia. >> > > > >> > > > >> https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/octavia >> > > > >> https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/octavia-files.yaml >> > > > >> https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/roles >> > > > >> > > > Additionally we're still leveraging some of the puppet-octavia code >> to manage configs/nova flavors. >> > > >> > > Thanks for sharing Alex & Carlos - useful source material. >> > > >> > > Most of that bitching in Denver was me. Thanks to John for recording it > for posterity. With Octavia being the defacto LBaaS standard in Openstack > we need to do a much better job deploying it. This proved to be a bit of a > headache with kolla-ansible, but I don't think it's terribly far off. > > The main issue, as mentioned in all the reference materials you provided, > is certificate generation. Kolla-ansible presently only supports a single > CA and does not help operators create that in any way. Also, you really > need to have two CAs. Beyond that, Octavia is extremely fussy about its > certificates and getting it right can be a royal pain. > > During my first attempt at doing this deployment, I went in search of some > project that had the certificate generation as a component and found that > OSA had the functionality and documented it well. I extracted the needed > bits, ran it, and came out the other side with a working set of > certificates. Then, like most operators, I saw a squirrel and forgot to do > anything more with it. > > So with the introduction out of the way, here's what I used. It was > entirely derived from OSA with much love. > > https://github.com/emccormickva/octavia-cert-generate > > I also had to make a few hacks to kolla-ansible to get everything going. > > 1) In the octavia config template (octavia.conf.j2) I updated the config > options to use the new certificates as follows: > > [certificates] > ca_private_key = /etc/octavia/certs/private/cakey.pem > ca_certificate = /etc/octavia/certs/ca_server_01.pem > > [haproxy_amphora] > server_ca = /etc/octavia/certs/ca_server_01.pem > client_cert = /etc/octavia/certs/client.pem > > [controller_worker] > client_ca = /etc/octavia/certs/ca_01.pem > > 2) Update kolla's config-yml to copy over all the certs for each container > > with_items: > - cakey.pem > - ca_01.pem > - ca_server_01.pem > - client.pem > > I think I had to make a few other hacks in Queens, but all of those seem > to have been addressed already (just doing a diff of master vs. my current > configs). If we can incorporate certificate generation, get 2 CAs, and copy > / configure them properly, I think everything will be great. Maybe others > have additional asks, but this would do it for me. If I can scrap together > some time, I'll see if I can get some commits together to make it happen, > but that's always a dodgy proposition. I'm also always willing to review > whatever or answer questions from someone else who wants to take it on. > > Cheers, > Erik > > > > > > >> > > > Thanks, >> > > > -Alex >> > > > >> > > >> >> > > >> >> > > >> As a starting point to improving this support, I'd like to gather >> > > >> information from people who are using octavia in kolla ansible, and >> > > >> what they have had to do to make it work. Please respond to this >> > > >> email. >> > > >> >> > > >> I've also tagged openstack-ansible and Tripleo - if there is any >> > > >> useful information those teams have to share about this topic, it >> is >> > > >> most welcome. Alternatively if your support for octavia also falls >> > > >> short perhaps we could collaborate on improvements. >> > > >> >> > > >> Thanks, >> > > >> Mark >> > > >> >> > > >> [1] >> http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-86 >> > > >> [2] https://etherpad.openstack.org/p/DEN-train-kolla-feedback >> > > >> [3] https://opendev.org/openstack/openstack-ansible-os_octavia >> > > >> >> > > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.settle at outlook.com Mon Jun 24 10:29:51 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Mon, 24 Jun 2019 10:29:51 +0000 Subject: [all] [ptls] [tc] [nova] [neutron] Volunteers that know TeX for PDF community goal Message-ID: Hi all, The work for the Train community goal - PDF support for project docs - is well underway. [1] Now, we're looking for volunteers to help test the implementation. We'll need someone to help build the docs into PDFs and determine things we can fix through tweaks to our docs, or if they're bugs in Sphinx. AKA: We need a troubleshoot artist. If you can volunteer, please add yourself to the wiki table here [2]. I've added neutron and nova specifically here as we need someone who is familiar with the project and it's dependencies to help us get that setup. Any questions? Reach out. Cheers, Alex [1] https://review.opendev.org/#/q/topic:build-pdf-docs+(status:open+OR+status:merged) [2] https://wiki.openstack.org/wiki/Documentation#PDF_for_Project_Docs_-_Community_Goal From smooney at redhat.com Mon Jun 24 11:22:38 2019 From: smooney at redhat.com (Sean Mooney) Date: Mon, 24 Jun 2019 12:22:38 +0100 Subject: [neutron] [nova] CI issue In-Reply-To: <8690DD8D-556F-489D-B71F-828286CDE156@redhat.com> References: <11787965-5E3F-4DF6-86E0-A709AD05CD46@redhat.com> <8690DD8D-556F-489D-B71F-828286CDE156@redhat.com> Message-ID: On Mon, 2019-06-24 at 10:01 +0200, Slawomir Kaplonski wrote: > Hi, > > It looks for me that it is caused by change in Nova [1] > > I proposed revert of this patch [2] and DNM patch in neutron to check if that will really help: [3] > > [1] https://review.opendev.org/#/c/644881/ > [2] https://review.opendev.org/#/c/667035/ > [3] https://review.opendev.org/#/c/667036/ i am pretty sure i know the reason that this is failing. path 1 does not have special case handeling for same host resize where we do not receive events form neutron at all. that patch was previously tested with only a multinode setup where without that patch we race and normally fail to start waiting for vif-plugged events form neutorn before we recive them cause the revert to fail. we could do a revert but i would prefer to instead add a follow up patch to adress the same host case. > > > On 24 Jun 2019, at 09:29, Slawomir Kaplonski wrote: > > > > Hi neutrinos, > > > > Since about 2 days we have some problem with iptables_hybrid jobs in Neutron CI. See [1] for details. > > I’m now trying to find out what can be the problem there but if I will not find it anything soon I will propose > > temporary patch to skip those 3 failing tests for now. > > Please don’t recheck Your patches if it failed on those 2 jobs - this will fail again :/ > > > > [1] https://bugs.launchpad.net/neutron/+bug/1833902 > > — > > Slawek Kaplonski > > Senior software engineer > > Red Hat > > > > — > Slawek Kaplonski > Senior software engineer > Red Hat > > From madhuri.kumari at intel.com Mon Jun 24 12:04:07 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Mon, 24 Jun 2019 12:04:07 +0000 Subject: [ironic] To not have meetings? In-Reply-To: References: Message-ID: <0512CBBECA36994BAA14C7FEDE986CA614A4D2B0@BGSMSX101.gar.corp.intel.com> +1 to the idea. It will help achieve flexibility. Regards, Madhuri From: Jacob Anders [mailto:jacob.anders.au at gmail.com] Sent: Thursday, June 20, 2019 3:47 PM To: Julia Kreger Cc: openstack-discuss Subject: Re: [ironic] To not have meetings? I think this is a great idea (or should I say - set of ideas) which goes beyond making the weekly meeting work for us APAC peeps. I think with this approach we will likely achieve better responsiveness and flexibility overall. I look forward to trying this out. Thank you Julia. On Tue, Jun 11, 2019 at 12:06 AM Julia Kreger > wrote: Last week the discussion came up of splitting the ironic meeting to alternate time zones as we have increasing numbers of contributors in the Asia/Pacific areas of the world[0]. With that discussion, an additional interesting question came up posing the question of shifting to the mailing list instead of our present IRC meeting[1]? It is definitely an interesting idea, one that I'm personally keen on because of time zones and daylight savings time. I think before we do this, we should collect thoughts and also try to determine how we would pull this off so we don't forget the weekly checkpoint that the meeting serves. I think we need to do something, so I guess now is a good time to provide input into what everyone thinks would be best for the project and facilitating the weekly check-in. What I think might work: By EOD UTC Monday: * Listed primary effort participants will be expected to update the whiteboard[2] weekly before EOD Monday UTC * Contributors propose patches to the whiteboard that they believe would be important for reviewers to examine this coming week. * PTL or designee sends weekly email to the mailing list to start an update thread shortly after EOD Monday UTC or early Tuesday UTC. ** Additional updates, questions, and topical discussion (new features, RFEs) would ideally be wrapped up by EOD UTC Tuesday. With that, I think we would also need to go ahead and begin having "office hours" as during the week we generally know some ironic contributors will be in IRC and able to respond to questions. I think this would initially consist of our meeting time and perhaps the other time that seems to be most friendly to the contributors int he Asia/Pacific area[3]. Thoughts/ideas/suggestions welcome! -Julia [0]: http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-03.log.html#t2019-06-03T15:31:33 [1]: http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-03.log.html#t2019-06-03T15:43:16 [2]: https://etherpad.openstack.org/p/IronicWhiteBoard [3]: https://doodle.com/poll/bv9a4qyqy44wiq92 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Jun 24 12:32:47 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Mon, 24 Jun 2019 14:32:47 +0200 Subject: [neutron] [nova] CI issue In-Reply-To: References: <11787965-5E3F-4DF6-86E0-A709AD05CD46@redhat.com> <8690DD8D-556F-489D-B71F-828286CDE156@redhat.com> Message-ID: <0A8DD96C-5F06-4702-B10D-31BC2830204D@redhat.com> Hi, > On 24 Jun 2019, at 13:22, Sean Mooney wrote: > > On Mon, 2019-06-24 at 10:01 +0200, Slawomir Kaplonski wrote: >> Hi, >> >> It looks for me that it is caused by change in Nova [1] >> >> I proposed revert of this patch [2] and DNM patch in neutron to check if that will really help: [3] >> >> [1] https://review.opendev.org/#/c/644881/ >> [2] https://review.opendev.org/#/c/667035/ >> [3] https://review.opendev.org/#/c/667036/ > i am pretty sure i know the reason that this is failing. > path 1 does not have special case handeling for same host resize where > we do not receive events form neutron at all. that patch was previously tested with only > a multinode setup where without that patch we race and normally fail to start waiting for vif-plugged > events form neutorn before we recive them cause the revert to fail. > we could do a revert but i would prefer to instead add a follow up patch to adress the same host case. If You can propose proper fix soon, this would be the best solution :) If it will take some time for You we should either merge revert for now or send patch to skip those 3 tests in neutron jobs to unblock our gates. Please tell me which solution I should go with then. >> >>> On 24 Jun 2019, at 09:29, Slawomir Kaplonski wrote: >>> >>> Hi neutrinos, >>> >>> Since about 2 days we have some problem with iptables_hybrid jobs in Neutron CI. See [1] for details. >>> I’m now trying to find out what can be the problem there but if I will not find it anything soon I will propose >>> temporary patch to skip those 3 failing tests for now. >>> Please don’t recheck Your patches if it failed on those 2 jobs - this will fail again :/ >>> >>> [1] https://bugs.launchpad.net/neutron/+bug/1833902 >>> — >>> Slawek Kaplonski >>> Senior software engineer >>> Red Hat >>> >> >> — >> Slawek Kaplonski >> Senior software engineer >> Red Hat >> >> > — Slawek Kaplonski Senior software engineer Red Hat From smooney at redhat.com Mon Jun 24 14:37:18 2019 From: smooney at redhat.com (Sean Mooney) Date: Mon, 24 Jun 2019 15:37:18 +0100 Subject: [neutron] [nova] CI issue In-Reply-To: <0A8DD96C-5F06-4702-B10D-31BC2830204D@redhat.com> References: <11787965-5E3F-4DF6-86E0-A709AD05CD46@redhat.com> <8690DD8D-556F-489D-B71F-828286CDE156@redhat.com> <0A8DD96C-5F06-4702-B10D-31BC2830204D@redhat.com> Message-ID: On Mon, 2019-06-24 at 14:32 +0200, Slawomir Kaplonski wrote: > Hi, > > > On 24 Jun 2019, at 13:22, Sean Mooney wrote: > > > > On Mon, 2019-06-24 at 10:01 +0200, Slawomir Kaplonski wrote: > > > Hi, > > > > > > It looks for me that it is caused by change in Nova [1] > > > > > > I proposed revert of this patch [2] and DNM patch in neutron to check if that will really help: [3] > > > > > > [1] https://review.opendev.org/#/c/644881/ > > > [2] https://review.opendev.org/#/c/667035/ > > > [3] https://review.opendev.org/#/c/667036/ > > > > i am pretty sure i know the reason that this is failing. > > path 1 does not have special case handeling for same host resize where > > we do not receive events form neutron at all. that patch was previously tested with only > > a multinode setup where without that patch we race and normally fail to start waiting for vif-plugged > > events form neutorn before we recive them cause the revert to fail. > > we could do a revert but i would prefer to instead add a follow up patch to adress the same host case. > > If You can propose proper fix soon, this would be the best solution :) > If it will take some time for You we should either merge revert for now or send patch to skip those 3 tests in neutron > jobs to unblock our gates. > Please tell me which solution I should go with then. just a quick update. we have decided to merge the revert. it will take a while for the revert to merge so please hold of on recheck neutron patches until it has merged. we will update the list when that happens once we have updated and tested the patch for the same host resize revert edgecase we will resubmit it. ill pull in the neturon job with a DNM patch to validate it. sorry for the breakage. > > > > > > > > On 24 Jun 2019, at 09:29, Slawomir Kaplonski wrote: > > > > > > > > Hi neutrinos, > > > > > > > > Since about 2 days we have some problem with iptables_hybrid jobs in Neutron CI. See [1] for details. > > > > I’m now trying to find out what can be the problem there but if I will not find it anything soon I will propose > > > > temporary patch to skip those 3 failing tests for now. > > > > Please don’t recheck Your patches if it failed on those 2 jobs - this will fail again :/ > > > > > > > > [1] https://bugs.launchpad.net/neutron/+bug/1833902 > > > > — > > > > Slawek Kaplonski > > > > Senior software engineer > > > > Red Hat > > > > > > > > > > — > > > Slawek Kaplonski > > > Senior software engineer > > > Red Hat > > > > > > > > — > Slawek Kaplonski > Senior software engineer > Red Hat > From mark at stackhpc.com Mon Jun 24 14:58:20 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 24 Jun 2019 15:58:20 +0100 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <0512CBBECA36994BAA14C7FEDE986CA60FC3299E@BGSMSX102.gar.corp.intel.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> <8e240b9c-c5f0-3d1f-5b16-c543bbbc525b@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC3299E@BGSMSX102.gar.corp.intel.com> Message-ID: On Fri, 14 Jun 2019 at 11:18, Kumari, Madhuri wrote: > > Hi Eric, > > Thank you for following up and the notes. > > The spec[4] is related but a complex one too with all the migration implementation. So I will try to put a new spec with a limited implementation of resize. I was talking with Madhuri in #openstack-ironic about this today [1]. While talking it through I raised some concerns about the nova resize-based design, which I'll try to outline here. When we deploy a node using deploy templates, we have the following sequence. * user picks a flavor and image, which may specify required traits * selected traits are pushed to ironic via instance_info.traits * ironic finds all deploy templates with name matching one of the selected traits * deploy steps from the matching templates are used when provisioning the node The deploy steps could include RAID config, BIOS config, or something else. If we now resize the instance to a different flavor which has a different set of traits, we would end up with a new set of traits, which map a new set of deploy templates, with a new set of steps. How do we apply this change? Should we execute all matching deploy steps, which could (e.g. RAID) result in losing data? Or should we attempt to execute only those deploy steps that have changed? Would that always work? I don't think we keep a record of the steps used to provision a node, so if templates have changed in the intervening time then we might not get a correct diff. The original RFE [2] just called for specifying a list of deploy steps via ironic API, however this doesn't really work for the nova model. [1] http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/latest.log.html#t2019-06-24T12:33:31 [2] https://storyboard.openstack.org/#!/story/2005129 > > > Regards, > Madhuri > > >>-----Original Message----- > >>From: Eric Fried [mailto:openstack at fried.cc] > >>Sent: Thursday, June 13, 2019 11:15 PM > >>To: openstack-discuss at lists.openstack.org > >>Subject: Re: [Nova][Ironic] Reset Configurations in Baremetals Post > >>Provisioning > >> > >>We discussed this today in the nova meeting [1] with a little bit of followup > >>in the main channel after the meeting closed [2]. > >> > >>There seems to be general support (or at least not objection) for > >>implementing "resize" for ironic, limited to: > >> > >>- same host [3] > >>- just this feature (i.e. "hyperthreading") or possibly "anything deploy > >>template" > >> > >>And the consensus was that it's time to put this into a spec. > >> > >>There was a rocky spec [4] that has some overlap and could be repurposed; > >>or a new one could be introduced. > >> > >>efried > >> > >>[1] > >>http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-06-13- > >>14.00.log.html#l-309 > >>[2] > >>http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack- > >>nova.2019-06-13.log.html#t2019-06-13T15:02:10 > >>(interleaved) > >>[3] an acknowledged wrinkle here was that we need to be able to detect at > >>the API level that we're dealing with an Ironic instance, and ignore the > >>allow_resize_to_same_host option (because always forcing same host) [4] > >>https://review.opendev.org/#/c/449155/ > From jaypipes at gmail.com Mon Jun 24 15:14:10 2019 From: jaypipes at gmail.com (Jay Pipes) Date: Mon, 24 Jun 2019 11:14:10 -0400 Subject: [placement] db query analysis In-Reply-To: References: Message-ID: <07c6e111-d6a9-d175-7014-aff832c6e9c7@gmail.com> On 6/19/19 8:39 AM, Chris Dent wrote: > One of the queries that has come up recently with placement > performance is whether there may be opportunities to gain some > improvement by making fewer queries to the db. That is: > > * are there redundant queries > * are there places where data is gathered in multiple queries that >   could be in one (or at least fewer) > * are there queries that are not doing what we think > > to that end I've done some analysis of logs produced when > [placement_database]/connection_debug is set to 50 (which dumps SQL > queries to the INFO log). > > The collection of queries made during a single request to GET > /allocation_candidates is at http://paste.openstack.org/show/753183/ > > The data set is a single resource resource provider of the same form > as used in the placement-perfload job (where 1000 providers are > used). Only 1 is used in this case as several queries use 'IN' > statements that list all the resource provider ids currently in > play, and that gets dumped to the log making it inscrutable. I've > noted in the paste where this happens. > > Each block of SQL is associated with the method that calls it. The > queries are in the order they happen. One query that happens three > times (once for each resource class requested) is listed once. > > Observations: > > * The way we use IN could be improved using a bindparam: > > https://docs.sqlalchemy.org/en/13/core/sqlelement.html?highlight=expanding%20bindparam#sqlalchemy.sql.operators.ColumnOperators.in_ > > > * That we use IN in that fashion at all, where we are carrying lists >   of rp ids around and making multiple queries, instead of one giant >   one, might be an area worth exploring. > > * There are a couple of places where get get a trait id (via name) in >   a separate query from using the trait id. > > * What can you see? > > Please have a look to see if anything looks odd, wrong, etc. > Basically what we're after is trying to find things that violate our > expectations. > > Note that this is just one of several paths through the database. > When there are sharing or nested providers things change. I didn't > bother to do a more complex set of queries at this time as it seemed > starting simple would help us tease out how best to communicate > these sorts of things. > > Related to that, I've started working on a nested-perfload at > https://review.opendev.org/665695 Please note that there used to be fewer queries performed in the allocation candidate and get resource provider functions. We replaced the giant SQL statements with multiple smaller SQL statements to assist in debuggability and tracing. Best, -jay From bdobreli at redhat.com Mon Jun 24 15:26:08 2019 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Mon, 24 Jun 2019 17:26:08 +0200 Subject: [all] [ptls] [tc] [nova] [neutron] [tripleo] Volunteers that know TeX for PDF community goal In-Reply-To: References: Message-ID: On 24.06.2019 12:29, Alexandra Settle wrote: > Hi all, > > The work for the Train community goal - PDF support for project docs - > is well underway. [1] Now, we're looking for volunteers to help test the > implementation. > > We'll need someone to help build the docs into PDFs and determine things > we can fix through tweaks to our docs, or if they're bugs in Sphinx. > AKA: We need a troubleshoot artist. There seems to be an issue [0] for any projects using the badges [1] or other SVGs in their docs. Also the default levels of nesting of the {\begin ... \end} stanzas might require additional tunings, like [2]. I'll keep posting here on the further issues discovered for PDF doc builds for TripleO. Stay tuned :) [0] https://github.com/sphinx-doc/sphinx/issues/4720#issuecomment-372046571 [1] https://governance.openstack.org/tc/badges/ [2] https://review.opendev.org/667114 > > If you can volunteer, please add yourself to the wiki table here [2]. > I've added neutron and nova specifically here as we need someone who is > familiar with the project and it's dependencies to help us get that setup. > > Any questions? Reach out. > > Cheers, > > Alex > > [1] > https://review.opendev.org/#/q/topic:build-pdf-docs+(status:open+OR+status:merged) > > [2] > https://wiki.openstack.org/wiki/Documentation#PDF_for_Project_Docs_-_Community_Goal > -- Best regards, Bogdan Dobrelya, Irc #bogdando From ltoscano at redhat.com Mon Jun 24 15:54:48 2019 From: ltoscano at redhat.com (Luigi Toscano) Date: Mon, 24 Jun 2019 17:54:48 +0200 Subject: [all] [ptls] [tc] [nova] [neutron] Volunteers that know TeX for PDF community goal In-Reply-To: References: Message-ID: <3528349.xQ6EllL4aP@whitebase.usersys.redhat.com> On Monday, 24 June 2019 12:29:51 CEST Alexandra Settle wrote: > Hi all, > > The work for the Train community goal - PDF support for project docs - > is well underway. [1] Now, we're looking for volunteers to help test the > implementation. > > We'll need someone to help build the docs into PDFs and determine things > we can fix through tweaks to our docs, or if they're bugs in Sphinx. > AKA: We need a troubleshoot artist. > > If you can volunteer, please add yourself to the wiki table here [2]. > I've added neutron and nova specifically here as we need someone who is > familiar with the project and it's dependencies to help us get that setup. I added myself for Sahara. > Any questions? Reach out. Looking at the first test (openstacksdk, https://review.opendev.org/#/c/ 601659/), it looks like we will need to copy a long list of items into bindep.txt. Would it make sense to think about a way to quickly share those values? Ciao -- Luigi From mtreinish at kortar.org Mon Jun 24 15:56:29 2019 From: mtreinish at kortar.org (Matthew Treinish) Date: Mon, 24 Jun 2019 11:56:29 -0400 Subject: [all] [ptls] [tc] [nova] [neutron] [tripleo] Volunteers that know TeX for PDF community goal In-Reply-To: References: Message-ID: <20190624155629.GA26343@sinanju.localdomain> On Mon, Jun 24, 2019 at 05:26:08PM +0200, Bogdan Dobrelya wrote: > On 24.06.2019 12:29, Alexandra Settle wrote: > > Hi all, > > > > The work for the Train community goal - PDF support for project docs - > > is well underway. [1] Now, we're looking for volunteers to help test the > > implementation. > > > > We'll need someone to help build the docs into PDFs and determine things > > we can fix through tweaks to our docs, or if they're bugs in Sphinx. > > AKA: We need a troubleshoot artist. > > There seems to be an issue [0] for any projects using the badges [1] or > other SVGs in their docs. Also the default levels of nesting of the {\begin > ... \end} stanzas might require additional tunings, like [2]. I'll keep > posting here on the further issues discovered for PDF doc builds for > TripleO. Stay tuned :) The svg in pdf thing was a known issue. When I first looked at building the nova docs with latex/pdf output a few years ago [1] you had to manually convert the images before building the latex. Since then sphinx has added an extension to do this for you: https://www.sphinx-doc.org/en/master/usage/extensions/imgconverter.html You should be able to just add that to the extension list in conf.py and it will convert the svgs at sphinx build time. -Matt Treinish [1] https://opendev.org/openstack/nova/commit/62575dd40e5b7698d9ba54641558246489f0614e > > [0] https://github.com/sphinx-doc/sphinx/issues/4720#issuecomment-372046571 > [1] https://governance.openstack.org/tc/badges/ > [2] https://review.opendev.org/667114 > > > > > If you can volunteer, please add yourself to the wiki table here [2]. > > I've added neutron and nova specifically here as we need someone who is > > familiar with the project and it's dependencies to help us get that setup. > > > > Any questions? Reach out. > > > > Cheers, > > > > Alex > > > > [1] > > https://review.opendev.org/#/q/topic:build-pdf-docs+(status:open+OR+status:merged) > > > > [2] > > https://wiki.openstack.org/wiki/Documentation#PDF_for_Project_Docs_-_Community_Goal > > > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From openstack at fried.cc Mon Jun 24 16:12:17 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 24 Jun 2019 11:12:17 -0500 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> <8e240b9c-c5f0-3d1f-5b16-c543bbbc525b@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC3299E@BGSMSX102.gar.corp.intel.com> Message-ID: <138cfc19-1850-a60a-b14a-d5a2ab8f0c85@fried.cc> > If we now resize the instance to a different flavor which has a > different set of traits, we would end up with a new set of traits, > which map a new set of deploy templates, with a new set of steps. > > How do we apply this change? Should we execute all matching deploy > steps, which could (e.g. RAID) result in losing data? Or should we > attempt to execute only those deploy steps that have changed? Would > that always work? I don't think we keep a record of the steps used to > provision a node, so if templates have changed in the intervening time > then we might not get a correct diff. Not being intimately familiar with the workings, the approach I've been advocating is to only support the changes you support, and fail on anything else. In other words, compare the old flavor to the new flavor. If the diff contains anything other than this "hyperthreading" gizmo, fail. Ironic resize is a special snowflake, and only supports a very limited set of changes done in a very limited way. At first, it's just one thing. You can add other pieces as demand arises, but by default you're rejecting big complicated things like your RAID example. efried . From mriedemos at gmail.com Mon Jun 24 17:12:47 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 24 Jun 2019 12:12:47 -0500 Subject: [Watcher] team meeting and agenda In-Reply-To: <201906191156028743737@zte.com.cn> References: <201906191156028743737@zte.com.cn> Message-ID: <82135dc0-074b-cde1-889c-aaed60c36c5f@gmail.com> On 6/18/2019 10:56 PM, li.canwei2 at zte.com.cn wrote: > The agenda is available on > https://wiki.openstack.org/wiki/Watcher_Meeting_Agenda > > > feel free to add any additional items. > Regarding this item: > whether we need to receive and process the Nova unversioned notifications [1] I have split the grenade part out of [1] and that change should be ready to go for the non-grenade jobs to get the watcher nova versioned notification handler code tested again. I made [2] to iterate on fixing the watcher grenade job to configure nova early for both versioned and unversioned notifications (remember we need nova emitting unversioned notifications for ceilometer in the CI jobs). [1] https://review.opendev.org/#/c/663332/ [2] https://review.opendev.org/#/c/667161/ -- Thanks, Matt From fungi at yuggoth.org Mon Jun 24 17:20:43 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 24 Jun 2019 17:20:43 +0000 Subject: [all] Long overdue cleanups of Zuulv2 compatibility base configs In-Reply-To: <20190620195848.quo7or6xqmcueb6p@yuggoth.org> References: <5980f30f-d4c7-4e0d-8d3a-984e2dcd707a@www.fastmail.com> <20190620195848.quo7or6xqmcueb6p@yuggoth.org> Message-ID: <20190624172043.6562nrsjuukzf2b6@yuggoth.org> On 2019-06-20 19:58:49 +0000 (+0000), Jeremy Stanley wrote: > On 2019-06-04 22:45:58 +0000 (+0000), Clark Boylan wrote: > > As part of our transition to Zuulv3 a year and a half ago, we > > carried over some compatibility tooling that we would now like > > to clean up. Specifically, we add a zuul-cloner (which went away > > in zuulv3) shim and set a global bindep fallback file value in > > all jobs. Zuulv3 native jobs are expected to use the repos zuul > > has precloned for you (no zuul-cloner required) as well as > > supply an in repo bindep.txt (or specify a bindep.txt path or > > install packages via some other method). > > > > This means that we should be able to remove both of these items > > from the non legacy base job in OpenDev's zuul. The legacy base > > job will continue to carry these for you so that you can write > > new native jobs over time. We have two changes [0][1] ready to > > go for this [...] > > Our current plan is to merge these changes on June 24, 2019. We > > will be around to help debug any unexpected issues that come up. > > Jobs can be updated to use the "legacy-base" base job instead of > > the "base" base job if they need to be reverted to the old > > behavior quickly. > [...] > > [0] https://review.opendev.org/656195 > > [1] https://review.opendev.org/663151 [...] > Obvious uncaught breakage to look out for on Monday will be > failures revolving around "command not found" errors for > zuul-cloner or other commands which may have been provided by one > of the packages in the > > list. As announced, these two changes have now merged. Please don't hesitate to let us know if you require any assistance working through suspected fallout from either or both. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From mriedemos at gmail.com Mon Jun 24 18:44:35 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 24 Jun 2019 13:44:35 -0500 Subject: [cinder] ceph multiattach details? In-Reply-To: <20190528133322.GA23003@sm-workstation> References: <2569f62a-1d30-2468-b019-06d99a819f82@gmail.com> <8c70bb02-a959-96fd-382e-26f8816aad5d@gmail.com> <20190528133322.GA23003@sm-workstation> Message-ID: <38f169c6-c8c6-a968-160c-6cfff65b7183@gmail.com> On 5/28/2019 8:33 AM, Sean McGinnis wrote: > Multiattach support has indeed been enabled for RBD as of the Stein release. > Though there are the known caveats that you point out. > > I thought there was a pending patch to add some details on this to to the RBD > driver configuration reference, but I am not finding anything at the moment. > > I don't have all the details on that myself, but hopefully one of the RBD > driver maintainers can chime in here with better details. Note that the devstack-plugin-ceph repo also enabled multiattach testing after it was enabled for the rbd volume driver in cinder [1]. So it should be getting tested in the ceph job as well upstream. [1] https://github.com/openstack/devstack-plugin-ceph/commit/b69c941d5ccafb7024027bfef4792014c9799c53#diff-7415f5ff7beee2cdf9ffe31e12e4c086 -- Thanks, Matt From gouthampravi at gmail.com Mon Jun 24 19:21:39 2019 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Mon, 24 Jun 2019 12:21:39 -0700 Subject: https://review.opendev.org/#/c/663089 In-Reply-To: <6031C821D2144A4CB722005A21B34BD53AED59FD@MX202CL02.corp.emc.com> References: <6031C821D2144A4CB722005A21B34BD53AED59FD@MX202CL02.corp.emc.com> Message-ID: Hi Helen, On Mon, Jun 24, 2019 at 3:18 AM Walsh, Helen wrote: > Hi Ravi, > > A request for information if I may. I will need some direction on how to > generate manila-powermax.inc > > I am not finding anything online > > https://review.opendev.org/#/c/663089 > Thank you for checking. I'm assuming you want to update the file rendered to: https://docs.openstack.org/manila/latest/configuration/shared-file-systems/drivers/dell-emc-vmax-driver.html#driver-options Manila currently doesn't have a way to automatically generate these configuration files. Please see associated bug: https://bugs.launchpad.net/manila/+bug/1713062. So for now, you can manually modify those files; we'll look to prioritizing https://bugs.launchpad.net/manila/+bug/1713062. Contributions to add the config option automation would be welcome! Thanks, Goutham > > > Thank you, > > Helen > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sombrafam at gmail.com Mon Jun 24 20:18:28 2019 From: sombrafam at gmail.com (Erlon Cruz) Date: Mon, 24 Jun 2019 17:18:28 -0300 Subject: NetApp E-series infiniband support for Cinder In-Reply-To: <20190621093030.otjr3ukiif635ork@localhost> References: <20190621091159.isdnyah3tkms26xb@localhost> <20190621093030.otjr3ukiif635ork@localhost> Message-ID: Hi Grant, As Gorka said, support for eseries was dropped in Stein, but even if not it never supported infiniband. Erlon Em sex, 21 de jun de 2019 às 06:36, Gorka Eguileor escreveu: > On 21/06, Grant Morley wrote: > > Hi Gorka, > > > > Thanks for that, I'll let the business know that we wont want to be using > > that. We had planned to move to Stein later this year! > > > > Regards, > > Glad I could help avoid that awkward moment. ;-) > > > > > > > On 21/06/2019 10:11, Gorka Eguileor wrote: > > > On 20/06, Grant Morley wrote: > > > > Hi All, > > > > > > > > Just a quick one to see if anybody knows if there is currently any > > > > infiniband support for cinder using a NetApp E-Series SAN. I have > had a look > > > > at: > > > > > > > > > https://docs.openstack.org/cinder/queens/configuration/block-storage/drivers/netapp-volume-driver.html > > > > > > > > and: > > > > > > > > > https://docs.openstack.org/cinder/rocky/reference/support-matrix.html > > > > > > > > They seem to suggest that only iSCSI and FC are supported. I just > want to > > > > make sure before I start trying to do a POC with the E-series and > > > > infiniband. > > > > > > > > Any advice would be much appreciated. > > > > > > > > Kind Regards, > > > > > > > > -- > > > > > > > > Grant Morley > > > > Cloud Lead, Civo Ltd > > > > www.civo.com | Signup for an account! > > > > > > > Hi, > > > > > > I don't know about support for infiniband, but the driver support has > > > been dropped in Stein. > > > > > > The release notes [1] state: > > > > > > Support for NetApp E-Series has been removed. The NetApp Unified > > > driver can now only be used with NetApp Clustered Data ONTAP. > > > > > > Regards, > > > Gorka. > > > > > > [1]: https://docs.openstack.org/releasenotes/cinder/stein.html > > -- > > > > Grant Morley > > Cloud Lead, Civo Ltd > > www.civo.com | Signup for an account! > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Jun 24 20:22:39 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Mon, 24 Jun 2019 22:22:39 +0200 Subject: [neutron] [nova] CI issue In-Reply-To: References: <11787965-5E3F-4DF6-86E0-A709AD05CD46@redhat.com> <8690DD8D-556F-489D-B71F-828286CDE156@redhat.com> <0A8DD96C-5F06-4702-B10D-31BC2830204D@redhat.com> Message-ID: <1505DD52-2A8F-4F85-992F-55A254E354AF@redhat.com> Hi, Revert in nova is merged so You can recheck Your Neutron patches now :) > On 24 Jun 2019, at 16:37, Sean Mooney wrote: > > On Mon, 2019-06-24 at 14:32 +0200, Slawomir Kaplonski wrote: >> Hi, >> >>> On 24 Jun 2019, at 13:22, Sean Mooney wrote: >>> >>> On Mon, 2019-06-24 at 10:01 +0200, Slawomir Kaplonski wrote: >>>> Hi, >>>> >>>> It looks for me that it is caused by change in Nova [1] >>>> >>>> I proposed revert of this patch [2] and DNM patch in neutron to check if that will really help: [3] >>>> >>>> [1] https://review.opendev.org/#/c/644881/ >>>> [2] https://review.opendev.org/#/c/667035/ >>>> [3] https://review.opendev.org/#/c/667036/ >>> >>> i am pretty sure i know the reason that this is failing. >>> path 1 does not have special case handeling for same host resize where >>> we do not receive events form neutron at all. that patch was previously tested with only >>> a multinode setup where without that patch we race and normally fail to start waiting for vif-plugged >>> events form neutorn before we recive them cause the revert to fail. >>> we could do a revert but i would prefer to instead add a follow up patch to adress the same host case. >> >> If You can propose proper fix soon, this would be the best solution :) >> If it will take some time for You we should either merge revert for now or send patch to skip those 3 tests in neutron >> jobs to unblock our gates. >> Please tell me which solution I should go with then. > just a quick update. > we have decided to merge the revert. it will take a while for the revert to merge > so please hold of on recheck neutron patches until it has merged. > we will update the list when that happens > > once we have updated and tested the patch for the same host resize revert edgecase > we will resubmit it. ill pull in the neturon job with a DNM patch to validate it. > sorry for the breakage. >> >>>> >>>>> On 24 Jun 2019, at 09:29, Slawomir Kaplonski wrote: >>>>> >>>>> Hi neutrinos, >>>>> >>>>> Since about 2 days we have some problem with iptables_hybrid jobs in Neutron CI. See [1] for details. >>>>> I’m now trying to find out what can be the problem there but if I will not find it anything soon I will propose >>>>> temporary patch to skip those 3 failing tests for now. >>>>> Please don’t recheck Your patches if it failed on those 2 jobs - this will fail again :/ >>>>> >>>>> [1] https://bugs.launchpad.net/neutron/+bug/1833902 >>>>> — >>>>> Slawek Kaplonski >>>>> Senior software engineer >>>>> Red Hat >>>>> >>>> >>>> — >>>> Slawek Kaplonski >>>> Senior software engineer >>>> Red Hat >>>> >>>> >> >> — >> Slawek Kaplonski >> Senior software engineer >> Red Hat — Slawek Kaplonski Senior software engineer Red Hat From smooney at redhat.com Mon Jun 24 20:23:56 2019 From: smooney at redhat.com (Sean Mooney) Date: Mon, 24 Jun 2019 21:23:56 +0100 Subject: [neutron] [nova] CI issue In-Reply-To: References: <11787965-5E3F-4DF6-86E0-A709AD05CD46@redhat.com> <8690DD8D-556F-489D-B71F-828286CDE156@redhat.com> <0A8DD96C-5F06-4702-B10D-31BC2830204D@redhat.com> Message-ID: Top posting to let everyone know https://review.opendev.org/#/c/667035/ has merged and it is now safe to recheck your neutron changes. On Mon, 2019-06-24 at 15:37 +0100, Sean Mooney wrote: > On Mon, 2019-06-24 at 14:32 +0200, Slawomir Kaplonski wrote: > > Hi, > > > > > On 24 Jun 2019, at 13:22, Sean Mooney wrote: > > > > > > On Mon, 2019-06-24 at 10:01 +0200, Slawomir Kaplonski wrote: > > > > Hi, > > > > > > > > It looks for me that it is caused by change in Nova [1] > > > > > > > > I proposed revert of this patch [2] and DNM patch in neutron to check if that will really help: [3] > > > > > > > > [1] https://review.opendev.org/#/c/644881/ > > > > [2] https://review.opendev.org/#/c/667035/ > > > > [3] https://review.opendev.org/#/c/667036/ > > > > > > i am pretty sure i know the reason that this is failing. > > > path 1 does not have special case handeling for same host resize where > > > we do not receive events form neutron at all. that patch was previously tested with only > > > a multinode setup where without that patch we race and normally fail to start waiting for vif-plugged > > > events form neutorn before we recive them cause the revert to fail. > > > we could do a revert but i would prefer to instead add a follow up patch to adress the same host case. > > > > If You can propose proper fix soon, this would be the best solution :) > > If it will take some time for You we should either merge revert for now or send patch to skip those 3 tests in > > neutron > > jobs to unblock our gates. > > Please tell me which solution I should go with then. > > just a quick update. > we have decided to merge the revert. it will take a while for the revert to merge > so please hold of on recheck neutron patches until it has merged. > we will update the list when that happens > > once we have updated and tested the patch for the same host resize revert edgecase > we will resubmit it. ill pull in the neturon job with a DNM patch to validate it. > sorry for the breakage. > > > > > > > > > > > On 24 Jun 2019, at 09:29, Slawomir Kaplonski wrote: > > > > > > > > > > Hi neutrinos, > > > > > > > > > > Since about 2 days we have some problem with iptables_hybrid jobs in Neutron CI. See [1] for details. > > > > > I’m now trying to find out what can be the problem there but if I will not find it anything soon I will > > > > > propose > > > > > temporary patch to skip those 3 failing tests for now. > > > > > Please don’t recheck Your patches if it failed on those 2 jobs - this will fail again :/ > > > > > > > > > > [1] https://bugs.launchpad.net/neutron/+bug/1833902 > > > > > — > > > > > Slawek Kaplonski > > > > > Senior software engineer > > > > > Red Hat > > > > > > > > > > > > > — > > > > Slawek Kaplonski > > > > Senior software engineer > > > > Red Hat > > > > > > > > > > > > — > > Slawek Kaplonski > > Senior software engineer > > Red Hat > > > > From jungleboyj at gmail.com Mon Jun 24 21:50:44 2019 From: jungleboyj at gmail.com (Jay Bryant) Date: Mon, 24 Jun 2019 16:50:44 -0500 Subject: NetApp E-series infiniband support for Cinder In-Reply-To: References: <20190621091159.isdnyah3tkms26xb@localhost> <20190621093030.otjr3ukiif635ork@localhost> Message-ID: Erlon, Good point.  Hadn't thought about the fact that it never did support IB. Jay On 6/24/2019 3:18 PM, Erlon Cruz wrote: > Hi Grant, > > As Gorka said, support for eseries was dropped in Stein, but even if > not it never supported infiniband. > > Erlon > > Em sex, 21 de jun de 2019 às 06:36, Gorka Eguileor > > escreveu: > > On 21/06, Grant Morley wrote: > > Hi Gorka, > > > > Thanks for that, I'll let the business know that we wont want to > be using > > that. We had planned to move to Stein later this year! > > > > Regards, > > Glad I could help avoid that awkward moment.  ;-) > > > > > > > On 21/06/2019 10:11, Gorka Eguileor wrote: > > > On 20/06, Grant Morley wrote: > > > > Hi All, > > > > > > > > Just a quick one to see if anybody knows if there is > currently any > > > > infiniband support for cinder using a NetApp E-Series SAN. I > have had a look > > > > at: > > > > > > > > > https://docs.openstack.org/cinder/queens/configuration/block-storage/drivers/netapp-volume-driver.html > > > > > > > > and: > > > > > > > > > https://docs.openstack.org/cinder/rocky/reference/support-matrix.html > > > > > > > > They seem to suggest that only iSCSI and FC are supported. I > just want to > > > > make sure before I start trying to do a POC with the > E-series and > > > > infiniband. > > > > > > > > Any advice would be much appreciated. > > > > > > > > Kind Regards, > > > > > > > > -- > > > > > > > > Grant Morley > > > > Cloud Lead, Civo Ltd > > > > www.civo.com | > Signup for an account! > > > > > > > Hi, > > > > > > I don't know about support for infiniband, but the driver > support has > > > been dropped in Stein. > > > > > > The release notes [1] state: > > > > > >    Support for NetApp E-Series has been removed. The NetApp > Unified > > >    driver can now only be used with NetApp Clustered Data ONTAP. > > > > > > Regards, > > > Gorka. > > > > > > [1]: https://docs.openstack.org/releasenotes/cinder/stein.html > > -- > > > > Grant Morley > > Cloud Lead, Civo Ltd > > www.civo.com | > Signup for an account! > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Mon Jun 24 23:36:25 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Mon, 24 Jun 2019 16:36:25 -0700 Subject: [all] [ptls] [tc] [nova] [neutron] [tripleo] Volunteers that know TeX for PDF community goal In-Reply-To: <20190624155629.GA26343@sinanju.localdomain> References: <20190624155629.GA26343@sinanju.localdomain> Message-ID: I gave this a quick run on Octavia. I did get it to output a PDF with our svg included. I see a few issues right away (beyond the screens and screens of warnings). How do we want to collect this feedback? Storyboard stories with a tag? 1. I needed to add the following bindeps: librsvg2-bin [doc platform:dpkg] fonts-freefont-otf [doc platform:dpkg] 2. Relative links come through to the PDF but are broken. 3, Oddly, the "configuration" section of our docs didn't render, it's just a blank section. Even if the generated configuration guide didn't work, I would have expected the RST policies document to come through. Even more strange, the configuration guide is linked from another section, and it rendered there. This must be one of the billion warnings that output. 4. We should document how to ignore or re-order the docs. We have an internal API reference that comes through as the first section, but is of little use to anyone outside the developers. It is also confusing as the actual Octavia API-REF link doesn't render. 5. The feature matrix tables rendered ok, except the red "X" does not (unicode 2716). (https://opendev.org/openstack/sphinx-feature-classification) Michael On Mon, Jun 24, 2019 at 8:59 AM Matthew Treinish wrote: > > On Mon, Jun 24, 2019 at 05:26:08PM +0200, Bogdan Dobrelya wrote: > > On 24.06.2019 12:29, Alexandra Settle wrote: > > > Hi all, > > > > > > The work for the Train community goal - PDF support for project docs - > > > is well underway. [1] Now, we're looking for volunteers to help test the > > > implementation. > > > > > > We'll need someone to help build the docs into PDFs and determine things > > > we can fix through tweaks to our docs, or if they're bugs in Sphinx. > > > AKA: We need a troubleshoot artist. > > > > There seems to be an issue [0] for any projects using the badges [1] or > > other SVGs in their docs. Also the default levels of nesting of the {\begin > > ... \end} stanzas might require additional tunings, like [2]. I'll keep > > posting here on the further issues discovered for PDF doc builds for > > TripleO. Stay tuned :) > > The svg in pdf thing was a known issue. When I first looked at building the > nova docs with latex/pdf output a few years ago [1] you had to manually > convert the images before building the latex. Since then sphinx has added > an extension to do this for you: > > https://www.sphinx-doc.org/en/master/usage/extensions/imgconverter.html > > You should be able to just add that to the extension list in conf.py and it > will convert the svgs at sphinx build time. > > -Matt Treinish > > [1] https://opendev.org/openstack/nova/commit/62575dd40e5b7698d9ba54641558246489f0614e > > > > > [0] https://github.com/sphinx-doc/sphinx/issues/4720#issuecomment-372046571 > > [1] https://governance.openstack.org/tc/badges/ > > [2] https://review.opendev.org/667114 > > > > > > > > If you can volunteer, please add yourself to the wiki table here [2]. > > > I've added neutron and nova specifically here as we need someone who is > > > familiar with the project and it's dependencies to help us get that setup. > > > > > > Any questions? Reach out. > > > > > > Cheers, > > > > > > Alex > > > > > > [1] > > > https://review.opendev.org/#/q/topic:build-pdf-docs+(status:open+OR+status:merged) > > > > > > [2] > > > https://wiki.openstack.org/wiki/Documentation#PDF_for_Project_Docs_-_Community_Goal > > > > > > > > > -- > > Best regards, > > Bogdan Dobrelya, > > Irc #bogdando > > From jm at artfiles.de Tue Jun 25 07:41:00 2019 From: jm at artfiles.de (Jan Marquardt) Date: Tue, 25 Jun 2019 09:41:00 +0200 Subject: [neutron] Provider/External Networking Message-ID: <0ED68C26-90C4-40B3-BA42-9272A39A97EC@artfiles.de> Hallo, we have/are planning an Openstack cluster which is supposed to look like this: http://paste.openstack.org/show/753337/ We are currently struggling with the external/provider networks. Is there any way/best practice to terminate the provider networks directly on the network nodes and announce the prefixes through BGP to the upstream routers? Any experience report or hint would really be appreciated. Best regards Jan -- Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95 E-Mail: support at artfiles.de | Web: http://www.artfiles.de Geschäftsführer: Harald Oltmanns | Tim Evers Eingetragen im Handelsregister Hamburg - HRB 81478 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From skaplons at redhat.com Tue Jun 25 08:06:24 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Tue, 25 Jun 2019 10:06:24 +0200 Subject: [neutron] Provider/External Networking In-Reply-To: <0ED68C26-90C4-40B3-BA42-9272A39A97EC@artfiles.de> References: <0ED68C26-90C4-40B3-BA42-9272A39A97EC@artfiles.de> Message-ID: <90267D67-10D0-4C82-8B9E-8AEF4C218795@redhat.com> Hi, For BGP You should check neutron-dynamic-routing project: https://docs.openstack.org/neutron-dynamic-routing/latest/ > On 25 Jun 2019, at 09:41, Jan Marquardt wrote: > > Hallo, > > we have/are planning an Openstack cluster which is supposed to > look like this: > > http://paste.openstack.org/show/753337/ > > We are currently struggling with the external/provider networks. > Is there any way/best practice to terminate the provider networks > directly on the network nodes and announce the prefixes through > BGP to the upstream routers? > > Any experience report or hint would really be appreciated. > > Best regards > > Jan > > -- > Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg > Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95 > E-Mail: support at artfiles.de | Web: http://www.artfiles.de > Geschäftsführer: Harald Oltmanns | Tim Evers > Eingetragen im Handelsregister Hamburg - HRB 81478 > — Slawek Kaplonski Senior software engineer Red Hat From mark at stackhpc.com Tue Jun 25 08:17:40 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 25 Jun 2019 09:17:40 +0100 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <138cfc19-1850-a60a-b14a-d5a2ab8f0c85@fried.cc> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> <8e240b9c-c5f0-3d1f-5b16-c543bbbc525b@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC3299E@BGSMSX102.gar.corp.intel.com> <138cfc19-1850-a60a-b14a-d5a2ab8f0c85@fried.cc> Message-ID: On Mon, 24 Jun 2019 at 17:13, Eric Fried wrote: > > > If we now resize the instance to a different flavor which has a > > different set of traits, we would end up with a new set of traits, > > which map a new set of deploy templates, with a new set of steps. > > > > How do we apply this change? Should we execute all matching deploy > > steps, which could (e.g. RAID) result in losing data? Or should we > > attempt to execute only those deploy steps that have changed? Would > > that always work? I don't think we keep a record of the steps used to > > provision a node, so if templates have changed in the intervening time > > then we might not get a correct diff. > > Not being intimately familiar with the workings, the approach I've been > advocating is to only support the changes you support, and fail on > anything else. > > In other words, compare the old flavor to the new flavor. If the diff > contains anything other than this "hyperthreading" gizmo, fail. > Hmm, I hadn't realised it would be quite this restricted. Although this could make it work, it does seem to be baking more ironic specifics into nova. There is an issue of standardisation here. Currently we do not have standard traits to describe these things, instead we use custom traits. The reason for this has been discussed earlier in this thread, essentially that we need to encode configuration key and value into the trait, and use the lack of a trait as 'don't care'. We did briefly discuss an alternative approach, but we're a fair way off having that. > Ironic resize is a special snowflake, and only supports a very limited > set of changes done in a very limited way. At first, it's just one > thing. You can add other pieces as demand arises, but by default you're > rejecting big complicated things like your RAID example. > > efried > . > From mark at stackhpc.com Tue Jun 25 08:27:06 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 25 Jun 2019 09:27:06 +0100 Subject: [ironic] To not have meetings? In-Reply-To: References: Message-ID: On Mon, 10 Jun 2019 at 15:02, Julia Kreger wrote: > > Last week the discussion came up of splitting the ironic meeting to > alternate time zones as we have increasing numbers of contributors in > the Asia/Pacific areas of the world[0]. With that discussion, an > additional interesting question came up posing the question of > shifting to the mailing list instead of our present IRC meeting[1]? > > It is definitely an interesting idea, one that I'm personally keen on > because of time zones and daylight savings time. > > I think before we do this, we should collect thoughts and also try to > determine how we would pull this off so we don't forget the weekly > checkpoint that the meeting serves. I think we need to do something, > so I guess now is a good time to provide input into what everyone > thinks would be best for the project and facilitating the weekly > check-in. > > What I think might work: > > By EOD UTC Monday: > > * Listed primary effort participants will be expected to update the > whiteboard[2] weekly before EOD Monday UTC > * Contributors propose patches to the whiteboard that they believe > would be important for reviewers to examine this coming week. > * PTL or designee sends weekly email to the mailing list to start an > update thread shortly after EOD Monday UTC or early Tuesday UTC. > ** Additional updates, questions, and topical discussion (new > features, RFEs) would ideally be wrapped up by EOD UTC Tuesday. > > With that, I think we would also need to go ahead and begin having > "office hours" as during the week we generally know some ironic > contributors will be in IRC and able to respond to questions. I think > this would initially consist of our meeting time and perhaps the other > time that seems to be most friendly to the contributors int he > Asia/Pacific area[3]. > > Thoughts/ideas/suggestions welcome! I'm interested to see how it would work. I do sometimes have other commitments that mean I can't attend a meeting, and having a way not just to catch up but get involved would be nice. The main risks that I can see are that it could stifle discussion through latency of email, and it could be a little less accessible than IRC. > > -Julia > > [0]: http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-03.log.html#t2019-06-03T15:31:33 > [1]: http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-03.log.html#t2019-06-03T15:43:16 > [2]: https://etherpad.openstack.org/p/IronicWhiteBoard > [3]: https://doodle.com/poll/bv9a4qyqy44wiq92 > From mark at stackhpc.com Tue Jun 25 08:44:41 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 25 Jun 2019 09:44:41 +0100 Subject: [kolla][octavia][openstack-ansible][tripleo] octavia in kolla ansible In-Reply-To: References: Message-ID: On Sun, 23 Jun 2019 at 05:52, Erik McCormick wrote: > > Some goodies inline below. > > On Thu, Jun 20, 2019 at 12:34 PM Mark Goddard wrote: >> >> On Thu, 20 Jun 2019 at 17:27, Michael Johnson wrote: >> > >> > Hi Mark, >> > >> > I wanted to highlight that I wrote a detailed certificate >> > configuration guide for Octavia here: >> > https://docs.openstack.org/octavia/latest/admin/guides/certificates.html >> > >> > This could be used to automate the certificate generation for Kolla deployments. >> >> Great, that looks useful. > > > Michael was super awesome creating this after Tobias and I (and a few other folks) ran into road blocks with this. Many many thanks for that. > >> > >> > Let me know if you have any questions about the guide or steps, >> > Michael >> > >> > On Thu, Jun 20, 2019 at 7:31 AM Mark Goddard wrote: >> > > >> > > On Thu, 20 Jun 2019 at 15:08, Alex Schultz wrote: >> > > > >> > > > >> > > > >> > > > On Thu, Jun 20, 2019 at 8:00 AM Mark Goddard wrote: >> > > >> >> > > >> Hi, >> > > >> >> > > >> In the recent kolla meeting [1] we discussed the usability of octavia >> > > >> in kolla ansible. We had feedback at the Denver summit [2] that this >> > > >> service is difficult to deploy and requires a number of manual steps. >> > > >> Certificates are one of the main headaches. It was stated that OSA [3] >> > > >> may have some useful code we could look into. >> > > > >> > > > >> > > > I second that Octavia is very painful to install. There is a requirement of a bunch of openstack cloud configurations (flavors/images/etc) that must be handled prior to actually configuring the service which means it's complex to deploy. IMHO it would have been beneficial for some of these items to actually have been rolled into the service itself (ie dynamically querying the services for flavor information rather than expecting an ID put into a configuration file). That being said, we have managed to get it integrated into tripleo but it's rather complex. It does use ansible if you want to borrow some of the concepts for os_octavia. >> > > > >> > > > https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/octavia >> > > > https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/octavia-files.yaml >> > > > https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/roles >> > > > >> > > > Additionally we're still leveraging some of the puppet-octavia code to manage configs/nova flavors. >> > > >> > > Thanks for sharing Alex & Carlos - useful source material. >> > > > > > Most of that bitching in Denver was me. Thanks to John for recording it for posterity. With Octavia being the defacto LBaaS standard in Openstack we need to do a much better job deploying it. This proved to be a bit of a headache with kolla-ansible, but I don't think it's terribly far off. > > The main issue, as mentioned in all the reference materials you provided, is certificate generation. Kolla-ansible presently only supports a single CA and does not help operators create that in any way. Also, you really need to have two CAs. Beyond that, Octavia is extremely fussy about its certificates and getting it right can be a royal pain. > Kolla Ansible (IMO) quite nicely separates certificate generation for the API from configuration of which cert to use. So you can generate your own cert and drop it in or use the kolla-ansible certificates command. I'd like to stick to this pattern > During my first attempt at doing this deployment, I went in search of some project that had the certificate generation as a component and found that OSA had the functionality and documented it well. I extracted the needed bits, ran it, and came out the other side with a working set of certificates. Then, like most operators, I saw a squirrel and forgot to do anything more with it. > > So with the introduction out of the way, here's what I used. It was entirely derived from OSA with much love. > > https://github.com/emccormickva/octavia-cert-generate > > I also had to make a few hacks to kolla-ansible to get everything going. > > 1) In the octavia config template (octavia.conf.j2) I updated the config options to use the new certificates as follows: > > [certificates] > ca_private_key = /etc/octavia/certs/private/cakey.pem > ca_certificate = /etc/octavia/certs/ca_server_01.pem > > [haproxy_amphora] > server_ca = /etc/octavia/certs/ca_server_01.pem > client_cert = /etc/octavia/certs/client.pem > > [controller_worker] > client_ca = /etc/octavia/certs/ca_01.pem > > 2) Update kolla's config-yml to copy over all the certs for each container > > with_items: > - cakey.pem > - ca_01.pem > - ca_server_01.pem > - client.pem > > I think I had to make a few other hacks in Queens, but all of those seem to have been addressed already (just doing a diff of master vs. my current configs). If we can incorporate certificate generation, get 2 CAs, and copy / configure them properly, I think everything will be great. Maybe others have additional asks, but this would do it for me. If I can scrap together some time, I'll see if I can get some commits together to make it happen, but that's always a dodgy proposition. I'm also always willing to review whatever or answer questions from someone else who wants to take it on. > > Cheers, > Erik > > >> > > > >> > > > Thanks, >> > > > -Alex >> > > > >> > > >> >> > > >> >> > > >> As a starting point to improving this support, I'd like to gather >> > > >> information from people who are using octavia in kolla ansible, and >> > > >> what they have had to do to make it work. Please respond to this >> > > >> email. >> > > >> >> > > >> I've also tagged openstack-ansible and Tripleo - if there is any >> > > >> useful information those teams have to share about this topic, it is >> > > >> most welcome. Alternatively if your support for octavia also falls >> > > >> short perhaps we could collaborate on improvements. >> > > >> >> > > >> Thanks, >> > > >> Mark >> > > >> >> > > >> [1] http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-86 >> > > >> [2] https://etherpad.openstack.org/p/DEN-train-kolla-feedback >> > > >> [3] https://opendev.org/openstack/openstack-ansible-os_octavia >> > > >> >> > > >> From mark at stackhpc.com Tue Jun 25 08:47:51 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 25 Jun 2019 09:47:51 +0100 Subject: [kolla][octavia][openstack-ansible][tripleo] octavia in kolla ansible In-Reply-To: References: Message-ID: On Tue, 25 Jun 2019 at 09:44, Mark Goddard wrote: > > On Sun, 23 Jun 2019 at 05:52, Erik McCormick wrote: > > > > Some goodies inline below. > > > > On Thu, Jun 20, 2019 at 12:34 PM Mark Goddard wrote: > >> > >> On Thu, 20 Jun 2019 at 17:27, Michael Johnson wrote: > >> > > >> > Hi Mark, > >> > > >> > I wanted to highlight that I wrote a detailed certificate > >> > configuration guide for Octavia here: > >> > https://docs.openstack.org/octavia/latest/admin/guides/certificates.html > >> > > >> > This could be used to automate the certificate generation for Kolla deployments. > >> > >> Great, that looks useful. > > > > > > Michael was super awesome creating this after Tobias and I (and a few other folks) ran into road blocks with this. Many many thanks for that. > > > >> > > >> > Let me know if you have any questions about the guide or steps, > >> > Michael > >> > > >> > On Thu, Jun 20, 2019 at 7:31 AM Mark Goddard wrote: > >> > > > >> > > On Thu, 20 Jun 2019 at 15:08, Alex Schultz wrote: > >> > > > > >> > > > > >> > > > > >> > > > On Thu, Jun 20, 2019 at 8:00 AM Mark Goddard wrote: > >> > > >> > >> > > >> Hi, > >> > > >> > >> > > >> In the recent kolla meeting [1] we discussed the usability of octavia > >> > > >> in kolla ansible. We had feedback at the Denver summit [2] that this > >> > > >> service is difficult to deploy and requires a number of manual steps. > >> > > >> Certificates are one of the main headaches. It was stated that OSA [3] > >> > > >> may have some useful code we could look into. > >> > > > > >> > > > > >> > > > I second that Octavia is very painful to install. There is a requirement of a bunch of openstack cloud configurations (flavors/images/etc) that must be handled prior to actually configuring the service which means it's complex to deploy. IMHO it would have been beneficial for some of these items to actually have been rolled into the service itself (ie dynamically querying the services for flavor information rather than expecting an ID put into a configuration file). That being said, we have managed to get it integrated into tripleo but it's rather complex. It does use ansible if you want to borrow some of the concepts for os_octavia. > >> > > > > >> > > > https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/octavia > >> > > > https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/octavia-files.yaml > >> > > > https://opendev.org/openstack/tripleo-common/src/branch/master/playbooks/roles > >> > > > > >> > > > Additionally we're still leveraging some of the puppet-octavia code to manage configs/nova flavors. > >> > > > >> > > Thanks for sharing Alex & Carlos - useful source material. > >> > > > > > > > > Most of that bitching in Denver was me. Thanks to John for recording it for posterity. With Octavia being the defacto LBaaS standard in Openstack we need to do a much better job deploying it. This proved to be a bit of a headache with kolla-ansible, but I don't think it's terribly far off. > > > > The main issue, as mentioned in all the reference materials you provided, is certificate generation. Kolla-ansible presently only supports a single CA and does not help operators create that in any way. Also, you really need to have two CAs. Beyond that, Octavia is extremely fussy about its certificates and getting it right can be a royal pain. > > > > Kolla Ansible (IMO) quite nicely separates certificate generation for > the API from configuration of which cert to use. So you can generate > your own cert and drop it in or use the kolla-ansible certificates > command. I'd like to stick to this pattern Pressed send too early :( I'd like to first make it possible to drop in certs for octavia, which should just be the code you've included below. If you have time to tidy that up and push it that would be great. A second step would be to incorporate certificate generation, and your role and OSA might be a good starting point for that. > > > During my first attempt at doing this deployment, I went in search of some project that had the certificate generation as a component and found that OSA had the functionality and documented it well. I extracted the needed bits, ran it, and came out the other side with a working set of certificates. Then, like most operators, I saw a squirrel and forgot to do anything more with it. > > > > So with the introduction out of the way, here's what I used. It was entirely derived from OSA with much love. > > > > https://github.com/emccormickva/octavia-cert-generate > > > > I also had to make a few hacks to kolla-ansible to get everything going. > > > > 1) In the octavia config template (octavia.conf.j2) I updated the config options to use the new certificates as follows: > > > > [certificates] > > ca_private_key = /etc/octavia/certs/private/cakey.pem > > ca_certificate = /etc/octavia/certs/ca_server_01.pem > > > > [haproxy_amphora] > > server_ca = /etc/octavia/certs/ca_server_01.pem > > client_cert = /etc/octavia/certs/client.pem > > > > [controller_worker] > > client_ca = /etc/octavia/certs/ca_01.pem > > > > 2) Update kolla's config-yml to copy over all the certs for each container > > > > with_items: > > - cakey.pem > > - ca_01.pem > > - ca_server_01.pem > > - client.pem > > > > I think I had to make a few other hacks in Queens, but all of those seem to have been addressed already (just doing a diff of master vs. my current configs). If we can incorporate certificate generation, get 2 CAs, and copy / configure them properly, I think everything will be great. Maybe others have additional asks, but this would do it for me. If I can scrap together some time, I'll see if I can get some commits together to make it happen, but that's always a dodgy proposition. I'm also always willing to review whatever or answer questions from someone else who wants to take it on. > > > > Cheers, > > Erik > > > > > >> > > > > >> > > > Thanks, > >> > > > -Alex > >> > > > > >> > > >> > >> > > >> > >> > > >> As a starting point to improving this support, I'd like to gather > >> > > >> information from people who are using octavia in kolla ansible, and > >> > > >> what they have had to do to make it work. Please respond to this > >> > > >> email. > >> > > >> > >> > > >> I've also tagged openstack-ansible and Tripleo - if there is any > >> > > >> useful information those teams have to share about this topic, it is > >> > > >> most welcome. Alternatively if your support for octavia also falls > >> > > >> short perhaps we could collaborate on improvements. > >> > > >> > >> > > >> Thanks, > >> > > >> Mark > >> > > >> > >> > > >> [1] http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-86 > >> > > >> [2] https://etherpad.openstack.org/p/DEN-train-kolla-feedback > >> > > >> [3] https://opendev.org/openstack/openstack-ansible-os_octavia > >> > > >> > >> > > > >> From mnasiadka at gmail.com Tue Jun 25 09:09:14 2019 From: mnasiadka at gmail.com (=?UTF-8?Q?Micha=C5=82_Nasiadka?=) Date: Tue, 25 Jun 2019 11:09:14 +0200 Subject: [kolla][kayobe] vote: kayobe as a kolla deliverable In-Reply-To: References: Message-ID: Hi Mark, Before I will vote - I would like to understand the future of Kayobe. Is the plan for it to stay as it is currently (Kolla on Bifrost)? Or rather to evolve and fill the place of kolla-cli? If we include it as a kolla deliverable - will it not have impact on it's current developers and cores? We already had a kolla-cli project, and when the creators left, nobody was there to pick it up... Thanks, Michal czw., 20 cze 2019 o 15:51 Mark Goddard napisał(a): > > Hi, > > In the most recent kolla meeting [1] we discussed the possibility of > kayobe becoming a deliverable of the kolla project. This follows on > from discussion at the PTG and then on here [3]. > > The two options discussed are: > > 1. become a deliverable of the Kolla project > 2. become an official top level OpenStack project > > There has been some positive feedback about option 1 and no negative > feedback that I am aware of. I would therefore like to ask the kolla > community to vote on whether to include kayobe as a deliverable of the > kolla project. The electorate is the kolla-core and kolla-ansible core > teams, excluding me. The opinion of others in the community is also > welcome. > > If you have questions or feedback, please respond to this email. > > Once you have made a decision, please respond with your answer to the > following question: > > "Should kayobe become a deliverable of the kolla project?" (yes/no) > > Thanks, > Mark > > [1] http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-120 > [2] https://etherpad.openstack.org/p/kolla-train-ptg > [3] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006901.html > -- Michał Nasiadka mnasiadka at gmail.com From geguileo at redhat.com Tue Jun 25 09:54:56 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Tue, 25 Jun 2019 11:54:56 +0200 Subject: How to reduce image size? In-Reply-To: <9D8A2486E35F0941A60430473E29F15B017EABB4E3@mxdb2.ad.garvan.unsw.edu.au> References: <9D8A2486E35F0941A60430473E29F15B017EABB4E3@mxdb2.ad.garvan.unsw.edu.au> Message-ID: <20190625095456.3dfpghrxapsjzlup@localhost> On 24/06, Manuel Sopena Ballesteros wrote: > Dear Openstack community, > > I would like to reduce the size of an image, I wrote this command: > > # openstack image set --property size=700000 centos7.6-kudu-image > Unable to set 'size' to '700000'. Reason: '700000' is not of type u'null', u'integer' > > Failed validating u'type' in schema[u'properties'][u'size']: > {u'description': u'Size of image file in bytes', > u'readOnly': True, > u'type': [u'null', u'integer']} > > On instance[u'size']: > '700000' > Hi, This looks like a bug to me. It looks like it is reading 700000 as a string and failing when checking if it's an integer or null. Cheers, Gorka. > I guess the real issue is u'readOnly': True > > How can I reduce the image size? > > Thank you very much > > Manuel > NOTICE > Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed. From bharat at stackhpc.com Tue Jun 25 10:54:36 2019 From: bharat at stackhpc.com (Bharat Kunwar) Date: Tue, 25 Jun 2019 11:54:36 +0100 Subject: [magnum] nodegroups Message-ID: Hi Theodoros, Just replying to your message on IRC here as you appear offline. > if you don't care for what you have in the db i would do this: > - stop both api and conductor > - drop magnum db > - checkout master > - run the migrations (from master) After getting this far, I tried running `magnum-db-manage upgrade` but I hit this: sqlalchemy.exc.InternalError: (pymysql.err.InternalError) (1049, u"Unknown database 'magnum'") (Background on this error at: http://sqlalche.me/e/2j85) > - start the services I can’t start the services either but that is probably due to lack of a database… > - create a cluster > - checkout magnum_nodegroups > - stop services > - run migrations > - start the services > I know it's not great… Would you mind letting me know where I’m going wrong? Thanks Bharat From mark at stackhpc.com Tue Jun 25 11:18:50 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 25 Jun 2019 12:18:50 +0100 Subject: [kolla][kayobe] vote: kayobe as a kolla deliverable In-Reply-To: References: Message-ID: On Tue, 25 Jun 2019 at 10:09, Michał Nasiadka wrote: > > Hi Mark, > > Before I will vote - I would like to understand the future of Kayobe. > > Is the plan for it to stay as it is currently (Kolla on Bifrost)? Or > rather to evolve and fill the place of kolla-cli? I think it has potential to fill a kolla-cli like role. There are deployments of kayobe that do not use the bare metal deployment features. In that case you can still use the host configuration features (networking, local storage, etc). There are also some configuration patterns encouraged by kayobe such as version controlled configuration (e.g. https://github.com/SKA-ScienceDataProcessor/alaska-kayobe-config), and we intend to add support for multiple environments (dev/staging/prod) or regions within a single configuration repo. This does add some complexity on top of kolla-ansible, so there is a trade off here. > > If we include it as a kolla deliverable - will it not have impact on > it's current developers and cores? > We already had a kolla-cli project, and when the creators left, nobody > was there to pick it up... I think the main impact on existing kolla developers will be increased noise, due to covering kayobe topics in meetings and IRC. I wouldn't expect kolla cores to start working on kayobe unless they wish to. We already have contributors with interest only in kolla or kolla-ansible. I do understand your concern given the recent experience with kolla-cli. In that case the contributors left the community shortly after the project became official and we did not really integrate it well. It also did not see much adoption, for whatever reason. Of course I can't predict the future, but I would say that kayobe is quite different. It has been open source from the beginning, and now has a significant user base. The kayobe team isn't going anywhere. We hope that including the project as a kolla deliverable would allow it to continue to grow. > > Thanks, > Michal > > czw., 20 cze 2019 o 15:51 Mark Goddard napisał(a): > > > > Hi, > > > > In the most recent kolla meeting [1] we discussed the possibility of > > kayobe becoming a deliverable of the kolla project. This follows on > > from discussion at the PTG and then on here [3]. > > > > The two options discussed are: > > > > 1. become a deliverable of the Kolla project > > 2. become an official top level OpenStack project > > > > There has been some positive feedback about option 1 and no negative > > feedback that I am aware of. I would therefore like to ask the kolla > > community to vote on whether to include kayobe as a deliverable of the > > kolla project. The electorate is the kolla-core and kolla-ansible core > > teams, excluding me. The opinion of others in the community is also > > welcome. > > > > If you have questions or feedback, please respond to this email. > > > > Once you have made a decision, please respond with your answer to the > > following question: > > > > "Should kayobe become a deliverable of the kolla project?" (yes/no) > > > > Thanks, > > Mark > > > > [1] http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-120 > > [2] https://etherpad.openstack.org/p/kolla-train-ptg > > [3] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006901.html > > > > > -- > Michał Nasiadka > mnasiadka at gmail.com From a.settle at outlook.com Tue Jun 25 11:23:22 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Tue, 25 Jun 2019 11:23:22 +0000 Subject: [all] [ptls] [tc] [nova] [neutron] [tripleo] Volunteers that know TeX for PDF community goal In-Reply-To: References: <20190624155629.GA26343@sinanju.localdomain> Message-ID: On 25/06/2019 00:36, Michael Johnson wrote: > I gave this a quick run on Octavia. I did get it to output a PDF with > our svg included. Hooray! > > I see a few issues right away (beyond the screens and screens of warnings). Thanks for sharing :) > > How do we want to collect this feedback? Storyboard stories with a tag? I think so. Doug pointed me in the direction of: https://github.com/openstack/goal-tools (which I was unaware of previously) I've messaged Stephen to see what his plans are as he's starting the entire thing off, and might be best if he generates the story as I'll be away as of Thursday. > > 1. I needed to add the following bindeps: > librsvg2-bin [doc platform:dpkg] > fonts-freefont-otf [doc platform:dpkg] > > 2. Relative links come through to the PDF but are broken. > 3, Oddly, the "configuration" section of our docs didn't render, it's > just a blank section. Even if the generated configuration guide didn't > work, I would have expected the RST policies document to come through. > Even more strange, the configuration guide is linked from another > section, and it rendered there. This must be one of the billion > warnings that output. > 4. We should document how to ignore or re-order the docs. We have an > internal API reference that comes through as the first section, but is > of little use to anyone outside the developers. It is also confusing > as the actual Octavia API-REF link doesn't render. > 5. The feature matrix tables rendered ok, except the red "X" does not > (unicode 2716). > (https://opendev.org/openstack/sphinx-feature-classification) > > Michael > > > > > > > > > On Mon, Jun 24, 2019 at 8:59 AM Matthew Treinish wrote: >> On Mon, Jun 24, 2019 at 05:26:08PM +0200, Bogdan Dobrelya wrote: >>> On 24.06.2019 12:29, Alexandra Settle wrote: >>>> Hi all, >>>> >>>> The work for the Train community goal - PDF support for project docs - >>>> is well underway. [1] Now, we're looking for volunteers to help test the >>>> implementation. >>>> >>>> We'll need someone to help build the docs into PDFs and determine things >>>> we can fix through tweaks to our docs, or if they're bugs in Sphinx. >>>> AKA: We need a troubleshoot artist. >>> There seems to be an issue [0] for any projects using the badges [1] or >>> other SVGs in their docs. Also the default levels of nesting of the {\begin >>> ... \end} stanzas might require additional tunings, like [2]. I'll keep >>> posting here on the further issues discovered for PDF doc builds for >>> TripleO. Stay tuned :) >> The svg in pdf thing was a known issue. When I first looked at building the >> nova docs with latex/pdf output a few years ago [1] you had to manually >> convert the images before building the latex. Since then sphinx has added >> an extension to do this for you: >> >> https://www.sphinx-doc.org/en/master/usage/extensions/imgconverter.html >> >> You should be able to just add that to the extension list in conf.py and it >> will convert the svgs at sphinx build time. >> >> -Matt Treinish >> >> [1] https://opendev.org/openstack/nova/commit/62575dd40e5b7698d9ba54641558246489f0614e >> >>> [0] https://github.com/sphinx-doc/sphinx/issues/4720#issuecomment-372046571 >>> [1] https://governance.openstack.org/tc/badges/ >>> [2] https://review.opendev.org/667114 >>> >>>> If you can volunteer, please add yourself to the wiki table here [2]. >>>> I've added neutron and nova specifically here as we need someone who is >>>> familiar with the project and it's dependencies to help us get that setup. >>>> >>>> Any questions? Reach out. >>>> >>>> Cheers, >>>> >>>> Alex >>>> >>>> [1] >>>> https://review.opendev.org/#/q/topic:build-pdf-docs+(status:open+OR+status:merged) >>>> >>>> [2] >>>> https://wiki.openstack.org/wiki/Documentation#PDF_for_Project_Docs_-_Community_Goal >>>> >>> >>> -- >>> Best regards, >>> Bogdan Dobrelya, >>> Irc #bogdando >>> From mnasiadka at gmail.com Tue Jun 25 11:53:45 2019 From: mnasiadka at gmail.com (=?UTF-8?Q?Micha=C5=82_Nasiadka?=) Date: Tue, 25 Jun 2019 13:53:45 +0200 Subject: [kolla][kayobe] vote: kayobe as a kolla deliverable In-Reply-To: References: Message-ID: Thanks for clarifications Mark Let's try - +1 for including it as Kolla deliverable. Michal wt., 25 cze 2019 o 13:19 Mark Goddard napisał(a): > > On Tue, 25 Jun 2019 at 10:09, Michał Nasiadka wrote: > > > > Hi Mark, > > > > Before I will vote - I would like to understand the future of Kayobe. > > > > Is the plan for it to stay as it is currently (Kolla on Bifrost)? Or > > rather to evolve and fill the place of kolla-cli? > > I think it has potential to fill a kolla-cli like role. There are > deployments of kayobe that do not use the bare metal deployment > features. In that case you can still use the host configuration > features (networking, local storage, etc). There are also some > configuration patterns encouraged by kayobe such as version controlled > configuration (e.g. > https://github.com/SKA-ScienceDataProcessor/alaska-kayobe-config), and > we intend to add support for multiple environments (dev/staging/prod) > or regions within a single configuration repo. This does add some > complexity on top of kolla-ansible, so there is a trade off here. > > > > > If we include it as a kolla deliverable - will it not have impact on > > it's current developers and cores? > > We already had a kolla-cli project, and when the creators left, nobody > > was there to pick it up... > > I think the main impact on existing kolla developers will be increased > noise, due to covering kayobe topics in meetings and IRC. I wouldn't > expect kolla cores to start working on kayobe unless they wish to. We > already have contributors with interest only in kolla or > kolla-ansible. > > I do understand your concern given the recent experience with > kolla-cli. In that case the contributors left the community shortly > after the project became official and we did not really integrate it > well. It also did not see much adoption, for whatever reason. Of > course I can't predict the future, but I would say that kayobe is > quite different. It has been open source from the beginning, and now > has a significant user base. The kayobe team isn't going anywhere. We > hope that including the project as a kolla deliverable would allow it > to continue to grow. > > > > > Thanks, > > Michal > > > > czw., 20 cze 2019 o 15:51 Mark Goddard napisał(a): > > > > > > Hi, > > > > > > In the most recent kolla meeting [1] we discussed the possibility of > > > kayobe becoming a deliverable of the kolla project. This follows on > > > from discussion at the PTG and then on here [3]. > > > > > > The two options discussed are: > > > > > > 1. become a deliverable of the Kolla project > > > 2. become an official top level OpenStack project > > > > > > There has been some positive feedback about option 1 and no negative > > > feedback that I am aware of. I would therefore like to ask the kolla > > > community to vote on whether to include kayobe as a deliverable of the > > > kolla project. The electorate is the kolla-core and kolla-ansible core > > > teams, excluding me. The opinion of others in the community is also > > > welcome. > > > > > > If you have questions or feedback, please respond to this email. > > > > > > Once you have made a decision, please respond with your answer to the > > > following question: > > > > > > "Should kayobe become a deliverable of the kolla project?" (yes/no) > > > > > > Thanks, > > > Mark > > > > > > [1] http://eavesdrop.openstack.org/meetings/kolla/2019/kolla.2019-06-19-15.00.log.html#l-120 > > > [2] https://etherpad.openstack.org/p/kolla-train-ptg > > > [3] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006901.html > > > > > > > > > -- > > Michał Nasiadka > > mnasiadka at gmail.com -- Michał Nasiadka mnasiadka at gmail.com From doug at doughellmann.com Tue Jun 25 13:17:50 2019 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 25 Jun 2019 09:17:50 -0400 Subject: [all] [ptls] [tc] [nova] [neutron] [tripleo] Volunteers that know TeX for PDF community goal In-Reply-To: References: <20190624155629.GA26343@sinanju.localdomain> Message-ID: Michael Johnson writes: > I gave this a quick run on Octavia. I did get it to output a PDF with > our svg included. > > I see a few issues right away (beyond the screens and screens of warnings). > > How do we want to collect this feedback? Storyboard stories with a tag? > > 1. I needed to add the following bindeps: > librsvg2-bin [doc platform:dpkg] > fonts-freefont-otf [doc platform:dpkg] We should add those globally in the job definition, and optionally in the project bindep file. > 2. Relative links come through to the PDF but are broken. Can you give an example of a link that didn't work? I suspect this has to do with using HTML links instead of :ref: or :doc: links that Sphinx knows how to resolve. > 3, Oddly, the "configuration" section of our docs didn't render, it's > just a blank section. Even if the generated configuration guide didn't > work, I would have expected the RST policies document to come through. > Even more strange, the configuration guide is linked from another > section, and it rendered there. This must be one of the billion > warnings that output. > 4. We should document how to ignore or re-order the docs. We have an > internal API reference that comes through as the first section, but is > of little use to anyone outside the developers. It is also confusing > as the actual Octavia API-REF link doesn't render. I'll defer to our information architects on the exact ordering, but I agree that we should be emphasizing user-facing content over contributor content. That likely means reordering the toctree directive in the root of several projects. It is *possible* to specify a different top-level document to use for controlling the order of content in the PDF build, but I don't recommend doing that because that will give you 2 separate toctrees to keep up to date as new content is added, and we're trying to achieve the first iteration of this goal without adding a ton of new maintenance burden for teams. > 5. The feature matrix tables rendered ok, except the red "X" does not > (unicode 2716). > (https://opendev.org/openstack/sphinx-feature-classification) > > Michael > > > > > > > > > On Mon, Jun 24, 2019 at 8:59 AM Matthew Treinish wrote: >> >> On Mon, Jun 24, 2019 at 05:26:08PM +0200, Bogdan Dobrelya wrote: >> > On 24.06.2019 12:29, Alexandra Settle wrote: >> > > Hi all, >> > > >> > > The work for the Train community goal - PDF support for project docs - >> > > is well underway. [1] Now, we're looking for volunteers to help test the >> > > implementation. >> > > >> > > We'll need someone to help build the docs into PDFs and determine things >> > > we can fix through tweaks to our docs, or if they're bugs in Sphinx. >> > > AKA: We need a troubleshoot artist. >> > >> > There seems to be an issue [0] for any projects using the badges [1] or >> > other SVGs in their docs. Also the default levels of nesting of the {\begin >> > ... \end} stanzas might require additional tunings, like [2]. I'll keep >> > posting here on the further issues discovered for PDF doc builds for >> > TripleO. Stay tuned :) >> >> The svg in pdf thing was a known issue. When I first looked at building the >> nova docs with latex/pdf output a few years ago [1] you had to manually >> convert the images before building the latex. Since then sphinx has added >> an extension to do this for you: >> >> https://www.sphinx-doc.org/en/master/usage/extensions/imgconverter.html >> >> You should be able to just add that to the extension list in conf.py and it >> will convert the svgs at sphinx build time. >> >> -Matt Treinish >> >> [1] https://opendev.org/openstack/nova/commit/62575dd40e5b7698d9ba54641558246489f0614e >> >> > >> > [0] https://github.com/sphinx-doc/sphinx/issues/4720#issuecomment-372046571 >> > [1] https://governance.openstack.org/tc/badges/ >> > [2] https://review.opendev.org/667114 >> > >> > > >> > > If you can volunteer, please add yourself to the wiki table here [2]. >> > > I've added neutron and nova specifically here as we need someone who is >> > > familiar with the project and it's dependencies to help us get that setup. >> > > >> > > Any questions? Reach out. >> > > >> > > Cheers, >> > > >> > > Alex >> > > >> > > [1] >> > > https://review.opendev.org/#/q/topic:build-pdf-docs+(status:open+OR+status:merged) >> > > >> > > [2] >> > > https://wiki.openstack.org/wiki/Documentation#PDF_for_Project_Docs_-_Community_Goal >> > > >> > >> > >> > -- >> > Best regards, >> > Bogdan Dobrelya, >> > Irc #bogdando >> > > -- Doug From ssbarnea at redhat.com Tue Jun 25 13:37:51 2019 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Tue, 25 Jun 2019 14:37:51 +0100 Subject: [devel] Fast flake8 testing In-Reply-To: <77112531d1c3e11bd6b2f9c178391ad073174586.camel@redhat.com> References: <99fea4a5-6f7e-bff0-72ff-3c6f322e0fbc@redhat.com> <77112531d1c3e11bd6b2f9c178391ad073174586.camel@redhat.com> Message-ID: Yeah, use pre-commit, the tool, not the git hook. At this moment it just rocks and was adopted by more and more openstack projects. See http://codesearch.openstack.org/?q=pre-commit&i=nope&files=.pre-commit-config.yaml&repos= for examples of projects using it (tox -e lint) Mainly pre-commit does something similar but with almost any linter, including flake8. Also has the ability to run only on changed files, runs nice under tox and has a much lower footprint (disk&cpu) than tox. And a bit of shameless advertising about my 1year old article explaining it: https://medium.com/@sbarnea/embracing-pre-commit-hooks-4ef1f4e72914 > On 21 Jun 2019, at 17:47, Stephen Finucane wrote: > > On Thu, 2019-06-20 at 13:37 -0400, Zane Bitter wrote: >> Those of you who work on a fairly large project will have noticed that >> running flake8 over all of it takes some time, and that this slows down >> development. >> >> Nova (at least) has a solution to this, in the form of a "fast8" tox >> environment that runs flake8 only against the files that have changed in >> the latest patch + the working directory. This is *much* faster, but >> that approach has some limitations: the script is buggy, it only tests >> the top-most patch, it creates a second tox environment (which is slow) >> that can then get out of sync with your regular pep8 environment, and of >> course it requires the project to add it explicitly. >> >> If you're interested in a solution with none of those limitations, here >> is a script that I've been using: >> >> https://gist.github.com/zaneb/7a8c752bfd97dd8972756d296fc5e41f > > Neat :) There's also the opportunity of integrating flake8 (and other > things) as a pre-commit hook, which is something I'm trying to adopt > within nova and the maybe oslo and further over time. > > http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007151.html > > That requires some project-level work though (including backports, if > you want it on stable branches) whereas your script can be used > anywhere. Both useful. > > Stephen > >> It tests all changes on the branch, using your existing pep8 tox >> environment, handles deleted files and changes to non-python files >> correctly, and should be usable for every OpenStack project. >> >> I hope this is helpful to someone. >> >> (Note that the pep8 environment on many projects includes other test >> commands in addition to flake8 - such as bandit - so you should still >> run the pep8 tox tests once before submitting a patch.) >> >> cheers, >> Zane. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ltoscano at redhat.com Tue Jun 25 14:10:45 2019 From: ltoscano at redhat.com (Luigi Toscano) Date: Tue, 25 Jun 2019 16:10:45 +0200 Subject: [all] [ptls] [tc] [nova] [neutron] [tripleo] Volunteers that know TeX for PDF community goal In-Reply-To: References: Message-ID: <27887188.8GcRtQqCbR@whitebase.usersys.redhat.com> On Tuesday, 25 June 2019 15:17:50 CEST Doug Hellmann wrote: > Michael Johnson writes: > > I gave this a quick run on Octavia. I did get it to output a PDF with > > our svg included. > > > > I see a few issues right away (beyond the screens and screens of > > warnings). > > > > How do we want to collect this feedback? Storyboard stories with a tag? > > > > 1. I needed to add the following bindeps: > > librsvg2-bin [doc platform:dpkg] > > fonts-freefont-otf [doc platform:dpkg] > > We should add those globally in the job definition, and optionally in > the project bindep file. > > > 2. Relative links come through to the PDF but are broken. > > Can you give an example of a link that didn't work? I suspect this has > to do with using HTML links instead of :ref: or :doc: links that Sphinx > knows how to resolve. I think I've seen this issue while testing the generation locally for Sahara: LaTeX Warning: Hyper reference `user/hadoop-swift::doc' on page 6 undefined on input line 368. (and many others similar warnings) The example above comes from https://opendev.org/openstack/sahara/src/branch/ master/doc/source/intro/overview.rst To get more information on how to enable swift support see :doc:`../user/hadoop-swift`. The generate document does not contain any hyperlink, but the replaced text is correct. -- Luigi From openstack at fried.cc Tue Jun 25 14:17:30 2019 From: openstack at fried.cc (Eric Fried) Date: Tue, 25 Jun 2019 09:17:30 -0500 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> <8e240b9c-c5f0-3d1f-5b16-c543bbbc525b@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC3299E@BGSMSX102.gar.corp.intel.com> <138cfc19-1850-a60a-b14a-d5a2ab8f0c85@fried.cc> Message-ID: <3f372dc5-c9c0-f66b-471a-5a2430241f66@fried.cc> > Hmm, I hadn't realised it would be quite this restricted. Although > this could make it work, it does seem to be baking more ironic > specifics into nova. Well, that's what virt drivers are for. In the simplest implementation, you have the Ironic virt driver's migrate_disk_and_power_off do the restrictive checking (all the information you need should be available to that method) and fail if necessary. That sucks a little bit because the failure is late (at compute vs. conductor or API). But that seems acceptable for something this limited, and is really no different than e.g. the libvirt driver failing if you try to resize the ephemeral disk down [1]. > There is an issue of standardisation here. Currently we do not have > standard traits to describe these things, instead we use custom > traits. The reason for this has been discussed earlier in this thread, > essentially that we need to encode configuration key and value into > the trait, and use the lack of a trait as 'don't care'. We did briefly > discuss an alternative approach, but we're a fair way off having that. I'm not sure that should really matter. If the logic lives in the virt driver as suggested above, you can do whatever fancy parsing and interpretation you like. efried P.S. I'll continue to repeat this disclaimer: I'm just spitballing here, no idea if this approach would have the support of Nova maintainers at large, or if there are major architectural blockers I'm not thinking of. [1] https://opendev.org/openstack/nova/src/branch/master/nova/virt/libvirt/driver.py#L8908-L8911 From johnsomor at gmail.com Tue Jun 25 14:38:11 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Tue, 25 Jun 2019 07:38:11 -0700 Subject: [all] [ptls] [tc] [nova] [neutron] [tripleo] Volunteers that know TeX for PDF community goal In-Reply-To: <27887188.8GcRtQqCbR@whitebase.usersys.redhat.com> References: <27887188.8GcRtQqCbR@whitebase.usersys.redhat.com> Message-ID: >> 2. Relative links come through to the PDF but are broken. > Can you give an example of a link that didn't work? I suspect this has > to do with using HTML links instead of :ref: or :doc: links that Sphinx > knows how to resolve. One of the cases I saw (sorry, don't have the upstream job functional yet) was an RST using the "image::" directive that had a ":target:" param. Our flow diagrams generate links to the actual SVG so you can zoom with your browser. Code is here: https://opendev.org/openstack/octavia/src/branch/master/tools/create_flow_docs.py#L108 That said, the others might be native links of some sort, I will keep that in mind as I try this stuff out. Michael On Tue, Jun 25, 2019 at 7:10 AM Luigi Toscano wrote: > > On Tuesday, 25 June 2019 15:17:50 CEST Doug Hellmann wrote: > > Michael Johnson writes: > > > I gave this a quick run on Octavia. I did get it to output a PDF with > > > our svg included. > > > > > > I see a few issues right away (beyond the screens and screens of > > > warnings). > > > > > > How do we want to collect this feedback? Storyboard stories with a tag? > > > > > > 1. I needed to add the following bindeps: > > > librsvg2-bin [doc platform:dpkg] > > > fonts-freefont-otf [doc platform:dpkg] > > > > We should add those globally in the job definition, and optionally in > > the project bindep file. > > > > > 2. Relative links come through to the PDF but are broken. > > > > Can you give an example of a link that didn't work? I suspect this has > > to do with using HTML links instead of :ref: or :doc: links that Sphinx > > knows how to resolve. > > I think I've seen this issue while testing the generation locally for Sahara: > > > LaTeX Warning: Hyper reference `user/hadoop-swift::doc' on page 6 undefined on > input line 368. > (and many others similar warnings) > > The example above comes from https://opendev.org/openstack/sahara/src/branch/ > master/doc/source/intro/overview.rst > > To get more information on how to enable swift support see > :doc:`../user/hadoop-swift`. > > The generate document does not contain any hyperlink, but the replaced text is > correct. > > -- > Luigi > > From alfredo.deluca at gmail.com Tue Jun 25 16:01:54 2019 From: alfredo.deluca at gmail.com (Alfredo De Luca) Date: Tue, 25 Jun 2019 18:01:54 +0200 Subject: Backup solutions Message-ID: Hi all. Just a quick one. Other than freezer openstack project as backup solution, are there any other opensource/commercial project/software/solutions for that? Cheers -- *Alfredo* -------------- next part -------------- An HTML attachment was scrubbed... URL: From grant at civo.com Tue Jun 25 17:32:46 2019 From: grant at civo.com (Grant Morley) Date: Tue, 25 Jun 2019 18:32:46 +0100 Subject: OSA with cumulus linux Message-ID: <83e49aea-3f44-a5fc-478a-fd1e4fe7442f@civo.com> Hi All, Just wondered if anyone was using OSA with cumulus linux at all? - We are starting a new POC and have cumulus switches for a new environment. We currently use OSA and wondered if there was any support / playbooks for it using cumulus linux? Or if anyone is running cumulus, have you had to write your own playbooks with it alongside OSA? Happy to do the latter, but didn't know if anyone has already done this or can point me to any documentation that would be useful. I couldn't find anything when I looked here: https://docs.openstack.org/project-deploy-guide/openstack-ansible/latest/ Regards, -- Grant Morley Cloud Lead, Civo Ltd www.civo.com | Signup for an account! -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Tue Jun 25 17:35:05 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Tue, 25 Jun 2019 19:35:05 +0200 Subject: Backup solutions In-Reply-To: References: Message-ID: We are using triliovault. It is a commercial solution. Ignazio Il Mar 25 Giu 2019 18:11 Alfredo De Luca ha scritto: > Hi all. Just a quick one. > Other than freezer openstack project as backup solution, are there any > other opensource/commercial project/software/solutions for that? > > Cheers > > -- > *Alfredo* > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jm at artfiles.de Tue Jun 25 18:55:58 2019 From: jm at artfiles.de (Jan Marquardt) Date: Tue, 25 Jun 2019 20:55:58 +0200 Subject: OSA with cumulus linux In-Reply-To: <83e49aea-3f44-a5fc-478a-fd1e4fe7442f@civo.com> References: <83e49aea-3f44-a5fc-478a-fd1e4fe7442f@civo.com> Message-ID: <9A589855-7DB3-4B77-8736-134A4A944592@artfiles.de> Hi Grant, we’ve just built an Openstack Cloud with Layer3 fabric with Cumulus Linux switches. We have a custom playbook which configures the hosts and switches. It is run before the OSA playbooks. I don’t think there is any Cumulus specific stuff inside OSA. Regards Jan -- Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95 E-Mail: support at artfiles.de | Web: http://www.artfiles.de Geschäftsführer: Harald Oltmanns | Tim Evers Eingetragen im Handelsregister Hamburg - HRB 81478 > Am 25.06.2019 um 19:32 schrieb Grant Morley : > > Hi All, > > Just wondered if anyone was using OSA with cumulus linux at all? - We are starting a new POC and have cumulus switches for a new environment. We currently use OSA and wondered if there was any support / playbooks for it using cumulus linux? Or if anyone is running cumulus, have you had to write your own playbooks with it alongside OSA? > > Happy to do the latter, but didn't know if anyone has already done this or can point me to any documentation that would be useful. I couldn't find anything when I looked here: > > https://docs.openstack.org/project-deploy-guide/openstack-ansible/latest/ > > Regards, > > -- > > > Grant Morley > Cloud Lead, Civo Ltd > www.civo.com | Signup for an account! -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From zbitter at redhat.com Tue Jun 25 19:24:03 2019 From: zbitter at redhat.com (Zane Bitter) Date: Tue, 25 Jun 2019 15:24:03 -0400 Subject: [heat] Resource replacement terminates at DELETE_COMPLETE In-Reply-To: References: Message-ID: <21e497ab-f48d-f5db-b9fe-14a797a8cbb7@redhat.com> On 22/06/19 11:30 AM, Erik McCormick wrote: > HI everyone! > > I have a situation with a heat stack where it has an Octavia Load > Balancer resource which it thinks it's already replaced and so will not > recreate it. > > Resource api_lbwith id 3978 already replaced by 3999; not checking check > /var/lib/kolla/venv/lib/python2.7/site-packages/heat/engine/check_resource.py:310 > : Ruh-roh. What version of Heat are you using? There has been at least one known bug related to that check. The one that I can find easily is https://storyboard.openstack.org/#!/story/2001974 (fixed in Rocky; backported to Queens and Pike). I think there might have been earlier issues found but they predated the existence of that log message (those were fun to debug). The log message was added in Queens (https://review.opendev.org/533015) so in theory whatever version you're running, the fix should be available in the latest stable release - though if memory serves that only prevents the issue rather than recovering from it. You'll be happy to hear that the check was eliminated forever in Stein: https://review.opendev.org/600278 > It goes to a DELETE_COMPLETED state and just sits there. The stack stays > UPDATE_IN_PROGRESS and nothing else moves. It doesn't even time out > after 4 hours. > > Doing a stack check puts everytinng as CHECK_COMPLETE, even the > non-existent load balancers. I can mark the LB and its components > unhealthy and start another update, but this just repeats the cycle. > > This all started with some Octavia shenanigans which ended with all the > load balancers being deleted manually. I have 2 similar stacks which > recreated fine, but this one went through the cycle several other times > as we were trying to fix the LB problem. This is a super edge case, but > hopefully someone has another idea how to get out of it. If you're up for some database hacking, removing that (DELETE_COMPLETE) resource ought to get you unblocked: > DELETE FROM resource WHERE id=3978; Obviously take appropriate precautions, back up the DB first, &c. cheers, Zane. From mriedemos at gmail.com Tue Jun 25 20:05:05 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 25 Jun 2019 15:05:05 -0500 Subject: [nova] Can we drop the kilo era ComputeNode host/service_id compat code now? Message-ID: There are still quite a few TODOs in the code [1][2][3] from a kilo era blueprint [4]. At this point I'm pretty sure you can't startup the nova-compute service without having a ComputeNode record without a host and hypervisor_hostname field set (we don't set the ComputeNode.service_id anywhere anymore as far as I can tell, except in some ComputeNode RPC compat code [5]). I've stumbled across all of this code before, but was looking at it again today because I have a very simple change I need to make which is going from a ComputeNode object and getting the related nova-compute Service object for that node. Looking at the code one might think this is reasonable: service = objects.Service.get_by_id(ctxt, compute_node.service_id) But compute_node.service_id is likely None. Or how about: service = objects.Service.get_by_compute_host(ctxt, compute_node.host) But ComputeNode.host is also nullable (though likely should have a value as noted above). This is a long way of me saying this code is all gross and we should clean it up, which means making sure all of this Kilo era compat code for old records is no longer necessary, which means all of those records should be migrated by now but how should we check? I *think* this might just be as simple as a "nova-status upgrade check" check which scans the cells looking for (non-deleted) compute_nodes records where host is NULL and report an error if any are found. I believe the recovery action for an operator that hits this is to delete the busted compute_nodes record and restart the nova-compute service so a new compute node record is created. I would really think that anything this scan would find would be orphaned compute_nodes records that could just be deleted since another compute_nodes record probably already exists for the same hypervisor_hostname value. IOW, I don't think we need an online data migration routine for this. Hopefully at least one person (Sylvain) can agree with me here and the plan of action I've put forth. [1] https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/db/sqlalchemy/models.py#L123 [2] https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/objects/compute_node.py#L150 [3] https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/objects/compute_node.py#L263 [4] https://blueprints.launchpad.net/nova/+spec/detach-service-from-computenode [5] https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/objects/compute_node.py#L118 -- Thanks, Matt From colleen at gazlene.net Tue Jun 25 20:29:30 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Tue, 25 Jun 2019 13:29:30 -0700 Subject: [keystone] Virtual Midcycle Planning Message-ID: <5480a911-8beb-46de-a326-fe5eea6802e5@www.fastmail.com> Hi team, As discussed in today's meeting, we will be having a virtual midcycle some time around milestone 2. We'll do two days with one three-hour session (with breaks) each day. We will do this over a video conference session, details of how to join will follow closer to the event. I've started a brainstorming etherpad: https://etherpad.openstack.org/p/keystone-train-midcycle-topics Please add discussion topics or hacking ideas to the etherpad and I will try to sort them. We need to decide on when exactly to hold the midcycle. I've created a doodle poll: https://doodle.com/poll/wr7ct4uhpw82sysg Please select times and days that you're available and then we'll try to schedule two back-to-back days (or at least two days in the same week) for the midcycle. Let me know if you have any questions or concerns. Colleen From stig.openstack at telfer.org Tue Jun 25 20:38:48 2019 From: stig.openstack at telfer.org (Stig Telfer) Date: Tue, 25 Jun 2019 21:38:48 +0100 Subject: [scientific-sig] IRC meeting shortly: Shanghai summit CFP and planning Message-ID: <52430A78-839F-442C-A96B-AEC50A73148D@telfer.org> Hi all - We have a Scientific SIG IRC meeting at 2100 UTC (about 20 minutes) in channel #openstack-meeting. Everyone is welcome. This week with the Shanghai summit CFP approaching we’d like to kick off the planning for Scientific SIG activities for the summit. Cheers, Stig From bharat at stackhpc.com Tue Jun 25 21:21:55 2019 From: bharat at stackhpc.com (Bharat Kunwar) Date: Tue, 25 Jun 2019 22:21:55 +0100 Subject: [magnum] weekly meetings Message-ID: <45A637C5-E94A-461B-AD9E-3CA73FD1B2B4@stackhpc.com> Hello all I was wondering if there was a reason there hasn’t been a regular weekly meeting recently. The last one I traced back in the IRC log was back in 4 June. It would be good if there was an up to date schedule somewhere with definite dates where there is a chair available. Best Bharat From alfredo.deluca at gmail.com Tue Jun 25 21:50:21 2019 From: alfredo.deluca at gmail.com (Alfredo De Luca) Date: Tue, 25 Jun 2019 23:50:21 +0200 Subject: Backup solutions In-Reply-To: References: Message-ID: Thanks Ignazio. Regards On Tue, Jun 25, 2019 at 7:35 PM Ignazio Cassano wrote: > We are using triliovault. > It is a commercial solution. > Ignazio > > Il Mar 25 Giu 2019 18:11 Alfredo De Luca ha > scritto: > >> Hi all. Just a quick one. >> Other than freezer openstack project as backup solution, are there any >> other opensource/commercial project/software/solutions for that? >> >> Cheers >> >> -- >> *Alfredo* >> >> -- *Alfredo* -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Tue Jun 25 22:11:22 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 25 Jun 2019 18:11:22 -0400 Subject: [tc] July meeting host In-Reply-To: References: <20190618135200.e53rgv3w26kjqun4@yuggoth.org> <20190618135352.kkwyqm6whpogutp4@yuggoth.org> <0aa1b541-a6ff-bb03-20db-d55ab72274a3@redhat.com> Message-ID: Hi everyone: Based on the feedback, the meeting will be moved to July 11th. Thank you, Mohammed On Wed, Jun 19, 2019 at 11:44 AM Lance Bragstad wrote: > > > > On 6/18/19 4:49 PM, Doug Hellmann wrote: > > Zane Bitter writes: > > > >> On 18/06/19 11:52 AM, Jim Rollenhagen wrote: > >>> On Tue, Jun 18, 2019 at 11:47 AM Mohammed Naser >>> > wrote: > >>> > >>> On Tue, Jun 18, 2019 at 11:21 AM Graham Hayes >>> > wrote: > >>> > I would suggest we move it ot the following week, unless a lot of > >>> > the US based folks think they can make it? > >>> > >>> While I think that we shouldn't necessarily just build our community > >>> to accommodate a specific region, I think in this case, the majority > >>> of the members involved in this meeting are in that region. > >>> > >>> I guess it would be nice if TC members can chime in and mention if it > >>> is easier to meet the week after which would be on the 11th of July. > >>> > >>> > >>> I won't be working on the 4th, so yes for me. > >> +1 > >> > > I will also be offline on the 4th. > > > +1 - I won't be working on the 4th. > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From feilong at catalyst.net.nz Tue Jun 25 23:11:04 2019 From: feilong at catalyst.net.nz (feilong at catalyst.net.nz) Date: Wed, 26 Jun 2019 11:11:04 +1200 Subject: [magnum] weekly meetings In-Reply-To: <45A637C5-E94A-461B-AD9E-3CA73FD1B2B4@stackhpc.com> References: <45A637C5-E94A-461B-AD9E-3CA73FD1B2B4@stackhpc.com> Message-ID: <28b59e74c479f119dc6d9201294f108e@catalyst.net.nz> Hi Bharat, I'm in Shanghai, China since last week for Kubecon Shanghai 2019 and I have asked Spyros to help host the meeting as I'm away. I did mention that I would be away in the IRC channel as well. I will send an email to here next time. Sorry about the confusion. On 2019-06-26 09:21, Bharat Kunwar wrote: > Hello all > > I was wondering if there was a reason there hasn’t been a regular > weekly meeting recently. The last one I traced back in the IRC log was > back in 4 June. It would be good if there was an up to date schedule > somewhere with definite dates where there is a chair available. > > Best > > Bharat From tony at bakeyournoodle.com Wed Jun 26 04:26:46 2019 From: tony at bakeyournoodle.com (Tony Breeds) Date: Wed, 26 Jun 2019 14:26:46 +1000 Subject: [stable][sahara] Changes to the stable team Message-ID: <20190626042646.GF11501@thor.bakeyournoodle.com> Hello all, In the middle of last week Telles asked[1] for a couple of the current Sahara core team (Jeremy and Luigi) to be added to the stable group. Upon review there isn't enough data (either reviews or backports) to verify that Jeremy and Luigi understand the stable policy. So I'd like to try an experiment and I've added both to the stable core team but I ask neither +W a backport and ping me (or another stable team member) for the final approval. I'd like to suggest dropping: * Elise Gafford egafford at redhat.com * Sergey Lukjanov slukjanov at mirantis.com * Sergey Reshetnyak sreshetniak at mirantis.com * Vitaly Gridnev gridnevvvit at gmail.com from the sahara-stable-maint group given[2] is empty. Yours Tony. [1] http://eavesdrop.openstack.org/irclogs/%23openstack-stable/%23openstack-stable.2019-06-19.log.html#t2019-06-19T16:36:38 [2] https://review.opendev.org/#/q/(project:openstack/python-saharaclient+OR+project:openstack/sahara+OR+project:openstack/sahara-dashboard+OR+project:openstack/sahara-extra+OR+project:openstack/sahara-image-elements+OR+project:openstack/sahara-plugin-ambari+OR+project:openstack/sahara-plugin-cdh+OR+project:openstack/sahara-plugin-mapr+OR+project:openstack/sahara-plugin-spark+OR+project:openstack/sahara-plugin-storm+OR+project:openstack/sahara-plugin-vanilla+OR+project:openstack/sahara-tests+OR+project:openstack/sahara-specs)+AND+(owner:egafford%2540redhat.com+OR+owner:slukjanov%2540mirantis.com+OR+owner:sreshetniak%2540mirantis.com+OR+owner:gridnevvvit%2540gmail.com+OR+reviewer:egafford%2540redhat.com+OR+reviewer:slukjanov%2540mirantis.com+OR+reviewer:sreshetniak%2540mirantis.com+OR+reviewer:gridnevvvit%2540gmail.com)+AND+branch:%255Estable/.*+after:2018-06-01 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From amotoki at gmail.com Wed Jun 26 06:15:25 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Wed, 26 Jun 2019 15:15:25 +0900 Subject: [all] [ptls] [tc] [nova] [neutron] [tripleo] Volunteers that know TeX for PDF community goal In-Reply-To: References: <20190624155629.GA26343@sinanju.localdomain> Message-ID: I tried the PDF build on neutron doc. The neutron doc hit only one error, though I haven't checked the generated document in detail yet. 1. The PDF generation stopped due to "! Dimension too large." error. This happens in a sample config file "neutron.conf". I am afraid that inline text is too long for "verbatim". As a workaround we can skip inline sample configuration files. 2. The TOC is not suitable for PDF document. It only lists the first two levels. As a reader of PDF version, I would like to see deeper levels to jump a specific section directly. I am not sure where is the setting of the TOC but it is worth improved. On Tue, Jun 25, 2019 at 8:38 AM Michael Johnson wrote: > 3, Oddly, the "configuration" section of our docs didn't render, it's > just a blank section. Even if the generated configuration guide didn't > work, I would have expected the RST policies document to come through. > Even more strange, the configuration guide is linked from another > section, and it rendered there. This must be one of the billion > warnings that output. Just FYI: In case of the neutron doc, I see the "configuration" section (oslo.config and oslo.policy) successfully. Thanks, Akihiro Motoki (amotoki) On Tue, Jun 25, 2019 at 8:38 AM Michael Johnson wrote: > > I gave this a quick run on Octavia. I did get it to output a PDF with > our svg included. > > I see a few issues right away (beyond the screens and screens of warnings). > > How do we want to collect this feedback? Storyboard stories with a tag? > > 1. I needed to add the following bindeps: > librsvg2-bin [doc platform:dpkg] > fonts-freefont-otf [doc platform:dpkg] > > 2. Relative links come through to the PDF but are broken. > 3, Oddly, the "configuration" section of our docs didn't render, it's > just a blank section. Even if the generated configuration guide didn't > work, I would have expected the RST policies document to come through. > Even more strange, the configuration guide is linked from another > section, and it rendered there. This must be one of the billion > warnings that output. > 4. We should document how to ignore or re-order the docs. We have an > internal API reference that comes through as the first section, but is > of little use to anyone outside the developers. It is also confusing > as the actual Octavia API-REF link doesn't render. > 5. The feature matrix tables rendered ok, except the red "X" does not > (unicode 2716). > (https://opendev.org/openstack/sphinx-feature-classification) > > Michael > > > > > > > > > On Mon, Jun 24, 2019 at 8:59 AM Matthew Treinish wrote: > > > > On Mon, Jun 24, 2019 at 05:26:08PM +0200, Bogdan Dobrelya wrote: > > > On 24.06.2019 12:29, Alexandra Settle wrote: > > > > Hi all, > > > > > > > > The work for the Train community goal - PDF support for project docs - > > > > is well underway. [1] Now, we're looking for volunteers to help test the > > > > implementation. > > > > > > > > We'll need someone to help build the docs into PDFs and determine things > > > > we can fix through tweaks to our docs, or if they're bugs in Sphinx. > > > > AKA: We need a troubleshoot artist. > > > > > > There seems to be an issue [0] for any projects using the badges [1] or > > > other SVGs in their docs. Also the default levels of nesting of the {\begin > > > ... \end} stanzas might require additional tunings, like [2]. I'll keep > > > posting here on the further issues discovered for PDF doc builds for > > > TripleO. Stay tuned :) > > > > The svg in pdf thing was a known issue. When I first looked at building the > > nova docs with latex/pdf output a few years ago [1] you had to manually > > convert the images before building the latex. Since then sphinx has added > > an extension to do this for you: > > > > https://www.sphinx-doc.org/en/master/usage/extensions/imgconverter.html > > > > You should be able to just add that to the extension list in conf.py and it > > will convert the svgs at sphinx build time. > > > > -Matt Treinish > > > > [1] https://opendev.org/openstack/nova/commit/62575dd40e5b7698d9ba54641558246489f0614e > > > > > > > > [0] https://github.com/sphinx-doc/sphinx/issues/4720#issuecomment-372046571 > > > [1] https://governance.openstack.org/tc/badges/ > > > [2] https://review.opendev.org/667114 > > > > > > > > > > > If you can volunteer, please add yourself to the wiki table here [2]. > > > > I've added neutron and nova specifically here as we need someone who is > > > > familiar with the project and it's dependencies to help us get that setup. > > > > > > > > Any questions? Reach out. > > > > > > > > Cheers, > > > > > > > > Alex > > > > > > > > [1] > > > > https://review.opendev.org/#/q/topic:build-pdf-docs+(status:open+OR+status:merged) > > > > > > > > [2] > > > > https://wiki.openstack.org/wiki/Documentation#PDF_for_Project_Docs_-_Community_Goal > > > > > > > > > > > > > -- > > > Best regards, > > > Bogdan Dobrelya, > > > Irc #bogdando > > > > From jyotishri403 at gmail.com Wed Jun 26 06:24:32 2019 From: jyotishri403 at gmail.com (Jyoti Dahiwele) Date: Wed, 26 Jun 2019 11:54:32 +0530 Subject: Installation of Cinder in multi compute environment Message-ID: Dear Team, What is the best practise of installation of Cinder service on a controller node or separate node in multi compute openstack installation? -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at est.tech Wed Jun 26 07:51:00 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Wed, 26 Jun 2019 07:51:00 +0000 Subject: [nova] Can we drop the kilo era ComputeNode host/service_id compat code now? In-Reply-To: References: Message-ID: <1561535457.4870.2@smtp.office365.com> On Tue, Jun 25, 2019 at 10:05 PM, Matt Riedemann wrote: > There are still quite a few TODOs in the code [1][2][3] from a kilo > era blueprint [4]. At this point I'm pretty sure you can't startup > the nova-compute service without having a ComputeNode record without > a host and hypervisor_hostname field set (we don't set the > ComputeNode.service_id anywhere anymore as far as I can tell, except > in some ComputeNode RPC compat code [5]). > > I've stumbled across all of this code before, but was looking at it > again today because I have a very simple change I need to make which > is going from a ComputeNode object and getting the related > nova-compute Service object for that node. > > Looking at the code one might think this is reasonable: > > service = objects.Service.get_by_id(ctxt, compute_node.service_id) > > But compute_node.service_id is likely None. Or how about: > > service = objects.Service.get_by_compute_host(ctxt, compute_node.host) > > But ComputeNode.host is also nullable (though likely should have a > value as noted above). > > This is a long way of me saying this code is all gross and we should > clean it up, which means making sure all of this Kilo era compat code > for old records is no longer necessary, which means all of those > records should be migrated by now but how should we check? > > I *think* this might just be as simple as a "nova-status upgrade > check" check which scans the cells looking for (non-deleted) > compute_nodes records where host is NULL and report an error if any > are found. I believe the recovery action for an operator that hits > this is to delete the busted compute_nodes record and restart the > nova-compute service so a new compute node record is created. I would > really think that anything this scan would find would be orphaned > compute_nodes records that could just be deleted since another > compute_nodes record probably already exists for the same > hypervisor_hostname value. IOW, I don't think we need an online data > migration routine for this. > > Hopefully at least one person (Sylvain) can agree with me here and > the plan of action I've put forth. You plan makes sens to me too. gibi > > [1] > https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/db/sqlalchemy/models.py#L123 > [2] > https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/objects/compute_node.py#L150 > [3] > https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/objects/compute_node.py#L263 > [4] > https://blueprints.launchpad.net/nova/+spec/detach-service-from-computenode > [5] > https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/objects/compute_node.py#L118 > > -- > > Thanks, > > Matt > From renat.akhmerov at gmail.com Wed Jun 26 07:59:27 2019 From: renat.akhmerov at gmail.com (Renat Akhmerov) Date: Wed, 26 Jun 2019 10:59:27 +0300 Subject: [mistral] No office hours meeting today In-Reply-To: <9c8513ea-baf2-492e-a9e8-2f779c0fac71@Spark> References: <9c8513ea-baf2-492e-a9e8-2f779c0fac71@Spark> Message-ID: Hello, Since a number of key people can’t attend today’s meeting we’re cancelling it. See you next week. Thanks Renat Akhmerov @Nokia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ltoscano at redhat.com Wed Jun 26 08:21:42 2019 From: ltoscano at redhat.com (Luigi Toscano) Date: Wed, 26 Jun 2019 10:21:42 +0200 Subject: [stable][sahara] Changes to the stable team In-Reply-To: <20190626042646.GF11501@thor.bakeyournoodle.com> References: <20190626042646.GF11501@thor.bakeyournoodle.com> Message-ID: <1571571.VMcRbnv6ix@whitebase.usersys.redhat.com> On Wednesday, 26 June 2019 06:26:46 CEST Tony Breeds wrote: > Hello all, > In the middle of last week Telles asked[1] for a couple of the > current Sahara core team (Jeremy and Luigi) to be added to the stable > group. > > Upon review there isn't enough data (either reviews or backports) to > verify that Jeremy and Luigi understand the stable policy. Hi, do you mean that the request did not contain enough data to evaluate us, or that the previous activity is not enough to assess whether we understand it? Ciao -- Luigi From grant at civo.com Wed Jun 26 08:31:16 2019 From: grant at civo.com (Grant Morley) Date: Wed, 26 Jun 2019 09:31:16 +0100 Subject: OSA with cumulus linux In-Reply-To: <9A589855-7DB3-4B77-8736-134A4A944592@artfiles.de> References: <83e49aea-3f44-a5fc-478a-fd1e4fe7442f@civo.com> <9A589855-7DB3-4B77-8736-134A4A944592@artfiles.de> Message-ID: <43d27beb-fa34-474e-4b66-2dfa86bec126@civo.com> Hi Jan, Many thanks for that. I suspected as much but thought I would ask the question. Regards, On 25/06/2019 19:55, Jan Marquardt wrote: > Hi Grant, > > we’ve just built an Openstack Cloud with Layer3 fabric with Cumulus > Linux switches. We have a custom playbook which configures the > hosts and switches. It is run before the OSA playbooks. I don’t think > there is any Cumulus specific stuff inside OSA. > > Regards > > Jan > > -- > Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg > Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95 > E-Mail: support at artfiles.de | Web: http://www.artfiles.de > Geschäftsführer: Harald Oltmanns | Tim Evers > Eingetragen im Handelsregister Hamburg - HRB 81478 > >> Am 25.06.2019 um 19:32 schrieb Grant Morley : >> >> Hi All, >> >> Just wondered if anyone was using OSA with cumulus linux at all? - We are starting a new POC and have cumulus switches for a new environment. We currently use OSA and wondered if there was any support / playbooks for it using cumulus linux? Or if anyone is running cumulus, have you had to write your own playbooks with it alongside OSA? >> >> Happy to do the latter, but didn't know if anyone has already done this or can point me to any documentation that would be useful. I couldn't find anything when I looked here: >> >> https://docs.openstack.org/project-deploy-guide/openstack-ansible/latest/ >> >> Regards, >> >> -- >> >> >> Grant Morley >> Cloud Lead, Civo Ltd >> www.civo.com | Signup for an account! -- Grant Morley Cloud Lead, Civo Ltd www.civo.com | Signup for an account! -------------- next part -------------- An HTML attachment was scrubbed... URL: From sfinucan at redhat.com Wed Jun 26 09:21:09 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Wed, 26 Jun 2019 10:21:09 +0100 Subject: [all] [ptls] [tc] [nova] [neutron] [tripleo] Volunteers that know TeX for PDF community goal In-Reply-To: References: <20190624155629.GA26343@sinanju.localdomain> Message-ID: <80e2e8550fd39cf9e224e24b4e6ad806acdc9e16.camel@redhat.com> On Mon, 2019-06-24 at 16:36 -0700, Michael Johnson wrote: > 4. We should document how to ignore or re-order the docs. We have an > internal API reference that comes through as the first section, but is > of little use to anyone outside the developers. It is also confusing > as the actual Octavia API-REF link doesn't render. I think this happens because it renders pages in the order that it encounters them. If you ensure there's a table of contents on the index page of each subsection ('/user', '/admin', '/config', ...), and that there's a top-level table of contents linking to each of these from your 'master_doc' as defined in 'conf.py (typically 'index') then things _should_ render in the correct order. These table of contents _could_ be hidden, though I haven't tested that yet. I plan to rework the nova docs according to the above...as soon as I get the darn thing building. Stephen From madhuri.kumari at intel.com Wed Jun 26 11:19:49 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Wed, 26 Jun 2019 11:19:49 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> <8e240b9c-c5f0-3d1f-5b16-c543bbbc525b@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC3299E@BGSMSX102.gar.corp.intel.com> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA614A50550@BGSMSX101.gar.corp.intel.com> Hi Mark, >>-----Original Message----- >>From: Mark Goddard [mailto:mark at stackhpc.com] >>I was talking with Madhuri in #openstack-ironic about this today [1]. >>While talking it through I raised some concerns about the nova resize-based >>design, which I'll try to outline here. >> >>When we deploy a node using deploy templates, we have the following >>sequence. >> >>* user picks a flavor and image, which may specify required traits >>* selected traits are pushed to ironic via instance_info.traits >>* ironic finds all deploy templates with name matching one of the selected >>traits >>* deploy steps from the matching templates are used when provisioning the >>node >> >>The deploy steps could include RAID config, BIOS config, or something else. >> >>If we now resize the instance to a different flavor which has a different set of >>traits, we would end up with a new set of traits, which map a new set of >>deploy templates, with a new set of steps. >> >>How do we apply this change? Should we execute all matching deploy steps, >>which could (e.g. RAID) result in losing data? Or should we attempt to >>execute only those deploy steps that have changed? Would that always >>work? I don't think we keep a record of the steps used to provision a node, >>so if templates have changed in the intervening time then we might not get a >>correct diff. >> Mark and I had discussion about this yesterday[1]. A possible way to fix this issue is by restricting deploy_steps from specific interface such as allow bios but restrict raid. However there could be some deploy_steps with bios interface which we might not want to allow during a resize. I don't have an example now. So I think it's better to restrict individual deploy_steps rather than a driver type. [1] http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-25.log.html#t2019-06-25T08:32:30 Regards, Madhuri From madhuri.kumari at intel.com Wed Jun 26 11:20:25 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Wed, 26 Jun 2019 11:20:25 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> <8e240b9c-c5f0-3d1f-5b16-c543bbbc525b@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC3299E@BGSMSX102.gar.corp.intel.com> <138cfc19-1850-a60a-b14a-d5a2ab8f0c85@fried.cc> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA614A5056D@BGSMSX101.gar.corp.intel.com> Hi Mark, >>-----Original Message----- >>From: Mark Goddard [mailto:mark at stackhpc.com] >>Hmm, I hadn't realised it would be quite this restricted. Although this could >>make it work, it does seem to be baking more ironic specifics into nova. >> >>There is an issue of standardisation here. Currently we do not have standard >>traits to describe these things, instead we use custom traits. The reason for >>this has been discussed earlier in this thread, essentially that we need to >>encode configuration key and value into the trait, and use the lack of a trait >>as 'don't care'. We did briefly discuss an alternative approach, but we're a >>fair way off having that. I think the issue of standardization is not related to the specific use case we are discussing here. It applies to the current state of ironic virt driver as well as you said. The idea of using the flavor metadata can fix this but that’s in itself another piece of work. Regards, Madhuri From mriedemos at gmail.com Wed Jun 26 13:44:33 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 26 Jun 2019 08:44:33 -0500 Subject: [stackalytics] Reported numbers seem inaccurate In-Reply-To: References: Message-ID: <7987c492-da59-f483-8e3b-d6ffc64b4233@gmail.com> On 4/11/2019 1:21 PM, Mark Goddard wrote: > Hi, > > I've heard a couple of people say they don't feel the numbers reported > by Stackalytics accurately reflect reality. I've been trying to gather a > few stats for my Kolla update session in Denver, and am finding the > same. I'll try to give some concrete examples. > > Reviews for all kolla deliverables in Stein [1]. Here the company stats > don't reflect the individual stats. Also, the total reviews in the > 'Kolla Official' module does not equal the sum of the reviews of its > submodules (kolla, kolla-ansible, kolla-cli). > > If I look at the contribution summary for Kolla Official in the last 90 > days [2], they are actually greater than those for the last 180 days [3]! > > There are also similar issues with commit metrics, and none seem to > match what I see in git. > > Thanks, > Mark > > [1] https://www.stackalytics.com/?metric=marks&module=kolla-group > [2] https://www.stackalytics.com/report/contribution/kolla-group/90 > [3] https://www.stackalytics.com/report/contribution/kolla-group/180 Seems things are busted again, or no one is doing any nova reviews: https://www.stackalytics.com/report/contribution/nova/30 -- Thanks, Matt From sbauza at redhat.com Wed Jun 26 13:45:12 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 26 Jun 2019 15:45:12 +0200 Subject: [nova] Can we drop the kilo era ComputeNode host/service_id compat code now? In-Reply-To: References: Message-ID: On Tue, Jun 25, 2019 at 10:14 PM Matt Riedemann wrote: > There are still quite a few TODOs in the code [1][2][3] from a kilo era > blueprint [4]. At this point I'm pretty sure you can't startup the > nova-compute service without having a ComputeNode record without a host > and hypervisor_hostname field set (we don't set the > ComputeNode.service_id anywhere anymore as far as I can tell, except in > some ComputeNode RPC compat code [5]). > > I've stumbled across all of this code before, but was looking at it > again today because I have a very simple change I need to make which is > going from a ComputeNode object and getting the related nova-compute > Service object for that node. > > Looking at the code one might think this is reasonable: > > service = objects.Service.get_by_id(ctxt, compute_node.service_id) > > But compute_node.service_id is likely None. Or how about: > > service = objects.Service.get_by_compute_host(ctxt, compute_node.host) > > But ComputeNode.host is also nullable (though likely should have a value > as noted above). > > Yeah basically, before this blueprint, a ComputeNode record was only having an hypervisor_hostname value and a service_id which was a FK from the Service record. Given we preferred to have a tuple (host, hypervisor_hostname) key for the CN record, we deprecated the service_id and wanted to add a new field named 'host'. For that, we were looking at the existing Service record to know the field value. After this, we were directly providing the 'host' field value. That said, since it was possible to have compute and conductor services having different release versions, that's why we were wanting to still be able to look at the backward compability. Now, we're waaaaay after Kilo, so I think we no longer need to support the compatibility ;-) This is a long way of me saying this code is all gross and we should > clean it up, which means making sure all of this Kilo era compat code > for old records is no longer necessary, which means all of those records > should be migrated by now but how should we check? > > I *think* this might just be as simple as a "nova-status upgrade check" > check which scans the cells looking for (non-deleted) compute_nodes > records where host is NULL and report an error if any are found. I > believe the recovery action for an operator that hits this is to delete > the busted compute_nodes record and restart the nova-compute service so > a new compute node record is created. I would really think that anything > this scan would find would be orphaned compute_nodes records that could > just be deleted since another compute_nodes record probably already > exists for the same hypervisor_hostname value. IOW, I don't think we > need an online data migration routine for this. > > Yeah, agreed with the above. I don't think we need an online data migration for this and I'm pretty sure an nova-status upgrade check should be enough. -Sylvain > Hopefully at least one person (Sylvain) can agree with me here and the > plan of action I've put forth. > > [1] > > https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/db/sqlalchemy/models.py#L123 > [2] > > https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/objects/compute_node.py#L150 > [3] > > https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/objects/compute_node.py#L263 > [4] > https://blueprints.launchpad.net/nova/+spec/detach-service-from-computenode > [5] > > https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/objects/compute_node.py#L118 > > -- > > Thanks, > > Matt > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrecutm at gmail.com Wed Jun 26 14:17:43 2019 From: andrecutm at gmail.com (Emnauel Andrecut) Date: Wed, 26 Jun 2019 17:17:43 +0300 Subject: [dev][magnum] Magnum event notifications not having relevant content related to clusters Message-ID: I'm writing about Magnum event notifications that do not contain any relevant information about the related Cluster (missing cluster id, other attributes..) As I've seen the PyCADF library receives no targetId and so the id (and mostly any other id) is randomly generated. Do you plan to add more information about the cluster on create success/pending events. Or would you accept a patch that does that? From mriedemos at gmail.com Wed Jun 26 14:21:22 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 26 Jun 2019 09:21:22 -0500 Subject: [nova] Questions on Force refresh instance info_cache during heal In-Reply-To: <5d133da9.1c69fb81.e1b4a.4f14SMTPIN_ADDED_BROKEN@mx.google.com> References: <5d133da9.1c69fb81.e1b4a.4f14SMTPIN_ADDED_BROKEN@mx.google.com> Message-ID: <1b0ba2d3-512c-ab37-bbc5-34dbb32b0885@gmail.com> On 6/26/2019 4:39 AM, 胡丽娜 wrote: > Dear Matt Riedemann : >         I am quiet interested in your recently published > patch  https://review.opendev.org/#/c/591607/  , there is  a question i > met ,If we spawn VM with port id and then update db , set > network_info='[]' , Afert  periodic task_heal_instance_info_cache , > Preserve On Delete turn into False , in face ,it's corret value is True . > Steps to reproduce: > 1.Spawn vm with port id : nova boot --flavor xx --nic port-id --image xx > vm_name > 2.View the value of Preserve On Delete , it'value is True > 3.Update the DB row ,drop interface_list > 4.View the value of Preserve On Delete , it turns into False > Have you ever had this question ?  Look forward to your reply ! >         Best wishes ! > > 胡丽娜  BC-EC > 中国移动苏州研发中心  云计算产品部 > 中移(苏州)软件技术有限公司 > 苏州高新区科技城昆仑山路58号 中移软件园 +openstack-discuss mailing list The problem in this case is that nova relies on the network info cache to determine which ports were created by nova and which were pre-existing and provided by the user, either during server create or attached to the server after it was created. When building the network info model we determine the pre-existing port IDs here [1] and then pass that to _build_vif_model here [2]. We then determine the preserve_on_delete value if the port ID is in that pre-existing ports set [3]. What we should probably be doing is if we have lost the cache is always set preserve_on_delete=True since we don't have the cache information to know if we should delete the port when the server is deleted, and it could be an SR-IOV port or something that the user does not want deleted. The worst case in this scenario is the server is deleted and some ports are unbound and left hanging around that the user has to manually cleanup but that's arguably better than deleting something they didn't want to be deleted. I used some similar logic in this related patch [4]. If people agree with this (being conservative when we've lost the cache), then please report a bug and we can make the change (it should be pretty simple). [1] https://review.opendev.org/#/c/591607/25/nova/network/neutronv2/api.py at 2826 [2] https://review.opendev.org/#/c/591607/25/nova/network/neutronv2/api.py at 2906 [3] https://review.opendev.org/#/c/591607/25/nova/network/neutronv2/api.py at 2764 [4] https://review.opendev.org/#/c/640516/ -- Thanks, Matt From Vrushali.Kamde at nttdata.com Wed Jun 26 14:34:04 2019 From: Vrushali.Kamde at nttdata.com (Kamde, Vrushali) Date: Wed, 26 Jun 2019 14:34:04 +0000 Subject: [nova] Strict isolation of group of hosts for image Message-ID: Hi, Working on implementation of 'Support filtering of allocation_candidates by forbidden aggregates' spec. Here we are trying to Test granular resource request. Facing issue in configuring inventories for 'CUSTOM_RESOURCE_CLASS' and resulting into 'No Valid host found' whereas expecting host from placement. So kindly help me here how one can set inventory for 'CUSTOM_RESOURCE_CLASS' because when the inventory is added for the custom resource class 'CUSTOM_RESOURCE_CLASS', it's getting removed when the compute service periodic task updates the inventory. FYI, here are the steps followed: 1. Data configured at nova: * Create three aggregates 'agg1', 'agg2', 'agg3' by using: 'POST'-- /os-aggregates(Create aggregate) * Setting metadata on aggregates: i. Setting metadata (trait:HW_CPU_X86_SGX) on the aggregate agg1 by using: 'POST'-- /os-aggregates/{aggregate_id}/action(set metadata) ii. Setting metadata (trait:STORAGE_DISK_SSD) on the aggregate agg2 by using: 'POST'-- /os-aggregates/{aggregate_id}/action(set metadata) iii. Setting metadata (trait:CUSTOM_MAGIC, trait:HW_CPU_X86_MMX) on the aggregate agg3 by using: 'POST'-- /os-aggregates/{aggregate_id}/action(set metadata) * Associate aggregates 'agg3' to host say 'RP1' by using: 'POST'-- /os-aggregates/{aggregate_id}/action(Add host) * Setting Extra-specs on Flavor(trait:HW_CPU_X86_MMX, trait1:CUSTOM_MAGIC, resources1:CUSTOM_RESOURCE_CLASS): 'POST'-- /flavors/{flavor_id}/os-extra_specs 1. Data configured at placement: * Setting traits (CUSTOM_MAGIC, HW_CPU_X86_MMX) on host 'RP1'. 'PUT'-- /resource_providers/{uuid}/traits * Create a new Resource class 'CUSTOM_RESOURCE_CLASS': 'POST'-- /resource_classes 1. Assigned inventories to the new resource class: openstack resource provider inventory set --resource CUSTOM_RESOURCE_CLASS:total=78 'PUT'-- /resource_providers/{uuid}/inventories/{resource_class} 1. When we boot the instance : * The final url of allocation_candidates which we get is as follows: '/allocation_candidates?limit=1000&member_of=%21in%3A442ca580-fd07-433c-9b67-c18fdeccbca1%2C61b8bb02-01e0-4e28-af4c-0ea4c828539a &required=HW_CPU_X86_MMX&required1=CUSTOM_MAGIC&resources=DISK_GB%3A1%2CMEMORY_MB%3A512%2CVCPU%3A1&resources1=CUSTOM_RESOURCE_CLASS%3A1' * Observed after the inventory is added for the custom resource class 'CUSTOM_RESOURCE_CLASS', it's getting removed when the compute service periodic task updates the inventory. Therefore the instance is not getting booted on 'RP1'. The error faced which we get is 'No Valid host found'. Regards, Vrushali Kamde. Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From snikitin at mirantis.com Wed Jun 26 14:41:38 2019 From: snikitin at mirantis.com (Sergey Nikitin) Date: Wed, 26 Jun 2019 18:41:38 +0400 Subject: [stackalytics] Reported numbers seem inaccurate In-Reply-To: <7987c492-da59-f483-8e3b-d6ffc64b4233@gmail.com> References: <7987c492-da59-f483-8e3b-d6ffc64b4233@gmail.com> Message-ID: Hi Matt, Thank you for message. I checked logs of stackalytics. It was a database critical issue several hours ago. When I opened your link with nova reviews I saw review stats, so stackalytics collecting data at this moment. Collecting of all stats is a long operation so data may be incomplete (you can see banner 'The data is being loaded now and is not complete' at the top of stackalytics.com). It took 30-40 hours to collect all data to empty database, so the only thing we can do is to wait. Sergey On Wed, Jun 26, 2019 at 6:01 PM Matt Riedemann wrote: > On 4/11/2019 1:21 PM, Mark Goddard wrote: > > Hi, > > > > I've heard a couple of people say they don't feel the numbers reported > > by Stackalytics accurately reflect reality. I've been trying to gather a > > few stats for my Kolla update session in Denver, and am finding the > > same. I'll try to give some concrete examples. > > > > Reviews for all kolla deliverables in Stein [1]. Here the company stats > > don't reflect the individual stats. Also, the total reviews in the > > 'Kolla Official' module does not equal the sum of the reviews of its > > submodules (kolla, kolla-ansible, kolla-cli). > > > > If I look at the contribution summary for Kolla Official in the last 90 > > days [2], they are actually greater than those for the last 180 days [3]! > > > > There are also similar issues with commit metrics, and none seem to > > match what I see in git. > > > > Thanks, > > Mark > > > > [1] https://www.stackalytics.com/?metric=marks&module=kolla-group > > [2] https://www.stackalytics.com/report/contribution/kolla-group/90 > > [3] https://www.stackalytics.com/report/contribution/kolla-group/180 > > Seems things are busted again, or no one is doing any nova reviews: > > https://www.stackalytics.com/report/contribution/nova/30 > > -- > > Thanks, > > Matt > > -- Best Regards, Sergey Nikitin -------------- next part -------------- An HTML attachment was scrubbed... URL: From pawel.konczalski at everyware.ch Wed Jun 26 14:47:30 2019 From: pawel.konczalski at everyware.ch (Pawel Konczalski) Date: Wed, 26 Jun 2019 16:47:30 +0200 Subject: Best practices to restart / repair broken Octavia LoadBalancer Message-ID: Hi, i run into a issue where one of the Octavia LB amphora VMs was crashed and since the loadbalancer operating_status become PENDING_UPDATE (or ERROR) it is no longer possible to use the OpenStack CLI tools to manage the LB: openstack loadbalancer amphora list --loadbalancer 0ce30f0e-1d75-486c-a09f-79125abf44b8 +--------------------------------------+--------------------------------------+-----------+--------+---------------+-------------+ | id                                   | loadbalancer_id                      | status    | role   | lb_network_ip | ha_ip       | +--------------------------------------+--------------------------------------+-----------+--------+---------------+-------------+ | daee2f88-01fd-4ffa-b80d-15c63771d99d | 0ce30f0e-1d75-486c-a09f-79125abf44b8 | ERROR     | BACKUP | 172.10.10.30 | 172.11.12.26 | | f22186b1-2865-4f4a-aae2-7f869b7aae12 | 0ce30f0e-1d75-486c-a09f-79125abf44b8 | ALLOCATED | MASTER | 172.10.10.5  | 172.11.12.26 | +--------------------------------------+--------------------------------------+-----------+--------+---------------+-------------+ openstack loadbalancer show 0ce30f0e-1d75-486c-a09f-79125abf44b8 +---------------------+--------------------------------------+ | Field               | Value                                | +---------------------+--------------------------------------+ | admin_state_up      | True                                 | | created_at          | 2019-05-19T09:48:12                  | | description         |                                      | | flavor              |                                      | | id                  | 0ce30f0e-1d75-486c-a09f-79125abf44b8 | | listeners           | 12745a48-7277-405f-98da-e7b9fbaf93cc | | name                | foo-lb1                      | | operating_status    | ONLINE                               | | pools               | 482985f9-2804-4960-bd93-6bbb798b57f7 | | project_id          | 76e81458c81f6e2xebbbfc81f6bb76e008d | | provider            | amphora                              | | provisioning_status | PENDING_UPDATE                       | | updated_at          | 2019-06-25T16:58:33                  | | vip_address         | 172.11.12.26                          | | vip_network_id      | 8cc0f284-613c-40a7-ac72-c83ffdc26a93 | | vip_port_id         | f598aac4-4bd0-472b-9b9c-e4e305cb561b | | vip_qos_policy_id   | None                                 | | vip_subnet_id       | e1478576-23b0-40e8-b4f2-5b284f2b23c4 | +---------------------+--------------------------------------+ I was able to fix this by update the load_balancer state to 'ACTIVE' directly in the Octavia Database and trigger a failover: MySQL [octavia]>update load_balancer set provisioning_status = 'ACTIVE' where id = '0ce30f0e-1d75-486c-a09f-79125abf44b8'; openstack loadbalancer failover 0ce30f0e-1d75-486c-a09f-79125abf44b8 But this seams to by more a workaround the a proper way to restart / repair the loadbalancer without a interfare in the OpenStack DB manually. Is there a another way accomplish this with the CLI? BR Pawel -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5227 bytes Desc: not available URL: From theodoros.tsioutsias at cern.ch Tue Jun 25 11:21:57 2019 From: theodoros.tsioutsias at cern.ch (Theodoros Tsioutsias) Date: Tue, 25 Jun 2019 11:21:57 +0000 Subject: [magnum] nodegroups In-Reply-To: References: Message-ID: Hi Bharat, Just create an empty schema called “magnum” before running the migrations. The tool assumes that the db is there. Thanks, Thodoris > On 25 Jun 2019, at 12:54, Bharat Kunwar wrote: > > Hi Theodoros, > > Just replying to your message on IRC here as you appear offline. > >> if you don't care for what you have in the db i would do this: >> - stop both api and conductor >> - drop magnum db >> - checkout master >> - run the migrations (from master) > > After getting this far, I tried running `magnum-db-manage upgrade` but I hit this: > > sqlalchemy.exc.InternalError: (pymysql.err.InternalError) (1049, u"Unknown database 'magnum'") (Background on this error at: http://sqlalche.me/e/2j85) > >> - start the services > > I can’t start the services either but that is probably due to lack of a database… > >> - create a cluster >> - checkout magnum_nodegroups >> - stop services >> - run migrations >> - start the services >> I know it's not great… > > Would you mind letting me know where I’m going wrong? > > Thanks > > Bharat From hemant.sonawane at itera.io Wed Jun 26 15:20:54 2019 From: hemant.sonawane at itera.io (Hemant Sonawane) Date: Wed, 26 Jun 2019 17:20:54 +0200 Subject: Openstack Glance Image custom properties issue Message-ID: Hello, My self Hemant Sonawane. Recently I am working on openstack and I am getting some strange error when I installed openstack dashboard and tried to edit the visibility of images and update their metadata it dosent work I get *Forbidden : Redirecting to login *and *Unable to edit the image custom properties *such errors. I also tried to see the glance-api logs and found this *"2019-06-26 13:11:51,705.705 1 WARNING glance.api.v2.images [-] Could not find schema properties file schema-image.json. Continuing without custom properties"* Is there any glance policy in horizon? that might cause this error I think. The detail log file attached for your ready reference. So please look into my concern and let me know if there is any solution to resolve my issue. Thanks and Regards, -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- + COMMAND=start + start + exec glance-api --config-file /etc/glance/glance-api.conf /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) 2019-06-26 13:11:51,705.705 1 WARNING glance.api.v2.images [-] Could not find schema properties file schema-image.json. Continuing without custom properties /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/util.py:55: DeprecationWarning: Using function/method 'Healthcheck.factory()' is deprecated: The healthcheck middleware must now be configured as an application, not as a filter val = callable(*args, **kw) 2019-06-26 13:11:52,772.772 1 INFO glance.common.wsgi [-] Starting 1 workers 2019-06-26 13:11:52,776.776 1 INFO glance.common.wsgi [-] Started child 13 /var/lib/openstack/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:1336: OsloDBDeprecationWarning: EngineFacade is deprecated; please use oslo_db.sqlalchemy.enginefacade expire_on_commit=expire_on_commit, _conf=conf) From mark at stackhpc.com Wed Jun 26 15:40:18 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 26 Jun 2019 16:40:18 +0100 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <0512CBBECA36994BAA14C7FEDE986CA614A5056D@BGSMSX101.gar.corp.intel.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> <8e240b9c-c5f0-3d1f-5b16-c543bbbc525b@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC3299E@BGSMSX102.gar.corp.intel.com> <138cfc19-1850-a60a-b14a-d5a2ab8f0c85@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA614A5056D@BGSMSX101.gar.corp.intel.com> Message-ID: On Wed, 26 Jun 2019 at 12:20, Kumari, Madhuri wrote: > > Hi Mark, > > > >>-----Original Message----- > >>From: Mark Goddard [mailto:mark at stackhpc.com] > > > >>Hmm, I hadn't realised it would be quite this restricted. Although this could > >>make it work, it does seem to be baking more ironic specifics into nova. > >> > >>There is an issue of standardisation here. Currently we do not have standard > >>traits to describe these things, instead we use custom traits. The reason for > >>this has been discussed earlier in this thread, essentially that we need to > >>encode configuration key and value into the trait, and use the lack of a trait > >>as 'don't care'. We did briefly discuss an alternative approach, but we're a > >>fair way off having that. > > I think the issue of standardization is not related to the specific use case we are discussing here. It applies to the current state of ironic virt driver as well as you said. > The idea of using the flavor metadata can fix this but that’s in itself another piece of work. That's not quite true. Currently we don't specify any trait values for deploy templates anywhere in nova or ironic. They're entirely defined by the operator (as are the deploy templates that reference them). This would need to become standard if we're to add it to code. > > Regards, > Madhuri From johnsomor at gmail.com Wed Jun 26 15:47:59 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Wed, 26 Jun 2019 08:47:59 -0700 Subject: Best practices to restart / repair broken Octavia LoadBalancer In-Reply-To: References: Message-ID: Hi Pawel, The intended CLI functionality to address this is the load balancer failover API, however we have some open bugs with that right now. Objects should never get "stuck" in PENDING_*, those are transitive states meaning that one of the controllers has claimed ownership of the resource to take an action on it. For example, in your case one of your health manager processes has claimed the load balancer to attempt an automatic repair. However, due to a bug in nova (https://bugs.launchpad.net/nova/+bug/1827746) this automatic repair was unable to complete. We try for up to five minutes, but then have to give up as nova is stuck. We have open stories and one patch in progress to improve this situation. Once we can get resources available to finish those, we will backport the bug fix patches to the stable branches. Related stories and patches in Octavia: https://review.opendev.org/#/c/585864/ https://storyboard.openstack.org/#!/story/2006051 As always, we encourage you to open StoryBoard stories for us to track any issues you have seen. Even if they are duplicate, we can then track the number of people experiencing an issue and help prioritize the work. Michael On Wed, Jun 26, 2019 at 7:50 AM Pawel Konczalski wrote: > > Hi, > > i run into a issue where one of the Octavia LB amphora VMs was crashed and since the loadbalancer operating_status become PENDING_UPDATE (or ERROR) it is no longer possible to use the OpenStack CLI tools to manage the LB: > > openstack loadbalancer amphora list --loadbalancer 0ce30f0e-1d75-486c-a09f-79125abf44b8 > +--------------------------------------+--------------------------------------+-----------+--------+---------------+-------------+ > | id | loadbalancer_id | status | role | lb_network_ip | ha_ip | > +--------------------------------------+--------------------------------------+-----------+--------+---------------+-------------+ > | daee2f88-01fd-4ffa-b80d-15c63771d99d | 0ce30f0e-1d75-486c-a09f-79125abf44b8 | ERROR | BACKUP | 172.10.10.30 | 172.11.12.26 | > | f22186b1-2865-4f4a-aae2-7f869b7aae12 | 0ce30f0e-1d75-486c-a09f-79125abf44b8 | ALLOCATED | MASTER | 172.10.10.5 | 172.11.12.26 | > +--------------------------------------+--------------------------------------+-----------+--------+---------------+-------------+ > > openstack loadbalancer show 0ce30f0e-1d75-486c-a09f-79125abf44b8 > +---------------------+--------------------------------------+ > | Field | Value | > +---------------------+--------------------------------------+ > | admin_state_up | True | > | created_at | 2019-05-19T09:48:12 | > | description | | > | flavor | | > | id | 0ce30f0e-1d75-486c-a09f-79125abf44b8 | > | listeners | 12745a48-7277-405f-98da-e7b9fbaf93cc | > | name | foo-lb1 | > | operating_status | ONLINE | > | pools | 482985f9-2804-4960-bd93-6bbb798b57f7 | > | project_id | 76e81458c81f6e2xebbbfc81f6bb76e008d | > | provider | amphora | > | provisioning_status | PENDING_UPDATE | > | updated_at | 2019-06-25T16:58:33 | > | vip_address | 172.11.12.26 | > | vip_network_id | 8cc0f284-613c-40a7-ac72-c83ffdc26a93 | > | vip_port_id | f598aac4-4bd0-472b-9b9c-e4e305cb561b | > | vip_qos_policy_id | None | > | vip_subnet_id | e1478576-23b0-40e8-b4f2-5b284f2b23c4 | > +---------------------+--------------------------------------+ > > I was able to fix this by update the load_balancer state to 'ACTIVE' directly in the Octavia Database and trigger a failover: > > MySQL [octavia]> update load_balancer set provisioning_status = 'ACTIVE' where id = '0ce30f0e-1d75-486c-a09f-79125abf44b8'; > > openstack loadbalancer failover 0ce30f0e-1d75-486c-a09f-79125abf44b8 > > > But this seams to by more a workaround the a proper way to restart / repair the loadbalancer without a interfare in the OpenStack DB manually. > > Is there a another way accomplish this with the CLI? > > BR > > Pawel From openstack at fried.cc Wed Jun 26 16:34:51 2019 From: openstack at fried.cc (Eric Fried) Date: Wed, 26 Jun 2019 11:34:51 -0500 Subject: [nova] Strict isolation of group of hosts for image In-Reply-To: References: Message-ID: Hi Vrushali- > 'CUSTOM_RESOURCE_CLASS',  it's getting removed when the compute service > periodic task updates the inventory. Yup. The virt driver is the source of truth for the compute node's inventory. You can't (yet [1]) add or change inventories except by modifying the actual code (the compute driver's update_provider_tree method ([2] for libvirt)). > Here we are trying to Test granular resource request. I'm curious why you care about this aspect. Perhaps there's a way to do this "natively" with VGPU or bandwidth... efried [1] https://review.opendev.org/612497 [2] https://opendev.org/openstack/nova/src/commit/707deb158996d540111c23afd8c916ea1c18906a/nova/virt/libvirt/driver.py#L6693 From bharat at stackhpc.com Wed Jun 26 17:10:10 2019 From: bharat at stackhpc.com (Bharat Kunwar) Date: Wed, 26 Jun 2019 18:10:10 +0100 Subject: [magnum] weekly meetings In-Reply-To: <28b59e74c479f119dc6d9201294f108e@catalyst.net.nz> References: <45A637C5-E94A-461B-AD9E-3CA73FD1B2B4@stackhpc.com> <28b59e74c479f119dc6d9201294f108e@catalyst.net.nz> Message-ID: <051C8ABB-227B-4B36-B97A-44DA10FB118C@stackhpc.com> Thanks for the response Fei Long, hopefully there will be a meeting next Tuesday to attend! > On 26 Jun 2019, at 00:11, feilong at catalyst.net.nz wrote: > > Hi Bharat, > > I'm in Shanghai, China since last week for Kubecon Shanghai 2019 and I have asked Spyros to help host the meeting as I'm away. I did mention that I would be away in the IRC channel as well. I will send an email to here next time. Sorry about the confusion. > > > On 2019-06-26 09:21, Bharat Kunwar wrote: >> Hello all >> I was wondering if there was a reason there hasn’t been a regular >> weekly meeting recently. The last one I traced back in the IRC log was >> back in 4 June. It would be good if there was an up to date schedule >> somewhere with definite dates where there is a chair available. >> Best >> Bharat From jimmy at openstack.org Wed Jun 26 17:43:41 2019 From: jimmy at openstack.org (Jimmy McArthur) Date: Wed, 26 Jun 2019 12:43:41 -0500 Subject: Shanghai CFP Coming Soon! Message-ID: <5D13AECD.9080605@openstack.org> Hi Everyone! The July 2 deadline to submit a presentation [1] for the Open Infrastructure Summit [2] in Shanghai is in less than one week! Submit your session today and join the global community in Shanghai, November 4-6, 2019. Sessions will be presented in both Mandarin and English, so you may submit your presentation in either language. Submit your presentations, panels, and hands-on workshops [3] before July 2 at 11:59 pm PT (July 3, 2019 at 15:00 China Standard Time). Tracks [4]: Container Infrastructure Hands-on Workshops AI, Machine Learning & HPC Private & Hybrid Cloud Public Cloud 5G, NFV & Edge Open Development Getting Started CI/CD Security Upcoming Shanghai Summit Deadlines * Register now [5] before the early bird registration deadline in early August (USD or RMB options available) * Apply for Travel Support [6] before August 8. For more information on the Travel Support Program, go here [7]. * Interested in sponsoring the Summit? [8]. * The content submission process for the Forum and Project Teams Gathering will be managed separately in the upcoming months. We look forward to your submissions! Cheers, Jimmy [1] https://cfp.openstack.org/ [2] https://www.openstack.org/summit/shanghai-2019/ [3] https://cfp.openstack.org/ [4] https://www.openstack.org/summit/shanghai-2019/summit-categories/ [5] https://www.openstack.org/summit/shanghai-2019/ [6] https://openstackfoundation.formstack.com/forms/travelsupportshanghai [7] https://www.openstack.org/summit/shanghai-2019/travel/ [8] https://www.openstack.org/summit/shanghai-2019/sponsors/ From colleen at gazlene.net Wed Jun 26 17:56:49 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Wed, 26 Jun 2019 10:56:49 -0700 Subject: Shanghai CFP Coming Soon! In-Reply-To: <5D13AECD.9080605@openstack.org> References: <5D13AECD.9080605@openstack.org> Message-ID: Hi, On Wed, Jun 26, 2019, at 10:44, Jimmy McArthur wrote: > Hi Everyone! > > The July 2 deadline to submit a presentation [1] for the Open > Infrastructure Summit [2] in Shanghai is in less than one week! Submit > your session today and join the global community in Shanghai, November > 4-6, 2019. Sessions will be presented in both Mandarin and English, so > you may submit your presentation in either language. > > Submit your presentations, panels, and hands-on workshops [3] before > July 2 at 11:59 pm PT (July 3, 2019 at 15:00 China Standard Time). > > Tracks [4]: > Container Infrastructure > Hands-on Workshops > AI, Machine Learning & HPC > Private & Hybrid Cloud > Public Cloud > 5G, NFV & Edge > Open Development > Getting Started > CI/CD > Security > > Upcoming Shanghai Summit Deadlines > > * Register now [5] before the early bird registration deadline in early > August (USD or RMB options available) > * Apply for Travel Support [6] before August 8. For more information on > the Travel Support Program, go here [7]. > * Interested in sponsoring the Summit? [8]. > * The content submission process for the Forum and Project Teams > Gathering will be managed separately in the upcoming months. > > We look forward to your submissions! > > Cheers, > Jimmy > > [1] https://cfp.openstack.org/ > [2] https://www.openstack.org/summit/shanghai-2019/ > [3] https://cfp.openstack.org/ > [4] https://www.openstack.org/summit/shanghai-2019/summit-categories/ > [5] https://www.openstack.org/summit/shanghai-2019/ > [6] https://openstackfoundation.formstack.com/forms/travelsupportshanghai > [7] https://www.openstack.org/summit/shanghai-2019/travel/ > [8] https://www.openstack.org/summit/shanghai-2019/sponsors/ > > When should we expect to be able to sign up to present project updates and onboardings? Colleen From kennelson11 at gmail.com Wed Jun 26 18:30:19 2019 From: kennelson11 at gmail.com (Kendall Nelson) Date: Wed, 26 Jun 2019 11:30:19 -0700 Subject: Shanghai CFP Coming Soon! In-Reply-To: References: <5D13AECD.9080605@openstack.org> Message-ID: I will be sending out the update stuff in a few weeks once I find out how many slots we have. The onboardings will be happening differently this time around as a part of the PTG[1], so that info will all get gathered when I send out the PTG survey in like...two weeks? -Kendall [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007079.html On Wed, Jun 26, 2019 at 10:59 AM Colleen Murphy wrote: > Hi, > > On Wed, Jun 26, 2019, at 10:44, Jimmy McArthur wrote: > > Hi Everyone! > > > > The July 2 deadline to submit a presentation [1] for the Open > > Infrastructure Summit [2] in Shanghai is in less than one week! Submit > > your session today and join the global community in Shanghai, November > > 4-6, 2019. Sessions will be presented in both Mandarin and English, so > > you may submit your presentation in either language. > > > > Submit your presentations, panels, and hands-on workshops [3] before > > July 2 at 11:59 pm PT (July 3, 2019 at 15:00 China Standard Time). > > > > Tracks [4]: > > Container Infrastructure > > Hands-on Workshops > > AI, Machine Learning & HPC > > Private & Hybrid Cloud > > Public Cloud > > 5G, NFV & Edge > > Open Development > > Getting Started > > CI/CD > > Security > > > > Upcoming Shanghai Summit Deadlines > > > > * Register now [5] before the early bird registration deadline in early > > August (USD or RMB options available) > > * Apply for Travel Support [6] before August 8. For more information on > > the Travel Support Program, go here [7]. > > * Interested in sponsoring the Summit? [8]. > > * The content submission process for the Forum and Project Teams > > Gathering will be managed separately in the upcoming months. > > > > We look forward to your submissions! > > > > Cheers, > > Jimmy > > > > [1] https://cfp.openstack.org/ > > [2] https://www.openstack.org/summit/shanghai-2019/ > > [3] https://cfp.openstack.org/ > > [4] https://www.openstack.org/summit/shanghai-2019/summit-categories/ > > [5] https://www.openstack.org/summit/shanghai-2019/ > > [6] > https://openstackfoundation.formstack.com/forms/travelsupportshanghai > > [7] https://www.openstack.org/summit/shanghai-2019/travel/ > > [8] https://www.openstack.org/summit/shanghai-2019/sponsors/ > > > > > > When should we expect to be able to sign up to present project updates and > onboardings? > > Colleen > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimmy at openstack.org Wed Jun 26 18:32:09 2019 From: jimmy at openstack.org (Jimmy McArthur) Date: Wed, 26 Jun 2019 13:32:09 -0500 Subject: Shanghai CFP CLOSING Soon! Message-ID: <5D13BA29.7040303@openstack.org> Sorry everyone. I completely messed up the subject. So for real... it's already here, but it won't be for long! Details below. The July 2 deadline to submit a presentation [1] for the Open Infrastructure Summit [2] in Shanghai is in less than one week! Submit your session today and join the global community in Shanghai, November 4-6, 2019. Sessions will be presented in both Mandarin and English, so you may submit your presentation in either language. Submit your presentations, panels, and hands-on workshops [3] before July 2 at 11:59 pm PT (July 3, 2019 at 15:00 China Standard Time). Tracks [4]: Container Infrastructure Hands-on Workshops AI, Machine Learning & HPC Private & Hybrid Cloud Public Cloud 5G, NFV & Edge Open Development Getting Started CI/CD Security Upcoming Shanghai Summit Deadlines * Register now [5] before the early bird registration deadline in early August (USD or RMB options available) * Apply for Travel Support [6] before August 8. For more information on the Travel Support Program, go here [7]. * Interested in sponsoring the Summit? [8]. * The content submission process for the Forum and Project Teams Gathering will be managed separately in the upcoming months. We look forward to your submissions! Cheers, Jimmy [1] https://cfp.openstack.org/ [2] https://www.openstack.org/summit/shanghai-2019/ [3] https://cfp.openstack.org/ [4] https://www.openstack.org/summit/shanghai-2019/summit-categories/ [5] https://www.openstack.org/summit/shanghai-2019/ [6] https://openstackfoundation.formstack.com/forms/travelsupportshanghai [7] https://www.openstack.org/summit/shanghai-2019/travel/ [8] https://www.openstack.org/summit/shanghai-2019/sponsors/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Wed Jun 26 18:34:06 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Wed, 26 Jun 2019 20:34:06 +0200 Subject: [kolla-ansible] migration Message-ID: Hello, Anyone have tried to migrate an existing openstack installation to kolla containers? Thanks Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim at jimrollenhagen.com Wed Jun 26 18:36:37 2019 From: jim at jimrollenhagen.com (Jim Rollenhagen) Date: Wed, 26 Jun 2019 14:36:37 -0400 Subject: [tc] Assuming control of GitHub organizations Message-ID: Hi, The opendev team reached out to me about handing off administrative access of the "openstack" and related organizations on GitHub. They think it would be best if the TC took control of that, or at least took control of delegating that access. In general, the goal here is to support OpenStack's presence and visibility on GitHub. Per Jim Blair: > In the long run, this shouldn't entail a lot of work, generally creating new > repos on GitHub to accept mirroring from opendev systems, performing renames, > handling transfer requests when repos move out of the openstack namespace, > setting and updating descriptions, and curating the list of pinned > repositories. In the short term, we have some archiving and moving to do.[0] Do TC members want to manage this, or should we delegate? One thing to figure out is how to grant that access. The opendev team uses a shared account with two-factor authentication provided by a shared shell account. This mitigates accidental pushes or settings changes when an admin is using their usual GitHub account. The TC (or its delegates) probably doesn't have a shared shell account to do this with. Some options: * each admin creates a second GitHub account for this purpose use a shared * account without 2FA use a shared account with 2FA, share the one time secret * with everyone to configure their own token generator use personal accounts * but be very careful Thoughts on these options? [0] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006829.html // jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed Jun 26 19:12:51 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 26 Jun 2019 14:12:51 -0500 Subject: [nova] FYI for out-of-tree virt drivers: deprecating non-update-provider-tree compat Message-ID: <3fb81b32-4e14-19c4-40e8-c4958d3d8562@gmail.com> I have a change up [1] to deprecate the backward compatibility code in the ResourceTracker for virt drivers that do not implement the update_provider_tree interface. Starting in Train you'll get a warning in the nova-compute logs every time the update_available_resource periodic runs (we could only log on startup, that's up for debate in the change) until the driver is updated and in the U release we'll drop that compatibility code. The only out-of-tree virt driver I checked was nova-lxd and it does *not* implement update_provider_tree so for those that care about that driver you should implement the interface. The patch has a link to the reference docs on implementing that interface for your driver (it's pretty simple). [1] https://review.opendev.org/#/c/667442/ -- Thanks, Matt From mriedemos at gmail.com Wed Jun 26 19:23:54 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 26 Jun 2019 14:23:54 -0500 Subject: [stackalytics] Reported numbers seem inaccurate In-Reply-To: References: <7987c492-da59-f483-8e3b-d6ffc64b4233@gmail.com> Message-ID: <18c314bd-f019-a6bf-292f-38c215a420a5@gmail.com> On 6/26/2019 9:41 AM, Sergey Nikitin wrote: > Hi Matt, > Thank you for message. I checked logs of stackalytics. > It was a database critical issue several hours ago. > > When I opened your link with nova reviews I saw review stats, so > stackalytics collecting data at this moment. > > Collecting of all stats is a long operation so data may be incomplete > (you can see banner 'The data is being loaded now and is not complete' > at the top of stackalytics.co m). > > It took 30-40 hours to collect all data to empty database, so the only > thing we can do is to wait. > > Sergey Ack, thanks for the quick reply Sergey. -- Thanks, Matt From openstack at fried.cc Wed Jun 26 19:58:42 2019 From: openstack at fried.cc (Eric Fried) Date: Wed, 26 Jun 2019 14:58:42 -0500 Subject: [nova] Spec Review Day: Tuesday July 2nd Message-ID: <8796a094-4aca-f674-e075-995c54f18e24@fried.cc> Esteemed Nova contributors and maintainers- With about a month until spec freeze [1], we need to fish or cut bait [2] on open specs. To that end, I have recycled the Spec Review Day etherpad [3] and populated it with all open Train specs (those with a file under specs/train/approved) [4]. I seeded their initial status in the etherpad, including noting from whom the next action is expected. Authors: Look for your specs, especially if they say "Action: author", and update them ASAP. (Having done so, you can toggle to "Action: reviewer".) Reviewers and Cores: Scan the list for Action: "reviewer" and/or "approver" and dive in. (You may wish to start by intersecting with [5], which narrows [4] down to specs you've already reviewed at some point.) You do not need to wait until, or limit your activity to, Tuesday. Especially if you're going to be unavailable then. Thanks, efried [1] https://wiki.openstack.org/wiki/Nova/Train_Release_Schedule [2] https://en.wiktionary.org/wiki/fish_or_cut_bait [3] https://etherpad.openstack.org/p/nova-spec-review-day [4] https://review.opendev.org/#/q/project:openstack/nova-specs+status:open+path:%255Especs/train/approved/.%2A [5] https://review.opendev.org/#/q/project:openstack/nova-specs+status:open+path:%255Especs/train/approved/.*+reviewedby:self From openstack at fried.cc Wed Jun 26 20:18:31 2019 From: openstack at fried.cc (Eric Fried) Date: Wed, 26 Jun 2019 15:18:31 -0500 Subject: [nova] Spec Review Day: Tuesday July 2nd In-Reply-To: <8796a094-4aca-f674-e075-995c54f18e24@fried.cc> References: <8796a094-4aca-f674-e075-995c54f18e24@fried.cc> Message-ID: <7a0cf62a-8f14-b9cc-4b13-e6c4c848f34f@fried.cc> Correction (thanks Matt): > [5] > https://review.opendev.org/#/q/project:openstack/nova-specs+status:open+path:%255Especs/train/approved/.*+reviewedby:self > should have been: [5] https://review.opendev.org/#/q/project:openstack/nova-specs+status:open+path:%255Especs/train/approved/.*+reviewer:self From tony at bakeyournoodle.com Wed Jun 26 20:51:49 2019 From: tony at bakeyournoodle.com (Tony Breeds) Date: Thu, 27 Jun 2019 06:51:49 +1000 Subject: [stable][sahara] Changes to the stable team In-Reply-To: <1571571.VMcRbnv6ix@whitebase.usersys.redhat.com> References: <20190626042646.GF11501@thor.bakeyournoodle.com> <1571571.VMcRbnv6ix@whitebase.usersys.redhat.com> Message-ID: <20190626205148.GH11501@thor.bakeyournoodle.com> On Wed, Jun 26, 2019 at 10:21:42AM +0200, Luigi Toscano wrote: > On Wednesday, 26 June 2019 06:26:46 CEST Tony Breeds wrote: > > Hello all, > > In the middle of last week Telles asked[1] for a couple of the > > current Sahara core team (Jeremy and Luigi) to be added to the stable > > group. > > > > Upon review there isn't enough data (either reviews or backports) to > > verify that Jeremy and Luigi understand the stable policy. > > > Hi, > > do you mean that the request did not contain enough data to evaluate us, or > that the previous activity is not enough to assess whether we understand it? The latter. Between backports and reviews it's hard to judge the understanding of the stable policy. Yours Tony. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From corey.bryant at canonical.com Wed Jun 26 21:44:34 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Wed, 26 Jun 2019 17:44:34 -0400 Subject: [goal][python3] Train unit tests weekly update (goal-11) Message-ID: This is the goal-11 weekly update for the "Update Python 3 test runtimes for Train" goal [1]. There are 11 weeks remaining for completion of Train community goals [2]. == What's the Goal? == To ensure (in the Train cycle) that all official OpenStack repositories with Python 3 unit tests are exclusively using the 'openstack-python3-train-jobs' Zuul template or one of its variants (e.g. 'openstack-python3-train-jobs-neutron') to run unit tests, and that tests are passing. This will ensure that all official projects are running py36 and py37 unit tests in Train. For complete details please see [1]. == Ongoing Work == Open patches needing reviews: https://review.openstack.org/#/q/topic:python3-train+is:open Failing patches: https://review.openstack.org/#/q/topic:python3-train+status:open+(+label:Verified-1+OR+label:Verified-2+) Patch automation scripts needing review: https://review.opendev.org/#/c/666934 Some notes on 2 issues I came across this week: 1) I've updated the patch automation scripts to drop py34,py35,py36 and add py37 to default tox targets as discussed with smcginnis, fungi, and gmann in #openstack-tc this week. 2) While trying to proposed and urge the move to py37-only default tox targets, some projects have decided to keep both py36 and py37 targets and some projects have decided to run with the generic py3-only target. == Completed Work == Merged patches: https://review.openstack.org/#/q/topic:python3-train+is:merged == How can you help? == Please take a look at the failing patches and help fix any failing unit tests for your project(s). Python 3.7 unit tests will be self-testing in Zuul. If you're interested in helping submit patches, please let me know. == Reference Material == [1] Goal description: https://governance.openstack.org/tc/goals /train/python3-updates.html [2] Train release schedule: https://releases.openstack.org/train/schedule.html (see R-5 for "Train Community Goals Completed") Storyboard: https://storyboard.openstack.org/#!/story/2005924 Porting to Python 3.7: https://docs.python.org/3/whatsnew/3.7.html#porting-to-python-3-7 Python Update Process: https://opendev.org/openstack/governance/src/branch/master/resolutions/20181024-python-update-process.rst Train runtimes: https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/train.rst Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From duc.openstack at gmail.com Thu Jun 27 00:11:23 2019 From: duc.openstack at gmail.com (Duc Truong) Date: Wed, 26 Jun 2019 17:11:23 -0700 Subject: [dev][requirements] Upcoming changes to constraints handling in tox.ini In-Reply-To: <20190522030203.GD15808@thor.bakeyournoodle.com> References: <20190522030203.GD15808@thor.bakeyournoodle.com> Message-ID: On Tue, May 21, 2019 at 8:02 PM Tony Breeds wrote: > > Hi folks, > This is a heads-up to describe 3 sets of changes you'll start seeing > starting next week. > > 1) lower-constraints.txt handling > TL;DR: Make sure projects do not specify a constraint file in install_command > 2) Switch to the new canonical constraints URL on master > TR;DR: Make sure you use https://releases.openstack.org/constraints/upper/master > 3) Switch to the new canonical constraints URL on stable branches > TR;DR: Make sure you use https://releases.openstack.org/constraints/upper/$series > > These will be generated from a member of the requirements team[1], and > will be on the gerrit topic constraints-updates. We'll start next week > to give y'all a few days to digest this email > I'm seeing a lot of changes for #2 being proposed by people who are not members of the requirements team (e.g. [1] and [2]). Is it ok to approve those changes or should we wait for the official changes from a member of the requirements team? [1] https://review.opendev.org/#/c/666947/ [2] https://review.opendev.org/#/c/666950/ From feilong at catalyst.net.nz Thu Jun 27 00:55:03 2019 From: feilong at catalyst.net.nz (Feilong) Date: Thu, 27 Jun 2019 08:55:03 +0800 Subject: [dev][magnum] Magnum event notifications not having relevant content related to clusters References: Message-ID: An HTML attachment was scrubbed... URL: From ssbarnea at redhat.com Thu Jun 27 07:46:38 2019 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Thu, 27 Jun 2019 08:46:38 +0100 Subject: [tc] Assuming control of GitHub organizations In-Reply-To: References: Message-ID: <2362e71a-2d39-4b01-af1c-3ed88c1398ca@Spark> My personal take on this is that it would be safer to use personal account with 2FA to manage these and also to make mandatory 2FA for all org members. Use of shared accounts is more risky as they are not as traceable as personal ones. Overall I salute the idea as I think that there are few projects that would love to be able to use github issue tracker. -- sorin On 26 Jun 2019, 19:41 +0100, Jim Rollenhagen , wrote: > Hi, > > The opendev team reached out to me about handing off administrative access of > the "openstack" and related organizations on GitHub. They think it would be > best if the TC took control of that, or at least took control of delegating > that access. In general, the goal here is to support OpenStack's presence and > visibility on GitHub. > > Per Jim Blair: > > > In the long run, this shouldn't entail a lot of work, generally creating new > > repos on GitHub to accept mirroring from opendev systems, performing renames, > > handling transfer requests when repos move out of the openstack namespace, > > setting and updating descriptions, and curating the list of pinned > > repositories. > > In the short term, we have some archiving and moving to do.[0] > > Do TC members want to manage this, or should we delegate? > > One thing to figure out is how to grant that access. The opendev team uses a > shared account with two-factor authentication provided by a shared shell > account. This mitigates accidental pushes or settings changes when an admin is > using their usual GitHub account. The TC (or its delegates) probably doesn't > have a shared shell account to do this with. Some options: > > * each admin creates a second GitHub account for this purpose use a shared > * account without 2FA use a shared account with 2FA, share the one time secret > * with everyone to configure their own token generator use personal accounts > * but be very careful > > Thoughts on these options? > > [0] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006829.html > > // jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From hemant.sonawane at itera.io Thu Jun 27 07:55:44 2019 From: hemant.sonawane at itera.io (Hemant Sonawane) Date: Thu, 27 Jun 2019 09:55:44 +0200 Subject: [glance] Openstack Glance Image custom properties issue Message-ID: Hello, My self Hemant Sonawane. Recently I am working on openstack and I am getting some strange error when I installed openstack dashboard and tried to edit the visibility of images and update their metadata it dosent work I get *Forbidden : Redirecting to login *and *Unable to edit the image custom properties *such errors. I also tried to see the glance-api logs and found this *"2019-06-26 13:11:51,705.705 1 WARNING glance.api.v2.images [-] Could not find schema properties file schema-image.json. Continuing without custom properties"* Is there any glance policy in horizon? that might cause this error I think. The detail log file attached for your ready reference. So please look into my concern and let me know if there is any solution to resolve my issue. Thanks and Regards, Hemant Sonawane -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- + COMMAND=start + start + exec glance-api --config-file /etc/glance/glance-api.conf /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) 2019-06-26 13:11:51,705.705 1 WARNING glance.api.v2.images [-] Could not find schema properties file schema-image.json. Continuing without custom properties /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py:22: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. return pkg_resources.EntryPoint.parse("x=" + s).load(False) /var/lib/openstack/local/lib/python2.7/site-packages/paste/deploy/util.py:55: DeprecationWarning: Using function/method 'Healthcheck.factory()' is deprecated: The healthcheck middleware must now be configured as an application, not as a filter val = callable(*args, **kw) 2019-06-26 13:11:52,772.772 1 INFO glance.common.wsgi [-] Starting 1 workers 2019-06-26 13:11:52,776.776 1 INFO glance.common.wsgi [-] Started child 13 /var/lib/openstack/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:1336: OsloDBDeprecationWarning: EngineFacade is deprecated; please use oslo_db.sqlalchemy.enginefacade expire_on_commit=expire_on_commit, _conf=conf) From mark at stackhpc.com Thu Jun 27 08:51:53 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 27 Jun 2019 09:51:53 +0100 Subject: [kolla-ansible] migration In-Reply-To: References: Message-ID: On Wed, 26 Jun 2019 at 19:34, Ignazio Cassano wrote: > > Hello, > Anyone have tried to migrate an existing openstack installation to kolla containers? Hi, I'm aware of two people currently working on that. Gregory Orange and one of my colleagues, Pierre Riteau. Pierre is away currently, so I hope he doesn't mind me quoting him from an email to Gregory. Mark "I am indeed working on a similar migration using Kolla Ansible with Kayobe, starting from a non-containerised OpenStack deployment based on CentOS RPMs. Existing OpenStack services are deployed across several controller nodes and all sit behind HAProxy, including for internal endpoints. We have additional controller nodes that we use to deploy containerised services. If you don't have the luxury of additional nodes, it will be more difficult as you will need to avoid processes clashing when listening on the same port. The method I am using resembles your second suggestion, however I am deploying only one containerised service at a time, in order to validate each of them independently. I use the --tags option of kolla-ansible to restrict Ansible to specific roles, and when I am happy with the resulting configuration I update HAProxy to point to the new controllers. As long as the configuration matches, this should be completely transparent for purely HTTP-based services like Glance. You need to be more careful with services that include components listening for RPC, such as Nova: if the new nova.conf is incorrect and you've deployed a nova-conductor that uses it, you could get failed instances launches. Some roles depend on others: if you are deploying the neutron-openvswitch-agent, you need to run the openvswitch role as well. I suggest starting with migrating Glance as it doesn't have any internal services and is easy to validate. Note that properly migrating Keystone requires keeping existing Fernet keys around, so any token stays valid until the time it is expected to stop working (which is fairly complex, see https://bugs.launchpad.net/kolla-ansible/+bug/1809469). While initially I was using an approach similar to your first suggestion, it can have side effects since Kolla Ansible uses these variables when templating configuration. As an example, most services will only have notifications enabled if enable_ceilometer is true. I've added existing control plane nodes to the Kolla Ansible inventory as separate groups, which allows me to use the existing database and RabbitMQ for the containerised services. For example, instead of: [mariadb:children] control you may have: [mariadb:children] oldcontrol_db I still have to perform the migration of these underlying services to the new control plane, I will let you know if there is any hurdle. A few random things to note: - if run on existing control plane hosts, the baremetal role removes some packages listed in `redhat_pkg_removals` which can trigger the removal of OpenStack dependencies using them! I've changed this variable to an empty list. - compare your existing deployment with a Kolla Ansible one to check for differences in endpoints, configuration files, database users, service users, etc. For Heat, Kolla uses the domain heat_user_domain, while your existing deployment may use another one (and this is hardcoded in the Kolla Heat image). Kolla Ansible uses the "service" project while a couple of deployments I worked with were using "services". This shouldn't matter, except there was a bug in Kolla which prevented it from setting the roles correctly: https://bugs.launchpad.net/kolla/+bug/1791896 (now fixed in latest Rocky and Queens images) - the ml2_conf.ini generated for Neutron generates physical network names like physnet1, physnet2… you may want to override bridge_mappings completely. - although sometimes it could be easier to change your existing deployment to match Kolla Ansible settings, rather than configure Kolla Ansible to match your deployment." > Thanks > Ignazio > From thierry at openstack.org Thu Jun 27 08:55:25 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 27 Jun 2019 10:55:25 +0200 Subject: [tc] Assuming control of GitHub organizations In-Reply-To: References: Message-ID: Jim Rollenhagen wrote: > The opendev team reached out to me about handing off administrative access of > the "openstack" and related organizations on GitHub. They think it would be > best if the TC took control of that, or at least took control of delegating > that access. In general, the goal here is to support OpenStack's > presence and visibility on GitHub. > [...] > > Do TC members want to manage this, or should we delegate? I have been considering our GitHub presence as a downstream "code marketing" property, a sort of front-end or entry point into the OpenStack universe for outsiders. As such, I'd consider it much closer to openstack.org/software than to opendev.org/openstack. So one way to do this would be to ask Foundation staff to maintain this code marketing property, taking care of aligning message with the content at openstack.org/software (which is driven from the osf/openstack-map repository). If we handle it at TC-level my fear is that we would duplicate work around things like project descriptions and what is pinned, and end up with slightly different messages. > One thing to figure out is how to grant that access. The opendev team uses a > shared account with two-factor authentication provided by a shared shell > account. This mitigates accidental pushes or settings changes when an > admin is > using their usual GitHub account. The TC (or its delegates) probably doesn't > have a shared shell account to do this with. Some options: > > * each admin creates a second GitHub account for this purpose use a shared > * account without 2FA use a shared account with 2FA, share the one time > secret > * with everyone to configure their own token generator use personal accounts > * but be very careful > > Thoughts on these options? I'd do a limited number of personal accounts, all with 2FA. -- Thierry Carrez (ttx) From ignaziocassano at gmail.com Thu Jun 27 11:10:36 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 27 Jun 2019 13:10:36 +0200 Subject: [kolla-ansible] migration In-Reply-To: References: Message-ID: Many thanks. I hope you'll write come docs at the end Ignazio Il Gio 27 Giu 2019 10:52 Mark Goddard ha scritto: > On Wed, 26 Jun 2019 at 19:34, Ignazio Cassano > wrote: > > > > Hello, > > Anyone have tried to migrate an existing openstack installation to kolla > containers? > > Hi, > > I'm aware of two people currently working on that. Gregory Orange and > one of my colleagues, Pierre Riteau. Pierre is away currently, so I > hope he doesn't mind me quoting him from an email to Gregory. > > Mark > > "I am indeed working on a similar migration using Kolla Ansible with > Kayobe, starting from a non-containerised OpenStack deployment based > on CentOS RPMs. > Existing OpenStack services are deployed across several controller > nodes and all sit behind HAProxy, including for internal endpoints. > We have additional controller nodes that we use to deploy > containerised services. If you don't have the luxury of additional > nodes, it will be more difficult as you will need to avoid processes > clashing when listening on the same port. > > The method I am using resembles your second suggestion, however I am > deploying only one containerised service at a time, in order to > validate each of them independently. > I use the --tags option of kolla-ansible to restrict Ansible to > specific roles, and when I am happy with the resulting configuration I > update HAProxy to point to the new controllers. > > As long as the configuration matches, this should be completely > transparent for purely HTTP-based services like Glance. You need to be > more careful with services that include components listening for RPC, > such as Nova: if the new nova.conf is incorrect and you've deployed a > nova-conductor that uses it, you could get failed instances launches. > Some roles depend on others: if you are deploying the > neutron-openvswitch-agent, you need to run the openvswitch role as > well. > > I suggest starting with migrating Glance as it doesn't have any > internal services and is easy to validate. Note that properly > migrating Keystone requires keeping existing Fernet keys around, so > any token stays valid until the time it is expected to stop working > (which is fairly complex, see > https://bugs.launchpad.net/kolla-ansible/+bug/1809469). > > While initially I was using an approach similar to your first > suggestion, it can have side effects since Kolla Ansible uses these > variables when templating configuration. As an example, most services > will only have notifications enabled if enable_ceilometer is true. > > I've added existing control plane nodes to the Kolla Ansible inventory > as separate groups, which allows me to use the existing database and > RabbitMQ for the containerised services. > For example, instead of: > > [mariadb:children] > control > > you may have: > > [mariadb:children] > oldcontrol_db > > I still have to perform the migration of these underlying services to > the new control plane, I will let you know if there is any hurdle. > > A few random things to note: > > - if run on existing control plane hosts, the baremetal role removes > some packages listed in `redhat_pkg_removals` which can trigger the > removal of OpenStack dependencies using them! I've changed this > variable to an empty list. > - compare your existing deployment with a Kolla Ansible one to check > for differences in endpoints, configuration files, database users, > service users, etc. For Heat, Kolla uses the domain heat_user_domain, > while your existing deployment may use another one (and this is > hardcoded in the Kolla Heat image). Kolla Ansible uses the "service" > project while a couple of deployments I worked with were using > "services". This shouldn't matter, except there was a bug in Kolla > which prevented it from setting the roles correctly: > https://bugs.launchpad.net/kolla/+bug/1791896 (now fixed in latest > Rocky and Queens images) > - the ml2_conf.ini generated for Neutron generates physical network > names like physnet1, physnet2… you may want to override > bridge_mappings completely. > - although sometimes it could be easier to change your existing > deployment to match Kolla Ansible settings, rather than configure > Kolla Ansible to match your deployment." > > > Thanks > > Ignazio > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-philippe at evrard.me Thu Jun 27 11:53:49 2019 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Thu, 27 Jun 2019 13:53:49 +0200 Subject: [tc] Assuming control of GitHub organizations In-Reply-To: References: Message-ID: <3cbe6e8b-2ebd-4097-a929-5b9fc495b702@www.fastmail.com> > I'd do a limited number of personal accounts, all with 2FA. Same. From gr at ham.ie Thu Jun 27 12:23:36 2019 From: gr at ham.ie (Graham Hayes) Date: Thu, 27 Jun 2019 13:23:36 +0100 Subject: [tc] Assuming control of GitHub organizations In-Reply-To: References: Message-ID: <10270aaf-9f4e-80b0-8e40-760d7c52dc0d@ham.ie> On 27/06/2019 09:55, Thierry Carrez wrote: > Jim Rollenhagen wrote: >> The opendev team reached out to me about handing off administrative >> access of >> the "openstack" and related organizations on GitHub. They think it >> would be >> best if the TC took control of that, or at least took control of >> delegating >> that access. In general, the goal here is to support OpenStack's >> presence and visibility on GitHub. >> [...] >> >> Do TC members want to manage this, or should we delegate? I think we should manage it, but possibly allow the foundation to manage parts of it (pinning, descriptions, etc). For setting up syncing / creation of projects we should look at keeping that under the TC (that could be the TC or other group of people that step up) > > I have been considering our GitHub presence as a downstream "code > marketing" property, a sort of front-end or entry point into the > OpenStack universe for outsiders. As such, I'd consider it much closer > to openstack.org/software than to opendev.org/openstack. > > So one way to do this would be to ask Foundation staff to maintain this > code marketing property, taking care of aligning message with the > content at openstack.org/software (which is driven from the > osf/openstack-map repository). > > If we handle it at TC-level my fear is that we would duplicate work > around things like project descriptions and what is pinned, and end up > with slightly different messages. I am not as concerned about this, the TC should be setting out our viewpoint for the project, and if this is in conflict with the message from the foundation, we have plenty of avenues to raise it. > >> One thing to figure out is how to grant that access. The opendev team >> uses a >> shared account with two-factor authentication provided by a shared shell >> account. This mitigates accidental pushes or settings changes when an >> admin is >> using their usual GitHub account. The TC (or its delegates) probably >> doesn't >> have a shared shell account to do this with. Some options: >> >> * each admin creates a second GitHub account for this purpose use a >> shared >> * account without 2FA use a shared account with 2FA, share the one >> time secret >> * with everyone to configure their own token generator use personal >> accounts >> * but be very careful >> >> Thoughts on these options? > > I'd do a limited number of personal accounts, all with 2FA. I would do it with personal accounts, but require 2FA, and explicit opt-in from TC / SIG / $group managing it. We should look at automating as much as possible of course, and have it ran by shared account that can be held in trust* as a break glass account if the needs arise in the future, but that is a longer term project. * trustee tbc > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From thierry at openstack.org Thu Jun 27 12:53:29 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 27 Jun 2019 14:53:29 +0200 Subject: [tc] Assuming control of GitHub organizations In-Reply-To: <10270aaf-9f4e-80b0-8e40-760d7c52dc0d@ham.ie> References: <10270aaf-9f4e-80b0-8e40-760d7c52dc0d@ham.ie> Message-ID: <8848aa44-38f7-22a7-3b9c-3f74a396b9e6@openstack.org> Graham Hayes wrote: > On 27/06/2019 09:55, Thierry Carrez wrote: >> Jim Rollenhagen wrote: >>> The opendev team reached out to me about handing off administrative >>> access of >>> the "openstack" and related organizations on GitHub. They think it >>> would be >>> best if the TC took control of that, or at least took control of >>> delegating >>> that access. In general, the goal here is to support OpenStack's >>> presence and visibility on GitHub. >>> [...] >>> >>> Do TC members want to manage this, or should we delegate? > > I think we should manage it, but possibly allow the foundation to manage > parts of it (pinning, descriptions, etc). > > For setting up syncing / creation of projects we should look at keeping > that under the TC (that could be the TC or other group of people that > step up) I would be fine with that -- import descriptions from openstack-map to avoid having to keep two separate sets up to date. In all cases the TC should set the rules defining how "OpenStack" should look like there. I offered Foundation resources because we may have trouble finding volunteers to follow those rules and keep that publication clean over time. That involves, for each new repo created in OpenStack, deciding if it's one we would replicate or pin, and create a GitHub-side repo. For each rename or deprecation, push the corresponding change. Not sure how much of that boring job can be automated ? -- Thierry Carrez (ttx) From thierry at openstack.org Thu Jun 27 12:54:11 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 27 Jun 2019 14:54:11 +0200 Subject: [tc] Assuming control of GitHub organizations In-Reply-To: <10270aaf-9f4e-80b0-8e40-760d7c52dc0d@ham.ie> References: <10270aaf-9f4e-80b0-8e40-760d7c52dc0d@ham.ie> Message-ID: <9a4fdd56-7af8-c432-6ade-d52c163e8b8c@openstack.org> Graham Hayes wrote: > On 27/06/2019 09:55, Thierry Carrez wrote: >> I have been considering our GitHub presence as a downstream "code >> marketing" property, a sort of front-end or entry point into the >> OpenStack universe for outsiders. As such, I'd consider it much closer >> to openstack.org/software than to opendev.org/openstack. >> >> So one way to do this would be to ask Foundation staff to maintain this >> code marketing property, taking care of aligning message with the >> content at openstack.org/software (which is driven from the >> osf/openstack-map repository). >> >> If we handle it at TC-level my fear is that we would duplicate work >> around things like project descriptions and what is pinned, and end up >> with slightly different messages. > > I am not as concerned about this, the TC should be setting out our > viewpoint for the project, and if this is in conflict with the message > from the foundation, we have plenty of avenues to raise it. How about the TC controls which repo is replicated where (and which ones are pinned etc), but we import the descriptions from the openstack-map repo? That would keep control on the TC side but avoid duplication of effort. In my experience it's already difficult to get projects to update descriptions in one place, so two... Also, who is volunteering for setting up the replication, and then keeping track of things as they evolve ? -- Thierry Carrez (ttx) From thierry at openstack.org Thu Jun 27 12:56:50 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 27 Jun 2019 14:56:50 +0200 Subject: [tc] Assuming control of GitHub organizations In-Reply-To: <9a4fdd56-7af8-c432-6ade-d52c163e8b8c@openstack.org> References: <10270aaf-9f4e-80b0-8e40-760d7c52dc0d@ham.ie> <9a4fdd56-7af8-c432-6ade-d52c163e8b8c@openstack.org> Message-ID: <00e72879-e585-cb1f-ea6a-1ab3021c7809@openstack.org> Thierry Carrez wrote: > Graham Hayes wrote: >> On 27/06/2019 09:55, Thierry Carrez wrote: >>> I have been considering our GitHub presence as a downstream "code >>> marketing" property, a sort of front-end or entry point into the >>> OpenStack universe for outsiders. As such, I'd consider it much closer >>> to openstack.org/software than to opendev.org/openstack. >>> >>> So one way to do this would be to ask Foundation staff to maintain this >>> code marketing property, taking care of aligning message with the >>> content at openstack.org/software (which is driven from the >>> osf/openstack-map repository). >>> >>> If we handle it at TC-level my fear is that we would duplicate work >>> around things like project descriptions and what is pinned, and end up >>> with slightly different messages. >> >> I am not as concerned about this, the TC should be setting out our >> viewpoint for the project, and if this is in conflict with the message >> from the foundation, we have plenty of avenues to raise it. > > How about the TC controls which repo is replicated where (and which ones > are pinned etc), but we import the descriptions from the openstack-map > repo? > > That would keep control on the TC side but avoid duplication of effort. > In my experience it's already difficult to get projects to update > descriptions in one place, so two... > > Also, who is volunteering for setting up the replication, and then > keeping track of things as they evolve ? Oops, duplicate send. Ignore me. -- Thierry Carrez (ttx) From sean.mcginnis at gmx.com Thu Jun 27 13:04:19 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 27 Jun 2019 08:04:19 -0500 Subject: [dev][requirements] Upcoming changes to constraints handling in tox.ini In-Reply-To: References: <20190522030203.GD15808@thor.bakeyournoodle.com> Message-ID: <20190627130419.GA30194@sm-workstation> On Wed, Jun 26, 2019 at 05:11:23PM -0700, Duc Truong wrote: > On Tue, May 21, 2019 at 8:02 PM Tony Breeds wrote: > > > > Hi folks, > > This is a heads-up to describe 3 sets of changes you'll start seeing > > starting next week. > > > > 1) lower-constraints.txt handling > > TL;DR: Make sure projects do not specify a constraint file in install_command > > 2) Switch to the new canonical constraints URL on master > > TR;DR: Make sure you use https://releases.openstack.org/constraints/upper/master > > 3) Switch to the new canonical constraints URL on stable branches > > TR;DR: Make sure you use https://releases.openstack.org/constraints/upper/$series > > > > These will be generated from a member of the requirements team[1], and > > will be on the gerrit topic constraints-updates. We'll start next week > > to give y'all a few days to digest this email > > > > I'm seeing a lot of changes for #2 being proposed by people who are > not members of the requirements team (e.g. [1] and [2]). > > Is it ok to approve those changes or should we wait for the official > changes from a member of the requirements team? > > [1] https://review.opendev.org/#/c/666947/ > [2] https://review.opendev.org/#/c/666950/ > Thanks for checking Duc. I think as long as they are using the correct URL, it should be fine to approve those patches. We may end up with some duplicates if we run our script to create patches for repos that have not been updated yet, but those can just be abandoned. Thanks! Sean From ignaziocassano at gmail.com Thu Jun 27 13:21:04 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 27 Jun 2019 15:21:04 +0200 Subject: [kolla-ansible] migration In-Reply-To: References: Message-ID: Hello Mark, let me to verify if I understood your method. You have old controllers,haproxy,mariadb and nova computes. You installed three new controllers but kolla.ansible inventory contains old mariadb and old rabbit servers. You are deployng single service on new controllers staring with glance. When you deploy glance on new controllers, it changes the glance endpoint on old mariadb db ? Regards Ignazio Il giorno gio 27 giu 2019 alle ore 10:52 Mark Goddard ha scritto: > On Wed, 26 Jun 2019 at 19:34, Ignazio Cassano > wrote: > > > > Hello, > > Anyone have tried to migrate an existing openstack installation to kolla > containers? > > Hi, > > I'm aware of two people currently working on that. Gregory Orange and > one of my colleagues, Pierre Riteau. Pierre is away currently, so I > hope he doesn't mind me quoting him from an email to Gregory. > > Mark > > "I am indeed working on a similar migration using Kolla Ansible with > Kayobe, starting from a non-containerised OpenStack deployment based > on CentOS RPMs. > Existing OpenStack services are deployed across several controller > nodes and all sit behind HAProxy, including for internal endpoints. > We have additional controller nodes that we use to deploy > containerised services. If you don't have the luxury of additional > nodes, it will be more difficult as you will need to avoid processes > clashing when listening on the same port. > > The method I am using resembles your second suggestion, however I am > deploying only one containerised service at a time, in order to > validate each of them independently. > I use the --tags option of kolla-ansible to restrict Ansible to > specific roles, and when I am happy with the resulting configuration I > update HAProxy to point to the new controllers. > > As long as the configuration matches, this should be completely > transparent for purely HTTP-based services like Glance. You need to be > more careful with services that include components listening for RPC, > such as Nova: if the new nova.conf is incorrect and you've deployed a > nova-conductor that uses it, you could get failed instances launches. > Some roles depend on others: if you are deploying the > neutron-openvswitch-agent, you need to run the openvswitch role as > well. > > I suggest starting with migrating Glance as it doesn't have any > internal services and is easy to validate. Note that properly > migrating Keystone requires keeping existing Fernet keys around, so > any token stays valid until the time it is expected to stop working > (which is fairly complex, see > https://bugs.launchpad.net/kolla-ansible/+bug/1809469). > > While initially I was using an approach similar to your first > suggestion, it can have side effects since Kolla Ansible uses these > variables when templating configuration. As an example, most services > will only have notifications enabled if enable_ceilometer is true. > > I've added existing control plane nodes to the Kolla Ansible inventory > as separate groups, which allows me to use the existing database and > RabbitMQ for the containerised services. > For example, instead of: > > [mariadb:children] > control > > you may have: > > [mariadb:children] > oldcontrol_db > > I still have to perform the migration of these underlying services to > the new control plane, I will let you know if there is any hurdle. > > A few random things to note: > > - if run on existing control plane hosts, the baremetal role removes > some packages listed in `redhat_pkg_removals` which can trigger the > removal of OpenStack dependencies using them! I've changed this > variable to an empty list. > - compare your existing deployment with a Kolla Ansible one to check > for differences in endpoints, configuration files, database users, > service users, etc. For Heat, Kolla uses the domain heat_user_domain, > while your existing deployment may use another one (and this is > hardcoded in the Kolla Heat image). Kolla Ansible uses the "service" > project while a couple of deployments I worked with were using > "services". This shouldn't matter, except there was a bug in Kolla > which prevented it from setting the roles correctly: > https://bugs.launchpad.net/kolla/+bug/1791896 (now fixed in latest > Rocky and Queens images) > - the ml2_conf.ini generated for Neutron generates physical network > names like physnet1, physnet2… you may want to override > bridge_mappings completely. > - although sometimes it could be easier to change your existing > deployment to match Kolla Ansible settings, rather than configure > Kolla Ansible to match your deployment." > > > Thanks > > Ignazio > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sombrafam at gmail.com Thu Jun 27 13:28:57 2019 From: sombrafam at gmail.com (Erlon Cruz) Date: Thu, 27 Jun 2019 10:28:57 -0300 Subject: [cinder] Deprecating driver versions Message-ID: Hey folks, Driver versions has being a source of a lot of confusions with costumers. Most of our drivers have a version number and history that are updated as the developers adds new fixes and features. Drivers also have a VERSION variable in the version class that should be bumped by developers. The problem with that is: - sometimes folks from the community just push patches on drivers, and its hard to bump every vendor version correctly; - that relies in the human factor to remember adding it, and usually that fails; - if we create a bugfix and bump the version, the backport to older branches will carry the version, which will not reflect the correct driver code; So, the solution I'm proposing for this is that we use the Cinder versions[1] and remove all version strings for drivers. Every new release we get a version. For stable versions, from time to time the PTL bumps the stable version and we have an accurate ways to describe the code. If we need to backport and send something to the costumer, we can do the backport, poke the PTL, and he will generate another version which can be downloaded on github or via PIP, and present the version to our costumers. So, what are your thought around this? Anyone else has had problems with that? What would be the implications of removing the driver version strings? Erlon [1] https://releases.openstack.org/teams/cinder.html [2] https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/solidfire.py#L237 -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Thu Jun 27 13:36:26 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 27 Jun 2019 08:36:26 -0500 Subject: [nova] TPM passthrough In-Reply-To: <9c98c459-2a91-bbaf-154c-a64c0cd93a71@gmail.com> Message-ID: <50a60bfe-42ab-f916-ce31-db19afac78ed@fried.cc> Folks- I've filed a blueprint [1] and started on a spec [2]. The latter needs love from someone who understands the low-level aspects better -- I left a bunch of todos. It would be nice to have this ready for next Tuesday [3] since we're getting pretty close to spec freeze. I also started some (very rough) code [4] to show how this could work. efried [1] https://blueprints.launchpad.net/nova/+spec/physical-tpm-passthrough [2] https://review.opendev.org/667926 [3] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007381.html [4] https://review.opendev.org/667928 From ignaziocassano at gmail.com Thu Jun 27 13:46:24 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 27 Jun 2019 15:46:24 +0200 Subject: [kolla-ansible] migration In-Reply-To: References: Message-ID: Sorry, for my question. It does not need to change anything because endpoints refer to haproxy vips. So if your new glance works fine you change haproxy backends for glance. Regards Ignazio Il giorno gio 27 giu 2019 alle ore 15:21 Ignazio Cassano < ignaziocassano at gmail.com> ha scritto: > Hello Mark, > let me to verify if I understood your method. > > You have old controllers,haproxy,mariadb and nova computes. > You installed three new controllers but kolla.ansible inventory contains > old mariadb and old rabbit servers. > You are deployng single service on new controllers staring with glance. > When you deploy glance on new controllers, it changes the glance endpoint > on old mariadb db ? > Regards > Ignazio > > Il giorno gio 27 giu 2019 alle ore 10:52 Mark Goddard > ha scritto: > >> On Wed, 26 Jun 2019 at 19:34, Ignazio Cassano >> wrote: >> > >> > Hello, >> > Anyone have tried to migrate an existing openstack installation to >> kolla containers? >> >> Hi, >> >> I'm aware of two people currently working on that. Gregory Orange and >> one of my colleagues, Pierre Riteau. Pierre is away currently, so I >> hope he doesn't mind me quoting him from an email to Gregory. >> >> Mark >> >> "I am indeed working on a similar migration using Kolla Ansible with >> Kayobe, starting from a non-containerised OpenStack deployment based >> on CentOS RPMs. >> Existing OpenStack services are deployed across several controller >> nodes and all sit behind HAProxy, including for internal endpoints. >> We have additional controller nodes that we use to deploy >> containerised services. If you don't have the luxury of additional >> nodes, it will be more difficult as you will need to avoid processes >> clashing when listening on the same port. >> >> The method I am using resembles your second suggestion, however I am >> deploying only one containerised service at a time, in order to >> validate each of them independently. >> I use the --tags option of kolla-ansible to restrict Ansible to >> specific roles, and when I am happy with the resulting configuration I >> update HAProxy to point to the new controllers. >> >> As long as the configuration matches, this should be completely >> transparent for purely HTTP-based services like Glance. You need to be >> more careful with services that include components listening for RPC, >> such as Nova: if the new nova.conf is incorrect and you've deployed a >> nova-conductor that uses it, you could get failed instances launches. >> Some roles depend on others: if you are deploying the >> neutron-openvswitch-agent, you need to run the openvswitch role as >> well. >> >> I suggest starting with migrating Glance as it doesn't have any >> internal services and is easy to validate. Note that properly >> migrating Keystone requires keeping existing Fernet keys around, so >> any token stays valid until the time it is expected to stop working >> (which is fairly complex, see >> https://bugs.launchpad.net/kolla-ansible/+bug/1809469). >> >> While initially I was using an approach similar to your first >> suggestion, it can have side effects since Kolla Ansible uses these >> variables when templating configuration. As an example, most services >> will only have notifications enabled if enable_ceilometer is true. >> >> I've added existing control plane nodes to the Kolla Ansible inventory >> as separate groups, which allows me to use the existing database and >> RabbitMQ for the containerised services. >> For example, instead of: >> >> [mariadb:children] >> control >> >> you may have: >> >> [mariadb:children] >> oldcontrol_db >> >> I still have to perform the migration of these underlying services to >> the new control plane, I will let you know if there is any hurdle. >> >> A few random things to note: >> >> - if run on existing control plane hosts, the baremetal role removes >> some packages listed in `redhat_pkg_removals` which can trigger the >> removal of OpenStack dependencies using them! I've changed this >> variable to an empty list. >> - compare your existing deployment with a Kolla Ansible one to check >> for differences in endpoints, configuration files, database users, >> service users, etc. For Heat, Kolla uses the domain heat_user_domain, >> while your existing deployment may use another one (and this is >> hardcoded in the Kolla Heat image). Kolla Ansible uses the "service" >> project while a couple of deployments I worked with were using >> "services". This shouldn't matter, except there was a bug in Kolla >> which prevented it from setting the roles correctly: >> https://bugs.launchpad.net/kolla/+bug/1791896 (now fixed in latest >> Rocky and Queens images) >> - the ml2_conf.ini generated for Neutron generates physical network >> names like physnet1, physnet2… you may want to override >> bridge_mappings completely. >> - although sometimes it could be easier to change your existing >> deployment to match Kolla Ansible settings, rather than configure >> Kolla Ansible to match your deployment." >> >> > Thanks >> > Ignazio >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Jun 27 13:48:14 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 27 Jun 2019 14:48:14 +0100 Subject: performance issue with opendev gitea interface. Message-ID: <5943466bd0d09b6a23f160381944dd9e0c772191.camel@redhat.com> i have started this as a separate thread form the Github organization management but it has been in the back of my mind as we are considering not syncing to gitub going forward. for larger project like nova https://opendev.org/openstack/nova preforms quite poorly in both firefox and chome i am seeing ~ 15second before the first response form opendev.org for nova. os-vif or other smaller projects seem to respond ok but for nova it makes navigating the code or liking code to other via opendev quite hard. i brought this up on the infra irc a few weeks ago and asked if gitea had any kind of caching and while the initial response was "we do not believe so". before archiving or stopping syncing to github i was wondering if we could explore options to improve the performace. if opendev.org is not currently fronted by a cdn perhaps that would help. Similarly looking at https://docs.gitea.io/en-us/config-cheat-sheet/ it may be possible to either change the cache or database parameters to improve performance. I really don't know how gitea has been deployed but at present the web interface is not usable in nova in a responsive manner so i have continued to use github when linking code to others but it would be nice to be able to use opendev instead. regards sean. From gmann at ghanshyammann.com Thu Jun 27 13:50:37 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 27 Jun 2019 22:50:37 +0900 Subject: [nova] API updates week 19-26 Message-ID: <16b9933cc31.128bc267433478.8660435979934439908@ghanshyammann.com> Hello Everyone, Please find the Nova API updates of this week. API Related BP : ============ COMPLETED: 1. Support adding description while locking an instance: - https://blueprints.launchpad.net/nova/+spec/add-locked-reason Code Ready for Review: ------------------------------ 1. Add host and hypervisor_hostname flag to create server - Topic: https://review.opendev.org/#/q/topic:bp/add-host-and-hypervisor-hostname-flag-to-create-server+(status:open+OR+status:merged) - Weekly Progress: patch is updated with review comment. ready for re-review. I will re-review it tomorrow. 2. Specifying az when restore shelved server - Topic: https://review.opendev.org/#/q/topic:bp/support-specifying-az-when-restore-shelved-server+(status:open+OR+status:merged) - Weekly Progress: Review comments is fixed and ready to re-review. 3. Nova API cleanup - Topic: https://review.opendev.org/#/c/666889/ - Weekly Progress: Code is up for review. A lot of files changed but should be ok to review. I have pushed a couple of patches for missing tests of previous microversions. 4. Detach and attach boot volumes: - Topic: https://review.openstack.org/#/q/topic:bp/detach-boot-volume+(status:open+OR+status:merged) - Weekly Progress: No Progress Spec Ready for Review: ----------------------------- 1. Nova API policy improvement - Spec: https://review.openstack.org/#/c/547850/ - PoC: https://review.openstack.org/#/q/topic:bp/policy-default-refresh+(status:open+OR+status:merged) - Weekly Progress: Under review and updates. 2. Support for changing deleted_on_termination after boot -Spec: https://review.openstack.org/#/c/580336/ - Weekly Progress: No update this week. 3. Support delete_on_termination in volume attach api -Spec: https://review.openstack.org/#/c/612949/ - Weekly Progress: No updates this week. 4. Add API ref guideline for body text - ~8 api-ref are left to fix. Previously approved Spec needs to be re-proposed for Train: --------------------------------------------------------------------------- 1. Servers Ips non-unique network names : - https://blueprints.launchpad.net/nova/+spec/servers-ips-non-unique-network-names - https://review.openstack.org/#/q/topic:bp/servers-ips-non-unique-network-names+(status:open+OR+status:merged) 2. Volume multiattach enhancements: - https://blueprints.launchpad.net/nova/+spec/volume-multiattach-enhancements - https://review.openstack.org/#/q/topic:bp/volume-multiattach-enhancements+(status:open+OR+status:merged) Bugs: ==== No progress report in this week too. I will start the bug triage next week. NOTE- There might be some bug which is not tagged as 'api' or 'api-ref', those are not in the above list. Tag such bugs so that we can keep our eyes. -gmann From mriedemos at gmail.com Thu Jun 27 13:52:20 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 27 Jun 2019 08:52:20 -0500 Subject: [nova] Questions on Force refresh instance info_cache during heal In-Reply-To: <1b0ba2d3-512c-ab37-bbc5-34dbb32b0885@gmail.com> References: <5d133da9.1c69fb81.e1b4a.4f14SMTPIN_ADDED_BROKEN@mx.google.com> <1b0ba2d3-512c-ab37-bbc5-34dbb32b0885@gmail.com> Message-ID: <1d3d4f92-96ad-cbcc-cebf-c5578e30ac38@gmail.com> On 6/26/2019 9:21 AM, Matt Riedemann wrote: > If people agree with this (being conservative when we've lost the > cache), then please report a bug and we can make the change (it should > be pretty simple). Here is the bug: https://bugs.launchpad.net/nova/+bug/1834463 -- Thanks, Matt From jim at jimrollenhagen.com Thu Jun 27 14:22:15 2019 From: jim at jimrollenhagen.com (Jim Rollenhagen) Date: Thu, 27 Jun 2019 10:22:15 -0400 Subject: performance issue with opendev gitea interface. In-Reply-To: <5943466bd0d09b6a23f160381944dd9e0c772191.camel@redhat.com> References: <5943466bd0d09b6a23f160381944dd9e0c772191.camel@redhat.com> Message-ID: On Thu, Jun 27, 2019 at 9:49 AM Sean Mooney wrote: > i have started this as a separate thread form the Github organization > management > but it has been in the back of my mind as we are considering not syncing > to gitub > going forward. for larger project like nova > https://opendev.org/openstack/nova > preforms quite poorly > AFAIK we have never discussed dropping syncing for the openstack namespace, it just may be implemented differently. We did drop mirroring for unofficial projects, but they can set it up themselves if they want it. > in both firefox and chome i am seeing ~ 15second before the first response > form > opendev.org for nova. > > os-vif or other smaller projects seem to respond ok but for nova it makes > navigating the > code or liking code to other via opendev quite hard. > > i brought this up on the infra irc a few weeks ago and asked if gitea had > any kind of caching > and while the initial response was "we do not believe so". > > before archiving or stopping syncing to github i was wondering if we could > explore > options to improve the performace. if opendev.org is not currently > fronted by a cdn perhaps > that would help. > > Similarly looking at https://docs.gitea.io/en-us/config-cheat-sheet/ it > may be possible to either > change the cache or database parameters to improve performance. I really > don't know how gitea has > been deployed but at present the web interface is not usable in nova in a > responsive manner > so i have continued to use github when linking code to others but it would > be nice > to be able to use opendev instead. > > regards > sean. > > > // jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremyfreudberg at gmail.com Thu Jun 27 14:29:15 2019 From: jeremyfreudberg at gmail.com (Jeremy Freudberg) Date: Thu, 27 Jun 2019 10:29:15 -0400 Subject: [sahara] Cancelling Sahara meeting July 4 Message-ID: Hi all, There will be no Sahara meeting this upcoming Thursday, July 4. Holler if you need anything. Thanks, Jeremy From openstack at fried.cc Thu Jun 27 14:37:30 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 27 Jun 2019 09:37:30 -0500 Subject: [nova] API updates week 19-26 In-Reply-To: <16b9933cc31.128bc267433478.8660435979934439908@ghanshyammann.com> References: <16b9933cc31.128bc267433478.8660435979934439908@ghanshyammann.com> Message-ID: > 3. Nova API cleanup > - Topic: https://review.opendev.org/#/c/666889/ > - Weekly Progress: Code is up for review. A lot of files changed but should be ok to review. I have pushed a couple of patches for missing tests of previous microversions. I added this to the runway queue [1], assuming it is complete and ready. efried [1] https://etherpad.openstack.org/p/nova-runways-train From mark at stackhpc.com Thu Jun 27 14:44:08 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 27 Jun 2019 15:44:08 +0100 Subject: [kolla-ansible] migration In-Reply-To: References: Message-ID: On Thu, 27 Jun 2019 at 14:46, Ignazio Cassano wrote: > > Sorry, for my question. > It does not need to change anything because endpoints refer to haproxy vips. > So if your new glance works fine you change haproxy backends for glance. > Regards > Ignazio That's correct - only the haproxy backend needs to be updated. > > > Il giorno gio 27 giu 2019 alle ore 15:21 Ignazio Cassano ha scritto: >> >> Hello Mark, >> let me to verify if I understood your method. >> >> You have old controllers,haproxy,mariadb and nova computes. >> You installed three new controllers but kolla.ansible inventory contains old mariadb and old rabbit servers. >> You are deployng single service on new controllers staring with glance. >> When you deploy glance on new controllers, it changes the glance endpoint on old mariadb db ? >> Regards >> Ignazio >> >> Il giorno gio 27 giu 2019 alle ore 10:52 Mark Goddard ha scritto: >>> >>> On Wed, 26 Jun 2019 at 19:34, Ignazio Cassano wrote: >>> > >>> > Hello, >>> > Anyone have tried to migrate an existing openstack installation to kolla containers? >>> >>> Hi, >>> >>> I'm aware of two people currently working on that. Gregory Orange and >>> one of my colleagues, Pierre Riteau. Pierre is away currently, so I >>> hope he doesn't mind me quoting him from an email to Gregory. >>> >>> Mark >>> >>> "I am indeed working on a similar migration using Kolla Ansible with >>> Kayobe, starting from a non-containerised OpenStack deployment based >>> on CentOS RPMs. >>> Existing OpenStack services are deployed across several controller >>> nodes and all sit behind HAProxy, including for internal endpoints. >>> We have additional controller nodes that we use to deploy >>> containerised services. If you don't have the luxury of additional >>> nodes, it will be more difficult as you will need to avoid processes >>> clashing when listening on the same port. >>> >>> The method I am using resembles your second suggestion, however I am >>> deploying only one containerised service at a time, in order to >>> validate each of them independently. >>> I use the --tags option of kolla-ansible to restrict Ansible to >>> specific roles, and when I am happy with the resulting configuration I >>> update HAProxy to point to the new controllers. >>> >>> As long as the configuration matches, this should be completely >>> transparent for purely HTTP-based services like Glance. You need to be >>> more careful with services that include components listening for RPC, >>> such as Nova: if the new nova.conf is incorrect and you've deployed a >>> nova-conductor that uses it, you could get failed instances launches. >>> Some roles depend on others: if you are deploying the >>> neutron-openvswitch-agent, you need to run the openvswitch role as >>> well. >>> >>> I suggest starting with migrating Glance as it doesn't have any >>> internal services and is easy to validate. Note that properly >>> migrating Keystone requires keeping existing Fernet keys around, so >>> any token stays valid until the time it is expected to stop working >>> (which is fairly complex, see >>> https://bugs.launchpad.net/kolla-ansible/+bug/1809469). >>> >>> While initially I was using an approach similar to your first >>> suggestion, it can have side effects since Kolla Ansible uses these >>> variables when templating configuration. As an example, most services >>> will only have notifications enabled if enable_ceilometer is true. >>> >>> I've added existing control plane nodes to the Kolla Ansible inventory >>> as separate groups, which allows me to use the existing database and >>> RabbitMQ for the containerised services. >>> For example, instead of: >>> >>> [mariadb:children] >>> control >>> >>> you may have: >>> >>> [mariadb:children] >>> oldcontrol_db >>> >>> I still have to perform the migration of these underlying services to >>> the new control plane, I will let you know if there is any hurdle. >>> >>> A few random things to note: >>> >>> - if run on existing control plane hosts, the baremetal role removes >>> some packages listed in `redhat_pkg_removals` which can trigger the >>> removal of OpenStack dependencies using them! I've changed this >>> variable to an empty list. >>> - compare your existing deployment with a Kolla Ansible one to check >>> for differences in endpoints, configuration files, database users, >>> service users, etc. For Heat, Kolla uses the domain heat_user_domain, >>> while your existing deployment may use another one (and this is >>> hardcoded in the Kolla Heat image). Kolla Ansible uses the "service" >>> project while a couple of deployments I worked with were using >>> "services". This shouldn't matter, except there was a bug in Kolla >>> which prevented it from setting the roles correctly: >>> https://bugs.launchpad.net/kolla/+bug/1791896 (now fixed in latest >>> Rocky and Queens images) >>> - the ml2_conf.ini generated for Neutron generates physical network >>> names like physnet1, physnet2… you may want to override >>> bridge_mappings completely. >>> - although sometimes it could be easier to change your existing >>> deployment to match Kolla Ansible settings, rather than configure >>> Kolla Ansible to match your deployment." >>> >>> > Thanks >>> > Ignazio >>> > From qianxi416 at foxmail.com Thu Jun 27 01:29:10 2019 From: qianxi416 at foxmail.com (=?utf-8?B?UWlhblhp?=) Date: Thu, 27 Jun 2019 09:29:10 +0800 Subject: Re [nova]Bug #1829696: qemu-kvm process takes 100% CPU usage whenrunning redhat/centos 7.6 guest OS In-Reply-To: References: Message-ID: I tested two newest QEMU versions these days(v3.1.0 and v4.0.0), and sadly, the problem still happens. I tried to find the reason why the qemu process take 100% usage of cpu, and collected some facts about it. I compared the facts with other normal vm's qemu process(who's cpu usage is normal) and didn't find out any interesting result. Please give me some guides to debug this problem if you could, thanks very much. (The full content of facts is in the attachment and I also leave a message in https://bugs.launchpad.net/qemu/+bug/1829696) ------------------ Thanks a lot! Best Regards QianXi ------------------ Original ------------------ From: "QianXi";; Date: Jun 21, 2019 To: "Kashyap Chamarthy"; "1829696"<1829696 at bugs.launchpad.net>; Cc: "openstack-discuss"; Subject: Reply: [nova]Bug #1829696: qemu-kvm process takes 100% CPU usage whenrunning redhat/centos 7.6 guest OS Thanks very much. I will upgrade the virt tools software on the host and check if it still happens. Later, I hope I could update you guys. Thanks again. Best regards QianXi------------------ 原始邮件 ------------------ 发件人: "Kashyap Chamarthy" 发送时间: 2019年6月21日(星期五) 凌晨0:32 收件人: "钱熙"; 抄送: "openstack-discuss"; 主题: Re: [nova]Bug #1829696: qemu-kvm process takes 100% CPU usage whenrunning redhat/centos 7.6 guest OS On Thu, Jun 20, 2019 at 10:06:11PM +0800, 钱熙 wrote: > Hi there, Hi, > > I am struggling with qemu-kvm 100% CPU usage problem. > When running redhat or centos 7.6 guest os on vm, > the cpu usage is very low on vm(almost 100% idle when no tasks run), but on the host, > qemu-kvm reports 100% cpu busy usage. > > > https://bugs.launchpad.net/nova/+bug/1829696 > I opened the bug above, however it did not find some interest. > > > After searching some related bugs report, > I suspect that it is due to the clock settings in vm's domain xml. > My settings are as follows: > > > > > > And details about the bug, please see https://bugs.launchpad.net/nova/+bug/1829696 I don't think it is related to clock settings at all. > It shows that only the version 7.6 of redhat or centos affected by this bug behavior. > In my cluster, it is OK for versions from redhat or centos 6.8 to 7.5. > > > Any clue or suggestion? Please see DanPB's response in comment#2; I agree with it. (Note that I've changed the bug component from 'openstack-nova' --> 'qemu') https://bugs.launchpad.net/qemu/+bug/1829696/comments/2 -- /kashyap -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 100_cpu_usage_problem.txt Type: application/octet-stream Size: 85341 bytes Desc: not available URL: From vondra at homeatcloud.cz Thu Jun 27 09:13:09 2019 From: vondra at homeatcloud.cz (=?UTF-8?Q?Tom=C3=A1=C5=A1_Vondra?=) Date: Thu, 27 Jun 2019 11:13:09 +0200 Subject: Backup solutions In-Reply-To: References: Message-ID: <03b601d52cc8$89ca5e40$9d5f1ac0$@homeatcloud.cz> Hi! We’re offering Disaster Recovery as a Service, and this is my list of possible software vendors who can backup anything and restore to OpenStack. Most can also backup from OpenStack as well. My personal tips are Hystax and Arrikto. We’ve got installations of each of them and can offer a demo. Tomas from HomeatCloud Software Price Win Lin Vmw HpV Open Stack RPO RTO DR to O/S DR to Vmware data transfers disk space custom scripts A A A A A 1d 1h A A * * CloudBerry $ A A N N N 1h 30m N A ** ** Hystax $$ A A A A A 15m 10m A A *** *** Arrikto $$ N N A N A 15m 10m A A **** *** Rackware $$$ A A N N N 1h 5m A A ** *** Trillio $$$ N N N N A 1h 30m A N *** ** Commvault $$$$ A A A A A 1h 5m N A **** **** application HA N/A N/A N/A N/A N/A 2s 5m A A * * From: Alfredo De Luca [mailto:alfredo.deluca at gmail.com] Sent: Tuesday, June 25, 2019 11:50 PM To: Ignazio Cassano Cc: openstack-discuss at lists.openstack.org Subject: Re: Backup solutions Thanks Ignazio. Regards On Tue, Jun 25, 2019 at 7:35 PM Ignazio Cassano wrote: We are using triliovault. It is a commercial solution. Ignazio Il Mar 25 Giu 2019 18:11 Alfredo De Luca ha scritto: Hi all. Just a quick one. Other than freezer openstack project as backup solution, are there any other opensource/commercial project/software/solutions for that? Cheers -- Alfredo -- Alfredo -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Jun 27 15:07:12 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 27 Jun 2019 10:07:12 -0500 Subject: [nova] API updates week 19-26 In-Reply-To: <16b9933cc31.128bc267433478.8660435979934439908@ghanshyammann.com> References: <16b9933cc31.128bc267433478.8660435979934439908@ghanshyammann.com> Message-ID: <243fdc76-7cb8-8149-3659-144f8408f872@gmail.com> On 6/27/2019 8:50 AM, Ghanshyam Mann wrote: > 4. Detach and attach boot volumes: > - Topic:https://review.openstack.org/#/q/topic:bp/detach-boot-volume+(status:open+OR+status:merged) > - Weekly Progress: No Progress The spec for this blueprint was fast-approved for Train since it was previously approved in Stein but the Train spec needs to be amended based on the corner case issues that came up during review in Stein and were discussed at the Train PTG: http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005804.html So someone needs to amend the spec. I think Kevin is busy with other things at the moment so if anyone else is interested in this they could help out by doing the spec amendment so we can at least get that done before the spec freeze deadline next month. > > Spec Ready for Review: > ----------------------------- > > 4. Add API ref guideline for body text > - ~8 api-ref are left to fix. This isn't really a spec, right? Anyway, do you have a list of which API references still need work? Are they APIs that we still care about, i.e. not some nova-network or cells v1 obsolete APIs? -- Thanks, Matt From gagehugo at gmail.com Thu Jun 27 16:42:40 2019 From: gagehugo at gmail.com (Gage Hugo) Date: Thu, 27 Jun 2019 11:42:40 -0500 Subject: [security sig] Weekly Newsletter June 27th & July 4th Meeting Canceled Message-ID: Just a heads up, but since next Thursday is the 4th of July, we will be canceling next week's Security SIG meeting. Have a safe holiday everyone! #Week of: 27 June 2019 - Security SIG Meeting Info: http://eavesdrop.openstack.org/#Security_SIG_meeting - Weekly on Thursday at 1500 UTC in #openstack-meeting - Agenda: https://etherpad.openstack.org/p/security-agenda - https://security.openstack.org/ - https://wiki.openstack.org/wiki/Security-SIG #Meeting Notes - Summary: http://eavesdrop.openstack.org/meetings/security/2019/security.2019-06-27-15.01.html - Slow week, discussed the TC approving the Image Encryption pop-up team & TPM passthrough - Next week's meeting will be canceled due to July 4th 🎆 ## News - The TC approved adding "Image Encryption" as a pop-up team - https://review.opendev.org/#/c/661983/ - https://governance.openstack.org/tc/reference/popup-teams.html#image-encryption - Physical TPM passthrough PoC - https://review.opendev.org/#/c/667926/ - spec - https://review.opendev.org/#/c/667928/ # VMT Reports - A full list of publicly marked security issues can be found here: https://bugs.launchpad.net/ossa/ - No new public security bugs this week -------------- next part -------------- An HTML attachment was scrubbed... URL: From elmiko at redhat.com Thu Jun 27 16:53:13 2019 From: elmiko at redhat.com (Michael McCune) Date: Thu, 27 Jun 2019 12:53:13 -0400 Subject: [sig][api] API-SIG office hours cancelled for 4 July Message-ID: due the holiday in the united states and travel by our other members, the api-sig will not hold office hours next thursday, 4 july. thank you =) peace o/ From gaetan.trellu at incloudus.com Thu Jun 27 20:50:51 2019 From: gaetan.trellu at incloudus.com (gaetan.trellu at incloudus.com) Date: Thu, 27 Jun 2019 16:50:51 -0400 Subject: [masakari] Run masakari-hostmonitor into Docker container Message-ID: <7666a4eae3522bcb14741108bf8a5994@incloudus.com> Hi, I'm integrating Masakari into Kolla and Kolla Ansible projects but I'm facing an issue related to masakari-hostmonitor. Based on masakari-monitors code[1], "systemctl status" command is used to check if pacemaker, pacemaker-remote and corosync are running. Having systemd running into Docker container is not the best solution. Does any of you has been able to run masakari-monitor into Docker container ? Thanks for your help. Gaëtan - [1] https://github.com/openstack/masakari-monitors/blob/26d558333d9731ca06da09b26fe6592c49c0ac8a/masakarimonitors/hostmonitor/host_handler/handle_host.py#L48 From cboylan at sapwetik.org Thu Jun 27 22:40:11 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Thu, 27 Jun 2019 15:40:11 -0700 Subject: performance issue with opendev gitea interface. In-Reply-To: References: <5943466bd0d09b6a23f160381944dd9e0c772191.camel@redhat.com> Message-ID: On Thu, Jun 27, 2019, at 7:22 AM, Jim Rollenhagen wrote: > On Thu, Jun 27, 2019 at 9:49 AM Sean Mooney wrote: > > i have started this as a separate thread form the Github organization management > > but it has been in the back of my mind as we are considering not syncing to gitub > > going forward. for larger project like nova https://opendev.org/openstack/nova > > preforms quite poorly > > AFAIK we have never discussed dropping syncing for the openstack > namespace, > it just may be implemented differently. We did drop mirroring for > unofficial projects, > but they can set it up themselves if they want it. Correct. We've transitioned to a world where everything isn't in the openstack/ namespace just to make github mirroring easy. Instead we've built flexible git mirror tooling that should allow you to mirror git repos to arbitrary locations on the Internet including GitHub and as far as I know OpenStack intends to keep mirroring to GitHub. > > > > > in both firefox and chome i am seeing ~ 15second before the first response form > > opendev.org for nova. > > > > os-vif or other smaller projects seem to respond ok but for nova it makes navigating the > > code or liking code to other via opendev quite hard. > > > > i brought this up on the infra irc a few weeks ago and asked if gitea had any kind of caching > > and while the initial response was "we do not believe so". Reading the doc you link below I believe we are using the default cache option of "memory". > > > > before archiving or stopping syncing to github i was wondering if we could explore > > options to improve the performace. if opendev.org is not currently fronted by a cdn perhaps > > that would help. > > > > Similarly looking at https://docs.gitea.io/en-us/config-cheat-sheet/ it may be possible to either > > change the cache or database parameters to improve performance. I really don't know how gitea has > > been deployed but at present the web interface is not usable in nova in a responsive manner > > so i have continued to use github when linking code to others but it would be nice > > to be able to use opendev instead. We should definitely do what we can to improve the performance of these larger repos. Gitea is a very receptive upstream so if we can identify the issue and/or fix it I'm sure they would be happy to help with that. The way we have deployed Gitea is 8 backend nodes behind an haproxy. Currently Gitea does not operate in a shared state manner so each backend operates independently with Gerrit replicating to each of them. Due to the lack of shared state here the haproxy load balancer balances you to a specific backend based on your source address (without this we observed git clients being unhappy on subsequent requests if objects weren't packed identically). The haproxy and gitea deployments are all done via ansible driving docker(-compose). The ansible roles are here [0][1], but probably the most interesting bit is the docker-compose [2] as you should be able to take that and run docker-compose locally to deploy a local gitea install for debugging. It might also be useful to know that the gitea backends can be addressed individually via https://gitea0X.opendev.org:3000/ replacing the X with values 1-8 (inclusive). Its possible that some backends perform better than others? Cacti data [3] may also be useful here. I notice that it doesn't show significant memory use by the gitea hosts [4]. This may mean we aren't caching in memory aggressively enough or gitea just doesn't cache what we need it to cache. I expect we'll eventually get to digging into this ourselves, but help is much appreciated (other items like replacing gitea host with corrupted disk have been bigger priorities). I do wonder if our replication of refs/changes and refs/notes has impacted gitea in a bad way. I don't have any data to support that yet other than it seemed gitea was quicker with our big repos in the past and that is the only major change we've made to gitea. We have upgraded gitea a few times so it may also just be a regression in the service. [0] https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/haproxy [1] https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gitea [2] https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gitea/templates/docker-compose.yaml.j2 [3] http://cacti.openstack.org [4] http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66633&rra_id=all Hope this helps, Clark From corvus at inaugust.com Thu Jun 27 23:00:08 2019 From: corvus at inaugust.com (James E. Blair) Date: Thu, 27 Jun 2019 16:00:08 -0700 Subject: [tc] Assuming control of GitHub organizations In-Reply-To: (Thierry Carrez's message of "Thu, 27 Jun 2019 10:55:25 +0200") References: Message-ID: <87mui2ws5j.fsf@meyer.lemoncheese.net> Thierry Carrez writes: > I'd do a limited number of personal accounts, all with 2FA. One thing I would encourage folks to consider is that GitHub makes it remarkably easy to do something "administrative" accidentally. Any of these accounts can easily accidentally push a commit, tag, etc., to the mirrored repos. It's not going to be destructive to the project in the long term, since it's merely a mirror of the authoritative code in Gerrit, but if we think it's important to protect the accounts with 2FA to reduce the chance of a malicious actor pushing a commit to a widely-used mirror, then we should similarly consider preventing an accidental push from a good actor. This is the principal reason that the Infra team developed its secondary-or-shared account policy. Especially if the folks who manage this are also folks who work on these repos, we're one "git push" away from having egg on our collective face. If the folks managing the GitHub presence are also developers, I would encourage the use of a shared or secondary account. -Jim From corvus at inaugust.com Thu Jun 27 23:12:19 2019 From: corvus at inaugust.com (James E. Blair) Date: Thu, 27 Jun 2019 16:12:19 -0700 Subject: performance issue with opendev gitea interface. In-Reply-To: (Clark Boylan's message of "Thu, 27 Jun 2019 15:40:11 -0700") References: <5943466bd0d09b6a23f160381944dd9e0c772191.camel@redhat.com> Message-ID: <87imsqwrl8.fsf@meyer.lemoncheese.net> "Clark Boylan" writes: > I do wonder if our replication of refs/changes and refs/notes has > impacted gitea in a bad way. I don't have any data to support that yet > other than it seemed gitea was quicker with our big repos in the past > and that is the only major change we've made to gitea. We have > upgraded gitea a few times so it may also just be a regression in the > service. It certainly did when we made the initial replication, but then we repacked the repos to get packed-refs, which dramatically improved things. However, my gut still says that the number of refs may still be having an impact (even though we no longer need to open a file for each ref). But it could also be the number of commits in the repo, or how deep into the history Gitea has to go to display the latest commit for each file. As Clark suggested, this is something that someone can test locally fairly easily just by running gitea with docker-compose and mirroring the nova repo into it (without refs, and then with packed-refs). Gitea is the vanguard of our updated config management. For all the folks who have been hesitant to help out with the infrastructure effort because it was a complex web of puppet which was impossible to reproduce locally -- this is the opposite. It is extremely reproducible, containerized, automated, and tested end-to-end. Also, really rather fun. Please pitch in if you can. :) -Jim From missile0407 at gmail.com Fri Jun 28 00:32:38 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Fri, 28 Jun 2019 08:32:38 +0800 Subject: [kolla] How can I change package version when building the image? Message-ID: Hi, I'm using stable/rocky from git now, but I want to build the prometheus images with latest version since there're few exporters that too previous that can't use most Grafana dashboards in default. I already know I have to edit templates in kolla-ansible since the newer exporters has using the different launch commands. But Idk how to set the version because the document didn't say how to override the Dockerfile. Is that OK to change package version inside Dockerfile? Many thanks, Eddie. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jasonanderson at uchicago.edu Fri Jun 28 00:46:20 2019 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Fri, 28 Jun 2019 00:46:20 +0000 Subject: [kolla] How can I change package version when building the image? References: Message-ID: Hi Eddie, On 6/27/19 7:32 PM, Eddie Yen wrote: Hi, I'm using stable/rocky from git now, but I want to build the prometheus images with latest version since there're few exporters that too previous that can't use most Grafana dashboards in default. I already know I have to edit templates in kolla-ansible since the newer exporters has using the different launch commands. But Idk how to set the version because the document didn't say how to override the Dockerfile. Is that OK to change package version inside Dockerfile? You can set the "prometheus_tag" variable to point to a different tag. It defaults to the value of "openstack_release", which in your case is probably "rocky". You can set it to "stein" to pull in Stein images for Prometheus, without affecting the versions of other components. If you need to custom-build your container images, rather than using the ones pre-built and hosted on Docker Hub, then a different approach might be necessary. Many thanks, Eddie. Cheers, /Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: From missile0407 at gmail.com Fri Jun 28 01:20:13 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Fri, 28 Jun 2019 09:20:13 +0800 Subject: [kolla] How can I change package version when building the image? In-Reply-To: References: Message-ID: Hi Jason, thanks for your quick reply. I summarized my understanding, correct me if wrong. Only I have to do is change the prometheus task files to use stein version's images inside kolla-ansible, then edit the launch command inside templates to stein release. Jason Anderson 於 2019年6月28日 週五 上午8:53寫道: > Hi Eddie, > > On 6/27/19 7:32 PM, Eddie Yen wrote: > > Hi, > > I'm using stable/rocky from git now, but I want to build the prometheus > images with latest version since there're few exporters that too previous > that can't use most Grafana dashboards in default. > > I already know I have to edit templates in kolla-ansible since the newer > exporters has using the different launch commands. But Idk how to set the > version because the document didn't say how to override the Dockerfile. Is > that OK to change package version inside Dockerfile? > > You can set the "prometheus_tag" variable to point to a different tag. It > defaults to the value of "openstack_release", which in your case is > probably "rocky". You can set it to "stein" to pull in Stein images for > Prometheus, without affecting the versions of other components. > > If you need to custom-build your container images, rather than using the > ones pre-built and hosted on Docker Hub, then a different approach might be > necessary. > > > Many thanks, > Eddie. > > Cheers, > /Jason > -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Fri Jun 28 07:50:12 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Fri, 28 Jun 2019 09:50:12 +0200 Subject: [cinder] Deprecating driver versions In-Reply-To: References: Message-ID: <20190628075012.ndwk52gabg2akqvx@localhost> On 27/06, Erlon Cruz wrote: > Hey folks, > > Driver versions has being a source of a lot of confusions with costumers. > Most of our drivers > have a version number and history that are updated as the developers adds > new fixes and > features. Drivers also have a VERSION variable in the version class that > should be bumped by > developers. The problem with that is: > > - sometimes folks from the community just push patches on drivers, and > its hard to bump > every vendor version correctly; > - that relies in the human factor to remember adding it, and usually > that fails; > - if we create a bugfix and bump the version, the backport to older > branches will carry the > version, which will not reflect the correct driver code; > > So, the solution I'm proposing for this is that we use the Cinder > versions[1] and remove all > version strings for drivers. Every new release we get a version. For stable > versions, from time to > time the PTL bumps the stable version and we have an accurate ways to > describe the code. > If we need to backport and send something to the costumer, we can do the > backport, poke > the PTL, and he will generate another version which can be downloaded on > github or via PIP, > and present the version to our costumers. > > So, what are your thought around this? Anyone else has had problems with > that? What would > be the implications of removing the driver version strings? > > Erlon > Hi Erlon, I am personally against removing the drivers versions, as I find them convenient and think they are good practice. A possible solution for the driver versioning is for a driver to designate a minor version per OpenStack release and use the patch version to track changes. This way one can always backport a patch and will just need to increase the patch version in the backport patch. Maybe we can have this formally described in our devref. We tell driver developers they can do whatever they want with the versioning in master, but backports must not backport the version as it is and instead increase the patch version. What do you think? If I remember correctly there are some drivers that only increase the version once per release. Cheers, Gorka. > [1] https://releases.openstack.org/teams/cinder.html > [2] > https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/solidfire.py#L237 From mark at stackhpc.com Fri Jun 28 08:10:42 2019 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 28 Jun 2019 09:10:42 +0100 Subject: [masakari] Run masakari-hostmonitor into Docker container In-Reply-To: <7666a4eae3522bcb14741108bf8a5994@incloudus.com> References: <7666a4eae3522bcb14741108bf8a5994@incloudus.com> Message-ID: On Thu, 27 Jun 2019 at 21:52, wrote: > > Hi, > > I'm integrating Masakari into Kolla and Kolla Ansible projects but I'm > facing > an issue related to masakari-hostmonitor. > > Based on masakari-monitors code[1], "systemctl status" command is used > to check > if pacemaker, pacemaker-remote and corosync are running. > > Having systemd running into Docker container is not the best solution. > Does any > of you has been able to run masakari-monitor into Docker container ? > I would not recommend running the systemd daemon in a container, but you could potentially use the client to access a daemon running on the host. E.g., for debian: https://stackoverflow.com/questions/54079586/make-systemctl-work-from-inside-a-container-in-a-debian-stretch-image. No doubt there will be various gotchas with this. Are you planning to run pacemaker and corosync on the host? Mark > Thanks for your help. > > Gaëtan > > - [1] > https://github.com/openstack/masakari-monitors/blob/26d558333d9731ca06da09b26fe6592c49c0ac8a/masakarimonitors/hostmonitor/host_handler/handle_host.py#L48 > From mark at stackhpc.com Fri Jun 28 08:13:10 2019 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 28 Jun 2019 09:13:10 +0100 Subject: [kolla] How can I change package version when building the image? In-Reply-To: References: Message-ID: On Fri, 28 Jun 2019 at 02:21, Eddie Yen wrote: > > Hi Jason, thanks for your quick reply. > > I summarized my understanding, correct me if wrong. > Only I have to do is change the prometheus task files to use stein version's images inside kolla-ansible, then edit the launch command inside templates to stein release. You should use globals.yml to set any variables you wish to override. In your case: prometheus_tag: stein > > Jason Anderson 於 2019年6月28日 週五 上午8:53寫道: >> >> Hi Eddie, >> >> On 6/27/19 7:32 PM, Eddie Yen wrote: >> >> Hi, >> >> I'm using stable/rocky from git now, but I want to build the prometheus images with latest version since there're few exporters that too previous that can't use most Grafana dashboards in default. >> >> I already know I have to edit templates in kolla-ansible since the newer exporters has using the different launch commands. But Idk how to set the version because the document didn't say how to override the Dockerfile. Is that OK to change package version inside Dockerfile? >> >> You can set the "prometheus_tag" variable to point to a different tag. It defaults to the value of "openstack_release", which in your case is probably "rocky". You can set it to "stein" to pull in Stein images for Prometheus, without affecting the versions of other components. >> >> If you need to custom-build your container images, rather than using the ones pre-built and hosted on Docker Hub, then a different approach might be necessary. >> >> >> Many thanks, >> Eddie. >> >> Cheers, >> /Jason From thierry at openstack.org Fri Jun 28 08:43:27 2019 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 28 Jun 2019 10:43:27 +0200 Subject: [tc] Assuming control of GitHub organizations In-Reply-To: <87mui2ws5j.fsf@meyer.lemoncheese.net> References: <87mui2ws5j.fsf@meyer.lemoncheese.net> Message-ID: James E. Blair wrote: > Thierry Carrez writes: > >> I'd do a limited number of personal accounts, all with 2FA. > > One thing I would encourage folks to consider is that GitHub makes it > remarkably easy to do something "administrative" accidentally. Any of > these accounts can easily accidentally push a commit, tag, etc., to the > mirrored repos. It's not going to be destructive to the project in the > long term, since it's merely a mirror of the authoritative code in > Gerrit, but if we think it's important to protect the accounts with 2FA > to reduce the chance of a malicious actor pushing a commit to a > widely-used mirror, then we should similarly consider preventing an > accidental push from a good actor. This is the principal reason that > the Infra team developed its secondary-or-shared account policy. > > Especially if the folks who manage this are also folks who work on these > repos, we're one "git push" away from having egg on our collective face. > > If the folks managing the GitHub presence are also developers, I would > encourage the use of a shared or secondary account. That is a fair point that I had not considered. That said, wouldn't the risk be relatively limited if the "admins" never checkout or clone from GitHub itself ? -- Thierry Carrez (ttx) From ssbarnea at redhat.com Fri Jun 28 09:33:29 2019 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Fri, 28 Jun 2019 10:33:29 +0100 Subject: [tc] Assuming control of GitHub organizations In-Reply-To: References: <87mui2ws5j.fsf@meyer.lemoncheese.net> Message-ID: <361EFD73-833A-44DA-A222-B8BE2A49C024@redhat.com> I think I could write a script that loops over all repos and activates branch restrictions to allow only our sync-bot to push. Running this daily should avoid the case where a new repo is added and someone forgets to add the restriction. In the future we can use the same bot for other maintenance tasks. * in python, obviously. > On 28 Jun 2019, at 09:43, Thierry Carrez wrote: > > James E. Blair wrote: >> Thierry Carrez writes: >>> I'd do a limited number of personal accounts, all with 2FA. >> One thing I would encourage folks to consider is that GitHub makes it >> remarkably easy to do something "administrative" accidentally. Any of >> these accounts can easily accidentally push a commit, tag, etc., to the >> mirrored repos. It's not going to be destructive to the project in the >> long term, since it's merely a mirror of the authoritative code in >> Gerrit, but if we think it's important to protect the accounts with 2FA >> to reduce the chance of a malicious actor pushing a commit to a >> widely-used mirror, then we should similarly consider preventing an >> accidental push from a good actor. This is the principal reason that >> the Infra team developed its secondary-or-shared account policy. >> Especially if the folks who manage this are also folks who work on these >> repos, we're one "git push" away from having egg on our collective face. >> If the folks managing the GitHub presence are also developers, I would >> encourage the use of a shared or secondary account. > > That is a fair point that I had not considered. > > That said, wouldn't the risk be relatively limited if the "admins" never checkout or clone from GitHub itself ? > > -- > Thierry Carrez (ttx) -------------- next part -------------- An HTML attachment was scrubbed... URL: From chkumar246 at gmail.com Fri Jun 28 11:00:17 2019 From: chkumar246 at gmail.com (Chandan kumar) Date: Fri, 28 Jun 2019 16:30:17 +0530 Subject: [tripleo][openstack-ansible] Integrating ansible-role-collect-logs in OSA Message-ID: Hello, With os_tempest project, TripleO and Openstack Ansible projects started collaborating together to reuse the tools developed by each other to avoid duplicates and enable more collaboration. During Denver Train 2019 PTG, we decided to unifying the CI logs for both the projects by providing a unified experience to the developers while browsing the CI logs making sure we have a similar logs structure for both the projects, so that one can easily navigate through the logs without scratching the heads and also the logs tree structure should say what logs are present where. In TripleO, we have ansible-role-collect-logs[1.] role for the same and in OSA we have logs_collect.sh[2.] script for the same. But once the logs gets collected at each other projects, It is very hard to navigate and find out where is the respective files. A little about ansible-role-collect-logs roles: * A role for aggregating logs from different nodes * Provide a list for files to collect at one places in playbook [3] * Hyperlink the logs with description about log files * Once logs collection is done, it pushes to a particular log server For example, tempest.html or stestr_results.html is the common file for viewing tempest results. In TripleO, we keep it under logs folder but in OSA logs/openstack/aio1-utility/stestr_results.html. If a new user contributes to another project, he tries to follow the same pattern for find logs as he seen in current project. By Keeping the same structure at all places It would be easier. So moving ahead what we are going to do: * Refactor collect-logs role to pass defaults list of files at one place * Pass the list of different logs files based on deployment tools * Put system/containers related commands at one place * Replace the collect_logs.sh script with playbook in OSA and replace it. Thanks for reading, We are looking for the comments on the above suggestion. Links: [1.] https://opendev.org/openstack/ansible-role-collect-logs/ [2.] https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/log-collect.sh [3.] https://opendev.org/openstack/tripleo-ci/src/branch/master/toci-quickstart/config/collect-logs.yml#L9 Thanks, Chandan Kumar From missile0407 at gmail.com Fri Jun 28 11:14:48 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Fri, 28 Jun 2019 19:14:48 +0800 Subject: [kolla] How can I change package version when building the image? In-Reply-To: References: Message-ID: Got it. I'll try this method. Mark Goddard 於 2019年6月28日 週五 下午4:13寫道: > On Fri, 28 Jun 2019 at 02:21, Eddie Yen wrote: > > > > Hi Jason, thanks for your quick reply. > > > > I summarized my understanding, correct me if wrong. > > Only I have to do is change the prometheus task files to use stein > version's images inside kolla-ansible, then edit the launch command inside > templates to stein release. > > You should use globals.yml to set any variables you wish to override. > In your case: > > prometheus_tag: stein > > > > > Jason Anderson 於 2019年6月28日 週五 上午8:53寫道: > >> > >> Hi Eddie, > >> > >> On 6/27/19 7:32 PM, Eddie Yen wrote: > >> > >> Hi, > >> > >> I'm using stable/rocky from git now, but I want to build the prometheus > images with latest version since there're few exporters that too previous > that can't use most Grafana dashboards in default. > >> > >> I already know I have to edit templates in kolla-ansible since the > newer exporters has using the different launch commands. But Idk how to set > the version because the document didn't say how to override the Dockerfile. > Is that OK to change package version inside Dockerfile? > >> > >> You can set the "prometheus_tag" variable to point to a different tag. > It defaults to the value of "openstack_release", which in your case is > probably "rocky". You can set it to "stein" to pull in Stein images for > Prometheus, without affecting the versions of other components. > >> > >> If you need to custom-build your container images, rather than using > the ones pre-built and hosted on Docker Hub, then a different approach > might be necessary. > >> > >> > >> Many thanks, > >> Eddie. > >> > >> Cheers, > >> /Jason > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Fri Jun 28 11:50:02 2019 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 28 Jun 2019 12:50:02 +0100 Subject: [kolla] Proposing yoctozepto as core Message-ID: Hi, I would like to propose adding Radosław Piliszek (yoctozepto) to kolla-core and kolla-ansible-core. While he has only recently started working upstream in the project, I feel that he has made some valuable contributions already, particularly around improving and maintaining CI. His reviews generally provide useful feedback, sometimes in advance of Zuul! Core team - please vote, and consider this my +1. I will keep this vote open for a week or until all cores have responded. Cheers, Mark From marcin.juszkiewicz at linaro.org Fri Jun 28 11:54:14 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Fri, 28 Jun 2019 13:54:14 +0200 Subject: [kolla] Proposing yoctozepto as core In-Reply-To: References: Message-ID: <950bf345-da0f-dabc-31f1-fcb1711e36df@linaro.org> On 28.06.2019 13:50, Mark Goddard wrote: > Hi, > > I would like to propose adding Radosław Piliszek (yoctozepto) to > kolla-core and kolla-ansible-core. While he has only recently started > working upstream in the project, I feel that he has made some valuable > contributions already, particularly around improving and maintaining > CI. His reviews generally provide useful feedback, sometimes in > advance of Zuul! > > Core team - please vote, and consider this my +1. I will keep this > vote open for a week or until all cores have responded. +2 From mnasiadka at gmail.com Fri Jun 28 12:02:22 2019 From: mnasiadka at gmail.com (=?UTF-8?Q?Micha=C5=82_Nasiadka?=) Date: Fri, 28 Jun 2019 14:02:22 +0200 Subject: [kolla] Proposing yoctozepto as core In-Reply-To: References: Message-ID: I’m all in - consider my +1 vote. W dniu pt., 28.06.2019 o 13:57 Mark Goddard napisał(a): > Hi, > > I would like to propose adding Radosław Piliszek (yoctozepto) to > kolla-core and kolla-ansible-core. While he has only recently started > working upstream in the project, I feel that he has made some valuable > contributions already, particularly around improving and maintaining > CI. His reviews generally provide useful feedback, sometimes in > advance of Zuul! > > Core team - please vote, and consider this my +1. I will keep this > vote open for a week or until all cores have responded. > > Cheers, > Mark > > -- Michał Nasiadka mnasiadka at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaetan.trellu at incloudus.com Fri Jun 28 12:08:16 2019 From: gaetan.trellu at incloudus.com (=?ISO-8859-1?Q?Ga=EBtan_Trellu?=) Date: Fri, 28 Jun 2019 08:08:16 -0400 Subject: [kolla] Proposing yoctozepto as core In-Reply-To: <950bf345-da0f-dabc-31f1-fcb1711e36df@linaro.org> Message-ID: An HTML attachment was scrubbed... URL: From sombrafam at gmail.com Fri Jun 28 12:09:04 2019 From: sombrafam at gmail.com (Erlon Cruz) Date: Fri, 28 Jun 2019 09:09:04 -0300 Subject: [cinder] Deprecating driver versions In-Reply-To: <20190628075012.ndwk52gabg2akqvx@localhost> References: <20190628075012.ndwk52gabg2akqvx@localhost> Message-ID: Hi Gorka, Em sex, 28 de jun de 2019 às 04:50, Gorka Eguileor escreveu: > On 27/06, Erlon Cruz wrote: > > Hey folks, > > > > Driver versions has being a source of a lot of confusions with costumers. > > Most of our drivers > > have a version number and history that are updated as the developers adds > > new fixes and > > features. Drivers also have a VERSION variable in the version class that > > should be bumped by > > developers. The problem with that is: > > > > - sometimes folks from the community just push patches on drivers, and > > its hard to bump > > every vendor version correctly; > > - that relies in the human factor to remember adding it, and usually > > that fails; > > - if we create a bugfix and bump the version, the backport to older > > branches will carry the > > version, which will not reflect the correct driver code; > > > > So, the solution I'm proposing for this is that we use the Cinder > > versions[1] and remove all > > version strings for drivers. Every new release we get a version. For > stable > > versions, from time to > > time the PTL bumps the stable version and we have an accurate ways to > > describe the code. > > If we need to backport and send something to the costumer, we can do the > > backport, poke > > the PTL, and he will generate another version which can be downloaded on > > github or via PIP, > > and present the version to our costumers. > > > > So, what are your thought around this? Anyone else has had problems with > > that? What would > > be the implications of removing the driver version strings? > > > > Erlon > > > > Hi Erlon, > > I am personally against removing the drivers versions, as I find them > convenient and think they are good practice. > How do you usually see people using that? And what makes that convenient for you? I see that they would be a good practice if the were properly updated and reflected the code status. For example the rbd.py driver. Has used the same version (1.2.0) since Ocata[1]. I can tell that is the same for most of our drivers. > A possible solution for the driver versioning is for a driver to > designate a minor version per OpenStack release and use the patch > version to track changes. This way one can always backport a patch and > will just need to increase the patch version in the backport patch. > > Maybe we can have this formally described in our devref. We tell > driver developers they can do whatever they want with the versioning in > master, but backports must not backport the version as it is and instead > increase the patch version. > We would again have to rely on developers doing the right thing, and things will be the same as they are today. The point here is to have a reliable way to version the code . > What do you think? > One thing we could do to still is to link the drivers version to a function that get the release version. Something like: MyDriver(){ VERSION = utils.get_current_version() ... } But we could also do a fancy logic that would get the vendor proposed version and bump it automatically. MyDriver(){ VERSION = '1.2.0' VERSION = utils.bump_version() ... } Where bump_version() would always use the current openstack version to know what version the driver should be. > If I remember correctly there are some drivers that only increase the > version once per release. > > Cheers, > Gorka. > > > [1] https://releases.openstack.org/teams/cinder.html > > [2] > > > https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/solidfire.py#L237 [1] https://github.com/openstack/cinder/blob/stable/ocata/cinder/volume/drivers/rbd.py -------------- next part -------------- An HTML attachment was scrubbed... URL: From corvus at inaugust.com Fri Jun 28 14:49:10 2019 From: corvus at inaugust.com (James E. Blair) Date: Fri, 28 Jun 2019 07:49:10 -0700 Subject: [tc] Assuming control of GitHub organizations In-Reply-To: (Thierry Carrez's message of "Fri, 28 Jun 2019 10:43:27 +0200") References: <87mui2ws5j.fsf@meyer.lemoncheese.net> Message-ID: <87v9wpvk7t.fsf@meyer.lemoncheese.net> Thierry Carrez writes: > James E. Blair wrote: >> Especially if the folks who manage this are also folks who work on these >> repos, we're one "git push" away from having egg on our collective face. >> >> If the folks managing the GitHub presence are also developers, I would >> encourage the use of a shared or secondary account. > > That is a fair point that I had not considered. > > That said, wouldn't the risk be relatively limited if the "admins" > never checkout or clone from GitHub itself ? Yes, the biggest risk is if one of the admins is a regular user of GitHub. If they don't have their own GitHub-forks of the OpenStack repos, and they only ever clone their local copies from OpenDev (or, they are not developers at all), then I think the risk of accidents on a personal account is fairly low. -Jim From pabelanger at redhat.com Fri Jun 28 15:24:48 2019 From: pabelanger at redhat.com (Paul Belanger) Date: Fri, 28 Jun 2019 11:24:48 -0400 Subject: [infra] Stepping down as infra-root Message-ID: <20190628152448.GA5098@localhost.localdomain> Greetings! 10 months ago my role at Red Hat changed, which saw my day to day job move away from OpenStack, specifically the Infrastructure team. Since then, I've be contributing to OpenStack in my spare time however it isn't going as well I as would like. Unfortunately, there is never enough time in the day to do everything and as a result I believe it is best for me to step day, and remove myself as infra-root: https://review.opendev.org/668192 In the 7 years I've been contributing to the infra team, I've learned a great many things (and still learning). Thank you to all the awesome humans, past and present, for making my day to day much better. To the community, I do ask if you are at all interested in how the infrastructure works / operates, reach out to the #openstack-infra channel and get involved. I refer you to the help wanted list: https://governance.openstack.org/tc/reference/help-most-needed.html#community-infrastructure-sysadmins Paul From kevin at cloudnull.com Fri Jun 28 15:46:53 2019 From: kevin at cloudnull.com (Carter, Kevin) Date: Fri, 28 Jun 2019 10:46:53 -0500 Subject: [infra] Stepping down as infra-root In-Reply-To: <20190628152448.GA5098@localhost.localdomain> References: <20190628152448.GA5098@localhost.localdomain> Message-ID: Sad to see you stepping away. I've really enjoyed working with you in the infra community and I hope to continue to work with you elsewhere, maybe just in a different capacity. Thank you for everything you've done. You've been an amazing community member and colleague. -- Kevin Carter IRC: Cloudnull On Fri, Jun 28, 2019 at 10:30 AM Paul Belanger wrote: > Greetings! > > 10 months ago my role at Red Hat changed, which saw my day to day job > move away from OpenStack, specifically the Infrastructure team. Since > then, I've be contributing to OpenStack in my spare time however it > isn't going as well I as would like. Unfortunately, there is never > enough time in the day to do everything and as a result I believe it is > best for me to step day, and remove myself as infra-root: > > https://review.opendev.org/668192 > > In the 7 years I've been contributing to the infra team, I've learned a > great many things (and still learning). Thank you to all the awesome > humans, past and present, for making my day to day much better. > > To the community, I do ask if you are at all interested in how the > infrastructure works / operates, reach out to the #openstack-infra > channel and get involved. I refer you to the help wanted list: > > > https://governance.openstack.org/tc/reference/help-most-needed.html#community-infrastructure-sysadmins > > Paul > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Fri Jun 28 16:14:48 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 28 Jun 2019 11:14:48 -0500 Subject: [nova] Last chance to weigh in on nova adopting the SDK for inter-service comms Message-ID: <01363cad-2e2b-f691-c4de-75ae5bac0d25@gmail.com> John and I are +2 and Eric is +1 (would be +2 if not involved in the effort) on the nova spec [1] to adopt the openstacksdk for nova communicating with other services rather than the python-*clients. There are several motivations within the spec itself and in review comments for doing this, but since it's a major change to nova across the board I want to make sure other cores and contributors are aware and at least not against this before it's approved. I think a reasonable timeline for waiting to approve is until EOD July 2 (next week) since that's our third and final spec review sprint. [1] https://review.opendev.org/#/c/662881/ -- Thanks, Matt From berendt at betacloud-solutions.de Fri Jun 28 16:23:33 2019 From: berendt at betacloud-solutions.de (Christian Berendt) Date: Fri, 28 Jun 2019 18:23:33 +0200 Subject: [kolla][kayobe] vote: kayobe as a kolla deliverable In-Reply-To: References: Message-ID: <38B05C4E-D3AB-4862-9EB0-B49B36A568F6@betacloud-solutions.de> Hello Mark. > On 20. Jun 2019, at 15:40, Mark Goddard wrote: > > Once you have made a decision, please respond with your answer to the > following question: > > "Should kayobe become a deliverable of the kolla project?" (yes/no) That would be a step in the right direction. For that. Yes. Christian. -- Christian Berendt Chief Executive Officer (CEO) Mail: berendt at betacloud-solutions.de Web: https://www.betacloud-solutions.de Betacloud Solutions GmbH Teckstrasse 62 / 70190 Stuttgart / Deutschland Geschäftsführer: Christian Berendt Unternehmenssitz: Stuttgart Amtsgericht: Stuttgart, HRB 756139 From miguel at mlavalle.com Fri Jun 28 17:03:46 2019 From: miguel at mlavalle.com (Miguel Lavalle) Date: Fri, 28 Jun 2019 12:03:46 -0500 Subject: [openstack-dev] [neutron] Cancelling Neutron Drivers meeting Message-ID: Dear Neutrinos, Next week, the 4th of July Holiday (Independence Day) will be observed in the USA. Since several members of the drivers team live in that country and will enjoy a long weekend, we are cancelling the meeting on the 5th. We will resume our meeting normally on the 12th. Best regards Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Jun 28 18:59:31 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 29 Jun 2019 03:59:31 +0900 Subject: [nova] API updates week 19-26 In-Reply-To: <243fdc76-7cb8-8149-3659-144f8408f872@gmail.com> References: <16b9933cc31.128bc267433478.8660435979934439908@ghanshyammann.com> <243fdc76-7cb8-8149-3659-144f8408f872@gmail.com> Message-ID: <16b9f74f7d9.11c729d3169910.4563602757513944079@ghanshyammann.com> ---- On Fri, 28 Jun 2019 00:07:12 +0900 Matt Riedemann wrote ---- > On 6/27/2019 8:50 AM, Ghanshyam Mann wrote: > > 4. Detach and attach boot volumes: > > - Topic:https://review.openstack.org/#/q/topic:bp/detach-boot-volume+(status:open+OR+status:merged) > > - Weekly Progress: No Progress > > The spec for this blueprint was fast-approved for Train since it was > previously approved in Stein but the Train spec needs to be amended > based on the corner case issues that came up during review in Stein and > were discussed at the Train PTG: > > http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005804.html > > So someone needs to amend the spec. I think Kevin is busy with other > things at the moment so if anyone else is interested in this they could > help out by doing the spec amendment so we can at least get that done > before the spec freeze deadline next month. > > > > > Spec Ready for Review: > > ----------------------------- > > > > 4. Add API ref guideline for body text > > - ~8 api-ref are left to fix. > > This isn't really a spec, right? Anyway, do you have a list of which API > references still need work? Are they APIs that we still care about, i.e. > not some nova-network or cells v1 obsolete APIs? Yeah. Only two we need to fix, others are deprecated one. Let me remove the todo part from them and add NOTE about "Do not update the deprecated one" ./api-ref/source/servers.inc ./api-ref/source/servers-actions.inc ./api-ref/source/os-tenant-network.inc ./api-ref/source/os-floating-ip-dns.inc ./api-ref/source/os-networks.inc ./api-ref/source/os-security-group-default-rules.inc ./api-ref/source/os-security-groups.inc -gmann > > -- > > Thanks, > > Matt > > From colleen at gazlene.net Fri Jun 28 19:56:57 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Fri, 28 Jun 2019 12:56:57 -0700 Subject: [keystone] Keystone Team Update - Week of 24 June 2019 Message-ID: # Keystone Team Update - Week of 24 June 2019 ## News ### Virtual Midcycle Planning We've begun planning our midcycle[1], which we will hold virtually sometime around or after milestone 2. Please participate in suggesting topics and the scheduling poll. [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007344.html ## Open Specs Train specs: https://bit.ly/2uZ2tRl Ongoing specs: https://bit.ly/2OyDLTh ## Recently Merged Changes Search query: https://bit.ly/2pquOwT We merged 13 changes this week, including some advances in the oslo.limit and application credential access rules implementations. ## Changes that need Attention Search query: https://bit.ly/2tymTje There are 45 changes that are passing CI, not in merge conflict, have no negative reviews and aren't proposed by bots. ## Bugs This week we opened 3 new bugs and closed none. Bugs opened (3) Bug #1834294 (keystone:Undecided) opened by Eivinas https://bugs.launchpad.net/keystone/+bug/1834294 Bug #1834304 (keystone:Undecided) opened by Rohit Londhe https://bugs.launchpad.net/keystone/+bug/1834304 Bug #1834342 (keystone:Undecided) opened by Benoît Knecht https://bugs.launchpad.net/keystone/+bug/1834342 ## Milestone Outlook https://releases.openstack.org/train/schedule.html Spec freeze is in 4 weeks. Our midcycle will be held some time around or just after that, and feature proposal freeze closely follows. ## Help with this newsletter Help contribute to this newsletter by editing the etherpad: https://etherpad.openstack.org/p/keystone-team-newsletter From Arkady.Kanevsky at dell.com Fri Jun 28 19:57:47 2019 From: Arkady.Kanevsky at dell.com (Arkady.Kanevsky at dell.com) Date: Fri, 28 Jun 2019 19:57:47 +0000 Subject: [cinder] Deprecating driver versions In-Reply-To: <20190628075012.ndwk52gabg2akqvx@localhost> References: <20190628075012.ndwk52gabg2akqvx@localhost> Message-ID: <0c8c83cb49d64fbd9dec6fec0b854bde@AUSX13MPS304.AMER.DELL.COM> What driver versions are we discussing? If I look at reference [1] https://releases.openstack.org/teams/cinder.html it does not list any driver. But does have versions of componenets of cinder. Usually people refer to https://docs.openstack.org/cinder/latest/reference/support-matrix.html for drivers. Driver versions and useful. I many case the driver version corresponds to storage backend version which makes it easier for correlate which driver needed (and version of OpenStack) for specific function support. Each driver has a maintainer and it is maintainer responsibility to handle visioning. Expect that any driver change must get +2 from driver maintainer. Agree with Gorka than some process documentation will be helpful Thanks, Arkady -----Original Message----- From: Gorka Eguileor Sent: Friday, June 28, 2019 2:50 AM To: Erlon Cruz Cc: openstack-discuss at lists.openstack.org Subject: Re: [cinder] Deprecating driver versions [EXTERNAL EMAIL] On 27/06, Erlon Cruz wrote: > Hey folks, > > Driver versions has being a source of a lot of confusions with costumers. > Most of our drivers > have a version number and history that are updated as the developers > adds new fixes and features. Drivers also have a VERSION variable in > the version class that should be bumped by developers. The problem > with that is: > > - sometimes folks from the community just push patches on drivers, > and its hard to bump > every vendor version correctly; > - that relies in the human factor to remember adding it, and > usually that fails; > - if we create a bugfix and bump the version, the backport to older > branches will carry the > version, which will not reflect the correct driver code; > > So, the solution I'm proposing for this is that we use the Cinder > versions[1] and remove all version strings for drivers. Every new > release we get a version. For stable versions, from time to time the > PTL bumps the stable version and we have an accurate ways to describe > the code. > If we need to backport and send something to the costumer, we can do > the backport, poke the PTL, and he will generate another version which > can be downloaded on github or via PIP, and present the version to our > costumers. > > So, what are your thought around this? Anyone else has had problems > with that? What would be the implications of removing the driver > version strings? > > Erlon > Hi Erlon, I am personally against removing the drivers versions, as I find them convenient and think they are good practice. A possible solution for the driver versioning is for a driver to designate a minor version per OpenStack release and use the patch version to track changes. This way one can always backport a patch and will just need to increase the patch version in the backport patch. Maybe we can have this formally described in our devref. We tell driver developers they can do whatever they want with the versioning in master, but backports must not backport the version as it is and instead increase the patch version. What do you think? If I remember correctly there are some drivers that only increase the version once per release. Cheers, Gorka. > [1] https://releases.openstack.org/teams/cinder.html > [2] > https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/ > solidfire.py#L237 From sean.mcginnis at gmx.com Sat Jun 29 12:04:00 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Sat, 29 Jun 2019 07:04:00 -0500 Subject: [tc] Assuming control of GitHub organizations In-Reply-To: <87v9wpvk7t.fsf@meyer.lemoncheese.net> References: <87mui2ws5j.fsf@meyer.lemoncheese.net> <87v9wpvk7t.fsf@meyer.lemoncheese.net> Message-ID: <20190629120400.GA13645@sm-workstation> On Fri, Jun 28, 2019 at 07:49:10AM -0700, James E. Blair wrote: > Thierry Carrez writes: > > > James E. Blair wrote: > >> Especially if the folks who manage this are also folks who work on these > >> repos, we're one "git push" away from having egg on our collective face. > >> > >> If the folks managing the GitHub presence are also developers, I would > >> encourage the use of a shared or secondary account. > > > > That is a fair point that I had not considered. > > > > That said, wouldn't the risk be relatively limited if the "admins" > > never checkout or clone from GitHub itself ? > > Yes, the biggest risk is if one of the admins is a regular user of > GitHub. If they don't have their own GitHub-forks of the OpenStack > repos, and they only ever clone their local copies from OpenDev (or, > they are not developers at all), then I think the risk of accidents on a > personal account is fairly low. > > -Jim > There are some tools out there that have been created to help mitigate these kinds of things. One I recently came across is described here: https://www.jeff.wilcox.name/2015/11/azure-on-github/ I'm not advocating for trying to adapt that tool, but I think it shows that something can be stood up relatively easily that would provide a separation of control to prevent accidental admin access modifications while still making it easy to see and manage a large number of repos. Seems fairly easy enough to even just create a githubadmin at openstack.org account and control access via that. Sean From fungi at yuggoth.org Sat Jun 29 12:37:06 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sat, 29 Jun 2019 12:37:06 +0000 Subject: [tc] Assuming control of GitHub organizations In-Reply-To: <20190629120400.GA13645@sm-workstation> References: <87mui2ws5j.fsf@meyer.lemoncheese.net> <87v9wpvk7t.fsf@meyer.lemoncheese.net> <20190629120400.GA13645@sm-workstation> Message-ID: <20190629123705.w4spsam5yexyfles@yuggoth.org> On 2019-06-29 07:04:00 -0500 (-0500), Sean McGinnis wrote: [...] > Seems fairly easy enough to even just create a githubadmin at openstack.org > account and control access via that. Which brings us all the way back around to what the Infra team ended up doing... create a shared account backed by a second auth factor of an OTP generator on its own access-controlled system with audit logging. You could leverage it the same way to authorize and deauthorize other accounts for short-term administrative access. But at the end of the day, solutions like that are probably more of a pain than if the handful of admins who want to have GH accounts and also use GH for other reasons could just have two separate accounts. (Not that I particularly have a horse in this race either way, I'm just happy I can stop touching GH nearly so often.) -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From keith.berger at suse.com Fri Jun 28 13:49:13 2019 From: keith.berger at suse.com (Keith Berger) Date: Fri, 28 Jun 2019 09:49:13 -0400 Subject: https://github.com/openstack/nova/blob/master/nova/virt/vmwareapi/ds_util.py Message-ID: Gary, I hope you are doing well. I was trying to understand the logic in this code when using datastore_regex. It seems even if  you set that parameter and it matches some N datastores, the "free capacity " being returned by nova-compute is only from a single/best datastore. def get_available_datastores(session, cluster=None, datastore_regex=None):     """Get the datastore list and choose the first local storage."""     ds = session._call_method(vutil,                               "get_object_property",                               cluster,                               "datastore") ... ...         if _is_datastore_valid(propdict, datastore_regex, allowed_ds_types):             new_ds = ds_obj.Datastore(                     ref=obj_content.obj,                     name=propdict['summary.name'],                     capacity=propdict['summary.capacity'],                     freespace=propdict['summary.freeSpace'])             # favor datastores with more free space             if (best_match is None or                 new_ds.freespace > best_match.freespace):                 best_match = new_ds     return best_match The issue we are seeing is a customer has for example 5 x 10 GB data stores (DS1,..DS5.)  For whatever reason , DS1 was filled more fully and once it hit close to 100% no new instances could launch as the were blocked by the Nova DiskFilter. I know we can use the disk_allocation to "fool" this but why is not using to total cumulative storage as opposed to the one with most free or first one. I am wondering if there is some configuration issue. If it is supposed to use best_match, should that dynamically change among the matched regex Datastore to the one with most free space? Could it be it is somehow skipping the "best_match" and always using " first local storage" I appreciate any suggestions or comments you have. Keith -- Keith Berger Master Software Engineer SUSE (P)+1 470.237.2012 (M)+1 404.664.9610 keith.berger at suse.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cfhhnhjejgdcagmc.png Type: image/png Size: 3661 bytes Desc: not available URL: From satish.txt at gmail.com Sat Jun 29 04:56:31 2019 From: satish.txt at gmail.com (Satish Patel) Date: Sat, 29 Jun 2019 00:56:31 -0400 Subject: glance image upload error [Errno 32] Broken pipe Message-ID: I have installed opnestack-ansible and integrate glance with ceph storage, first day when i upload image it works but today when i am trying to upload image i am getting this error [root at ostack-infra-2-1-utility-container-c166f549 ~]# openstack image create --file cirros-0.3.4-x86_64-disk.raw --container-format bare --disk-format raw --public cirros-0.3.4-tmp Error finding address for http://172.28.8.9:9292/v2/images/8f3456aa-52fc-4b4a-8b11-dfbadb8e88ca/file: [Errno 32] Broken pipe I do have enough space on ceph storage, i am not seeing any error on glance logs also which help me. [root at ostack-infra-01-ceph-mon-container-692bea95 root]# ceph df detail GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 9314G 3526G 5787G 62.14 245k POOLS: NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED images 6 95367M N/A 22522M 3.62 584G 2839 2839 2594k 36245 67568M vms 7 N/A N/A 1912G 76.58 584G 248494 242k 6567M 3363M 5738G volumes 8 N/A N/A 0 0 584G 0 0 0 0 0 backups 9 N/A N/A 0 0 584G 0 0 0 0 0 metrics 10 N/A N/A 0 0 584G 0 0 0 0 0 From vladimir.blando at gmail.com Sat Jun 29 15:30:41 2019 From: vladimir.blando at gmail.com (vladimir franciz blando) Date: Sat, 29 Jun 2019 23:30:41 +0800 Subject: [kolla] [ceph] Ceph using cluster network for replication Message-ID: Hi, In my globals.yml file --- cluster_interface: "bond1" --- but after deployment, there's nothing on the ceph.conf file that indicates that it's using the cluster interface --- # docker exec ceph_mon cat /etc/ceph/ceph.conf [global] log file = /var/log/kolla/ceph/$cluster-$name.log log to syslog = false err to syslog = false log to stderr = false err to stderr = false fsid = f8ff1404-bead-422b-9076-21583012ad30 mon initial members = 172.16.43.22, 172.16.43.11, 172.16.43.12 mon host = 172.16.43.22, 172.16.43.11, 172.16.43.12 mon addr = 172.16.43.22:6789, 172.16.43.11:6789, 172.16.43.12:6789 auth cluster required = cephx auth service required = cephx auth client required = cephx setuser match path = /var/lib/ceph/$type/$cluster-$id [mon] mon compact on start = true mon cluster log file = /var/log/kolla/ceph/$cluster.log --- I was expecting cluster network = . I even tried making a custom config (/etc/kolla/config/ceph/ceph.conf) which includes the cluster network = public network = and run "kolla-ansible -i multinode reconfigure" but it did nothing, was there something missing on what I did? - Vlad ᐧ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jungleboyj at gmail.com Sat Jun 29 18:15:01 2019 From: jungleboyj at gmail.com (Jay S. Bryant) Date: Sat, 29 Jun 2019 13:15:01 -0500 Subject: [cinder] Deprecating driver versions In-Reply-To: <20190628075012.ndwk52gabg2akqvx@localhost> References: <20190628075012.ndwk52gabg2akqvx@localhost> Message-ID: <59de8659-6a0b-62b5-8bee-ed0fdf622cf9@gmail.com> Erlon, I appreciate the goal here but I agree with Gorka here. The drivers are the vendor's responsibilities and they version them as they wish.  I think updating the devref some best practices recommendations would be good and maybe come to agreement between the cores on what the best practices are so that we can try to enforce it to some extent through reviews.  That is probably the best way forward. Jay On 6/28/2019 2:50 AM, Gorka Eguileor wrote: > On 27/06, Erlon Cruz wrote: >> Hey folks, >> >> Driver versions has being a source of a lot of confusions with costumers. >> Most of our drivers >> have a version number and history that are updated as the developers adds >> new fixes and >> features. Drivers also have a VERSION variable in the version class that >> should be bumped by >> developers. The problem with that is: >> >> - sometimes folks from the community just push patches on drivers, and >> its hard to bump >> every vendor version correctly; >> - that relies in the human factor to remember adding it, and usually >> that fails; >> - if we create a bugfix and bump the version, the backport to older >> branches will carry the >> version, which will not reflect the correct driver code; >> >> So, the solution I'm proposing for this is that we use the Cinder >> versions[1] and remove all >> version strings for drivers. Every new release we get a version. For stable >> versions, from time to >> time the PTL bumps the stable version and we have an accurate ways to >> describe the code. >> If we need to backport and send something to the costumer, we can do the >> backport, poke >> the PTL, and he will generate another version which can be downloaded on >> github or via PIP, >> and present the version to our costumers. >> >> So, what are your thought around this? Anyone else has had problems with >> that? What would >> be the implications of removing the driver version strings? >> >> Erlon >> > Hi Erlon, > > I am personally against removing the drivers versions, as I find them > convenient and think they are good practice. > > A possible solution for the driver versioning is for a driver to > designate a minor version per OpenStack release and use the patch > version to track changes. This way one can always backport a patch and > will just need to increase the patch version in the backport patch. > > Maybe we can have this formally described in our devref. We tell > driver developers they can do whatever they want with the versioning in > master, but backports must not backport the version as it is and instead > increase the patch version. > > What do you think? > > If I remember correctly there are some drivers that only increase the > version once per release. > > Cheers, > Gorka. > >> [1] https://releases.openstack.org/teams/cinder.html >> [2] >> https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/solidfire.py#L237 From radoslaw.piliszek at gmail.com Sun Jun 30 07:18:50 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Sun, 30 Jun 2019 09:18:50 +0200 Subject: [kolla] [ceph] Ceph using cluster network for replication In-Reply-To: References: Message-ID: Hello Vlad, the 'cluster network' (aka 'cluster_network') in ceph.conf is just a hint to OSDs which address to use for cluster traffic (as they register it in MONs). We use globals.yml 'cluster_interface' to explicitly run the OSD using the first address on the selected interface for cluster address: ceph-osd ... --cluster-addr {{ hostvars[inventory_hostname]['ansible_' + cluster_interface]['ipv4']['address'] }} ... Therefore, there is no need to set 'cluster network'. If you really want to set it (perhaps for reference), you can set it in '/etc/kolla/config/ceph.conf' (notice no ceph/ directory). Kind regards, Radek sob., 29 cze 2019 o 17:33 vladimir franciz blando napisał(a): > Hi, > > In my globals.yml file > --- > cluster_interface: "bond1" > --- > > but after deployment, there's nothing on the ceph.conf file that indicates > that it's using the cluster interface > --- > # docker exec ceph_mon cat /etc/ceph/ceph.conf > [global] > log file = /var/log/kolla/ceph/$cluster-$name.log > log to syslog = false > err to syslog = false > log to stderr = false > err to stderr = false > fsid = f8ff1404-bead-422b-9076-21583012ad30 > mon initial members = 172.16.43.22, 172.16.43.11, 172.16.43.12 > mon host = 172.16.43.22, 172.16.43.11, 172.16.43.12 > mon addr = 172.16.43.22:6789, 172.16.43.11:6789, 172.16.43.12:6789 > auth cluster required = cephx > auth service required = cephx > auth client required = cephx > setuser match path = /var/lib/ceph/$type/$cluster-$id > > [mon] > mon compact on start = true > mon cluster log file = /var/log/kolla/ceph/$cluster.log > --- > I was expecting > cluster network = . > > I even tried making a custom config (/etc/kolla/config/ceph/ceph.conf) > which includes the > cluster network = > public network = > > and run "kolla-ansible -i multinode reconfigure" > > but it did nothing, was there something missing on what I did? > > > - Vlad > ᐧ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Sun Jun 30 08:55:02 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Sun, 30 Jun 2019 10:55:02 +0200 Subject: [kolla-ansible] migration In-Reply-To: References: Message-ID: Hi Mark, let me to explain what I am trying. I have a queens installation based on centos and pacemaker with some instances and heat stacks. I would like to have another installation with same instances, projects, stacks ....I'd like to have same uuid for all objects (users,projects instances and so on, because it is controlled by a cloud management platform we wrote. I stopped controllers on old queens installation backupping the openstack database. I installed the new kolla openstack queens on new three controllers with same addresses of the old intallation , vip as well. One of the three controllers is also a kvm node on queens. I stopped all containeres except rabbit,keepalive,rabbit,haproxy and mariadb. I deleted al openstack db on mariadb container and I imported the old tables, changing the address of rabbit for pointing to the new rabbit cluster. I restarded containers. Changing the rabbit address on old kvm nodes, I can see the old virtual machines and I can open console on them. I can see all networks (tenant and provider) of al installation, but when I try to create a new instance on the new kvm, it remains in buiding state. Seems it cannot aquire an address. Storage between old and new installation are shred on nfs NETAPP, so I can see cinder volumes. I suppose db structure is different between a kolla installation and a manual instaltion !? What is wrong ? Thanks Ignazio Il giorno gio 27 giu 2019 alle ore 16:44 Mark Goddard ha scritto: > On Thu, 27 Jun 2019 at 14:46, Ignazio Cassano > wrote: > > > > Sorry, for my question. > > It does not need to change anything because endpoints refer to haproxy > vips. > > So if your new glance works fine you change haproxy backends for glance. > > Regards > > Ignazio > > That's correct - only the haproxy backend needs to be updated. > > > > > > > Il giorno gio 27 giu 2019 alle ore 15:21 Ignazio Cassano < > ignaziocassano at gmail.com> ha scritto: > >> > >> Hello Mark, > >> let me to verify if I understood your method. > >> > >> You have old controllers,haproxy,mariadb and nova computes. > >> You installed three new controllers but kolla.ansible inventory > contains old mariadb and old rabbit servers. > >> You are deployng single service on new controllers staring with glance. > >> When you deploy glance on new controllers, it changes the glance > endpoint on old mariadb db ? > >> Regards > >> Ignazio > >> > >> Il giorno gio 27 giu 2019 alle ore 10:52 Mark Goddard < > mark at stackhpc.com> ha scritto: > >>> > >>> On Wed, 26 Jun 2019 at 19:34, Ignazio Cassano < > ignaziocassano at gmail.com> wrote: > >>> > > >>> > Hello, > >>> > Anyone have tried to migrate an existing openstack installation to > kolla containers? > >>> > >>> Hi, > >>> > >>> I'm aware of two people currently working on that. Gregory Orange and > >>> one of my colleagues, Pierre Riteau. Pierre is away currently, so I > >>> hope he doesn't mind me quoting him from an email to Gregory. > >>> > >>> Mark > >>> > >>> "I am indeed working on a similar migration using Kolla Ansible with > >>> Kayobe, starting from a non-containerised OpenStack deployment based > >>> on CentOS RPMs. > >>> Existing OpenStack services are deployed across several controller > >>> nodes and all sit behind HAProxy, including for internal endpoints. > >>> We have additional controller nodes that we use to deploy > >>> containerised services. If you don't have the luxury of additional > >>> nodes, it will be more difficult as you will need to avoid processes > >>> clashing when listening on the same port. > >>> > >>> The method I am using resembles your second suggestion, however I am > >>> deploying only one containerised service at a time, in order to > >>> validate each of them independently. > >>> I use the --tags option of kolla-ansible to restrict Ansible to > >>> specific roles, and when I am happy with the resulting configuration I > >>> update HAProxy to point to the new controllers. > >>> > >>> As long as the configuration matches, this should be completely > >>> transparent for purely HTTP-based services like Glance. You need to be > >>> more careful with services that include components listening for RPC, > >>> such as Nova: if the new nova.conf is incorrect and you've deployed a > >>> nova-conductor that uses it, you could get failed instances launches. > >>> Some roles depend on others: if you are deploying the > >>> neutron-openvswitch-agent, you need to run the openvswitch role as > >>> well. > >>> > >>> I suggest starting with migrating Glance as it doesn't have any > >>> internal services and is easy to validate. Note that properly > >>> migrating Keystone requires keeping existing Fernet keys around, so > >>> any token stays valid until the time it is expected to stop working > >>> (which is fairly complex, see > >>> https://bugs.launchpad.net/kolla-ansible/+bug/1809469). > >>> > >>> While initially I was using an approach similar to your first > >>> suggestion, it can have side effects since Kolla Ansible uses these > >>> variables when templating configuration. As an example, most services > >>> will only have notifications enabled if enable_ceilometer is true. > >>> > >>> I've added existing control plane nodes to the Kolla Ansible inventory > >>> as separate groups, which allows me to use the existing database and > >>> RabbitMQ for the containerised services. > >>> For example, instead of: > >>> > >>> [mariadb:children] > >>> control > >>> > >>> you may have: > >>> > >>> [mariadb:children] > >>> oldcontrol_db > >>> > >>> I still have to perform the migration of these underlying services to > >>> the new control plane, I will let you know if there is any hurdle. > >>> > >>> A few random things to note: > >>> > >>> - if run on existing control plane hosts, the baremetal role removes > >>> some packages listed in `redhat_pkg_removals` which can trigger the > >>> removal of OpenStack dependencies using them! I've changed this > >>> variable to an empty list. > >>> - compare your existing deployment with a Kolla Ansible one to check > >>> for differences in endpoints, configuration files, database users, > >>> service users, etc. For Heat, Kolla uses the domain heat_user_domain, > >>> while your existing deployment may use another one (and this is > >>> hardcoded in the Kolla Heat image). Kolla Ansible uses the "service" > >>> project while a couple of deployments I worked with were using > >>> "services". This shouldn't matter, except there was a bug in Kolla > >>> which prevented it from setting the roles correctly: > >>> https://bugs.launchpad.net/kolla/+bug/1791896 (now fixed in latest > >>> Rocky and Queens images) > >>> - the ml2_conf.ini generated for Neutron generates physical network > >>> names like physnet1, physnet2… you may want to override > >>> bridge_mappings completely. > >>> - although sometimes it could be easier to change your existing > >>> deployment to match Kolla Ansible settings, rather than configure > >>> Kolla Ansible to match your deployment." > >>> > >>> > Thanks > >>> > Ignazio > >>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Sun Jun 30 09:18:33 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Sun, 30 Jun 2019 11:18:33 +0200 Subject: [kolla-ansible] migration In-Reply-To: References: Message-ID: Hi Ignazio, it is hard to tell without logs. Please attach (pastebin) the relevant ones (probably nova ones, maybe neutron and cinder). Also, did you keep the old configs and tried comparing them with new ones? Kind regards, Radek niedz., 30 cze 2019 o 11:07 Ignazio Cassano napisał(a): > Hi Mark, > let me to explain what I am trying. > I have a queens installation based on centos and pacemaker with some > instances and heat stacks. > I would like to have another installation with same instances, projects, > stacks ....I'd like to have same uuid for all objects (users,projects > instances and so on, because it is controlled by a cloud management > platform we wrote. > > I stopped controllers on old queens installation backupping the openstack > database. > I installed the new kolla openstack queens on new three controllers with > same addresses of the old intallation , vip as well. > One of the three controllers is also a kvm node on queens. > I stopped all containeres except rabbit,keepalive,rabbit,haproxy and > mariadb. > I deleted al openstack db on mariadb container and I imported the old > tables, changing the address of rabbit for pointing to the new rabbit > cluster. > I restarded containers. > Changing the rabbit address on old kvm nodes, I can see the old virtual > machines and I can open console on them. > I can see all networks (tenant and provider) of al installation, but when > I try to create a new instance on the new kvm, it remains in buiding state. > Seems it cannot aquire an address. > Storage between old and new installation are shred on nfs NETAPP, so I can > see cinder volumes. > I suppose db structure is different between a kolla installation and a > manual instaltion !? > What is wrong ? > Thanks > Ignazio > > > > > Il giorno gio 27 giu 2019 alle ore 16:44 Mark Goddard > ha scritto: > >> On Thu, 27 Jun 2019 at 14:46, Ignazio Cassano >> wrote: >> > >> > Sorry, for my question. >> > It does not need to change anything because endpoints refer to haproxy >> vips. >> > So if your new glance works fine you change haproxy backends for glance. >> > Regards >> > Ignazio >> >> That's correct - only the haproxy backend needs to be updated. >> >> > >> > >> > Il giorno gio 27 giu 2019 alle ore 15:21 Ignazio Cassano < >> ignaziocassano at gmail.com> ha scritto: >> >> >> >> Hello Mark, >> >> let me to verify if I understood your method. >> >> >> >> You have old controllers,haproxy,mariadb and nova computes. >> >> You installed three new controllers but kolla.ansible inventory >> contains old mariadb and old rabbit servers. >> >> You are deployng single service on new controllers staring with glance. >> >> When you deploy glance on new controllers, it changes the glance >> endpoint on old mariadb db ? >> >> Regards >> >> Ignazio >> >> >> >> Il giorno gio 27 giu 2019 alle ore 10:52 Mark Goddard < >> mark at stackhpc.com> ha scritto: >> >>> >> >>> On Wed, 26 Jun 2019 at 19:34, Ignazio Cassano < >> ignaziocassano at gmail.com> wrote: >> >>> > >> >>> > Hello, >> >>> > Anyone have tried to migrate an existing openstack installation to >> kolla containers? >> >>> >> >>> Hi, >> >>> >> >>> I'm aware of two people currently working on that. Gregory Orange and >> >>> one of my colleagues, Pierre Riteau. Pierre is away currently, so I >> >>> hope he doesn't mind me quoting him from an email to Gregory. >> >>> >> >>> Mark >> >>> >> >>> "I am indeed working on a similar migration using Kolla Ansible with >> >>> Kayobe, starting from a non-containerised OpenStack deployment based >> >>> on CentOS RPMs. >> >>> Existing OpenStack services are deployed across several controller >> >>> nodes and all sit behind HAProxy, including for internal endpoints. >> >>> We have additional controller nodes that we use to deploy >> >>> containerised services. If you don't have the luxury of additional >> >>> nodes, it will be more difficult as you will need to avoid processes >> >>> clashing when listening on the same port. >> >>> >> >>> The method I am using resembles your second suggestion, however I am >> >>> deploying only one containerised service at a time, in order to >> >>> validate each of them independently. >> >>> I use the --tags option of kolla-ansible to restrict Ansible to >> >>> specific roles, and when I am happy with the resulting configuration I >> >>> update HAProxy to point to the new controllers. >> >>> >> >>> As long as the configuration matches, this should be completely >> >>> transparent for purely HTTP-based services like Glance. You need to be >> >>> more careful with services that include components listening for RPC, >> >>> such as Nova: if the new nova.conf is incorrect and you've deployed a >> >>> nova-conductor that uses it, you could get failed instances launches. >> >>> Some roles depend on others: if you are deploying the >> >>> neutron-openvswitch-agent, you need to run the openvswitch role as >> >>> well. >> >>> >> >>> I suggest starting with migrating Glance as it doesn't have any >> >>> internal services and is easy to validate. Note that properly >> >>> migrating Keystone requires keeping existing Fernet keys around, so >> >>> any token stays valid until the time it is expected to stop working >> >>> (which is fairly complex, see >> >>> https://bugs.launchpad.net/kolla-ansible/+bug/1809469). >> >>> >> >>> While initially I was using an approach similar to your first >> >>> suggestion, it can have side effects since Kolla Ansible uses these >> >>> variables when templating configuration. As an example, most services >> >>> will only have notifications enabled if enable_ceilometer is true. >> >>> >> >>> I've added existing control plane nodes to the Kolla Ansible inventory >> >>> as separate groups, which allows me to use the existing database and >> >>> RabbitMQ for the containerised services. >> >>> For example, instead of: >> >>> >> >>> [mariadb:children] >> >>> control >> >>> >> >>> you may have: >> >>> >> >>> [mariadb:children] >> >>> oldcontrol_db >> >>> >> >>> I still have to perform the migration of these underlying services to >> >>> the new control plane, I will let you know if there is any hurdle. >> >>> >> >>> A few random things to note: >> >>> >> >>> - if run on existing control plane hosts, the baremetal role removes >> >>> some packages listed in `redhat_pkg_removals` which can trigger the >> >>> removal of OpenStack dependencies using them! I've changed this >> >>> variable to an empty list. >> >>> - compare your existing deployment with a Kolla Ansible one to check >> >>> for differences in endpoints, configuration files, database users, >> >>> service users, etc. For Heat, Kolla uses the domain heat_user_domain, >> >>> while your existing deployment may use another one (and this is >> >>> hardcoded in the Kolla Heat image). Kolla Ansible uses the "service" >> >>> project while a couple of deployments I worked with were using >> >>> "services". This shouldn't matter, except there was a bug in Kolla >> >>> which prevented it from setting the roles correctly: >> >>> https://bugs.launchpad.net/kolla/+bug/1791896 (now fixed in latest >> >>> Rocky and Queens images) >> >>> - the ml2_conf.ini generated for Neutron generates physical network >> >>> names like physnet1, physnet2… you may want to override >> >>> bridge_mappings completely. >> >>> - although sometimes it could be easier to change your existing >> >>> deployment to match Kolla Ansible settings, rather than configure >> >>> Kolla Ansible to match your deployment." >> >>> >> >>> > Thanks >> >>> > Ignazio >> >>> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Sun Jun 30 10:10:23 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Sun, 30 Jun 2019 12:10:23 +0200 Subject: [kolla-ansible] migration In-Reply-To: References: Message-ID: Hello Radosław, unfortunately I came back to old installation restarting old controllers and I installed a new kolla queens with new addresses, so now I have two different openstacks installations. I compared the configurations: the difference is that the old does not use any memcache secret, while the kolla installation use them....but I do not think they are stored on db. I am looking for a method for migrating from non kolla to kolla but I did not find any example searching google. If there isn't any documented method I'll try again and I'll send logs. An alternative could be migrating instances from a volume backup but it a very slow way and my cloud management platform should re-import all uuid . Installing kolla queens works I got an openstack that works fine but I need a procedure to migrate. Thanks and Regards Ignazio Il giorno dom 30 giu 2019 alle ore 11:18 Radosław Piliszek < radoslaw.piliszek at gmail.com> ha scritto: > Hi Ignazio, > > it is hard to tell without logs. Please attach (pastebin) the relevant > ones (probably nova ones, maybe neutron and cinder). > Also, did you keep the old configs and tried comparing them with new ones? > > Kind regards, > Radek > > niedz., 30 cze 2019 o 11:07 Ignazio Cassano > napisał(a): > >> Hi Mark, >> let me to explain what I am trying. >> I have a queens installation based on centos and pacemaker with some >> instances and heat stacks. >> I would like to have another installation with same instances, projects, >> stacks ....I'd like to have same uuid for all objects (users,projects >> instances and so on, because it is controlled by a cloud management >> platform we wrote. >> >> I stopped controllers on old queens installation backupping the openstack >> database. >> I installed the new kolla openstack queens on new three controllers with >> same addresses of the old intallation , vip as well. >> One of the three controllers is also a kvm node on queens. >> I stopped all containeres except rabbit,keepalive,rabbit,haproxy and >> mariadb. >> I deleted al openstack db on mariadb container and I imported the old >> tables, changing the address of rabbit for pointing to the new rabbit >> cluster. >> I restarded containers. >> Changing the rabbit address on old kvm nodes, I can see the old virtual >> machines and I can open console on them. >> I can see all networks (tenant and provider) of al installation, but when >> I try to create a new instance on the new kvm, it remains in buiding state. >> Seems it cannot aquire an address. >> Storage between old and new installation are shred on nfs NETAPP, so I >> can see cinder volumes. >> I suppose db structure is different between a kolla installation and a >> manual instaltion !? >> What is wrong ? >> Thanks >> Ignazio >> >> >> >> >> Il giorno gio 27 giu 2019 alle ore 16:44 Mark Goddard >> ha scritto: >> >>> On Thu, 27 Jun 2019 at 14:46, Ignazio Cassano >>> wrote: >>> > >>> > Sorry, for my question. >>> > It does not need to change anything because endpoints refer to haproxy >>> vips. >>> > So if your new glance works fine you change haproxy backends for >>> glance. >>> > Regards >>> > Ignazio >>> >>> That's correct - only the haproxy backend needs to be updated. >>> >>> > >>> > >>> > Il giorno gio 27 giu 2019 alle ore 15:21 Ignazio Cassano < >>> ignaziocassano at gmail.com> ha scritto: >>> >> >>> >> Hello Mark, >>> >> let me to verify if I understood your method. >>> >> >>> >> You have old controllers,haproxy,mariadb and nova computes. >>> >> You installed three new controllers but kolla.ansible inventory >>> contains old mariadb and old rabbit servers. >>> >> You are deployng single service on new controllers staring with >>> glance. >>> >> When you deploy glance on new controllers, it changes the glance >>> endpoint on old mariadb db ? >>> >> Regards >>> >> Ignazio >>> >> >>> >> Il giorno gio 27 giu 2019 alle ore 10:52 Mark Goddard < >>> mark at stackhpc.com> ha scritto: >>> >>> >>> >>> On Wed, 26 Jun 2019 at 19:34, Ignazio Cassano < >>> ignaziocassano at gmail.com> wrote: >>> >>> > >>> >>> > Hello, >>> >>> > Anyone have tried to migrate an existing openstack installation to >>> kolla containers? >>> >>> >>> >>> Hi, >>> >>> >>> >>> I'm aware of two people currently working on that. Gregory Orange and >>> >>> one of my colleagues, Pierre Riteau. Pierre is away currently, so I >>> >>> hope he doesn't mind me quoting him from an email to Gregory. >>> >>> >>> >>> Mark >>> >>> >>> >>> "I am indeed working on a similar migration using Kolla Ansible with >>> >>> Kayobe, starting from a non-containerised OpenStack deployment based >>> >>> on CentOS RPMs. >>> >>> Existing OpenStack services are deployed across several controller >>> >>> nodes and all sit behind HAProxy, including for internal endpoints. >>> >>> We have additional controller nodes that we use to deploy >>> >>> containerised services. If you don't have the luxury of additional >>> >>> nodes, it will be more difficult as you will need to avoid processes >>> >>> clashing when listening on the same port. >>> >>> >>> >>> The method I am using resembles your second suggestion, however I am >>> >>> deploying only one containerised service at a time, in order to >>> >>> validate each of them independently. >>> >>> I use the --tags option of kolla-ansible to restrict Ansible to >>> >>> specific roles, and when I am happy with the resulting configuration >>> I >>> >>> update HAProxy to point to the new controllers. >>> >>> >>> >>> As long as the configuration matches, this should be completely >>> >>> transparent for purely HTTP-based services like Glance. You need to >>> be >>> >>> more careful with services that include components listening for RPC, >>> >>> such as Nova: if the new nova.conf is incorrect and you've deployed a >>> >>> nova-conductor that uses it, you could get failed instances launches. >>> >>> Some roles depend on others: if you are deploying the >>> >>> neutron-openvswitch-agent, you need to run the openvswitch role as >>> >>> well. >>> >>> >>> >>> I suggest starting with migrating Glance as it doesn't have any >>> >>> internal services and is easy to validate. Note that properly >>> >>> migrating Keystone requires keeping existing Fernet keys around, so >>> >>> any token stays valid until the time it is expected to stop working >>> >>> (which is fairly complex, see >>> >>> https://bugs.launchpad.net/kolla-ansible/+bug/1809469). >>> >>> >>> >>> While initially I was using an approach similar to your first >>> >>> suggestion, it can have side effects since Kolla Ansible uses these >>> >>> variables when templating configuration. As an example, most services >>> >>> will only have notifications enabled if enable_ceilometer is true. >>> >>> >>> >>> I've added existing control plane nodes to the Kolla Ansible >>> inventory >>> >>> as separate groups, which allows me to use the existing database and >>> >>> RabbitMQ for the containerised services. >>> >>> For example, instead of: >>> >>> >>> >>> [mariadb:children] >>> >>> control >>> >>> >>> >>> you may have: >>> >>> >>> >>> [mariadb:children] >>> >>> oldcontrol_db >>> >>> >>> >>> I still have to perform the migration of these underlying services to >>> >>> the new control plane, I will let you know if there is any hurdle. >>> >>> >>> >>> A few random things to note: >>> >>> >>> >>> - if run on existing control plane hosts, the baremetal role removes >>> >>> some packages listed in `redhat_pkg_removals` which can trigger the >>> >>> removal of OpenStack dependencies using them! I've changed this >>> >>> variable to an empty list. >>> >>> - compare your existing deployment with a Kolla Ansible one to check >>> >>> for differences in endpoints, configuration files, database users, >>> >>> service users, etc. For Heat, Kolla uses the domain heat_user_domain, >>> >>> while your existing deployment may use another one (and this is >>> >>> hardcoded in the Kolla Heat image). Kolla Ansible uses the "service" >>> >>> project while a couple of deployments I worked with were using >>> >>> "services". This shouldn't matter, except there was a bug in Kolla >>> >>> which prevented it from setting the roles correctly: >>> >>> https://bugs.launchpad.net/kolla/+bug/1791896 (now fixed in latest >>> >>> Rocky and Queens images) >>> >>> - the ml2_conf.ini generated for Neutron generates physical network >>> >>> names like physnet1, physnet2… you may want to override >>> >>> bridge_mappings completely. >>> >>> - although sometimes it could be easier to change your existing >>> >>> deployment to match Kolla Ansible settings, rather than configure >>> >>> Kolla Ansible to match your deployment." >>> >>> >>> >>> > Thanks >>> >>> > Ignazio >>> >>> > >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Sun Jun 30 10:12:50 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Sun, 30 Jun 2019 12:12:50 +0200 Subject: [kolla-ansible] migration In-Reply-To: References: Message-ID: PS I saw a video about kolla-migrator showed during an openstack summit....but its documentation si very poor. Regards Ignazio Il giorno dom 30 giu 2019 alle ore 11:18 Radosław Piliszek < radoslaw.piliszek at gmail.com> ha scritto: > Hi Ignazio, > > it is hard to tell without logs. Please attach (pastebin) the relevant > ones (probably nova ones, maybe neutron and cinder). > Also, did you keep the old configs and tried comparing them with new ones? > > Kind regards, > Radek > > niedz., 30 cze 2019 o 11:07 Ignazio Cassano > napisał(a): > >> Hi Mark, >> let me to explain what I am trying. >> I have a queens installation based on centos and pacemaker with some >> instances and heat stacks. >> I would like to have another installation with same instances, projects, >> stacks ....I'd like to have same uuid for all objects (users,projects >> instances and so on, because it is controlled by a cloud management >> platform we wrote. >> >> I stopped controllers on old queens installation backupping the openstack >> database. >> I installed the new kolla openstack queens on new three controllers with >> same addresses of the old intallation , vip as well. >> One of the three controllers is also a kvm node on queens. >> I stopped all containeres except rabbit,keepalive,rabbit,haproxy and >> mariadb. >> I deleted al openstack db on mariadb container and I imported the old >> tables, changing the address of rabbit for pointing to the new rabbit >> cluster. >> I restarded containers. >> Changing the rabbit address on old kvm nodes, I can see the old virtual >> machines and I can open console on them. >> I can see all networks (tenant and provider) of al installation, but when >> I try to create a new instance on the new kvm, it remains in buiding state. >> Seems it cannot aquire an address. >> Storage between old and new installation are shred on nfs NETAPP, so I >> can see cinder volumes. >> I suppose db structure is different between a kolla installation and a >> manual instaltion !? >> What is wrong ? >> Thanks >> Ignazio >> >> >> >> >> Il giorno gio 27 giu 2019 alle ore 16:44 Mark Goddard >> ha scritto: >> >>> On Thu, 27 Jun 2019 at 14:46, Ignazio Cassano >>> wrote: >>> > >>> > Sorry, for my question. >>> > It does not need to change anything because endpoints refer to haproxy >>> vips. >>> > So if your new glance works fine you change haproxy backends for >>> glance. >>> > Regards >>> > Ignazio >>> >>> That's correct - only the haproxy backend needs to be updated. >>> >>> > >>> > >>> > Il giorno gio 27 giu 2019 alle ore 15:21 Ignazio Cassano < >>> ignaziocassano at gmail.com> ha scritto: >>> >> >>> >> Hello Mark, >>> >> let me to verify if I understood your method. >>> >> >>> >> You have old controllers,haproxy,mariadb and nova computes. >>> >> You installed three new controllers but kolla.ansible inventory >>> contains old mariadb and old rabbit servers. >>> >> You are deployng single service on new controllers staring with >>> glance. >>> >> When you deploy glance on new controllers, it changes the glance >>> endpoint on old mariadb db ? >>> >> Regards >>> >> Ignazio >>> >> >>> >> Il giorno gio 27 giu 2019 alle ore 10:52 Mark Goddard < >>> mark at stackhpc.com> ha scritto: >>> >>> >>> >>> On Wed, 26 Jun 2019 at 19:34, Ignazio Cassano < >>> ignaziocassano at gmail.com> wrote: >>> >>> > >>> >>> > Hello, >>> >>> > Anyone have tried to migrate an existing openstack installation to >>> kolla containers? >>> >>> >>> >>> Hi, >>> >>> >>> >>> I'm aware of two people currently working on that. Gregory Orange and >>> >>> one of my colleagues, Pierre Riteau. Pierre is away currently, so I >>> >>> hope he doesn't mind me quoting him from an email to Gregory. >>> >>> >>> >>> Mark >>> >>> >>> >>> "I am indeed working on a similar migration using Kolla Ansible with >>> >>> Kayobe, starting from a non-containerised OpenStack deployment based >>> >>> on CentOS RPMs. >>> >>> Existing OpenStack services are deployed across several controller >>> >>> nodes and all sit behind HAProxy, including for internal endpoints. >>> >>> We have additional controller nodes that we use to deploy >>> >>> containerised services. If you don't have the luxury of additional >>> >>> nodes, it will be more difficult as you will need to avoid processes >>> >>> clashing when listening on the same port. >>> >>> >>> >>> The method I am using resembles your second suggestion, however I am >>> >>> deploying only one containerised service at a time, in order to >>> >>> validate each of them independently. >>> >>> I use the --tags option of kolla-ansible to restrict Ansible to >>> >>> specific roles, and when I am happy with the resulting configuration >>> I >>> >>> update HAProxy to point to the new controllers. >>> >>> >>> >>> As long as the configuration matches, this should be completely >>> >>> transparent for purely HTTP-based services like Glance. You need to >>> be >>> >>> more careful with services that include components listening for RPC, >>> >>> such as Nova: if the new nova.conf is incorrect and you've deployed a >>> >>> nova-conductor that uses it, you could get failed instances launches. >>> >>> Some roles depend on others: if you are deploying the >>> >>> neutron-openvswitch-agent, you need to run the openvswitch role as >>> >>> well. >>> >>> >>> >>> I suggest starting with migrating Glance as it doesn't have any >>> >>> internal services and is easy to validate. Note that properly >>> >>> migrating Keystone requires keeping existing Fernet keys around, so >>> >>> any token stays valid until the time it is expected to stop working >>> >>> (which is fairly complex, see >>> >>> https://bugs.launchpad.net/kolla-ansible/+bug/1809469). >>> >>> >>> >>> While initially I was using an approach similar to your first >>> >>> suggestion, it can have side effects since Kolla Ansible uses these >>> >>> variables when templating configuration. As an example, most services >>> >>> will only have notifications enabled if enable_ceilometer is true. >>> >>> >>> >>> I've added existing control plane nodes to the Kolla Ansible >>> inventory >>> >>> as separate groups, which allows me to use the existing database and >>> >>> RabbitMQ for the containerised services. >>> >>> For example, instead of: >>> >>> >>> >>> [mariadb:children] >>> >>> control >>> >>> >>> >>> you may have: >>> >>> >>> >>> [mariadb:children] >>> >>> oldcontrol_db >>> >>> >>> >>> I still have to perform the migration of these underlying services to >>> >>> the new control plane, I will let you know if there is any hurdle. >>> >>> >>> >>> A few random things to note: >>> >>> >>> >>> - if run on existing control plane hosts, the baremetal role removes >>> >>> some packages listed in `redhat_pkg_removals` which can trigger the >>> >>> removal of OpenStack dependencies using them! I've changed this >>> >>> variable to an empty list. >>> >>> - compare your existing deployment with a Kolla Ansible one to check >>> >>> for differences in endpoints, configuration files, database users, >>> >>> service users, etc. For Heat, Kolla uses the domain heat_user_domain, >>> >>> while your existing deployment may use another one (and this is >>> >>> hardcoded in the Kolla Heat image). Kolla Ansible uses the "service" >>> >>> project while a couple of deployments I worked with were using >>> >>> "services". This shouldn't matter, except there was a bug in Kolla >>> >>> which prevented it from setting the roles correctly: >>> >>> https://bugs.launchpad.net/kolla/+bug/1791896 (now fixed in latest >>> >>> Rocky and Queens images) >>> >>> - the ml2_conf.ini generated for Neutron generates physical network >>> >>> names like physnet1, physnet2… you may want to override >>> >>> bridge_mappings completely. >>> >>> - although sometimes it could be easier to change your existing >>> >>> deployment to match Kolla Ansible settings, rather than configure >>> >>> Kolla Ansible to match your deployment." >>> >>> >>> >>> > Thanks >>> >>> > Ignazio >>> >>> > >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: