From gmann at ghanshyammann.com Wed Jan 1 01:03:22 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 31 Dec 2019 19:03:22 -0600 Subject: [qa] QA Office hour new timing for 2020 Message-ID: <16f5ea0ef6f.1010065b175496.3744470588375359656@ghanshyammann.com> Hello Everyone, We have few members who actively started contribution to QA from India as well as from Europe TZ. The current office hour time is more convenient for CT and JST only and very difficult for members from India Europe to join. I would like to adjust office hour timing to include all four TZs. To do that someone has to wake up early or stay late at night :). I gave preference to new members and selected Tuesday 13.30 UTC [1]. which will be an early morning for me and late-night in Tokyo. let me know if any objection or better suggestion. Accordingly, I will make the new time effective from 7th Jan. Also cancelling the 1st Jan office hour and wish you all a very happy new year. [1] https://www.timeanddate.com/worldclock/meetingdetails.html?year=2020&month=1&day=7&hour=13&min=30&sec=0&p1=265&p2=204&p3=771&p4=248&iv=1800 -gmann From gouthampravi at gmail.com Wed Jan 1 14:39:02 2020 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Wed, 1 Jan 2020 20:09:02 +0530 Subject: [manila] No IRC/Community meeting on 2nd January 2020 Message-ID: Hello Zorillas, Due to the holidays we're expecting multiple community members (myself included) to be unavailable for the weekly IRC meeting tomorrow (2nd January 2020, 15:00 UTC), so we'll skip it. Please add any agenda items to the next meeting (9th January 2020, 15:00 UTC) [1]. Happy New Year, and the rest of your holidays! Thanks, Goutham [1] https://wiki.openstack.org/wiki/Manila/Meetings -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim at swiftstack.com Thu Jan 2 06:10:37 2020 From: tim at swiftstack.com (Tim Burke) Date: Wed, 01 Jan 2020 22:10:37 -0800 Subject: [stein][cinder][backup][swift] issue In-Reply-To: References: Message-ID: Hi Ignazio, That's expected behavior with rados gateway. They follow S3's lead and have a unified namespace for containers across all tenants. From their documentation [0]: If a container with the same name already exists, and the user is the container owner then the operation will succeed. Otherwise the operation will fail. FWIW, that's very much a Ceph-ism -- Swift proper allows each tenant full and independent control over their namespace. Tim [0] https://docs.ceph.com/docs/mimic/radosgw/swift/containerops/#http-response On Mon, 2019-12-30 at 15:48 +0100, Ignazio Cassano wrote: > Hello All, > I configured openstack stein on centos 7 witch ceph. > Cinder works fine and object storage on ceph seems to work fine: I > can clreate containers, volume etc ..... > > I configured cinder backup on swift (but swift is using ceph rados > gateway) : > > backup_driver = cinder.backup.drivers.swift.SwiftBackupDriver > swift_catalog_info = object-store:swift:publicURL > backup_swift_enable_progress_timer = True > #backup_swift_url = http://10.102.184.190:8080/v1/AUTH_ > backup_swift_auth_url = http://10.102.184.190:5000/v3 > backup_swift_auth = per_user > backup_swift_auth_version = 1 > backup_swift_user = admin > backup_swift_user_domain = default > #backup_swift_key = > #backup_swift_container = volumebackups > backup_swift_object_size = 52428800 > #backup_swift_project = > #backup_swift_project_domain = > backup_swift_retry_attempts = 3 > backup_swift_retry_backoff = 2 > backup_compression_algorithm = zlib > > If I run a backup as user admin, it creates a container named > "volumebackups". > If I run a backup as user demo and I do not specify a container name, > it tires to write on volumebackups and gives some errors: > > ClientException: Container PUT failed: > http://10.102.184.190:8080/swift/v1/AUTH_964f343cf5164028a803db91488bdb01/volumebackups > 409 Conflict BucketAlreadyExists > > > Does it mean I cannot use the same containers name on differents > projects ? > > My ceph.conf is configured for using keystone: > [client.rgw.tst2-osctrl01] > rgw_frontends = "civetweb port=10.102.184.190:8080" > # Keystone information > rgw keystone api version = 3 > rgw keystone url = http://10.102.184.190:5000 > rgw keystone admin user = admin > rgw keystone admin password = password > rgw keystone admin domain = default > rgw keystone admin project = admin > rgw swift account in url = true > rgw keystone implicit tenants = true > > > > Any help, please ? > Best Regards > Ignazio From tim at swiftstack.com Thu Jan 2 06:10:37 2020 From: tim at swiftstack.com (Tim Burke) Date: Wed, 01 Jan 2020 22:10:37 -0800 Subject: [stein][cinder][backup][swift] issue In-Reply-To: References: Message-ID: Hi Ignazio, That's expected behavior with rados gateway. They follow S3's lead and have a unified namespace for containers across all tenants. From their documentation [0]: If a container with the same name already exists, and the user is the container owner then the operation will succeed. Otherwise the operation will fail. FWIW, that's very much a Ceph-ism -- Swift proper allows each tenant full and independent control over their namespace. Tim [0] https://docs.ceph.com/docs/mimic/radosgw/swift/containerops/#http-response On Mon, 2019-12-30 at 15:48 +0100, Ignazio Cassano wrote: > Hello All, > I configured openstack stein on centos 7 witch ceph. > Cinder works fine and object storage on ceph seems to work fine: I > can clreate containers, volume etc ..... > > I configured cinder backup on swift (but swift is using ceph rados > gateway) : > > backup_driver = cinder.backup.drivers.swift.SwiftBackupDriver > swift_catalog_info = object-store:swift:publicURL > backup_swift_enable_progress_timer = True > #backup_swift_url = http://10.102.184.190:8080/v1/AUTH_ > backup_swift_auth_url = http://10.102.184.190:5000/v3 > backup_swift_auth = per_user > backup_swift_auth_version = 1 > backup_swift_user = admin > backup_swift_user_domain = default > #backup_swift_key = > #backup_swift_container = volumebackups > backup_swift_object_size = 52428800 > #backup_swift_project = > #backup_swift_project_domain = > backup_swift_retry_attempts = 3 > backup_swift_retry_backoff = 2 > backup_compression_algorithm = zlib > > If I run a backup as user admin, it creates a container named > "volumebackups". > If I run a backup as user demo and I do not specify a container name, > it tires to write on volumebackups and gives some errors: > > ClientException: Container PUT failed: > http://10.102.184.190:8080/swift/v1/AUTH_964f343cf5164028a803db91488bdb01/volumebackups > 409 Conflict BucketAlreadyExists > > > Does it mean I cannot use the same containers name on differents > projects ? > > My ceph.conf is configured for using keystone: > [client.rgw.tst2-osctrl01] > rgw_frontends = "civetweb port=10.102.184.190:8080" > # Keystone information > rgw keystone api version = 3 > rgw keystone url = http://10.102.184.190:5000 > rgw keystone admin user = admin > rgw keystone admin password = password > rgw keystone admin domain = default > rgw keystone admin project = admin > rgw swift account in url = true > rgw keystone implicit tenants = true > > > > Any help, please ? > Best Regards > Ignazio From ignaziocassano at gmail.com Thu Jan 2 06:38:09 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 2 Jan 2020 07:38:09 +0100 Subject: [stein][cinder][backup][swift] issue In-Reply-To: References: Message-ID: Many Thanks, Tim Ignazio Il giorno gio 2 gen 2020 alle ore 07:10 Tim Burke ha scritto: > Hi Ignazio, > > That's expected behavior with rados gateway. They follow S3's lead and > have a unified namespace for containers across all tenants. From their > documentation [0]: > > If a container with the same name already exists, and the user is > the container owner then the operation will succeed. Otherwise the > operation will fail. > > FWIW, that's very much a Ceph-ism -- Swift proper allows each tenant > full and independent control over their namespace. > > Tim > > [0] > https://docs.ceph.com/docs/mimic/radosgw/swift/containerops/#http-response > > On Mon, 2019-12-30 at 15:48 +0100, Ignazio Cassano wrote: > > Hello All, > > I configured openstack stein on centos 7 witch ceph. > > Cinder works fine and object storage on ceph seems to work fine: I > > can clreate containers, volume etc ..... > > > > I configured cinder backup on swift (but swift is using ceph rados > > gateway) : > > > > backup_driver = cinder.backup.drivers.swift.SwiftBackupDriver > > swift_catalog_info = object-store:swift:publicURL > > backup_swift_enable_progress_timer = True > > #backup_swift_url = http://10.102.184.190:8080/v1/AUTH_ > > backup_swift_auth_url = http://10.102.184.190:5000/v3 > > backup_swift_auth = per_user > > backup_swift_auth_version = 1 > > backup_swift_user = admin > > backup_swift_user_domain = default > > #backup_swift_key = > > #backup_swift_container = volumebackups > > backup_swift_object_size = 52428800 > > #backup_swift_project = > > #backup_swift_project_domain = > > backup_swift_retry_attempts = 3 > > backup_swift_retry_backoff = 2 > > backup_compression_algorithm = zlib > > > > If I run a backup as user admin, it creates a container named > > "volumebackups". > > If I run a backup as user demo and I do not specify a container name, > > it tires to write on volumebackups and gives some errors: > > > > ClientException: Container PUT failed: > > > http://10.102.184.190:8080/swift/v1/AUTH_964f343cf5164028a803db91488bdb01/volumebackups > > 409 Conflict BucketAlreadyExists > > > > > > Does it mean I cannot use the same containers name on differents > > projects ? > > > > My ceph.conf is configured for using keystone: > > [client.rgw.tst2-osctrl01] > > rgw_frontends = "civetweb port=10.102.184.190:8080" > > # Keystone information > > rgw keystone api version = 3 > > rgw keystone url = http://10.102.184.190:5000 > > rgw keystone admin user = admin > > rgw keystone admin password = password > > rgw keystone admin domain = default > > rgw keystone admin project = admin > > rgw swift account in url = true > > rgw keystone implicit tenants = true > > > > > > > > Any help, please ? > > Best Regards > > Ignazio > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From svyas at redhat.com Thu Jan 2 12:27:52 2020 From: svyas at redhat.com (Soniya Vyas) Date: Thu, 2 Jan 2020 17:57:52 +0530 Subject: openstack-discuss Digest, Vol 15, Issue 1 In-Reply-To: References: Message-ID: > Message: 4 > Date: Tue, 31 Dec 2019 19:03:22 -0600 > From: Ghanshyam Mann > To: "openstack-discuss" > Subject: [qa] QA Office hour new timing for 2020 > Message-ID: > <16f5ea0ef6f.1010065b175496.3744470588375359656 at ghanshyammann.com> > Content-Type: text/plain; charset="UTF-8" > > Hello Everyone, > > We have few members who actively started contribution to QA from India as well as from Europe TZ. > The current office hour time is more convenient for CT and JST only and very difficult for members from India > Europe to join. > > I would like to adjust office hour timing to include all four TZs. To do that someone has to wake up early or stay > late at night :). I gave preference to new members and selected Tuesday 13.30 UTC [1]. which will be an early morning > for me and late-night in Tokyo. > > let me know if any objection or better suggestion. Accordingly, I will make the new time effective from 7th Jan. > > Also cancelling the 1st Jan office hour and wish you all a very happy new year. > > [1] https://www.timeanddate.com/worldclock/meetingdetails.html?year=2020&month=1&day=7&hour=13&min=30&sec=0&p1=265&p2=204&p3=771&p4=248&iv=1800 Thanks a lot to whole QA team and Ghanshyam Mann for considering our timing issues. Looking forward to join QA Office hours meeting. Regards, Soniya Vyas From mnaser at vexxhost.com Thu Jan 2 20:02:57 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Thu, 2 Jan 2020 15:02:57 -0500 Subject: [openstack-ansible] strange execution delays In-Reply-To: References: Message-ID: Hi Joe, Those timeouts re almost 99% the reason behind this issue. I'd suggest restarting systemd-logind and seeing how that fares: systemctl restart systemd-logind If the issue persists or happens again, I'm not sure, but those timeouts are 100% a cause of issue here. Thanks, Mohammed On Mon, Dec 30, 2019 at 2:51 PM Joe Topjian wrote: > > Hi Mohammad, > >> Do you have any PAM modules that might be hitting some sorts of >> external API for auditing purposes that may be throttling you? > > > Not unless OSA would have configured something. The deployment is *very* standard, heavily leveraging default values. > > DNS of each container is configured to use LXC host for resolution. The host is using the systemd-based resolver, but is pointing to a local, dedicated upstream resolver. I want to point the problem there, but we've run into this issue in two different locations, one of which has an upstream DNS resolver that I'm confident does not throttle requests. But, hey, it's DNS - maybe it's still the cause. > >> >> How is systemd-logind feeling? Anything odd in your system logs? > > > Yes. We have a feeling it's *something* with systemd, but aren't exactly sure what. Affected containers' logs end up with a lot of the following entries: > > Dec 3 20:30:17 infra1-repo-container-a0f194b3 su[4170]: Successful su for root by root > Dec 3 20:30:17 infra1-repo-container-a0f194b3 su[4170]: + ??? root:root > Dec 3 20:30:17 infra1-repo-container-a0f194b3 su[4170]: pam_unix(su:session): session opened for user root by (uid=0) > Dec 3 20:30:27 infra1-repo-container-a0f194b3 dbus-daemon[47]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms) > Dec 3 20:30:42 infra1-repo-container-a0f194b3 su[4170]: pam_systemd(su:session): Failed to create session: Connection timed out > Dec 3 20:30:43 infra1-repo-container-a0f194b3 su[4170]: pam_unix(su:session): session closed for user root > > But we aren't sure if those timeouts are a symptom of cause. > > Thanks for your help! > > Joe -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. https://vexxhost.com From openstack at nemebean.com Thu Jan 2 20:20:53 2020 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 2 Jan 2020 14:20:53 -0600 Subject: [oslo][kolla][requirements][release][infra] Hit by an old, fixed bug In-Reply-To: References: <20191230150137.GA9057@sm-workstation> Message-ID: <79cddc25-88e0-b5dd-8b8a-17cf14b9c4b1@nemebean.com> On 12/30/19 9:52 AM, Radosław Piliszek wrote: > Thanks, Sean! I knew I was missing something really basic! > I was under the impression that 9.x is Stein, like it happens with > main projects (major=branch). > I could not find any doc explaining oslo.messaging versioning, perhaps > Oslo could release 9.5.1 off the stein branch? Oslo for the most part follows semver, so we only bump major versions when there is a breaking change. We bump minor versions each release so we can do bugfix releases on the previous stable branch without stepping on master releases. The underlying cause of this is likely that I'm way behind on releasing the Oslo stable branches. It's high on my todo list now that most people are back from holidays and will be around to help out if a release breaks something. However, anyone can propose a release[0][1] (contrary to what [0] suggests), so if the necessary fix is already on stable/stein and just hasn't been released yet please feel free to do that. You'll just need a +1 from either myself or hberaud (the Oslo release liaison) before the release team will approve it. 0: https://releases.openstack.org/reference/using.html#requesting-a-release 1: https://releases.openstack.org/reference/using.html#using-new-release-command > > The issue remains that, even though oslo backports bugfixes into their > stable branches, kolla (and very possibly other deployment solutions) > no longer benefit from them. > > -yoctozepto > > pon., 30 gru 2019 o 16:01 Sean McGinnis napisał(a): >> >> On Sun, Dec 29, 2019 at 09:41:45PM +0100, Radosław Piliszek wrote: >>> Hi Folks, >>> >>> as the subject goes, my installation has been hit by an old bug: >>> https://bugs.launchpad.net/oslo.messaging/+bug/1828841 >>> (bug details not important, linked here for background) >>> >>> I am using Stein, deployed with recent Kolla-built source-based images >>> (with only slight modifications compared to vanilla ones). >>> Kolla's procedure for building source-based images considers upper >>> constraints, which, unfortunately, turned out to be lagging behind a >>> few releases w.r.t. oslo.messaging at least. >>> The fix was in 9.7.0 released on May 21, u-c still point to 9.5.0 from >>> Feb 26 and the latest of Stein is 9.8.0 from Jul 18. >>> >>> It seems oslo.messaging is missing from the automatic updates that bot proposes: >>> https://review.opendev.org/#/q/owner:%22OpenStack+Proposal+Bot%22+project:openstack/requirements+branch:stable/stein >>> >>> Per: >>> https://opendev.org/openstack/releases/src/branch/master/doc/source/reference/reviewer_guide.rst#release-jobs >>> this upper-constraint proposal should be happening for all releases. >>> >> >> This is normal and what is expected. >> >> Requirements are only updated for the branch in which those releases happen. So >> if there is a release of oslo.messaging for stable/train, only the stable/train >> upper constraints are updated for that new release. The stable/stein branch >> will not be affected because that shows what the tested upper constraints were >> for that branch. >> >> The last stable/stein release for oslo.messaging was 9.5.0: >> >> https://opendev.org/openstack/releases/src/branch/master/deliverables/stein/oslo.messaging.yaml#L49 >> >> And 9.5.0 is what is set in the stable/stein upper-constraints: >> >> https://opendev.org/openstack/requirements/src/branch/stable/stein/upper-constraints.txt#L146 >> >> To get that raised, whatever necessary bugfixes that are required in >> oslo.messaging would need to be backported per-cycle until stable/stein (as in, >> if it was in current master, it would need to be backported and merged to >> stable/train first, then stable/stein), and once merged a stable release would >> need to be proposed for that branch's version of the library. >> >> Once that stable release is done, that will propose the update to the upper >> constraint for the given branch. >> >>> I would be glad if someone investigated why it happens(/ed) and >>> audited whether other OpenStack projects don't need updating as well >>> to avoid running on old deps when new are awaiting for months. :-) >>> Please note this might apply to other branches as well. >>> >>> PS: for some reason oslo.messaging Stein release notes ( >>> https://docs.openstack.org/releasenotes/oslo.messaging/stein.html ) >>> are stuck at 9.5.0 as well, this could be right (I did not inspect the >>> sources) but I am adding this in PS so you have more things to >>> correlate if they need be. >>> >> >> Again, as expected. The last stable/stein release was 9.5.0, so that is correct >> that the release notes for stein only show up to that point. > From joe at topjian.net Thu Jan 2 20:54:25 2020 From: joe at topjian.net (Joe Topjian) Date: Thu, 2 Jan 2020 13:54:25 -0700 Subject: [openstack-ansible] strange execution delays In-Reply-To: References: Message-ID: Hi Mohammad, Restarting of systemd-logind would sometimes hang indefinitely, which is why we've defaulted to just a hard stop/start of the container. The problem then slowly begins to creep up again. If you haven't seen this behavior, then that's still helpful. We'll scour the environment trying to find *something* that might be causing it. Thanks, Joe On Thu, Jan 2, 2020 at 1:03 PM Mohammed Naser wrote: > Hi Joe, > > Those timeouts re almost 99% the reason behind this issue. I'd > suggest restarting systemd-logind and seeing how that fares: > > systemctl restart systemd-logind > > If the issue persists or happens again, I'm not sure, but those > timeouts are 100% a cause of issue here. > > Thanks, > Mohammed > > On Mon, Dec 30, 2019 at 2:51 PM Joe Topjian wrote: > > > > Hi Mohammad, > > > >> Do you have any PAM modules that might be hitting some sorts of > >> external API for auditing purposes that may be throttling you? > > > > > > Not unless OSA would have configured something. The deployment is *very* > standard, heavily leveraging default values. > > > > DNS of each container is configured to use LXC host for resolution. The > host is using the systemd-based resolver, but is pointing to a local, > dedicated upstream resolver. I want to point the problem there, but we've > run into this issue in two different locations, one of which has an > upstream DNS resolver that I'm confident does not throttle requests. But, > hey, it's DNS - maybe it's still the cause. > > > >> > >> How is systemd-logind feeling? Anything odd in your system logs? > > > > > > Yes. We have a feeling it's *something* with systemd, but aren't exactly > sure what. Affected containers' logs end up with a lot of the following > entries: > > > > Dec 3 20:30:17 infra1-repo-container-a0f194b3 su[4170]: Successful su > for root by root > > Dec 3 20:30:17 infra1-repo-container-a0f194b3 su[4170]: + ??? root:root > > Dec 3 20:30:17 infra1-repo-container-a0f194b3 su[4170]: > pam_unix(su:session): session opened for user root by (uid=0) > > Dec 3 20:30:27 infra1-repo-container-a0f194b3 dbus-daemon[47]: [system] > Failed to activate service 'org.freedesktop.systemd1': timed out > (service_start_timeout=25000ms) > > Dec 3 20:30:42 infra1-repo-container-a0f194b3 su[4170]: > pam_systemd(su:session): Failed to create session: Connection timed out > > Dec 3 20:30:43 infra1-repo-container-a0f194b3 su[4170]: > pam_unix(su:session): session closed for user root > > > > But we aren't sure if those timeouts are a symptom of cause. > > > > Thanks for your help! > > > > Joe > > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. https://vexxhost.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Thu Jan 2 21:29:42 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Thu, 2 Jan 2020 16:29:42 -0500 Subject: [openstack-ansible] strange execution delays In-Reply-To: References: Message-ID: I'd suggest looking at the dbus logs too which can include some interesting things, but yeah, this is certainly the source of your issues so I would dig around dbus/systemd-logind Good luck and keep us updated! On Thu, Jan 2, 2020 at 3:54 PM Joe Topjian wrote: > > Hi Mohammad, > > Restarting of systemd-logind would sometimes hang indefinitely, which is why we've defaulted to just a hard stop/start of the container. The problem then slowly begins to creep up again. > > If you haven't seen this behavior, then that's still helpful. We'll scour the environment trying to find *something* that might be causing it. > > Thanks, > Joe > > > On Thu, Jan 2, 2020 at 1:03 PM Mohammed Naser wrote: >> >> Hi Joe, >> >> Those timeouts re almost 99% the reason behind this issue. I'd >> suggest restarting systemd-logind and seeing how that fares: >> >> systemctl restart systemd-logind >> >> If the issue persists or happens again, I'm not sure, but those >> timeouts are 100% a cause of issue here. >> >> Thanks, >> Mohammed >> >> On Mon, Dec 30, 2019 at 2:51 PM Joe Topjian wrote: >> > >> > Hi Mohammad, >> > >> >> Do you have any PAM modules that might be hitting some sorts of >> >> external API for auditing purposes that may be throttling you? >> > >> > >> > Not unless OSA would have configured something. The deployment is *very* standard, heavily leveraging default values. >> > >> > DNS of each container is configured to use LXC host for resolution. The host is using the systemd-based resolver, but is pointing to a local, dedicated upstream resolver. I want to point the problem there, but we've run into this issue in two different locations, one of which has an upstream DNS resolver that I'm confident does not throttle requests. But, hey, it's DNS - maybe it's still the cause. >> > >> >> >> >> How is systemd-logind feeling? Anything odd in your system logs? >> > >> > >> > Yes. We have a feeling it's *something* with systemd, but aren't exactly sure what. Affected containers' logs end up with a lot of the following entries: >> > >> > Dec 3 20:30:17 infra1-repo-container-a0f194b3 su[4170]: Successful su for root by root >> > Dec 3 20:30:17 infra1-repo-container-a0f194b3 su[4170]: + ??? root:root >> > Dec 3 20:30:17 infra1-repo-container-a0f194b3 su[4170]: pam_unix(su:session): session opened for user root by (uid=0) >> > Dec 3 20:30:27 infra1-repo-container-a0f194b3 dbus-daemon[47]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms) >> > Dec 3 20:30:42 infra1-repo-container-a0f194b3 su[4170]: pam_systemd(su:session): Failed to create session: Connection timed out >> > Dec 3 20:30:43 infra1-repo-container-a0f194b3 su[4170]: pam_unix(su:session): session closed for user root >> > >> > But we aren't sure if those timeouts are a symptom of cause. >> > >> > Thanks for your help! >> > >> > Joe >> >> >> >> -- >> Mohammed Naser — vexxhost >> ----------------------------------------------------- >> D. 514-316-8872 >> D. 800-910-1726 ext. 200 >> E. mnaser at vexxhost.com >> W. https://vexxhost.com -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. https://vexxhost.com From matt at oliver.net.au Thu Jan 2 22:32:43 2020 From: matt at oliver.net.au (Matthew Oliver) Date: Fri, 3 Jan 2020 09:32:43 +1100 Subject: [stein][cinder][backup][swift] issue In-Reply-To: References: Message-ID: Tim, as always, has hit the nail on the head. By default rgw doesn't use explicit tenants. if you want to use RGW and explicit tenants.. ie no global container namespace, then you need to add: rgw keystone implicit tenants = true To you rgw client configuration in ceph.conf. See: https://docs.ceph.com/docs/master/radosgw/multitenancy/#swift-with-keystone Not sure what happens to existing containers when you enable this option, because I think my default things are considered to be in the 'default' tenant. matt On Thu, Jan 2, 2020 at 5:40 PM Ignazio Cassano wrote: > Many Thanks, Tim > Ignazio > > Il giorno gio 2 gen 2020 alle ore 07:10 Tim Burke ha > scritto: > >> Hi Ignazio, >> >> That's expected behavior with rados gateway. They follow S3's lead and >> have a unified namespace for containers across all tenants. From their >> documentation [0]: >> >> If a container with the same name already exists, and the user is >> the container owner then the operation will succeed. Otherwise the >> operation will fail. >> >> FWIW, that's very much a Ceph-ism -- Swift proper allows each tenant >> full and independent control over their namespace. >> >> Tim >> >> [0] >> https://docs.ceph.com/docs/mimic/radosgw/swift/containerops/#http-response >> >> On Mon, 2019-12-30 at 15:48 +0100, Ignazio Cassano wrote: >> > Hello All, >> > I configured openstack stein on centos 7 witch ceph. >> > Cinder works fine and object storage on ceph seems to work fine: I >> > can clreate containers, volume etc ..... >> > >> > I configured cinder backup on swift (but swift is using ceph rados >> > gateway) : >> > >> > backup_driver = cinder.backup.drivers.swift.SwiftBackupDriver >> > swift_catalog_info = object-store:swift:publicURL >> > backup_swift_enable_progress_timer = True >> > #backup_swift_url = http://10.102.184.190:8080/v1/AUTH_ >> > backup_swift_auth_url = http://10.102.184.190:5000/v3 >> > backup_swift_auth = per_user >> > backup_swift_auth_version = 1 >> > backup_swift_user = admin >> > backup_swift_user_domain = default >> > #backup_swift_key = >> > #backup_swift_container = volumebackups >> > backup_swift_object_size = 52428800 >> > #backup_swift_project = >> > #backup_swift_project_domain = >> > backup_swift_retry_attempts = 3 >> > backup_swift_retry_backoff = 2 >> > backup_compression_algorithm = zlib >> > >> > If I run a backup as user admin, it creates a container named >> > "volumebackups". >> > If I run a backup as user demo and I do not specify a container name, >> > it tires to write on volumebackups and gives some errors: >> > >> > ClientException: Container PUT failed: >> > >> http://10.102.184.190:8080/swift/v1/AUTH_964f343cf5164028a803db91488bdb01/volumebackups >> > 409 Conflict BucketAlreadyExists >> > >> > >> > Does it mean I cannot use the same containers name on differents >> > projects ? >> > >> > My ceph.conf is configured for using keystone: >> > [client.rgw.tst2-osctrl01] >> > rgw_frontends = "civetweb port=10.102.184.190:8080" >> > # Keystone information >> > rgw keystone api version = 3 >> > rgw keystone url = http://10.102.184.190:5000 >> > rgw keystone admin user = admin >> > rgw keystone admin password = password >> > rgw keystone admin domain = default >> > rgw keystone admin project = admin >> > rgw swift account in url = true >> > rgw keystone implicit tenants = true >> > >> > >> > >> > Any help, please ? >> > Best Regards >> > Ignazio >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From matt at oliver.net.au Thu Jan 2 22:32:43 2020 From: matt at oliver.net.au (Matthew Oliver) Date: Fri, 3 Jan 2020 09:32:43 +1100 Subject: [stein][cinder][backup][swift] issue In-Reply-To: References: Message-ID: Tim, as always, has hit the nail on the head. By default rgw doesn't use explicit tenants. if you want to use RGW and explicit tenants.. ie no global container namespace, then you need to add: rgw keystone implicit tenants = true To you rgw client configuration in ceph.conf. See: https://docs.ceph.com/docs/master/radosgw/multitenancy/#swift-with-keystone Not sure what happens to existing containers when you enable this option, because I think my default things are considered to be in the 'default' tenant. matt On Thu, Jan 2, 2020 at 5:40 PM Ignazio Cassano wrote: > Many Thanks, Tim > Ignazio > > Il giorno gio 2 gen 2020 alle ore 07:10 Tim Burke ha > scritto: > >> Hi Ignazio, >> >> That's expected behavior with rados gateway. They follow S3's lead and >> have a unified namespace for containers across all tenants. From their >> documentation [0]: >> >> If a container with the same name already exists, and the user is >> the container owner then the operation will succeed. Otherwise the >> operation will fail. >> >> FWIW, that's very much a Ceph-ism -- Swift proper allows each tenant >> full and independent control over their namespace. >> >> Tim >> >> [0] >> https://docs.ceph.com/docs/mimic/radosgw/swift/containerops/#http-response >> >> On Mon, 2019-12-30 at 15:48 +0100, Ignazio Cassano wrote: >> > Hello All, >> > I configured openstack stein on centos 7 witch ceph. >> > Cinder works fine and object storage on ceph seems to work fine: I >> > can clreate containers, volume etc ..... >> > >> > I configured cinder backup on swift (but swift is using ceph rados >> > gateway) : >> > >> > backup_driver = cinder.backup.drivers.swift.SwiftBackupDriver >> > swift_catalog_info = object-store:swift:publicURL >> > backup_swift_enable_progress_timer = True >> > #backup_swift_url = http://10.102.184.190:8080/v1/AUTH_ >> > backup_swift_auth_url = http://10.102.184.190:5000/v3 >> > backup_swift_auth = per_user >> > backup_swift_auth_version = 1 >> > backup_swift_user = admin >> > backup_swift_user_domain = default >> > #backup_swift_key = >> > #backup_swift_container = volumebackups >> > backup_swift_object_size = 52428800 >> > #backup_swift_project = >> > #backup_swift_project_domain = >> > backup_swift_retry_attempts = 3 >> > backup_swift_retry_backoff = 2 >> > backup_compression_algorithm = zlib >> > >> > If I run a backup as user admin, it creates a container named >> > "volumebackups". >> > If I run a backup as user demo and I do not specify a container name, >> > it tires to write on volumebackups and gives some errors: >> > >> > ClientException: Container PUT failed: >> > >> http://10.102.184.190:8080/swift/v1/AUTH_964f343cf5164028a803db91488bdb01/volumebackups >> > 409 Conflict BucketAlreadyExists >> > >> > >> > Does it mean I cannot use the same containers name on differents >> > projects ? >> > >> > My ceph.conf is configured for using keystone: >> > [client.rgw.tst2-osctrl01] >> > rgw_frontends = "civetweb port=10.102.184.190:8080" >> > # Keystone information >> > rgw keystone api version = 3 >> > rgw keystone url = http://10.102.184.190:5000 >> > rgw keystone admin user = admin >> > rgw keystone admin password = password >> > rgw keystone admin domain = default >> > rgw keystone admin project = admin >> > rgw swift account in url = true >> > rgw keystone implicit tenants = true >> > >> > >> > >> > Any help, please ? >> > Best Regards >> > Ignazio >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Jan 3 09:28:57 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 3 Jan 2020 10:28:57 +0100 Subject: [stein][cinder][backup][swift] issue In-Reply-To: References: Message-ID: Thanks, Matt. When I add rgw keystone implicit tenants = true container are acreated with the project id/name. Regards Ignazio Il giorno gio 2 gen 2020 alle ore 23:32 Matthew Oliver ha scritto: > Tim, as always, has hit the nail on the head. By default rgw doesn't use > explicit tenants. > if you want to use RGW and explicit tenants.. ie no global container > namespace, then you need to add: > > rgw keystone implicit tenants = true > > To you rgw client configuration in ceph.conf. > > See: > https://docs.ceph.com/docs/master/radosgw/multitenancy/#swift-with-keystone > > Not sure what happens to existing containers when you enable this option, > because I think my default things are considered to be in the 'default' > tenant. > > matt > > On Thu, Jan 2, 2020 at 5:40 PM Ignazio Cassano > wrote: > >> Many Thanks, Tim >> Ignazio >> >> Il giorno gio 2 gen 2020 alle ore 07:10 Tim Burke >> ha scritto: >> >>> Hi Ignazio, >>> >>> That's expected behavior with rados gateway. They follow S3's lead and >>> have a unified namespace for containers across all tenants. From their >>> documentation [0]: >>> >>> If a container with the same name already exists, and the user is >>> the container owner then the operation will succeed. Otherwise the >>> operation will fail. >>> >>> FWIW, that's very much a Ceph-ism -- Swift proper allows each tenant >>> full and independent control over their namespace. >>> >>> Tim >>> >>> [0] >>> >>> https://docs.ceph.com/docs/mimic/radosgw/swift/containerops/#http-response >>> >>> On Mon, 2019-12-30 at 15:48 +0100, Ignazio Cassano wrote: >>> > Hello All, >>> > I configured openstack stein on centos 7 witch ceph. >>> > Cinder works fine and object storage on ceph seems to work fine: I >>> > can clreate containers, volume etc ..... >>> > >>> > I configured cinder backup on swift (but swift is using ceph rados >>> > gateway) : >>> > >>> > backup_driver = cinder.backup.drivers.swift.SwiftBackupDriver >>> > swift_catalog_info = object-store:swift:publicURL >>> > backup_swift_enable_progress_timer = True >>> > #backup_swift_url = http://10.102.184.190:8080/v1/AUTH_ >>> > backup_swift_auth_url = http://10.102.184.190:5000/v3 >>> > backup_swift_auth = per_user >>> > backup_swift_auth_version = 1 >>> > backup_swift_user = admin >>> > backup_swift_user_domain = default >>> > #backup_swift_key = >>> > #backup_swift_container = volumebackups >>> > backup_swift_object_size = 52428800 >>> > #backup_swift_project = >>> > #backup_swift_project_domain = >>> > backup_swift_retry_attempts = 3 >>> > backup_swift_retry_backoff = 2 >>> > backup_compression_algorithm = zlib >>> > >>> > If I run a backup as user admin, it creates a container named >>> > "volumebackups". >>> > If I run a backup as user demo and I do not specify a container name, >>> > it tires to write on volumebackups and gives some errors: >>> > >>> > ClientException: Container PUT failed: >>> > >>> http://10.102.184.190:8080/swift/v1/AUTH_964f343cf5164028a803db91488bdb01/volumebackups >>> > 409 Conflict BucketAlreadyExists >>> > >>> > >>> > Does it mean I cannot use the same containers name on differents >>> > projects ? >>> > >>> > My ceph.conf is configured for using keystone: >>> > [client.rgw.tst2-osctrl01] >>> > rgw_frontends = "civetweb port=10.102.184.190:8080" >>> > # Keystone information >>> > rgw keystone api version = 3 >>> > rgw keystone url = http://10.102.184.190:5000 >>> > rgw keystone admin user = admin >>> > rgw keystone admin password = password >>> > rgw keystone admin domain = default >>> > rgw keystone admin project = admin >>> > rgw swift account in url = true >>> > rgw keystone implicit tenants = true >>> > >>> > >>> > >>> > Any help, please ? >>> > Best Regards >>> > Ignazio >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From katonalala at gmail.com Fri Jan 3 09:56:33 2020 From: katonalala at gmail.com (Lajos Katona) Date: Fri, 3 Jan 2020 10:56:33 +0100 Subject: About the use of security groups with neutron ports In-Reply-To: <000401d5bcc3$3f9ebd30$bedc3790$@gmail.com> References: <003d01d5bc42$2af8ceb0$80ea6c10$@gmail.com> <582E6225-F178-401A-A1D4-A52484B76DD9@redhat.com> <000401d5bcc3$3f9ebd30$bedc3790$@gmail.com> Message-ID: Hi, General answer: if you check your processes running on the host you will see which config files are used: $ ps -ef |grep neutron-server lajoska+ 32072 1 2 09:51 ? 00:00:03 /usr/bin/python3.6 /usr/local/bin/neutron-server --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/taas_plugin.ini .... Similarly you can check your ovs-agent: $ ps -ef |grep neutron-openvswitch-agent .... For the documentation of the config files check the configuration reference: https://docs.openstack.org/neutron/latest/configuration/config.html (this is the latest, so I suppose you need some older one like train or similar) Regards Lajos ezt írta (időpont: 2019. dec. 27., P, 15:42): > Thank you very much, Slawek. > > > > In case I have multiple configuration files, how to know which one is > currently loaded in neutron? > > For example, in my environment I have: > > - ml2_conf.ini > - ml2_conf_odl.ini > - ml2_conf_sriov.ini > - openvswitch_agent.ini > - sriov_agent.ini > > > > > > [root at overcloud-controller-0 cbis-admin]# cd /etc/neutron/plugins/ml2/ > > [root at overcloud-controller-0 ml2]# ls > > ml2_conf.ini ml2_conf_odl.ini ml2_conf_sriov.ini openvswitch_agent.ini > sriov_agent.ini > > > > > > Which one of these is used? > > > > Cheers, > > Ahmed > > > > > > > > -----Original Message----- > From: Slawek Kaplonski > Sent: Friday, December 27, 2019 10:28 AM > To: ahmed.zaky.abdallah at gmail.com > Cc: openstack-discuss at lists.openstack.org > Subject: Re: About the use of security groups with neutron ports > > > > Hi, > > > > > On 27 Dec 2019, at 00:14, ahmed.zaky.abdallah at gmail.com wrote: > > > > > > Hi All, > > > > > > I am trying to wrap my head around something I came across in one of the > OpenStack deployments. I am running Telco VNFs one of them is having > different VMs using SR-IOV interfaces. > > > > > > On one of my VNFs on Openstack, I defined a wrong IPv6 Gm bearer > interface to be exactly the same as the IPv6 Gateway. As I hate > re-onboarding, I decided to embark on a journey of changing the IPv6 of the > Gm bearer interface manually on the application side, everything went on > fine. > > > > > > After two weeks, my customer started complaining about one way RTP flow. > The customer was reluctant to blame the operation I carried out because > everything worked smooth after my modification. > > > After days of investigation, I remembered that I have port-security > enabled and this means AAP “Allowed-Address-Pairs” are defined per vPort > (AAP contain the floating IP address of the VM so that the security to > allow traffic to and from this VIP). I gave it a try and edited AAP > “Allowed-Address-Pairs” to include the correct new IPv6 address. Doing that > everything started working fine. > > > > > > The only logical explanation at that time is security group rules are > really invoked. > > > > > > Now, I am trying to understand how the iptables are really invoked. I > did some digging and it seems like we can control the firewall drivers on > two levels: > > > > > > • Nova compute > > > • ML2 plugin > > > > > > I was curious to check nova.conf and it has already the following line: > firewall_driver=nova.virt.firewall.NoopFirewallDriver > > > > > > However, checking the ml2 plugin configuration, the following is found: > > > > > > 230 [securitygroup] > > > 231 > > > 232 # > > > 233 # From neutron.ml2 > > > 234 # > > > 235 > > > 236 # Driver for security groups firewall in the L2 agent (string > value) > > > 237 #firewall_driver = > > > 238 firewall_driver = openvswitch > > > > > > So, I am jumping to a conclusion that ml2 plugin is the one responsible > for enforcing the firewall rules in my case. > > > > > > Have you had a similar experience? > > > Is my assumption correct: If I comment out the ml2 plugin firewall > driver then the port security carries no sense at all and security groups > won’t be invoked? > > > > Firewall_driver config option has to be set to some value. You can set > “noop” as firewall_driver to completely disable this feature for all ports. > > But please remember that You need to set it on agent’s side so it’s on > compute nodes, not on neutron-server side. > > Also, if You want to disable it only for some ports, You can set > “port_security_enabled” to False and than SG will not be applied for such > port and You will not need to configure any additional IPs in allowed > address pairs for this port. > > > > > > > > Cheers, > > > Ahmed > > > > — > > Slawek Kaplonski > > Senior software engineer > > Red Hat > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Arkady.Kanevsky at dell.com Fri Jan 3 21:19:22 2020 From: Arkady.Kanevsky at dell.com (Arkady.Kanevsky at dell.com) Date: Fri, 3 Jan 2020 21:19:22 +0000 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management Message-ID: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> Fellow Open Stackers, I have been thinking on how to handle SmartNICs, GPUs, FPGA handling across different projects within OpenStack with Cyborg taking a leading role in it. Cyborg is important project and address accelerator devices that are part of the server and potentially switches and storage. It is address 3 different use cases and users there are all grouped into single project. 1. Application user need to program a portion of the device under management, like GPU, or SmartNIC for that app usage. Having a common way to do it across different device families and across different vendor is very important. And that has to be done every time a VM is deploy that need usage of a device. That is tied with VM scheduling. 2. Administrator need to program the whole device for specific usage. That covers the scenario when device can only support single tenant or single use case. That is done once during OpenStack deployment but may need reprogramming to configure device for different usage. May or may not require reboot of the server. 3. Administrator need to setup device for its use, like burning specific FW on it. This is typically done as part of server life-cycle event. The first 2 cases cover application life cycle of device usage. The last one covers device life cycle independently how it is used. Managing life cycle of devices is Ironic responsibility, One cannot and should not manage lifecycle of server components independently. Managing server devices outside server management violates customer service agreements with server vendors and breaks server support agreements. Nova and Neutron are getting info about all devices and their capabilities from Ironic; that they use for scheduling. We should avoid creating new project for every new component of the server and modify nova and neuron for each new device. (the same will also apply to cinder and manila if smart devices used in its data/control path on a server). Finally we want Cyborg to be able to be used in standalone capacity, say for Kubernetes. Thus, I propose that Cyborg cover use cases 1 & 2, and Ironic would cover use case 3. Thus, move all device Life-cycle code from Cyborg to Ironic. Concentrate Cyborg of fulfilling the first 2 use cases. Simplify integration with Nova and Neutron for using these accelerators to use existing Ironic mechanism for it. Create idempotent calls for use case 1 so Nova and Neutron can use it as part of VM deployment to ensure that devices are programmed for VM under scheduling need. Create idempotent call(s) for use case 2 for TripleO to setup device for single accelerator usage of a node. [Propose similar model for CNI integration.] Let the discussion start! Thanks., Arkady -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhipengh512 at gmail.com Sat Jan 4 01:53:10 2020 From: zhipengh512 at gmail.com (Zhipeng Huang) Date: Sat, 4 Jan 2020 09:53:10 +0800 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> Message-ID: Hi Arkady, Thanks for your interest in Cyborg project :) I would like to point out that when we initiated the project there are two specific use cases we want to cover: the accelerators attached locally (via PCIe or other bus type) or remotely (via Ethernet or other fabric type). For the latter one, it is clear that its life cycle is independent from the server (like block device managed by Cinder). For the former one however, its life cycle is not dependent on server for all kinds of accelerators either. For example we already have PCIe based AI accelerator cards or Smart NICs that could be power on/off when the server is on all the time. Therefore it is not a good idea to move all the life cycle management part into Ironic for the above mentioned reasons. Ironic integration is very important for the standalone usage of Cyborg for Kubernetes, Envoy (TLS acceleration) and others alike. Hope this answers your question :) On Sat, Jan 4, 2020 at 5:23 AM wrote: > Fellow Open Stackers, > > I have been thinking on how to handle SmartNICs, GPUs, FPGA handling > across different projects within OpenStack with Cyborg taking a leading > role in it. > > > > Cyborg is important project and address accelerator devices that are part > of the server and potentially switches and storage. > > It is address 3 different use cases and users there are all grouped into > single project. > > > > 1. Application user need to program a portion of the device under > management, like GPU, or SmartNIC for that app usage. Having a common way > to do it across different device families and across different vendor is > very important. And that has to be done every time a VM is deploy that need > usage of a device. That is tied with VM scheduling. > 2. Administrator need to program the whole device for specific usage. > That covers the scenario when device can only support single tenant or > single use case. That is done once during OpenStack deployment but may need > reprogramming to configure device for different usage. May or may not > require reboot of the server. > 3. Administrator need to setup device for its use, like burning > specific FW on it. This is typically done as part of server life-cycle > event. > > > > The first 2 cases cover application life cycle of device usage. > > The last one covers device life cycle independently how it is used. > > > > Managing life cycle of devices is Ironic responsibility, One cannot and > should not manage lifecycle of server components independently. Managing > server devices outside server management violates customer service > agreements with server vendors and breaks server support agreements. > > Nova and Neutron are getting info about all devices and their capabilities > from Ironic; that they use for scheduling. We should avoid creating new > project for every new component of the server and modify nova and neuron > for each new device. (the same will also apply to cinder and manila if > smart devices used in its data/control path on a server). > > Finally we want Cyborg to be able to be used in standalone capacity, say > for Kubernetes. > > > > Thus, I propose that Cyborg cover use cases 1 & 2, and Ironic would cover > use case 3. > > Thus, move all device Life-cycle code from Cyborg to Ironic. > > Concentrate Cyborg of fulfilling the first 2 use cases. > > Simplify integration with Nova and Neutron for using these accelerators to > use existing Ironic mechanism for it. > > Create idempotent calls for use case 1 so Nova and Neutron can use it as > part of VM deployment to ensure that devices are programmed for VM under > scheduling need. > > Create idempotent call(s) for use case 2 for TripleO to setup device for > single accelerator usage of a node. > > [Propose similar model for CNI integration.] > > > > Let the discussion start! > > > > Thanks., > Arkady > -- Zhipeng (Howard) Huang Principle Engineer OpenStack, Kubernetes, CNCF, LF Edge, ONNX, Kubeflow, OpenSDS, Open Service Broker API, OCP, Hyperledger, ETSI, SNIA, DMTF, W3C -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcm at jonmasters.org Sat Jan 4 04:44:24 2020 From: jcm at jonmasters.org (Jon Masters) Date: Fri, 3 Jan 2020 23:44:24 -0500 Subject: [kolla] neutron-l3-agent namespace NAT table not working? Message-ID: Hi there, I've got a weird problem with the neutron-l3-agent container on my deployment. It comes up, sets up the iptables rules in the qrouter namespace (and I can see these using "ip netns...") but traffic isn't having DNAT or SNAT applied. What's most strange is that manually adding a LOG jump target to the iptables nat PRE/POSTROUTING chains (after enabling nf logging sent to the host kernel, confirmed that works) doesn't result in any log entries. It's as if the nat table isn't being applied at all for any packets traversing the qrouter namespace. This is driving me crazy :) Anyone got some quick suggestions? (assume I tried the obvious stuff). Jon. -- Computer Architect -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Sat Jan 4 09:35:52 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Sat, 4 Jan 2020 10:35:52 +0100 Subject: [oslo][kolla][requirements][release][infra] Hit by an old, fixed bug In-Reply-To: <79cddc25-88e0-b5dd-8b8a-17cf14b9c4b1@nemebean.com> References: <20191230150137.GA9057@sm-workstation> <79cddc25-88e0-b5dd-8b8a-17cf14b9c4b1@nemebean.com> Message-ID: Thanks, Ben. That doc preamble really made me think not to cross the holy ground of release proposals. :-) I proposed release [1] and added you and Hervé as reviewers. [1] https://review.opendev.org/701080 -yoctozepto czw., 2 sty 2020 o 21:20 Ben Nemec napisał(a): > > > > On 12/30/19 9:52 AM, Radosław Piliszek wrote: > > Thanks, Sean! I knew I was missing something really basic! > > I was under the impression that 9.x is Stein, like it happens with > > main projects (major=branch). > > I could not find any doc explaining oslo.messaging versioning, perhaps > > Oslo could release 9.5.1 off the stein branch? > > Oslo for the most part follows semver, so we only bump major versions > when there is a breaking change. We bump minor versions each release so > we can do bugfix releases on the previous stable branch without stepping > on master releases. > > The underlying cause of this is likely that I'm way behind on releasing > the Oslo stable branches. It's high on my todo list now that most people > are back from holidays and will be around to help out if a release > breaks something. > > However, anyone can propose a release[0][1] (contrary to what [0] > suggests), so if the necessary fix is already on stable/stein and just > hasn't been released yet please feel free to do that. You'll just need a > +1 from either myself or hberaud (the Oslo release liaison) before the > release team will approve it. > > 0: https://releases.openstack.org/reference/using.html#requesting-a-release > 1: > https://releases.openstack.org/reference/using.html#using-new-release-command > > > > > The issue remains that, even though oslo backports bugfixes into their > > stable branches, kolla (and very possibly other deployment solutions) > > no longer benefit from them. > > > > -yoctozepto > > > > pon., 30 gru 2019 o 16:01 Sean McGinnis napisał(a): > >> > >> On Sun, Dec 29, 2019 at 09:41:45PM +0100, Radosław Piliszek wrote: > >>> Hi Folks, > >>> > >>> as the subject goes, my installation has been hit by an old bug: > >>> https://bugs.launchpad.net/oslo.messaging/+bug/1828841 > >>> (bug details not important, linked here for background) > >>> > >>> I am using Stein, deployed with recent Kolla-built source-based images > >>> (with only slight modifications compared to vanilla ones). > >>> Kolla's procedure for building source-based images considers upper > >>> constraints, which, unfortunately, turned out to be lagging behind a > >>> few releases w.r.t. oslo.messaging at least. > >>> The fix was in 9.7.0 released on May 21, u-c still point to 9.5.0 from > >>> Feb 26 and the latest of Stein is 9.8.0 from Jul 18. > >>> > >>> It seems oslo.messaging is missing from the automatic updates that bot proposes: > >>> https://review.opendev.org/#/q/owner:%22OpenStack+Proposal+Bot%22+project:openstack/requirements+branch:stable/stein > >>> > >>> Per: > >>> https://opendev.org/openstack/releases/src/branch/master/doc/source/reference/reviewer_guide.rst#release-jobs > >>> this upper-constraint proposal should be happening for all releases. > >>> > >> > >> This is normal and what is expected. > >> > >> Requirements are only updated for the branch in which those releases happen. So > >> if there is a release of oslo.messaging for stable/train, only the stable/train > >> upper constraints are updated for that new release. The stable/stein branch > >> will not be affected because that shows what the tested upper constraints were > >> for that branch. > >> > >> The last stable/stein release for oslo.messaging was 9.5.0: > >> > >> https://opendev.org/openstack/releases/src/branch/master/deliverables/stein/oslo.messaging.yaml#L49 > >> > >> And 9.5.0 is what is set in the stable/stein upper-constraints: > >> > >> https://opendev.org/openstack/requirements/src/branch/stable/stein/upper-constraints.txt#L146 > >> > >> To get that raised, whatever necessary bugfixes that are required in > >> oslo.messaging would need to be backported per-cycle until stable/stein (as in, > >> if it was in current master, it would need to be backported and merged to > >> stable/train first, then stable/stein), and once merged a stable release would > >> need to be proposed for that branch's version of the library. > >> > >> Once that stable release is done, that will propose the update to the upper > >> constraint for the given branch. > >> > >>> I would be glad if someone investigated why it happens(/ed) and > >>> audited whether other OpenStack projects don't need updating as well > >>> to avoid running on old deps when new are awaiting for months. :-) > >>> Please note this might apply to other branches as well. > >>> > >>> PS: for some reason oslo.messaging Stein release notes ( > >>> https://docs.openstack.org/releasenotes/oslo.messaging/stein.html ) > >>> are stuck at 9.5.0 as well, this could be right (I did not inspect the > >>> sources) but I am adding this in PS so you have more things to > >>> correlate if they need be. > >>> > >> > >> Again, as expected. The last stable/stein release was 9.5.0, so that is correct > >> that the release notes for stein only show up to that point. > > From skaplons at redhat.com Sat Jan 4 09:46:12 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Sat, 4 Jan 2020 10:46:12 +0100 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: References: Message-ID: Hi, Is this qrouter namespace created with all those rules in container or in the host directly? Do You have qr-xxx and qg-xxx ports from br-int in this qrouter namespace? > On 4 Jan 2020, at 05:44, Jon Masters wrote: > > Hi there, > > I've got a weird problem with the neutron-l3-agent container on my deployment. It comes up, sets up the iptables rules in the qrouter namespace (and I can see these using "ip netns...") but traffic isn't having DNAT or SNAT applied. What's most strange is that manually adding a LOG jump target to the iptables nat PRE/POSTROUTING chains (after enabling nf logging sent to the host kernel, confirmed that works) doesn't result in any log entries. It's as if the nat table isn't being applied at all for any packets traversing the qrouter namespace. This is driving me crazy :) > > Anyone got some quick suggestions? (assume I tried the obvious stuff). > > Jon. > > -- > Computer Architect — Slawek Kaplonski Senior software engineer Red Hat From ahmed.zaky.abdallah at gmail.com Sat Jan 4 11:46:36 2020 From: ahmed.zaky.abdallah at gmail.com (Ahmed ZAKY) Date: Sat, 4 Jan 2020 12:46:36 +0100 Subject: About the use of security groups with neutron ports In-Reply-To: References: <003d01d5bc42$2af8ceb0$80ea6c10$@gmail.com> <582E6225-F178-401A-A1D4-A52484B76DD9@redhat.com> <000401d5bcc3$3f9ebd30$bedc3790$@gmail.com> Message-ID: Thank you, Lajos. Kind regards, Ahmed On Fri, 3 Jan 2020, 10:56 Lajos Katona, wrote: > Hi, > > General answer: > if you check your processes running on the host you will see which config > files are used: > $ ps -ef |grep neutron-server > lajoska+ 32072 1 2 09:51 ? 00:00:03 /usr/bin/python3.6 > /usr/local/bin/neutron-server --config-file /etc/neutron/neutron.conf > --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file > /etc/neutron/taas_plugin.ini > .... > > Similarly you can check your ovs-agent: > $ ps -ef |grep neutron-openvswitch-agent > .... > > For the documentation of the config files check the configuration > reference: > https://docs.openstack.org/neutron/latest/configuration/config.html (this > is the latest, so I suppose you need some older one like train or similar) > > Regards > Lajos > > ezt írta (időpont: 2019. dec. 27., P, > 15:42): > >> Thank you very much, Slawek. >> >> >> >> In case I have multiple configuration files, how to know which one is >> currently loaded in neutron? >> >> For example, in my environment I have: >> >> - ml2_conf.ini >> - ml2_conf_odl.ini >> - ml2_conf_sriov.ini >> - openvswitch_agent.ini >> - sriov_agent.ini >> >> >> >> >> >> [root at overcloud-controller-0 cbis-admin]# cd /etc/neutron/plugins/ml2/ >> >> [root at overcloud-controller-0 ml2]# ls >> >> ml2_conf.ini ml2_conf_odl.ini ml2_conf_sriov.ini >> openvswitch_agent.ini sriov_agent.ini >> >> >> >> >> >> Which one of these is used? >> >> >> >> Cheers, >> >> Ahmed >> >> >> >> >> >> >> >> -----Original Message----- >> From: Slawek Kaplonski >> Sent: Friday, December 27, 2019 10:28 AM >> To: ahmed.zaky.abdallah at gmail.com >> Cc: openstack-discuss at lists.openstack.org >> Subject: Re: About the use of security groups with neutron ports >> >> >> >> Hi, >> >> >> >> > On 27 Dec 2019, at 00:14, ahmed.zaky.abdallah at gmail.com wrote: >> >> > >> >> > Hi All, >> >> > >> >> > I am trying to wrap my head around something I came across in one of >> the OpenStack deployments. I am running Telco VNFs one of them is having >> different VMs using SR-IOV interfaces. >> >> > >> >> > On one of my VNFs on Openstack, I defined a wrong IPv6 Gm bearer >> interface to be exactly the same as the IPv6 Gateway. As I hate >> re-onboarding, I decided to embark on a journey of changing the IPv6 of the >> Gm bearer interface manually on the application side, everything went on >> fine. >> >> > >> >> > After two weeks, my customer started complaining about one way RTP >> flow. The customer was reluctant to blame the operation I carried out >> because everything worked smooth after my modification. >> >> > After days of investigation, I remembered that I have port-security >> enabled and this means AAP “Allowed-Address-Pairs” are defined per vPort >> (AAP contain the floating IP address of the VM so that the security to >> allow traffic to and from this VIP). I gave it a try and edited AAP >> “Allowed-Address-Pairs” to include the correct new IPv6 address. Doing that >> everything started working fine. >> >> > >> >> > The only logical explanation at that time is security group rules are >> really invoked. >> >> > >> >> > Now, I am trying to understand how the iptables are really invoked. I >> did some digging and it seems like we can control the firewall drivers on >> two levels: >> >> > >> >> > • Nova compute >> >> > • ML2 plugin >> >> > >> >> > I was curious to check nova.conf and it has already the following line: >> firewall_driver=nova.virt.firewall.NoopFirewallDriver >> >> > >> >> > However, checking the ml2 plugin configuration, the following is found: >> >> > >> >> > 230 [securitygroup] >> >> > 231 >> >> > 232 # >> >> > 233 # From neutron.ml2 >> >> > 234 # >> >> > 235 >> >> > 236 # Driver for security groups firewall in the L2 agent (string >> value) >> >> > 237 #firewall_driver = >> >> > 238 firewall_driver = openvswitch >> >> > >> >> > So, I am jumping to a conclusion that ml2 plugin is the one responsible >> for enforcing the firewall rules in my case. >> >> > >> >> > Have you had a similar experience? >> >> > Is my assumption correct: If I comment out the ml2 plugin firewall >> driver then the port security carries no sense at all and security groups >> won’t be invoked? >> >> >> >> Firewall_driver config option has to be set to some value. You can set >> “noop” as firewall_driver to completely disable this feature for all ports. >> >> But please remember that You need to set it on agent’s side so it’s on >> compute nodes, not on neutron-server side. >> >> Also, if You want to disable it only for some ports, You can set >> “port_security_enabled” to False and than SG will not be applied for such >> port and You will not need to configure any additional IPs in allowed >> address pairs for this port. >> >> >> >> > >> >> > Cheers, >> >> > Ahmed >> >> >> >> — >> >> Slawek Kaplonski >> >> Senior software engineer >> >> Red Hat >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Sat Jan 4 12:56:19 2020 From: smooney at redhat.com (Sean Mooney) Date: Sat, 04 Jan 2020 12:56:19 +0000 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: References: Message-ID: On Sat, 2020-01-04 at 10:46 +0100, Slawek Kaplonski wrote: > Hi, > > Is this qrouter namespace created with all those rules in container or in the host directly? > Do You have qr-xxx and qg-xxx ports from br-int in this qrouter namespace? in kolla the l3 agent should be running with net=host so the container should be useing the hosts root namespace and it will create network namespaces as needed for the different routers. the ip table rules should be in the router sub namespaces. > > > On 4 Jan 2020, at 05:44, Jon Masters wrote: > > > > Hi there, > > > > I've got a weird problem with the neutron-l3-agent container on my deployment. It comes up, sets up the iptables > > rules in the qrouter namespace (and I can see these using "ip netns...") but traffic isn't having DNAT or SNAT > > applied. What's most strange is that manually adding a LOG jump target to the iptables nat PRE/POSTROUTING chains > > (after enabling nf logging sent to the host kernel, confirmed that works) doesn't result in any log entries. It's as > > if the nat table isn't being applied at all for any packets traversing the qrouter namespace. This is driving me > > crazy :) > > > > Anyone got some quick suggestions? (assume I tried the obvious stuff). > > > > Jon. > > > > -- > > Computer Architect > > — > Slawek Kaplonski > Senior software engineer > Red Hat > > From jcm at jonmasters.org Sat Jan 4 15:39:43 2020 From: jcm at jonmasters.org (Jon Masters) Date: Sat, 4 Jan 2020 10:39:43 -0500 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: References: Message-ID: Excuse top posting on my phone. Also, yes, the namespaces are as described. It’s just that the (correct) nat rules for the qrouter netns are never running, in spite of the two interfaces existing in that ns and correctly attached to the vswitch. -- Computer Architect > On Jan 4, 2020, at 07:56, Sean Mooney wrote: > > On Sat, 2020-01-04 at 10:46 +0100, Slawek Kaplonski wrote: >> Hi, >> >> Is this qrouter namespace created with all those rules in container or in the host directly? >> Do You have qr-xxx and qg-xxx ports from br-int in this qrouter namespace? > in kolla the l3 agent should be running with net=host so the container should be useing the hosts > root namespace and it will create network namespaces as needed for the different routers. > > the ip table rules should be in the router sub namespaces. > >> >>>> On 4 Jan 2020, at 05:44, Jon Masters wrote: >>> >>> Hi there, >>> >>> I've got a weird problem with the neutron-l3-agent container on my deployment. It comes up, sets up the iptables >>> rules in the qrouter namespace (and I can see these using "ip netns...") but traffic isn't having DNAT or SNAT >>> applied. What's most strange is that manually adding a LOG jump target to the iptables nat PRE/POSTROUTING chains >>> (after enabling nf logging sent to the host kernel, confirmed that works) doesn't result in any log entries. It's as >>> if the nat table isn't being applied at all for any packets traversing the qrouter namespace. This is driving me >>> crazy :) >>> >>> Anyone got some quick suggestions? (assume I tried the obvious stuff). >>> >>> Jon. >>> >>> -- >>> Computer Architect >> >> — >> Slawek Kaplonski >> Senior software engineer >> Red Hat >> >> > From jcm at jonmasters.org Sun Jan 5 19:04:25 2020 From: jcm at jonmasters.org (Jon Masters) Date: Sun, 5 Jan 2020 14:04:25 -0500 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: References: Message-ID: This turns out to a not well documented bug in the CentOS7.7 kernel that causes exactly nat rules not to run as I was seeing. Oh dear god was this nasty as whatever to find and workaround. -- Computer Architect > On Jan 4, 2020, at 10:39, Jon Masters wrote: > > Excuse top posting on my phone. Also, yes, the namespaces are as described. It’s just that the (correct) nat rules for the qrouter netns are never running, in spite of the two interfaces existing in that ns and correctly attached to the vswitch. > > -- > Computer Architect > > >>> On Jan 4, 2020, at 07:56, Sean Mooney wrote: >>> >>> On Sat, 2020-01-04 at 10:46 +0100, Slawek Kaplonski wrote: >>> Hi, >>> >>> Is this qrouter namespace created with all those rules in container or in the host directly? >>> Do You have qr-xxx and qg-xxx ports from br-int in this qrouter namespace? >> in kolla the l3 agent should be running with net=host so the container should be useing the hosts >> root namespace and it will create network namespaces as needed for the different routers. >> >> the ip table rules should be in the router sub namespaces. >> >>> >>>>> On 4 Jan 2020, at 05:44, Jon Masters wrote: >>>> >>>> Hi there, >>>> >>>> I've got a weird problem with the neutron-l3-agent container on my deployment. It comes up, sets up the iptables >>>> rules in the qrouter namespace (and I can see these using "ip netns...") but traffic isn't having DNAT or SNAT >>>> applied. What's most strange is that manually adding a LOG jump target to the iptables nat PRE/POSTROUTING chains >>>> (after enabling nf logging sent to the host kernel, confirmed that works) doesn't result in any log entries. It's as >>>> if the nat table isn't being applied at all for any packets traversing the qrouter namespace. This is driving me >>>> crazy :) >>>> >>>> Anyone got some quick suggestions? (assume I tried the obvious stuff). >>>> >>>> Jon. >>>> >>>> -- >>>> Computer Architect >>> >>> — >>> Slawek Kaplonski >>> Senior software engineer >>> Red Hat >>> >>> >> From laurentfdumont at gmail.com Sun Jan 5 23:50:51 2020 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Sun, 5 Jan 2020 18:50:51 -0500 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: References: Message-ID: Do you happen to have the bug ID for Centos? On Sun, Jan 5, 2020 at 2:11 PM Jon Masters wrote: > This turns out to a not well documented bug in the CentOS7.7 kernel that > causes exactly nat rules not to run as I was seeing. Oh dear god was this > nasty as whatever to find and workaround. > > -- > Computer Architect > > > > On Jan 4, 2020, at 10:39, Jon Masters wrote: > > > > Excuse top posting on my phone. Also, yes, the namespaces are as > described. It’s just that the (correct) nat rules for the qrouter netns are > never running, in spite of the two interfaces existing in that ns and > correctly attached to the vswitch. > > > > -- > > Computer Architect > > > > > >>> On Jan 4, 2020, at 07:56, Sean Mooney wrote: > >>> > >>> On Sat, 2020-01-04 at 10:46 +0100, Slawek Kaplonski wrote: > >>> Hi, > >>> > >>> Is this qrouter namespace created with all those rules in container or > in the host directly? > >>> Do You have qr-xxx and qg-xxx ports from br-int in this qrouter > namespace? > >> in kolla the l3 agent should be running with net=host so the container > should be useing the hosts > >> root namespace and it will create network namespaces as needed for the > different routers. > >> > >> the ip table rules should be in the router sub namespaces. > >> > >>> > >>>>> On 4 Jan 2020, at 05:44, Jon Masters wrote: > >>>> > >>>> Hi there, > >>>> > >>>> I've got a weird problem with the neutron-l3-agent container on my > deployment. It comes up, sets up the iptables > >>>> rules in the qrouter namespace (and I can see these using "ip > netns...") but traffic isn't having DNAT or SNAT > >>>> applied. What's most strange is that manually adding a LOG jump > target to the iptables nat PRE/POSTROUTING chains > >>>> (after enabling nf logging sent to the host kernel, confirmed that > works) doesn't result in any log entries. It's as > >>>> if the nat table isn't being applied at all for any packets > traversing the qrouter namespace. This is driving me > >>>> crazy :) > >>>> > >>>> Anyone got some quick suggestions? (assume I tried the obvious stuff). > >>>> > >>>> Jon. > >>>> > >>>> -- > >>>> Computer Architect > >>> > >>> — > >>> Slawek Kaplonski > >>> Senior software engineer > >>> Red Hat > >>> > >>> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jan.vondra at ultimum.io Mon Jan 6 00:25:55 2020 From: jan.vondra at ultimum.io (Jan Vondra) Date: Mon, 6 Jan 2020 01:25:55 +0100 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: References: Message-ID: Could you send us more details about your deployment - e.g. kolla version and image info? And please try to check neutron-openvswitch-agent log - errors regarding applying iptables rules should be there. I've encountered similar behavior when trying to run a nftables OS image (Debian 10) on iptables OS image (Ubuntu 16.04). You can try it by running sudo update-alternatives --query iptables If it's the case, option to force legacy iptables has been added - https://review.opendev.org/#/c/685967/. Best regards, Jan Dne po 6. 1. 2020 0:56 uživatel Laurent Dumont napsal: > Do you happen to have the bug ID for Centos? > > On Sun, Jan 5, 2020 at 2:11 PM Jon Masters wrote: > >> This turns out to a not well documented bug in the CentOS7.7 kernel that >> causes exactly nat rules not to run as I was seeing. Oh dear god was this >> nasty as whatever to find and workaround. >> >> -- >> Computer Architect >> >> >> > On Jan 4, 2020, at 10:39, Jon Masters wrote: >> > >> > Excuse top posting on my phone. Also, yes, the namespaces are as >> described. It’s just that the (correct) nat rules for the qrouter netns are >> never running, in spite of the two interfaces existing in that ns and >> correctly attached to the vswitch. >> > >> > -- >> > Computer Architect >> > >> > >> >>> On Jan 4, 2020, at 07:56, Sean Mooney wrote: >> >>> >> >>> On Sat, 2020-01-04 at 10:46 +0100, Slawek Kaplonski wrote: >> >>> Hi, >> >>> >> >>> Is this qrouter namespace created with all those rules in container >> or in the host directly? >> >>> Do You have qr-xxx and qg-xxx ports from br-int in this qrouter >> namespace? >> >> in kolla the l3 agent should be running with net=host so the container >> should be useing the hosts >> >> root namespace and it will create network namespaces as needed for >> the different routers. >> >> >> >> the ip table rules should be in the router sub namespaces. >> >> >> >>> >> >>>>> On 4 Jan 2020, at 05:44, Jon Masters wrote: >> >>>> >> >>>> Hi there, >> >>>> >> >>>> I've got a weird problem with the neutron-l3-agent container on my >> deployment. It comes up, sets up the iptables >> >>>> rules in the qrouter namespace (and I can see these using "ip >> netns...") but traffic isn't having DNAT or SNAT >> >>>> applied. What's most strange is that manually adding a LOG jump >> target to the iptables nat PRE/POSTROUTING chains >> >>>> (after enabling nf logging sent to the host kernel, confirmed that >> works) doesn't result in any log entries. It's as >> >>>> if the nat table isn't being applied at all for any packets >> traversing the qrouter namespace. This is driving me >> >>>> crazy :) >> >>>> >> >>>> Anyone got some quick suggestions? (assume I tried the obvious >> stuff). >> >>>> >> >>>> Jon. >> >>>> >> >>>> -- >> >>>> Computer Architect >> >>> >> >>> — >> >>> Slawek Kaplonski >> >>> Senior software engineer >> >>> Red Hat >> >>> >> >>> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcm at jonmasters.org Mon Jan 6 02:26:28 2020 From: jcm at jonmasters.org (Jon Masters) Date: Sun, 5 Jan 2020 21:26:28 -0500 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: References: Message-ID: There’s no bug ID that I’m aware of. But I’ll go look for one or file one. -- Computer Architect > On Jan 5, 2020, at 18:51, Laurent Dumont wrote: > >  > Do you happen to have the bug ID for Centos? > >> On Sun, Jan 5, 2020 at 2:11 PM Jon Masters wrote: >> This turns out to a not well documented bug in the CentOS7.7 kernel that causes exactly nat rules not to run as I was seeing. Oh dear god was this nasty as whatever to find and workaround. >> >> -- >> Computer Architect >> >> >> > On Jan 4, 2020, at 10:39, Jon Masters wrote: >> > >> > Excuse top posting on my phone. Also, yes, the namespaces are as described. It’s just that the (correct) nat rules for the qrouter netns are never running, in spite of the two interfaces existing in that ns and correctly attached to the vswitch. >> > >> > -- >> > Computer Architect >> > >> > >> >>> On Jan 4, 2020, at 07:56, Sean Mooney wrote: >> >>> >> >>> On Sat, 2020-01-04 at 10:46 +0100, Slawek Kaplonski wrote: >> >>> Hi, >> >>> >> >>> Is this qrouter namespace created with all those rules in container or in the host directly? >> >>> Do You have qr-xxx and qg-xxx ports from br-int in this qrouter namespace? >> >> in kolla the l3 agent should be running with net=host so the container should be useing the hosts >> >> root namespace and it will create network namespaces as needed for the different routers. >> >> >> >> the ip table rules should be in the router sub namespaces. >> >> >> >>> >> >>>>> On 4 Jan 2020, at 05:44, Jon Masters wrote: >> >>>> >> >>>> Hi there, >> >>>> >> >>>> I've got a weird problem with the neutron-l3-agent container on my deployment. It comes up, sets up the iptables >> >>>> rules in the qrouter namespace (and I can see these using "ip netns...") but traffic isn't having DNAT or SNAT >> >>>> applied. What's most strange is that manually adding a LOG jump target to the iptables nat PRE/POSTROUTING chains >> >>>> (after enabling nf logging sent to the host kernel, confirmed that works) doesn't result in any log entries. It's as >> >>>> if the nat table isn't being applied at all for any packets traversing the qrouter namespace. This is driving me >> >>>> crazy :) >> >>>> >> >>>> Anyone got some quick suggestions? (assume I tried the obvious stuff). >> >>>> >> >>>> Jon. >> >>>> >> >>>> -- >> >>>> Computer Architect >> >>> >> >>> — >> >>> Slawek Kaplonski >> >>> Senior software engineer >> >>> Red Hat >> >>> >> >>> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrei.perepiolkin at open-e.com Mon Jan 6 04:51:26 2020 From: andrei.perepiolkin at open-e.com (Andrei Perapiolkin) Date: Mon, 6 Jan 2020 06:51:26 +0200 Subject: [kolla] Quick start: ansible deploy failure In-Reply-To: References: Message-ID: <88226158-d15c-7f23-e692-aa461f6d8549@open-e.com> Hello, Im following quick start guide on deploying Kolla ansible and getting failure on deploy stage: https://docs.openstack.org/kolla-ansible/latest/user/quickstart.html kolla-ansible -i ./multinode deploy TASK [mariadb : Creating haproxy mysql user] ******************************************************************************************************************************** fatal: [control01]: FAILED! => {"changed": false, "msg": "Can not parse the inner module output: localhost | SUCCESS => {\n    \"changed\": false, \n    \"user\": \"haproxy\"\n}\n"} I deploy to Centos7 with latest updates. [user at master ~]$ pip list DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support Package                          Version -------------------------------- ---------- ansible                          2.9.1 Babel                            2.8.0 backports.ssl-match-hostname     3.7.0.1 certifi                          2019.11.28 cffi                             1.13.2 chardet                          3.0.4 configobj                        4.7.2 cryptography                     2.8 debtcollector                    1.22.0 decorator                        3.4.0 docker                           4.1.0 enum34                           1.1.6 funcsigs                         1.0.2 httplib2                         0.9.2 idna                             2.8 iniparse                         0.4 ipaddress                        1.0.23 IPy                              0.75 iso8601                          0.1.12 Jinja2                           2.10.3 jmespath                         0.9.4 kitchen                          1.1.1 kolla-ansible                    9.0.0 MarkupSafe                       1.1.1 monotonic                        1.5 netaddr                          0.7.19 netifaces                        0.10.9 oslo.config                      6.12.0 oslo.i18n                        3.25.0 oslo.utils                       3.42.1 paramiko                         2.1.1 pbr                              5.4.4 perf                             0.1 pip                              19.3.1 ply                              3.4 policycoreutils-default-encoding 0.1 pyasn1                           0.1.9 pycparser                        2.19 pycurl                           7.19.0 pygobject                        3.22.0 pygpgme                          0.3 pyliblzma                        0.5.3 pyparsing                        2.4.6 python-linux-procfs              0.4.9 pytz                             2019.3 pyudev                           0.15 pyxattr                          0.5.1 PyYAML                           5.2 requests                         2.22.0 rfc3986                          1.3.2 schedutils                       0.4 seobject                         0.1 sepolicy                         1.1 setuptools                       44.0.0 six                              1.13.0 slip                             0.4.0 slip.dbus                        0.4.0 stevedore                        1.31.0 urlgrabber                       3.10 urllib3                          1.25.7 websocket-client                 0.57.0 wrapt                            1.11.2 yum-metadata-parser              1.1.4 and it looks like Im not alone with such issue: https://q.cnblogs.com/q/125213/ Thanks for your attention, Andrei Perepiolkin -------------- next part -------------- An HTML attachment was scrubbed... URL: From rico.lin.guanyu at gmail.com Mon Jan 6 06:41:00 2020 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Mon, 6 Jan 2020 14:41:00 +0800 Subject: [meta-sig][multi-arch] propose forming a Multi-arch SIG In-Reply-To: References: <20191121001509.GB976114@fedora19.localdomain> <20191217043512.GA2367741@fedora19.localdomain> Message-ID: Hi all, according to our doodle result, I propose a patch [1] to settle down our initial meeting schedules as *2020/01/07 Tuesday 0800 UTC on #openstack-meeting-alt and 1500 UTC on #openstack-meeting.* I assume we can use our initial meetings to discuss about SIG setup details (schedules, chairs, etc.), and general goals we should set and initial action we need to takes. please join us if you got time. Also, let me know if anything is wrong with the above schedule. [1] https://review.opendev.org/#/c/701147/ On Tue, Dec 17, 2019 at 2:11 PM Rico Lin wrote: > From this ML, and some IRC and Wechat discussions. I put most of the > information I collected in [1]. > At this point, we can tell there's a lot of works already in progress in > this community. So I think we can definitely get benefits from this SIG. > > Here are things we need to settle at this point: > > - *SIG chairs*: We need multiple SIG chairs who can help to drive SIG > goals and host meetings/events. *Put your name under `SIG chairs:` if > you're interested*. I will propose my name on the create SIG patch > since I'm interested in helping set this SIG up and we need to fillup > something there. But that won't block you from signing up. And I'm more > than happy if we can have more people rush in for the chair role:). > - *First meeting schedule*: I create polling for meeting time [2]. *Please > pick your favorite for our first meeting time* (And potentially our > long term meeting schedule, but let's discuss that in the meeting). I pick > the second week of Jan. because some might be on their vacation in the > following two weeks. As for the location, I do like to suggest we use > #openstack-meeting, so we might be able to get more people's attention. > From the experience of other SIGs, to run a meeting on your own IRC > channel, make it harder for new community members to join. > - *Resources*: We need to find out who or which organization is also > interested in this. Right now, I believe we need more servers to run tests, > and people to help on making test jobs, feedbacks, or any other tasks. So > please help to forward the etherpad([1]) and add on more information that I > fail to mention:) If you can find organizations that might be interested in > donating servers, I can help to reach out too. *So sign up and provide > any information that you think will helps:)* > - *Build and trace*: We definitely need to target all the above > works(from the previous replies) in this SIG, and (like Ian mentioned) to > work on the test infrastructure. And these make great first step tasks for > SIG. And to track all jobs, I think it will be reasonable to create a > Storyboard for this SIG and document those tasks in one Storyboard. > > All the above tasks IMO don't need to wait for the first meeting to happen > before them, so If anyone likes to put their effort on any of them or like > to suggest more initial tasks, you're the most welcome here! > > [1] https://etherpad.openstack.org/p/Multi-arch > [2] https://doodle.com/poll/8znyzc57skqkryv8 > > On Tue, Dec 17, 2019 at 12:45 PM Ian Wienand wrote: > >> On Tue, Nov 26, 2019 at 11:33:16AM +0000, Jonathan Rosser wrote: >> > openstack-ansible is ready to go on arm CI but in order to make the >> jobs run >> > in a reasonable time and not simply timeout a source of pre-built arm >> python >> > wheels is needed. It would be a shame to let the work that got >> contributed >> > to OSA for arm just rot. >> >> So ARM64 wheels are still a work-in-progress, but in the mean time we >> have merged a change to install a separate queue for ARM64 jobs [1]. >> Jobs in the "check-arm64" queue will be implicitly non-voting (Zuul >> isn't configured to add +-1 votes for this queue) but importantly will >> run asynchronously to the regular queue. Thus if there's very high >> demand, or any intermittent instability your gates won't be held up. >> >> [2] is an example of using this in diskimage-builder. >> >> Of course you *can* put ARM64 jobs in your gate queues as voting jobs, >> but just be aware with only 8 nodes available at this time, it could >> easily become a bottle-neck to merging code. >> >> The "check-arm64" queue is designed to be an automatically-running >> half-way point as we (hopefully) scale up support (like wheel builds >> and mirrors) and resources further. >> >> Thanks, >> >> -i >> >> [1] https://review.opendev.org/#/c/698606/ >> [2] https://review.opendev.org/#/c/676111/ >> >> >> > > -- > May The Force of OpenStack Be With You, > > *Rico Lin*irc: ricolin > > -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From rico.lin.guanyu at gmail.com Mon Jan 6 07:04:14 2020 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Mon, 6 Jan 2020 15:04:14 +0800 Subject: [auto-scaling][self-healing] Discussion to merge two SIG to one In-Reply-To: References: <1b39c3fe-22a1-c84c-ed13-05fbd9360d7d@suse.com> Message-ID: Hi guys, I send out a new schedule patch [1], please take a look to see if that works for you. Which proposed 2020/01/07 Tuesday 1400 UTC on irc #openstack-meeting as our first combined meeting schedule. [1] https://review.opendev.org/701137 On Wed, Dec 18, 2019 at 11:46 AM Rico Lin wrote: > To further push this task. I would like to propose we pick a new joint > meeting schedule for both SIGs together. > > The first steps should be we share same meeting time and schedule, also > share same event plan (as Witek suggested). And we can go from there to > discuss if we need further plans. > I also would like to suggest we move our meeting place to > #openstack-meeting so we can have chance to have more people to join. > Let's have a quick doodle polling for time, > https://doodle.com/poll/98nrf8iibr7zv3kt > Please join that doodle survey if you're interested in join us:) > > > On Thu, Nov 28, 2019 at 4:57 PM Rico Lin > wrote: > >> >> >> On Thu, Nov 28, 2019 at 4:37 PM Witek Bedyk >> wrote: >> > >> > Hi, >> > how about starting with joining the SIGs meeting times and organizing >> > the Forum and PTG events together? The repositories and wiki pages could >> > stay as they are and refer to each other. >> > >> I think even if we merged two SIG, repositories should stay separated as >> they're now. IMO we can simply rename openstack/auto-scaling-sig >> to openstack/auto-scaling and so as to self-healing. >> Or just keep it the same will be fine IMO. >> We don't need a new repo for the new SIG (at least not for now). >> >> I do like the idea to start with joining the SIGs meeting times and >> organizing the Forum and PTG events together. >> One more proposal in my mind will be, join the channel for IRC. >> > >> > I think merging is good if you have an idea how to better structure the >> > content, and time to review the existing one and do all the formal >> > stuff. Just gluing the documents won't help. >> Totally agree with this point! >> > >> > Cheers >> > Witek >> > >> >> >> -- >> May The Force of OpenStack Be With You, >> Rico Lin >> irc: ricolin >> > > > -- > May The Force of OpenStack Be With You, > > *Rico Lin*irc: ricolin > > -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Mon Jan 6 09:08:46 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 6 Jan 2020 10:08:46 +0100 Subject: [kolla] Quick start: ansible deploy failure In-Reply-To: <88226158-d15c-7f23-e692-aa461f6d8549@open-e.com> References: <88226158-d15c-7f23-e692-aa461f6d8549@open-e.com> Message-ID: Hi Andrei, I see you use kolla-ansible for Train, yet it looks as if you are deploying Stein there. Could you confirm that? If you prefer to deploy Stein, please use the Stein branch of kolla-ansible or analogically the 8.* releases from PyPI. Otherwise try deploying Train. -yoctozepto pon., 6 sty 2020 o 05:58 Andrei Perapiolkin napisał(a): > > Hello, > > > Im following quick start guide on deploying Kolla ansible and getting failure on deploy stage: > > https://docs.openstack.org/kolla-ansible/latest/user/quickstart.html > > kolla-ansible -i ./multinode deploy > > TASK [mariadb : Creating haproxy mysql user] ******************************************************************************************************************************** > > fatal: [control01]: FAILED! => {"changed": false, "msg": "Can not parse the inner module output: localhost | SUCCESS => {\n \"changed\": false, \n \"user\": \"haproxy\"\n}\n"} > > > I deploy to Centos7 with latest updates. > > > [user at master ~]$ pip list > DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support > Package Version > -------------------------------- ---------- > ansible 2.9.1 > Babel 2.8.0 > backports.ssl-match-hostname 3.7.0.1 > certifi 2019.11.28 > cffi 1.13.2 > chardet 3.0.4 > configobj 4.7.2 > cryptography 2.8 > debtcollector 1.22.0 > decorator 3.4.0 > docker 4.1.0 > enum34 1.1.6 > funcsigs 1.0.2 > httplib2 0.9.2 > idna 2.8 > iniparse 0.4 > ipaddress 1.0.23 > IPy 0.75 > iso8601 0.1.12 > Jinja2 2.10.3 > jmespath 0.9.4 > kitchen 1.1.1 > kolla-ansible 9.0.0 > MarkupSafe 1.1.1 > monotonic 1.5 > netaddr 0.7.19 > netifaces 0.10.9 > oslo.config 6.12.0 > oslo.i18n 3.25.0 > oslo.utils 3.42.1 > paramiko 2.1.1 > pbr 5.4.4 > perf 0.1 > pip 19.3.1 > ply 3.4 > policycoreutils-default-encoding 0.1 > pyasn1 0.1.9 > pycparser 2.19 > pycurl 7.19.0 > pygobject 3.22.0 > pygpgme 0.3 > pyliblzma 0.5.3 > pyparsing 2.4.6 > python-linux-procfs 0.4.9 > pytz 2019.3 > pyudev 0.15 > pyxattr 0.5.1 > PyYAML 5.2 > requests 2.22.0 > rfc3986 1.3.2 > schedutils 0.4 > seobject 0.1 > sepolicy 1.1 > setuptools 44.0.0 > six 1.13.0 > slip 0.4.0 > slip.dbus 0.4.0 > stevedore 1.31.0 > urlgrabber 3.10 > urllib3 1.25.7 > websocket-client 0.57.0 > wrapt 1.11.2 > yum-metadata-parser 1.1.4 > > > and it looks like Im not alone with such issue: https://q.cnblogs.com/q/125213/ > > > Thanks for your attention, > > Andrei Perepiolkin From radoslaw.piliszek at gmail.com Mon Jan 6 09:11:48 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 6 Jan 2020 10:11:48 +0100 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: References: Message-ID: If it's RHEL kernel's bug, then Red Hat would likely want to know about it (if not knowing already). I have my kolla deployment on c7.7 and I don't encounter this issue, though there is a pending kernel update so now I'm worried about applying it... -yoctozepto pon., 6 sty 2020 o 03:34 Jon Masters napisał(a): > > There’s no bug ID that I’m aware of. But I’ll go look for one or file one. > > -- > Computer Architect > > > On Jan 5, 2020, at 18:51, Laurent Dumont wrote: > >  > Do you happen to have the bug ID for Centos? > > On Sun, Jan 5, 2020 at 2:11 PM Jon Masters wrote: >> >> This turns out to a not well documented bug in the CentOS7.7 kernel that causes exactly nat rules not to run as I was seeing. Oh dear god was this nasty as whatever to find and workaround. >> >> -- >> Computer Architect >> >> >> > On Jan 4, 2020, at 10:39, Jon Masters wrote: >> > >> > Excuse top posting on my phone. Also, yes, the namespaces are as described. It’s just that the (correct) nat rules for the qrouter netns are never running, in spite of the two interfaces existing in that ns and correctly attached to the vswitch. >> > >> > -- >> > Computer Architect >> > >> > >> >>> On Jan 4, 2020, at 07:56, Sean Mooney wrote: >> >>> >> >>> On Sat, 2020-01-04 at 10:46 +0100, Slawek Kaplonski wrote: >> >>> Hi, >> >>> >> >>> Is this qrouter namespace created with all those rules in container or in the host directly? >> >>> Do You have qr-xxx and qg-xxx ports from br-int in this qrouter namespace? >> >> in kolla the l3 agent should be running with net=host so the container should be useing the hosts >> >> root namespace and it will create network namespaces as needed for the different routers. >> >> >> >> the ip table rules should be in the router sub namespaces. >> >> >> >>> >> >>>>> On 4 Jan 2020, at 05:44, Jon Masters wrote: >> >>>> >> >>>> Hi there, >> >>>> >> >>>> I've got a weird problem with the neutron-l3-agent container on my deployment. It comes up, sets up the iptables >> >>>> rules in the qrouter namespace (and I can see these using "ip netns...") but traffic isn't having DNAT or SNAT >> >>>> applied. What's most strange is that manually adding a LOG jump target to the iptables nat PRE/POSTROUTING chains >> >>>> (after enabling nf logging sent to the host kernel, confirmed that works) doesn't result in any log entries. It's as >> >>>> if the nat table isn't being applied at all for any packets traversing the qrouter namespace. This is driving me >> >>>> crazy :) >> >>>> >> >>>> Anyone got some quick suggestions? (assume I tried the obvious stuff). >> >>>> >> >>>> Jon. >> >>>> >> >>>> -- >> >>>> Computer Architect >> >>> >> >>> — >> >>> Slawek Kaplonski >> >>> Senior software engineer >> >>> Red Hat >> >>> >> >>> >> >> >> From thierry at openstack.org Mon Jan 6 09:40:54 2020 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 6 Jan 2020 10:40:54 +0100 Subject: [cloudkitty] Stepping down from PTL In-Reply-To: <6a879c96c9aa82cc31f4ffde7a6b2663@objectif-libre.com> References: <6a879c96c9aa82cc31f4ffde7a6b2663@objectif-libre.com> Message-ID: <601c77c1-f725-f4dc-7a24-da7627f6d998@openstack.org> Luka Peschke wrote: > I'm moving to a new position that doesn't involve OpenStack, and won't > leave me the required time to be Cloudkitty's PTL. This is why I have to > step down from the PTL position. jferrieu will take my position for the > end of the U cycle (he's been a major contributor recently), with the > help of huats, who's been the Cloudkitty PTL before me, and has been > around in the community for a long time. > > I've been the PTL for two and a half cycles, and I think that it is a > good thing for the project to take a new lead, with a new vision. > > I'm grateful for my experience within the OpenStack community. Sorry to see you go, Luka! To make the transition official, could you propose a change to update the PTL name at: https://opendev.org/openstack/governance/src/branch/master/reference/projects.yaml#L161 Thanks in advance, -- Thierry Carrez (ttx) From arnaud.morin at gmail.com Mon Jan 6 09:34:08 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Mon, 6 Jan 2020 09:34:08 +0000 Subject: [neutron][nova][cinder][glance][largescale-sig] Documentation update for large-scale Message-ID: <20200106093408.GH1174@sync> Hey all, With the new "Large scale SIG", we were thinking about updating documentation to help operators setting up large deployments. To do so, we would like to propose, at least, some documentation changes to identify options that affect large scale. The plan is to have a small note on some options and eventually a link to a specific page for large scale (see attachments). I know that nova started working on collecting those parameters here: https://bugs.launchpad.net/nova/+bug/1838819 Do you know if something similar exists on other projects? Moreover, we would like to collect more parameters that could be tuned on a large scale deployment, for every project. So, if you have any, feel free to answer to this mail or add some info on the following etherpad: https://etherpad.openstack.org/p/large-scale-sig-documentation Thanks for your help! Regards. The large-scale team. -- Arnaud Morin -------------- next part -------------- A non-text attachment was scrubbed... Name: before.png Type: application/octet-stream Size: 42004 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: after.png Type: application/octet-stream Size: 59687 bytes Desc: not available URL: From smooney at redhat.com Mon Jan 6 11:40:15 2020 From: smooney at redhat.com (Sean Mooney) Date: Mon, 06 Jan 2020 11:40:15 +0000 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: References: Message-ID: <9a331abcc2d5eaf119dc1c1903c3405024ce84a8.camel@redhat.com> On Mon, 2020-01-06 at 10:11 +0100, Radosław Piliszek wrote: > If it's RHEL kernel's bug, then Red Hat would likely want to know > about it (if not knowing already). > I have my kolla deployment on c7.7 and I don't encounter this issue, > though there is a pending kernel update so now I'm worried about > applying it... it sound more like a confilct between legacy iptables and the new nftables based replacement. if you mix the two then it will appear as if the rules are installed but only some of the rules will run. so the container images and the host need to be both configured to use the same versions. that said fi you are using centos images on a centos host they should be providing your usnign centos 7 or centos 8 on both. if you try to use centos 7 image on a centos 8 host or centos 8 images on a centos 7 host it would likely have issues due to the fact centos 8 uses a differt iptables implemeantion > > -yoctozepto > > pon., 6 sty 2020 o 03:34 Jon Masters napisał(a): > > > > There’s no bug ID that I’m aware of. But I’ll go look for one or file one. > > > > -- > > Computer Architect > > > > > > On Jan 5, 2020, at 18:51, Laurent Dumont wrote: > > > >  > > Do you happen to have the bug ID for Centos? > > > > On Sun, Jan 5, 2020 at 2:11 PM Jon Masters wrote: > > > > > > This turns out to a not well documented bug in the CentOS7.7 kernel that causes exactly nat rules not to run as I > > > was seeing. Oh dear god was this nasty as whatever to find and workaround. > > > > > > -- > > > Computer Architect > > > > > > > > > > On Jan 4, 2020, at 10:39, Jon Masters wrote: > > > > > > > > Excuse top posting on my phone. Also, yes, the namespaces are as described. It’s just that the (correct) nat > > > > rules for the qrouter netns are never running, in spite of the two interfaces existing in that ns and correctly > > > > attached to the vswitch. > > > > > > > > -- > > > > Computer Architect > > > > > > > > > > > > > > On Jan 4, 2020, at 07:56, Sean Mooney wrote: > > > > > > > > > > > > On Sat, 2020-01-04 at 10:46 +0100, Slawek Kaplonski wrote: > > > > > > Hi, > > > > > > > > > > > > Is this qrouter namespace created with all those rules in container or in the host directly? > > > > > > Do You have qr-xxx and qg-xxx ports from br-int in this qrouter namespace? > > > > > > > > > > in kolla the l3 agent should be running with net=host so the container should be useing the hosts > > > > > root namespace and it will create network namespaces as needed for the different routers. > > > > > > > > > > the ip table rules should be in the router sub namespaces. > > > > > > > > > > > > > > > > > > > On 4 Jan 2020, at 05:44, Jon Masters wrote: > > > > > > > > > > > > > > Hi there, > > > > > > > > > > > > > > I've got a weird problem with the neutron-l3-agent container on my deployment. It comes up, sets up the > > > > > > > iptables > > > > > > > rules in the qrouter namespace (and I can see these using "ip netns...") but traffic isn't having DNAT or > > > > > > > SNAT > > > > > > > applied. What's most strange is that manually adding a LOG jump target to the iptables nat PRE/POSTROUTING > > > > > > > chains > > > > > > > (after enabling nf logging sent to the host kernel, confirmed that works) doesn't result in any log > > > > > > > entries. It's as > > > > > > > if the nat table isn't being applied at all for any packets traversing the qrouter namespace. This is > > > > > > > driving me > > > > > > > crazy :) > > > > > > > > > > > > > > Anyone got some quick suggestions? (assume I tried the obvious stuff). > > > > > > > > > > > > > > Jon. > > > > > > > > > > > > > > -- > > > > > > > Computer Architect > > > > > > > > > > > > — > > > > > > Slawek Kaplonski > > > > > > Senior software engineer > > > > > > Red Hat > > > > > > > > > > > > > > From skaplons at redhat.com Mon Jan 6 11:50:11 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 6 Jan 2020 12:50:11 +0100 Subject: [neutron][nova][cinder][glance][largescale-sig] Documentation update for large-scale In-Reply-To: <20200106093408.GH1174@sync> References: <20200106093408.GH1174@sync> Message-ID: <515FFAFC-83C7-4E4F-9B43-19186FE86C1F@redhat.com> Hi, I just opened similar bug for Neutron to track this from Neutron perspective also. It’s here: https://bugs.launchpad.net/neutron/+bug/1858419 I will also raise this on our next team meeting. > On 6 Jan 2020, at 10:34, Arnaud Morin wrote: > > Hey all, > > With the new "Large scale SIG", we were thinking about updating > documentation to help operators setting up large deployments. > To do so, we would like to propose, at least, some documentation changes > to identify options that affect large scale. > The plan is to have a small note on some options and eventually a link > to a specific page for large scale (see attachments). > > I know that nova started working on collecting those parameters here: > https://bugs.launchpad.net/nova/+bug/1838819 > > Do you know if something similar exists on other projects? > > Moreover, we would like to collect more parameters that could be tuned > on a large scale deployment, for every project. > So, if you have any, feel free to answer to this mail or add some info > on the following etherpad: > https://etherpad.openstack.org/p/large-scale-sig-documentation > > Thanks for your help! > > Regards. > The large-scale team. > > -- > Arnaud Morin > > — Slawek Kaplonski Senior software engineer Red Hat From jan.vondra at ultimum.io Mon Jan 6 12:02:51 2020 From: jan.vondra at ultimum.io (Jan Vondra) Date: Mon, 6 Jan 2020 13:02:51 +0100 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: <9a331abcc2d5eaf119dc1c1903c3405024ce84a8.camel@redhat.com> References: <9a331abcc2d5eaf119dc1c1903c3405024ce84a8.camel@redhat.com> Message-ID: po 6. 1. 2020 v 12:46 odesílatel Sean Mooney napsal: > > On Mon, 2020-01-06 at 10:11 +0100, Radosław Piliszek wrote: > > If it's RHEL kernel's bug, then Red Hat would likely want to know > > about it (if not knowing already). > > I have my kolla deployment on c7.7 and I don't encounter this issue, > > though there is a pending kernel update so now I'm worried about > > applying it... > it sound more like a confilct between legacy iptables and the new nftables based replacement. > if you mix the two then it will appear as if the rules are installed but only some of the rules will run. > so the container images and the host need to be both configured to use the same versions. > > that said fi you are using centos images on a centos host they should be providing your usnign centos 7 or centos 8 on > both. if you try to use centos 7 image on a centos 8 host or centos 8 images on a centos 7 host it would likely have > issues due to the fact centos 8 uses a differt iptables implemeantion > As I wrote before this scenario has already been covered in following patches: https://review.opendev.org/#/c/685967/ https://review.opendev.org/#/c/683679/ To force iptables legacy in neutron containers put following line into globals.yml file: neutron_legacy_iptables: "yes" Beware currently there is an issue in applying changes in enviromental variables for already running containers so you may have to manually delete neutron containers and recreate them using reconfigure or - if possible - destroy and redeploy whole deployment. J.V. From jcm at jonmasters.org Mon Jan 6 12:13:22 2020 From: jcm at jonmasters.org (Jon Masters) Date: Mon, 6 Jan 2020 04:13:22 -0800 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: <9a331abcc2d5eaf119dc1c1903c3405024ce84a8.camel@redhat.com> References: <9a331abcc2d5eaf119dc1c1903c3405024ce84a8.camel@redhat.com> Message-ID: I did specifically check for such a conflict tho before proceeding down the path I went :) -- Computer Architect > On Jan 6, 2020, at 03:40, Sean Mooney wrote: > > On Mon, 2020-01-06 at 10:11 +0100, Radosław Piliszek wrote: >> If it's RHEL kernel's bug, then Red Hat would likely want to know >> about it (if not knowing already). >> I have my kolla deployment on c7.7 and I don't encounter this issue, >> though there is a pending kernel update so now I'm worried about >> applying it... > it sound more like a confilct between legacy iptables and the new nftables based replacement. > if you mix the two then it will appear as if the rules are installed but only some of the rules will run. > so the container images and the host need to be both configured to use the same versions. > > that said fi you are using centos images on a centos host they should be providing your usnign centos 7 or centos 8 on > both. if you try to use centos 7 image on a centos 8 host or centos 8 images on a centos 7 host it would likely have > issues due to the fact centos 8 uses a differt iptables implemeantion > >> >> -yoctozepto >> >> pon., 6 sty 2020 o 03:34 Jon Masters napisał(a): >>> >>> There’s no bug ID that I’m aware of. But I’ll go look for one or file one. >>> >>> -- >>> Computer Architect >>> >>> >>>> On Jan 5, 2020, at 18:51, Laurent Dumont wrote: >>> >>>  >>> Do you happen to have the bug ID for Centos? >>> >>> On Sun, Jan 5, 2020 at 2:11 PM Jon Masters wrote: >>>> >>>> This turns out to a not well documented bug in the CentOS7.7 kernel that causes exactly nat rules not to run as I >>>> was seeing. Oh dear god was this nasty as whatever to find and workaround. >>>> >>>> -- >>>> Computer Architect >>>> >>>> >>>>> On Jan 4, 2020, at 10:39, Jon Masters wrote: >>>>> >>>>> Excuse top posting on my phone. Also, yes, the namespaces are as described. It’s just that the (correct) nat >>>>> rules for the qrouter netns are never running, in spite of the two interfaces existing in that ns and correctly >>>>> attached to the vswitch. >>>>> >>>>> -- >>>>> Computer Architect >>>>> >>>>> >>>>>>> On Jan 4, 2020, at 07:56, Sean Mooney wrote: >>>>>>> >>>>>>> On Sat, 2020-01-04 at 10:46 +0100, Slawek Kaplonski wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Is this qrouter namespace created with all those rules in container or in the host directly? >>>>>>> Do You have qr-xxx and qg-xxx ports from br-int in this qrouter namespace? >>>>>> >>>>>> in kolla the l3 agent should be running with net=host so the container should be useing the hosts >>>>>> root namespace and it will create network namespaces as needed for the different routers. >>>>>> >>>>>> the ip table rules should be in the router sub namespaces. >>>>>> >>>>>>> >>>>>>>>> On 4 Jan 2020, at 05:44, Jon Masters wrote: >>>>>>>> >>>>>>>> Hi there, >>>>>>>> >>>>>>>> I've got a weird problem with the neutron-l3-agent container on my deployment. It comes up, sets up the >>>>>>>> iptables >>>>>>>> rules in the qrouter namespace (and I can see these using "ip netns...") but traffic isn't having DNAT or >>>>>>>> SNAT >>>>>>>> applied. What's most strange is that manually adding a LOG jump target to the iptables nat PRE/POSTROUTING >>>>>>>> chains >>>>>>>> (after enabling nf logging sent to the host kernel, confirmed that works) doesn't result in any log >>>>>>>> entries. It's as >>>>>>>> if the nat table isn't being applied at all for any packets traversing the qrouter namespace. This is >>>>>>>> driving me >>>>>>>> crazy :) >>>>>>>> >>>>>>>> Anyone got some quick suggestions? (assume I tried the obvious stuff). >>>>>>>> >>>>>>>> Jon. >>>>>>>> >>>>>>>> -- >>>>>>>> Computer Architect >>>>>>> >>>>>>> — >>>>>>> Slawek Kaplonski >>>>>>> Senior software engineer >>>>>>> Red Hat >>>>>>> >>>>>>> >> >> > From radoslaw.piliszek at gmail.com Mon Jan 6 12:33:02 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 6 Jan 2020 13:33:02 +0100 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: References: <9a331abcc2d5eaf119dc1c1903c3405024ce84a8.camel@redhat.com> Message-ID: Folks, this seems to be about C7, not C8, and "neutron_legacy_iptables" does not apply here. @Jon - what is the kernel bug you mentioned but never referenced? -yoctozepto pon., 6 sty 2020 o 13:13 Jon Masters napisał(a): > > I did specifically check for such a conflict tho before proceeding down the path I went :) > > -- > Computer Architect > > > > On Jan 6, 2020, at 03:40, Sean Mooney wrote: > > > > On Mon, 2020-01-06 at 10:11 +0100, Radosław Piliszek wrote: > >> If it's RHEL kernel's bug, then Red Hat would likely want to know > >> about it (if not knowing already). > >> I have my kolla deployment on c7.7 and I don't encounter this issue, > >> though there is a pending kernel update so now I'm worried about > >> applying it... > > it sound more like a confilct between legacy iptables and the new nftables based replacement. > > if you mix the two then it will appear as if the rules are installed but only some of the rules will run. > > so the container images and the host need to be both configured to use the same versions. > > > > that said fi you are using centos images on a centos host they should be providing your usnign centos 7 or centos 8 on > > both. if you try to use centos 7 image on a centos 8 host or centos 8 images on a centos 7 host it would likely have > > issues due to the fact centos 8 uses a differt iptables implemeantion > > > >> > >> -yoctozepto > >> > >> pon., 6 sty 2020 o 03:34 Jon Masters napisał(a): > >>> > >>> There’s no bug ID that I’m aware of. But I’ll go look for one or file one. > >>> > >>> -- > >>> Computer Architect > >>> > >>> > >>>> On Jan 5, 2020, at 18:51, Laurent Dumont wrote: > >>> > >>>  > >>> Do you happen to have the bug ID for Centos? > >>> > >>> On Sun, Jan 5, 2020 at 2:11 PM Jon Masters wrote: > >>>> > >>>> This turns out to a not well documented bug in the CentOS7.7 kernel that causes exactly nat rules not to run as I > >>>> was seeing. Oh dear god was this nasty as whatever to find and workaround. > >>>> > >>>> -- > >>>> Computer Architect > >>>> > >>>> > >>>>> On Jan 4, 2020, at 10:39, Jon Masters wrote: > >>>>> > >>>>> Excuse top posting on my phone. Also, yes, the namespaces are as described. It’s just that the (correct) nat > >>>>> rules for the qrouter netns are never running, in spite of the two interfaces existing in that ns and correctly > >>>>> attached to the vswitch. > >>>>> > >>>>> -- > >>>>> Computer Architect > >>>>> > >>>>> > >>>>>>> On Jan 4, 2020, at 07:56, Sean Mooney wrote: > >>>>>>> > >>>>>>> On Sat, 2020-01-04 at 10:46 +0100, Slawek Kaplonski wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> Is this qrouter namespace created with all those rules in container or in the host directly? > >>>>>>> Do You have qr-xxx and qg-xxx ports from br-int in this qrouter namespace? > >>>>>> > >>>>>> in kolla the l3 agent should be running with net=host so the container should be useing the hosts > >>>>>> root namespace and it will create network namespaces as needed for the different routers. > >>>>>> > >>>>>> the ip table rules should be in the router sub namespaces. > >>>>>> > >>>>>>> > >>>>>>>>> On 4 Jan 2020, at 05:44, Jon Masters wrote: > >>>>>>>> > >>>>>>>> Hi there, > >>>>>>>> > >>>>>>>> I've got a weird problem with the neutron-l3-agent container on my deployment. It comes up, sets up the > >>>>>>>> iptables > >>>>>>>> rules in the qrouter namespace (and I can see these using "ip netns...") but traffic isn't having DNAT or > >>>>>>>> SNAT > >>>>>>>> applied. What's most strange is that manually adding a LOG jump target to the iptables nat PRE/POSTROUTING > >>>>>>>> chains > >>>>>>>> (after enabling nf logging sent to the host kernel, confirmed that works) doesn't result in any log > >>>>>>>> entries. It's as > >>>>>>>> if the nat table isn't being applied at all for any packets traversing the qrouter namespace. This is > >>>>>>>> driving me > >>>>>>>> crazy :) > >>>>>>>> > >>>>>>>> Anyone got some quick suggestions? (assume I tried the obvious stuff). > >>>>>>>> > >>>>>>>> Jon. > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Computer Architect > >>>>>>> > >>>>>>> — > >>>>>>> Slawek Kaplonski > >>>>>>> Senior software engineer > >>>>>>> Red Hat > >>>>>>> > >>>>>>> > >> > >> > > From mark at stackhpc.com Mon Jan 6 13:38:19 2020 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 6 Jan 2020 13:38:19 +0000 Subject: [kolla] Adding Dincer Celik to kolla-core and kolla-ansible-core Message-ID: Hi, I recently proposed to the existing cores that we add Dincer Celik (osmanlicilegi) to the kolla-core and kolla-ansible-core groups and we agreed to go ahead. Thanks for your contribution to the project so far Dincer, I'm glad to have you on the team. Cheers, Mark From hberaud at redhat.com Mon Jan 6 14:16:38 2020 From: hberaud at redhat.com (Herve Beraud) Date: Mon, 6 Jan 2020 15:16:38 +0100 Subject: [oslo][kolla][requirements][release][infra] Hit by an old, fixed bug In-Reply-To: References: <20191230150137.GA9057@sm-workstation> <79cddc25-88e0-b5dd-8b8a-17cf14b9c4b1@nemebean.com> Message-ID: Thanks Radosław for the heads up, I validated the new release. Le sam. 4 janv. 2020 à 10:39, Radosław Piliszek a écrit : > Thanks, Ben. That doc preamble really made me think not to cross the > holy ground of release proposals. :-) > > I proposed release [1] and added you and Hervé as reviewers. > > [1] https://review.opendev.org/701080 > > -yoctozepto > > czw., 2 sty 2020 o 21:20 Ben Nemec napisał(a): > > > > > > > > On 12/30/19 9:52 AM, Radosław Piliszek wrote: > > > Thanks, Sean! I knew I was missing something really basic! > > > I was under the impression that 9.x is Stein, like it happens with > > > main projects (major=branch). > > > I could not find any doc explaining oslo.messaging versioning, perhaps > > > Oslo could release 9.5.1 off the stein branch? > > > > Oslo for the most part follows semver, so we only bump major versions > > when there is a breaking change. We bump minor versions each release so > > we can do bugfix releases on the previous stable branch without stepping > > on master releases. > > > > The underlying cause of this is likely that I'm way behind on releasing > > the Oslo stable branches. It's high on my todo list now that most people > > are back from holidays and will be around to help out if a release > > breaks something. > > > > However, anyone can propose a release[0][1] (contrary to what [0] > > suggests), so if the necessary fix is already on stable/stein and just > > hasn't been released yet please feel free to do that. You'll just need a > > +1 from either myself or hberaud (the Oslo release liaison) before the > > release team will approve it. > > > > 0: > https://releases.openstack.org/reference/using.html#requesting-a-release > > 1: > > > https://releases.openstack.org/reference/using.html#using-new-release-command > > > > > > > > The issue remains that, even though oslo backports bugfixes into their > > > stable branches, kolla (and very possibly other deployment solutions) > > > no longer benefit from them. > > > > > > -yoctozepto > > > > > > pon., 30 gru 2019 o 16:01 Sean McGinnis > napisał(a): > > >> > > >> On Sun, Dec 29, 2019 at 09:41:45PM +0100, Radosław Piliszek wrote: > > >>> Hi Folks, > > >>> > > >>> as the subject goes, my installation has been hit by an old bug: > > >>> https://bugs.launchpad.net/oslo.messaging/+bug/1828841 > > >>> (bug details not important, linked here for background) > > >>> > > >>> I am using Stein, deployed with recent Kolla-built source-based > images > > >>> (with only slight modifications compared to vanilla ones). > > >>> Kolla's procedure for building source-based images considers upper > > >>> constraints, which, unfortunately, turned out to be lagging behind a > > >>> few releases w.r.t. oslo.messaging at least. > > >>> The fix was in 9.7.0 released on May 21, u-c still point to 9.5.0 > from > > >>> Feb 26 and the latest of Stein is 9.8.0 from Jul 18. > > >>> > > >>> It seems oslo.messaging is missing from the automatic updates that > bot proposes: > > >>> > https://review.opendev.org/#/q/owner:%22OpenStack+Proposal+Bot%22+project:openstack/requirements+branch:stable/stein > > >>> > > >>> Per: > > >>> > https://opendev.org/openstack/releases/src/branch/master/doc/source/reference/reviewer_guide.rst#release-jobs > > >>> this upper-constraint proposal should be happening for all releases. > > >>> > > >> > > >> This is normal and what is expected. > > >> > > >> Requirements are only updated for the branch in which those releases > happen. So > > >> if there is a release of oslo.messaging for stable/train, only the > stable/train > > >> upper constraints are updated for that new release. The stable/stein > branch > > >> will not be affected because that shows what the tested upper > constraints were > > >> for that branch. > > >> > > >> The last stable/stein release for oslo.messaging was 9.5.0: > > >> > > >> > https://opendev.org/openstack/releases/src/branch/master/deliverables/stein/oslo.messaging.yaml#L49 > > >> > > >> And 9.5.0 is what is set in the stable/stein upper-constraints: > > >> > > >> > https://opendev.org/openstack/requirements/src/branch/stable/stein/upper-constraints.txt#L146 > > >> > > >> To get that raised, whatever necessary bugfixes that are required in > > >> oslo.messaging would need to be backported per-cycle until > stable/stein (as in, > > >> if it was in current master, it would need to be backported and > merged to > > >> stable/train first, then stable/stein), and once merged a stable > release would > > >> need to be proposed for that branch's version of the library. > > >> > > >> Once that stable release is done, that will propose the update to the > upper > > >> constraint for the given branch. > > >> > > >>> I would be glad if someone investigated why it happens(/ed) and > > >>> audited whether other OpenStack projects don't need updating as well > > >>> to avoid running on old deps when new are awaiting for months. :-) > > >>> Please note this might apply to other branches as well. > > >>> > > >>> PS: for some reason oslo.messaging Stein release notes ( > > >>> https://docs.openstack.org/releasenotes/oslo.messaging/stein.html ) > > >>> are stuck at 9.5.0 as well, this could be right (I did not inspect > the > > >>> sources) but I am adding this in PS so you have more things to > > >>> correlate if they need be. > > >>> > > >> > > >> Again, as expected. The last stable/stein release was 9.5.0, so that > is correct > > >> that the release notes for stein only show up to that point. > > > > > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Mon Jan 6 14:32:28 2020 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 6 Jan 2020 15:32:28 +0100 Subject: [largescale-sig] Meeting summary and next actions In-Reply-To: <3c3a6232-9a3b-d240-ab82-c7ac4997f5c0@openstack.org> References: <3c3a6232-9a3b-d240-ab82-c7ac4997f5c0@openstack.org> Message-ID: <06e5f16f-dfa4-8189-da7b-ad2250df8125@openstack.org> Thierry Carrez wrote: > [...] > The next meeting will happen on January 15, at 9:00 UTC on > #openstack-meeting. Oops, some unexpected travel came up and I won't be available to chair the meeting on that date. We can either: 1- keep the meeting, with someone else chairing. I can help with posting the agenda before and the summary after, just need someone to start the meeting and lead it -- any volunteer? 2- move the meeting to January 22, but we may lose Chinese participants to new year preparations... Thoughts? -- Thierry Carrez (ttx) From haleyb.dev at gmail.com Mon Jan 6 15:15:53 2020 From: haleyb.dev at gmail.com (Brian Haley) Date: Mon, 6 Jan 2020 10:15:53 -0500 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: References: <9a331abcc2d5eaf119dc1c1903c3405024ce84a8.camel@redhat.com> Message-ID: On 1/6/20 7:33 AM, Radosław Piliszek wrote: > Folks, this seems to be about C7, not C8, and > "neutron_legacy_iptables" does not apply here. > @Jon - what is the kernel bug you mentioned but never referenced? There was a previous kernel bug in a Centos kernel that broke DNAT, https://bugs.launchpad.net/neutron/+bug/1776778 but don't know if this is the same issue. I would have hoped no one was using that kernel by now, and/or it was blacklisted. -Brian > pon., 6 sty 2020 o 13:13 Jon Masters napisał(a): >> >> I did specifically check for such a conflict tho before proceeding down the path I went :) >> >> -- >> Computer Architect >> >> >>> On Jan 6, 2020, at 03:40, Sean Mooney wrote: >>> >>> On Mon, 2020-01-06 at 10:11 +0100, Radosław Piliszek wrote: >>>> If it's RHEL kernel's bug, then Red Hat would likely want to know >>>> about it (if not knowing already). >>>> I have my kolla deployment on c7.7 and I don't encounter this issue, >>>> though there is a pending kernel update so now I'm worried about >>>> applying it... >>> it sound more like a confilct between legacy iptables and the new nftables based replacement. >>> if you mix the two then it will appear as if the rules are installed but only some of the rules will run. >>> so the container images and the host need to be both configured to use the same versions. >>> >>> that said fi you are using centos images on a centos host they should be providing your usnign centos 7 or centos 8 on >>> both. if you try to use centos 7 image on a centos 8 host or centos 8 images on a centos 7 host it would likely have >>> issues due to the fact centos 8 uses a differt iptables implemeantion >>> >>>> >>>> -yoctozepto >>>> >>>> pon., 6 sty 2020 o 03:34 Jon Masters napisał(a): >>>>> >>>>> There’s no bug ID that I’m aware of. But I’ll go look for one or file one. >>>>> >>>>> -- >>>>> Computer Architect >>>>> >>>>> >>>>>> On Jan 5, 2020, at 18:51, Laurent Dumont wrote: >>>>> >>>>>  >>>>> Do you happen to have the bug ID for Centos? >>>>> >>>>> On Sun, Jan 5, 2020 at 2:11 PM Jon Masters wrote: >>>>>> >>>>>> This turns out to a not well documented bug in the CentOS7.7 kernel that causes exactly nat rules not to run as I >>>>>> was seeing. Oh dear god was this nasty as whatever to find and workaround. >>>>>> >>>>>> -- >>>>>> Computer Architect >>>>>> >>>>>> >>>>>>> On Jan 4, 2020, at 10:39, Jon Masters wrote: >>>>>>> >>>>>>> Excuse top posting on my phone. Also, yes, the namespaces are as described. It’s just that the (correct) nat >>>>>>> rules for the qrouter netns are never running, in spite of the two interfaces existing in that ns and correctly >>>>>>> attached to the vswitch. >>>>>>> >>>>>>> -- >>>>>>> Computer Architect >>>>>>> >>>>>>> >>>>>>>>> On Jan 4, 2020, at 07:56, Sean Mooney wrote: >>>>>>>>> >>>>>>>>> On Sat, 2020-01-04 at 10:46 +0100, Slawek Kaplonski wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Is this qrouter namespace created with all those rules in container or in the host directly? >>>>>>>>> Do You have qr-xxx and qg-xxx ports from br-int in this qrouter namespace? >>>>>>>> >>>>>>>> in kolla the l3 agent should be running with net=host so the container should be useing the hosts >>>>>>>> root namespace and it will create network namespaces as needed for the different routers. >>>>>>>> >>>>>>>> the ip table rules should be in the router sub namespaces. >>>>>>>> >>>>>>>>> >>>>>>>>>>> On 4 Jan 2020, at 05:44, Jon Masters wrote: >>>>>>>>>> >>>>>>>>>> Hi there, >>>>>>>>>> >>>>>>>>>> I've got a weird problem with the neutron-l3-agent container on my deployment. It comes up, sets up the >>>>>>>>>> iptables >>>>>>>>>> rules in the qrouter namespace (and I can see these using "ip netns...") but traffic isn't having DNAT or >>>>>>>>>> SNAT >>>>>>>>>> applied. What's most strange is that manually adding a LOG jump target to the iptables nat PRE/POSTROUTING >>>>>>>>>> chains >>>>>>>>>> (after enabling nf logging sent to the host kernel, confirmed that works) doesn't result in any log >>>>>>>>>> entries. It's as >>>>>>>>>> if the nat table isn't being applied at all for any packets traversing the qrouter namespace. This is >>>>>>>>>> driving me >>>>>>>>>> crazy :) >>>>>>>>>> >>>>>>>>>> Anyone got some quick suggestions? (assume I tried the obvious stuff). >>>>>>>>>> >>>>>>>>>> Jon. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Computer Architect >>>>>>>>> >>>>>>>>> — >>>>>>>>> Slawek Kaplonski >>>>>>>>> Senior software engineer >>>>>>>>> Red Hat >>>>>>>>> >>>>>>>>> >>>> >>>> >>> > From Arkady.Kanevsky at dell.com Mon Jan 6 17:07:17 2020 From: Arkady.Kanevsky at dell.com (Arkady.Kanevsky at dell.com) Date: Mon, 6 Jan 2020 17:07:17 +0000 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> Message-ID: <7b55e3b28d644492a846fdb10f7b127b@AUSX13MPS308.AMER.DELL.COM> Zhipeng, Thanks for quick feedback. Where is accelerating device is running? I am aware of 3 possibilities: servers, storage, switches. In each one of them the device is managed as part of server, storage box or switch. The core of my message is separation of device life cycle management in the “box” where it is placed, from the programming the device as needed per application (VM, container). Thanks, Arkady From: Zhipeng Huang Sent: Friday, January 3, 2020 7:53 PM To: Kanevsky, Arkady Cc: OpenStack Discuss Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management [EXTERNAL EMAIL] Hi Arkady, Thanks for your interest in Cyborg project :) I would like to point out that when we initiated the project there are two specific use cases we want to cover: the accelerators attached locally (via PCIe or other bus type) or remotely (via Ethernet or other fabric type). For the latter one, it is clear that its life cycle is independent from the server (like block device managed by Cinder). For the former one however, its life cycle is not dependent on server for all kinds of accelerators either. For example we already have PCIe based AI accelerator cards or Smart NICs that could be power on/off when the server is on all the time. Therefore it is not a good idea to move all the life cycle management part into Ironic for the above mentioned reasons. Ironic integration is very important for the standalone usage of Cyborg for Kubernetes, Envoy (TLS acceleration) and others alike. Hope this answers your question :) On Sat, Jan 4, 2020 at 5:23 AM > wrote: Fellow Open Stackers, I have been thinking on how to handle SmartNICs, GPUs, FPGA handling across different projects within OpenStack with Cyborg taking a leading role in it. Cyborg is important project and address accelerator devices that are part of the server and potentially switches and storage. It is address 3 different use cases and users there are all grouped into single project. 1. Application user need to program a portion of the device under management, like GPU, or SmartNIC for that app usage. Having a common way to do it across different device families and across different vendor is very important. And that has to be done every time a VM is deploy that need usage of a device. That is tied with VM scheduling. 2. Administrator need to program the whole device for specific usage. That covers the scenario when device can only support single tenant or single use case. That is done once during OpenStack deployment but may need reprogramming to configure device for different usage. May or may not require reboot of the server. 3. Administrator need to setup device for its use, like burning specific FW on it. This is typically done as part of server life-cycle event. The first 2 cases cover application life cycle of device usage. The last one covers device life cycle independently how it is used. Managing life cycle of devices is Ironic responsibility, One cannot and should not manage lifecycle of server components independently. Managing server devices outside server management violates customer service agreements with server vendors and breaks server support agreements. Nova and Neutron are getting info about all devices and their capabilities from Ironic; that they use for scheduling. We should avoid creating new project for every new component of the server and modify nova and neuron for each new device. (the same will also apply to cinder and manila if smart devices used in its data/control path on a server). Finally we want Cyborg to be able to be used in standalone capacity, say for Kubernetes. Thus, I propose that Cyborg cover use cases 1 & 2, and Ironic would cover use case 3. Thus, move all device Life-cycle code from Cyborg to Ironic. Concentrate Cyborg of fulfilling the first 2 use cases. Simplify integration with Nova and Neutron for using these accelerators to use existing Ironic mechanism for it. Create idempotent calls for use case 1 so Nova and Neutron can use it as part of VM deployment to ensure that devices are programmed for VM under scheduling need. Create idempotent call(s) for use case 2 for TripleO to setup device for single accelerator usage of a node. [Propose similar model for CNI integration.] Let the discussion start! Thanks., Arkady -- Zhipeng (Howard) Huang Principle Engineer OpenStack, Kubernetes, CNCF, LF Edge, ONNX, Kubeflow, OpenSDS, Open Service Broker API, OCP, Hyperledger, ETSI, SNIA, DMTF, W3C -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Jan 6 19:51:24 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 6 Jan 2020 20:51:24 +0100 Subject: [neutron] Bug deputy report - week of 30th December Message-ID: <40343C7D-8C58-4D56-A1E5-D3F72C90D7F1@redhat.com> Hi, I was on bug deputy last week. It was pretty quiet week with only few bugs reported. Below is my summary of it. Critical: https://bugs.launchpad.net/neutron/+bug/1858260 - Upstream CI neutron-tempest-plugin-* fails - I marked it as critical as it cause gate failures, I will have to take a look at it closer next week, Medium: https://bugs.launchpad.net/neutron/+bug/1858086 - qrouter's local link route cannot be restored - confirmed by me on local env, it would be good if someone from L3 subteam can take a look at it, Undecided and others: https://bugs.launchpad.net/neutron/+bug/1858377 - probably bug for openstackclient rather than neutron, but I would like to wait for confirmation from bug reporter first, https://bugs.launchpad.net/neutron/+bug/1858262 - duplicate of other bug, https://bugs.launchpad.net/neutron/+bug/1858419 - docs bug to update config options for large scale deployments, see http://lists.openstack.org/pipermail/openstack-discuss/2020-January/011820.html for details — Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Mon Jan 6 19:56:36 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 6 Jan 2020 20:56:36 +0100 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: References: <9a331abcc2d5eaf119dc1c1903c3405024ce84a8.camel@redhat.com> Message-ID: Hi, > On 6 Jan 2020, at 16:15, Brian Haley wrote: > > On 1/6/20 7:33 AM, Radosław Piliszek wrote: >> Folks, this seems to be about C7, not C8, and >> "neutron_legacy_iptables" does not apply here. >> @Jon - what is the kernel bug you mentioned but never referenced? > > There was a previous kernel bug in a Centos kernel that broke DNAT, https://bugs.launchpad.net/neutron/+bug/1776778 but don't know if this is the same issue. I would have hoped no one was using that kernel by now, and/or it was blacklisted. This one also came to my mind when I read about kernel bug here. But this old bug was affecting only DNAT on dvr routers IIRC so IMO it doesn’t seems like same issue. > > -Brian > >> pon., 6 sty 2020 o 13:13 Jon Masters napisał(a): >>> >>> I did specifically check for such a conflict tho before proceeding down the path I went :) >>> >>> -- >>> Computer Architect >>> >>> >>>> On Jan 6, 2020, at 03:40, Sean Mooney wrote: >>>> >>>> On Mon, 2020-01-06 at 10:11 +0100, Radosław Piliszek wrote: >>>>> If it's RHEL kernel's bug, then Red Hat would likely want to know >>>>> about it (if not knowing already). >>>>> I have my kolla deployment on c7.7 and I don't encounter this issue, >>>>> though there is a pending kernel update so now I'm worried about >>>>> applying it... >>>> it sound more like a confilct between legacy iptables and the new nftables based replacement. >>>> if you mix the two then it will appear as if the rules are installed but only some of the rules will run. >>>> so the container images and the host need to be both configured to use the same versions. >>>> >>>> that said fi you are using centos images on a centos host they should be providing your usnign centos 7 or centos 8 on >>>> both. if you try to use centos 7 image on a centos 8 host or centos 8 images on a centos 7 host it would likely have >>>> issues due to the fact centos 8 uses a differt iptables implemeantion >>>> >>>>> >>>>> -yoctozepto >>>>> >>>>> pon., 6 sty 2020 o 03:34 Jon Masters napisał(a): >>>>>> >>>>>> There’s no bug ID that I’m aware of. But I’ll go look for one or file one. >>>>>> >>>>>> -- >>>>>> Computer Architect >>>>>> >>>>>> >>>>>>> On Jan 5, 2020, at 18:51, Laurent Dumont wrote: >>>>>> >>>>>>  >>>>>> Do you happen to have the bug ID for Centos? >>>>>> >>>>>> On Sun, Jan 5, 2020 at 2:11 PM Jon Masters wrote: >>>>>>> >>>>>>> This turns out to a not well documented bug in the CentOS7.7 kernel that causes exactly nat rules not to run as I >>>>>>> was seeing. Oh dear god was this nasty as whatever to find and workaround. >>>>>>> >>>>>>> -- >>>>>>> Computer Architect >>>>>>> >>>>>>> >>>>>>>> On Jan 4, 2020, at 10:39, Jon Masters wrote: >>>>>>>> >>>>>>>> Excuse top posting on my phone. Also, yes, the namespaces are as described. It’s just that the (correct) nat >>>>>>>> rules for the qrouter netns are never running, in spite of the two interfaces existing in that ns and correctly >>>>>>>> attached to the vswitch. >>>>>>>> >>>>>>>> -- >>>>>>>> Computer Architect >>>>>>>> >>>>>>>> >>>>>>>>>> On Jan 4, 2020, at 07:56, Sean Mooney wrote: >>>>>>>>>> >>>>>>>>>> On Sat, 2020-01-04 at 10:46 +0100, Slawek Kaplonski wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Is this qrouter namespace created with all those rules in container or in the host directly? >>>>>>>>>> Do You have qr-xxx and qg-xxx ports from br-int in this qrouter namespace? >>>>>>>>> >>>>>>>>> in kolla the l3 agent should be running with net=host so the container should be useing the hosts >>>>>>>>> root namespace and it will create network namespaces as needed for the different routers. >>>>>>>>> >>>>>>>>> the ip table rules should be in the router sub namespaces. >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> On 4 Jan 2020, at 05:44, Jon Masters wrote: >>>>>>>>>>> >>>>>>>>>>> Hi there, >>>>>>>>>>> >>>>>>>>>>> I've got a weird problem with the neutron-l3-agent container on my deployment. It comes up, sets up the >>>>>>>>>>> iptables >>>>>>>>>>> rules in the qrouter namespace (and I can see these using "ip netns...") but traffic isn't having DNAT or >>>>>>>>>>> SNAT >>>>>>>>>>> applied. What's most strange is that manually adding a LOG jump target to the iptables nat PRE/POSTROUTING >>>>>>>>>>> chains >>>>>>>>>>> (after enabling nf logging sent to the host kernel, confirmed that works) doesn't result in any log >>>>>>>>>>> entries. It's as >>>>>>>>>>> if the nat table isn't being applied at all for any packets traversing the qrouter namespace. This is >>>>>>>>>>> driving me >>>>>>>>>>> crazy :) >>>>>>>>>>> >>>>>>>>>>> Anyone got some quick suggestions? (assume I tried the obvious stuff). >>>>>>>>>>> >>>>>>>>>>> Jon. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Computer Architect >>>>>>>>>> >>>>>>>>>> — >>>>>>>>>> Slawek Kaplonski >>>>>>>>>> Senior software engineer >>>>>>>>>> Red Hat >>>>>>>>>> >>>>>>>>>> >>>>> >>>>> >>>> > — Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Mon Jan 6 20:05:53 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 6 Jan 2020 21:05:53 +0100 Subject: [all][neutron][neutron-fwaas] Maintainers needed In-Reply-To: <20191119102615.oq46xojyhoybulna@skaplons-mac> References: <20191119102615.oq46xojyhoybulna@skaplons-mac> Message-ID: Hi, Just as a reminder, we are still looking for maintainers who want to keep neutron-fwaas project alive. As it was written in my previous email, we will mark this project as deprecated. So please reply to this email or contact me directly if You are interested in maintaining this project. > On 19 Nov 2019, at 11:26, Slawek Kaplonski wrote: > > Hi, > > Over the past couple of cycles we have noticed that new contributions and > maintenance efforts for neutron-fwaas project were almost non existent. > This impacts patches for bug fixes, new features and reviews. The Neutron > core team is trying to at least keep the CI of this project healthy, but we > don’t have enough knowledge about the details of the neutron-fwaas > code base to review more complex patches. > > During the PTG in Shanghai we discussed that with operators and TC members > during the forum session [1] and later within the Neutron team during the > PTG session [2]. > > During these discussions, with the help of operators and TC members, we reached > the conclusion that we need to have someone responsible for maintaining project. > This doesn’t mean that the maintainer needs to spend full time working on this > project. Rather, we need someone to be the contact person for the project, who > takes care of the project’s CI and review patches. Of course that’s only a > minimal requirement. If the new maintainer works on new features for the > project, it’s even better :) > > If we don’t have any new maintainer(s) before milestone Ussuri-2, which is > Feb 10 - Feb 14 according to [3], we will need to mark neutron-fwaas > as deprecated and in “V” cycle we will propose to move the project > from the Neutron stadium, hosted in the “openstack/“ namespace, to the > unofficial projects hosted in the “x/“ namespace. > > So if You are using this project now, or if You have customers who are > using it, please consider the possibility of maintaining it. Otherwise, > please be aware that it is highly possible that the project will be > deprecated and moved out from the official OpenStack projects. > > [1] > https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward > [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored - > Lines 379-421 > [3] https://releases.openstack.org/ussuri/schedule.html > > -- > Slawek Kaplonski > Senior software engineer > Red Hat — Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Mon Jan 6 20:06:13 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 6 Jan 2020 21:06:13 +0100 Subject: [all][neutron][neutron-vpnaas] Maintainers needed In-Reply-To: <20191119104137.pkra6hehfhdjjhh3@skaplons-mac> References: <20191119104137.pkra6hehfhdjjhh3@skaplons-mac> Message-ID: Hi, Just as a reminder, we are still looking for maintainers who want to keep neutron-vpnaas project alive. As it was written in my previous email, we will mark this project as deprecated. So please reply to this email or contact me directly if You are interested in maintaining this project. > On 19 Nov 2019, at 11:41, Slawek Kaplonski wrote: > > Hi, > > Over the past couple of cycles we have noticed that new contributions and > maintenance efforts for neutron-vpnaas were almost non existent. > This impacts patches for bug fixes, new features and reviews. The Neutron > core team is trying to at least keep the CI of this project healthy, but we > don’t have enough knowledge about the details of the neutron-vpnaas > code base to review more complex patches. > > During the PTG in Shanghai we discussed that with operators and TC members > during the forum session [1] and later within the Neutron team during the > PTG session [2]. > > During these discussions, with the help of operators and TC members, we reached > the conclusion that we need to have someone responsible for maintaining project. > This doesn’t mean that the maintainer needs to spend full time working on this > project. Rather, we need someone to be the contact person for the project, who > takes care of the project’s CI and review patches. Of course that’s only a > minimal requirement. If the new maintainer works on new features for the > project, it’s even better :) > > If we don’t have any new maintainer(s) before milestone Ussuri-2, which is > Feb 10 - Feb 14 according to [3], we will need to mark neutron-vpnaas > as deprecated and in “V” cycle we will propose to move the project > from the Neutron stadium, hosted in the “openstack/“ namespace, to the > unofficial projects hosted in the “x/“ namespace. > > So if You are using this project now, or if You have customers who are > using it, please consider the possibility of maintaining it. Otherwise, > please be aware that it is highly possible that the project will be > deprecated and moved out from the official OpenStack projects. > > [1] > https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward > [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored - > Lines 379-421 > [3] https://releases.openstack.org/ussuri/schedule.html > > -- > Slawek Kaplonski > Senior software engineer > Red Hat — Slawek Kaplonski Senior software engineer Red Hat From neil at tigera.io Mon Jan 6 20:38:11 2020 From: neil at tigera.io (Neil Jerram) Date: Mon, 6 Jan 2020 20:38:11 +0000 Subject: [all] Is there something I can do to get a simple fix done? Message-ID: I'm struggling to say this positively, but... it feels like OpenStack promotes refactoring work that will likely break something, but is very slow when a corresponding fix is needed, even when the fix is trivial. Is there something we could do to get fixes done more quickly when needed? My case in point: my team's networking plugin (networking-calico) does not do "extraroutes", and so was broken by some python-openstackclient change (possibly [1]) that wrongly assumed that. I posted a fix [2] that passed CI on 21st October, and asked for it to be reviewed on IRC a couple of days later. Édouard Thuleau posted a similar fix [3] on 4th December, and we agreed that his was better, so I abandoned mine. His fix attracted a +2 on 11th December, but has been sitting like that ever since. It's a fix that I would expect to be simple to review, so I wonder if there's something else we could have done here to get this moving? Or if there is a systemic problem here that deserves discussion? [1] https://opendev.org/openstack/python-openstackclient/commit/c44f26eb7e41c28bb13ef9bd31c8ddda9e638862 [2] https://review.opendev.org/#/c/685312/ [3] https://review.opendev.org/#/c/697240/ Many thanks, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Mon Jan 6 21:29:27 2020 From: openstack at fried.cc (Eric Fried) Date: Mon, 6 Jan 2020 15:29:27 -0600 Subject: [all] Is there something I can do to get a simple fix done? In-Reply-To: References: Message-ID: Neil- > Édouard Thuleau posted a similar fix [3] on 4th > December, and we agreed that his was better, so I abandoned mine. His > fix attracted a +2 on 11th December, but has been sitting like that ever > since. Expecting any fix - even a trivial one - to get merged in less than a month when that month includes most of December is expecting a lot. That said... > It's a fix that I would expect to be simple to review, so I wonder if > there's something else we could have done here to get this moving?  Or > if there is a systemic problem here that deserves discussion? In this specific case I think the issue is a dearth of "core hours" available to the python-openstackclient project. Dean (dtroyer) and Monty (mordred) are the main cores there, and their time is *very* divided. Absent some signal of urgency to garner their attention and cause them to prioritize it over other work, a given change has a decent chance of languishing indefinitely, particularly as other more urgent work never ceases to pile up. > I'm struggling to say this positively, but... it feels like OpenStack > promotes refactoring work that will likely break something, but is very > slow when a corresponding fix is needed, even when the fix is trivial. > Is there something we could do to get fixes done more quickly when needed? Presumably this generalization is based on more than just the above fix. Certainly it doesn't apply to all patches in all projects. But to the extent that it is true, it often has the same root cause: shortage of maintainers. This is a message that needs to be taken back to companies that continue to expect OpenStack to be maintained without investing in the human power necessary to create cores. #brokenrecord efried . From juliaashleykreger at gmail.com Mon Jan 6 21:32:57 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 6 Jan 2020 13:32:57 -0800 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: <7b55e3b28d644492a846fdb10f7b127b@AUSX13MPS308.AMER.DELL.COM> References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> <7b55e3b28d644492a846fdb10f7b127b@AUSX13MPS308.AMER.DELL.COM> Message-ID: Greetings Arkady, I think your message makes a very good case and raises a point that I've been trying to type out for the past hour, but with only different words. We have multiple USER driven interactions with a similarly desired, if not the exact same desired end result where different paths can be taken, as we perceive use cases from "As a user, I would like a VM with a configured accelerator", "I would like any compute resource (VM or Baremetal), with a configured accelerator", to "As an administrator, I need to reallocate a baremetal node for this different use, so my user can leverage its accelerator once they know how and are ready to use it.", and as suggested "I as a user want baremetal with k8s and configured accelerators." And I suspect this diversity of use patterns is where things begin to become difficult. As such I believe, we in essence, have a question of a support or compatibility matrix that definitely has gaps depending on "how" the "user" wants or needs to achieve their goals. And, I think where this entire discussion _can_ go sideways is... (from what I understand) some of these devices need to be flashed by the application user with firmware on demand to meet the user's needs, which is where lifecycle and support interactions begin to become... conflicted. Further complicating matters is the "Metal to Tenant" use cases where the user requesting the machine is not an administrator, but has some level of inherent administrative access to all Operating System accessible devices once their OS has booted. Which makes me wonder "What if the cloud administrators WANT to block the tenant's direct ability to write/flash firmware into accelerator/smartnic/etc?" I suspect if cloud administrators want to block such hardware access, vendors will want to support such a capability. Blocking such access inherently forces some actions into hardware management/maintenance workflows, and may ultimately may cause some of a support matrix's use cases to be unsupportable, again ultimately depending on what exactly the user is attempting to achieve. Going back to the suggestions in the original email, They seem logical to me in terms of the delineation and separation of responsibilities as we present a cohesive solution the users of our software. Greetings Zhipeng, Is there any documentation at present that details the desired support and use cases? I think this would at least help my understanding, since everything that requires the power to be on would still need to be integrated with-in workflows for eventual tighter integration. Also, has Cyborg drafted any plans or proposals for integration? -Julia On Mon, Jan 6, 2020 at 9:14 AM wrote: > > Zhipeng, > > Thanks for quick feedback. > > Where is accelerating device is running? I am aware of 3 possibilities: servers, storage, switches. > > In each one of them the device is managed as part of server, storage box or switch. > > > > The core of my message is separation of device life cycle management in the “box” where it is placed, from the programming the device as needed per application (VM, container). > > > > Thanks, > Arkady > > > > From: Zhipeng Huang > Sent: Friday, January 3, 2020 7:53 PM > To: Kanevsky, Arkady > Cc: OpenStack Discuss > Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management > > > > [EXTERNAL EMAIL] > > Hi Arkady, > > > > Thanks for your interest in Cyborg project :) I would like to point out that when we initiated the project there are two specific use cases we want to cover: the accelerators attached locally (via PCIe or other bus type) or remotely (via Ethernet or other fabric type). > > > > For the latter one, it is clear that its life cycle is independent from the server (like block device managed by Cinder). For the former one however, its life cycle is not dependent on server for all kinds of accelerators either. For example we already have PCIe based AI accelerator cards or Smart NICs that could be power on/off when the server is on all the time. > > > > Therefore it is not a good idea to move all the life cycle management part into Ironic for the above mentioned reasons. Ironic integration is very important for the standalone usage of Cyborg for Kubernetes, Envoy (TLS acceleration) and others alike. > > > > Hope this answers your question :) > > > > On Sat, Jan 4, 2020 at 5:23 AM wrote: > > Fellow Open Stackers, > > I have been thinking on how to handle SmartNICs, GPUs, FPGA handling across different projects within OpenStack with Cyborg taking a leading role in it. > > > > Cyborg is important project and address accelerator devices that are part of the server and potentially switches and storage. > > It is address 3 different use cases and users there are all grouped into single project. > > > > Application user need to program a portion of the device under management, like GPU, or SmartNIC for that app usage. Having a common way to do it across different device families and across different vendor is very important. And that has to be done every time a VM is deploy that need usage of a device. That is tied with VM scheduling. > Administrator need to program the whole device for specific usage. That covers the scenario when device can only support single tenant or single use case. That is done once during OpenStack deployment but may need reprogramming to configure device for different usage. May or may not require reboot of the server. > Administrator need to setup device for its use, like burning specific FW on it. This is typically done as part of server life-cycle event. > > > > The first 2 cases cover application life cycle of device usage. > > The last one covers device life cycle independently how it is used. > > > > Managing life cycle of devices is Ironic responsibility, One cannot and should not manage lifecycle of server components independently. Managing server devices outside server management violates customer service agreements with server vendors and breaks server support agreements. > > Nova and Neutron are getting info about all devices and their capabilities from Ironic; that they use for scheduling. We should avoid creating new project for every new component of the server and modify nova and neuron for each new device. (the same will also apply to cinder and manila if smart devices used in its data/control path on a server). > > Finally we want Cyborg to be able to be used in standalone capacity, say for Kubernetes. > > > > Thus, I propose that Cyborg cover use cases 1 & 2, and Ironic would cover use case 3. > > Thus, move all device Life-cycle code from Cyborg to Ironic. > > Concentrate Cyborg of fulfilling the first 2 use cases. > > Simplify integration with Nova and Neutron for using these accelerators to use existing Ironic mechanism for it. > > Create idempotent calls for use case 1 so Nova and Neutron can use it as part of VM deployment to ensure that devices are programmed for VM under scheduling need. > > Create idempotent call(s) for use case 2 for TripleO to setup device for single accelerator usage of a node. > > [Propose similar model for CNI integration.] > > > > Let the discussion start! > > > > Thanks., > Arkady > > > > > -- > > Zhipeng (Howard) Huang > > > > Principle Engineer > > OpenStack, Kubernetes, CNCF, LF Edge, ONNX, Kubeflow, OpenSDS, Open Service Broker API, OCP, Hyperledger, ETSI, SNIA, DMTF, W3C > > From jcm at jonmasters.org Mon Jan 6 22:27:43 2020 From: jcm at jonmasters.org (Jon Masters) Date: Mon, 6 Jan 2020 14:27:43 -0800 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: References: <9a331abcc2d5eaf119dc1c1903c3405024ce84a8.camel@redhat.com> Message-ID: https://bugs.launchpad.net/kolla/+bug/1858505 On Mon, Jan 6, 2020 at 11:56 AM Slawek Kaplonski wrote: > Hi, > > > On 6 Jan 2020, at 16:15, Brian Haley wrote: > > > > On 1/6/20 7:33 AM, Radosław Piliszek wrote: > >> Folks, this seems to be about C7, not C8, and > >> "neutron_legacy_iptables" does not apply here. > >> @Jon - what is the kernel bug you mentioned but never referenced? > > > > There was a previous kernel bug in a Centos kernel that broke DNAT, > https://bugs.launchpad.net/neutron/+bug/1776778 but don't know if this is > the same issue. I would have hoped no one was using that kernel by now, > and/or it was blacklisted. > > This one also came to my mind when I read about kernel bug here. But this > old bug was affecting only DNAT on dvr routers IIRC so IMO it doesn’t seems > like same issue. > > > > > -Brian > > > >> pon., 6 sty 2020 o 13:13 Jon Masters napisał(a): > >>> > >>> I did specifically check for such a conflict tho before proceeding > down the path I went :) > >>> > >>> -- > >>> Computer Architect > >>> > >>> > >>>> On Jan 6, 2020, at 03:40, Sean Mooney wrote: > >>>> > >>>> On Mon, 2020-01-06 at 10:11 +0100, Radosław Piliszek wrote: > >>>>> If it's RHEL kernel's bug, then Red Hat would likely want to know > >>>>> about it (if not knowing already). > >>>>> I have my kolla deployment on c7.7 and I don't encounter this issue, > >>>>> though there is a pending kernel update so now I'm worried about > >>>>> applying it... > >>>> it sound more like a confilct between legacy iptables and the new > nftables based replacement. > >>>> if you mix the two then it will appear as if the rules are installed > but only some of the rules will run. > >>>> so the container images and the host need to be both configured to > use the same versions. > >>>> > >>>> that said fi you are using centos images on a centos host they should > be providing your usnign centos 7 or centos 8 on > >>>> both. if you try to use centos 7 image on a centos 8 host or centos 8 > images on a centos 7 host it would likely have > >>>> issues due to the fact centos 8 uses a differt iptables implemeantion > >>>> > >>>>> > >>>>> -yoctozepto > >>>>> > >>>>> pon., 6 sty 2020 o 03:34 Jon Masters > napisał(a): > >>>>>> > >>>>>> There’s no bug ID that I’m aware of. But I’ll go look for one or > file one. > >>>>>> > >>>>>> -- > >>>>>> Computer Architect > >>>>>> > >>>>>> > >>>>>>> On Jan 5, 2020, at 18:51, Laurent Dumont > wrote: > >>>>>> > >>>>>>  > >>>>>> Do you happen to have the bug ID for Centos? > >>>>>> > >>>>>> On Sun, Jan 5, 2020 at 2:11 PM Jon Masters > wrote: > >>>>>>> > >>>>>>> This turns out to a not well documented bug in the CentOS7.7 > kernel that causes exactly nat rules not to run as I > >>>>>>> was seeing. Oh dear god was this nasty as whatever to find and > workaround. > >>>>>>> > >>>>>>> -- > >>>>>>> Computer Architect > >>>>>>> > >>>>>>> > >>>>>>>> On Jan 4, 2020, at 10:39, Jon Masters wrote: > >>>>>>>> > >>>>>>>> Excuse top posting on my phone. Also, yes, the namespaces are as > described. It’s just that the (correct) nat > >>>>>>>> rules for the qrouter netns are never running, in spite of the > two interfaces existing in that ns and correctly > >>>>>>>> attached to the vswitch. > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Computer Architect > >>>>>>>> > >>>>>>>> > >>>>>>>>>> On Jan 4, 2020, at 07:56, Sean Mooney > wrote: > >>>>>>>>>> > >>>>>>>>>> On Sat, 2020-01-04 at 10:46 +0100, Slawek Kaplonski wrote: > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> Is this qrouter namespace created with all those rules in > container or in the host directly? > >>>>>>>>>> Do You have qr-xxx and qg-xxx ports from br-int in this qrouter > namespace? > >>>>>>>>> > >>>>>>>>> in kolla the l3 agent should be running with net=host so the > container should be useing the hosts > >>>>>>>>> root namespace and it will create network namespaces as needed > for the different routers. > >>>>>>>>> > >>>>>>>>> the ip table rules should be in the router sub namespaces. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>> On 4 Jan 2020, at 05:44, Jon Masters > wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi there, > >>>>>>>>>>> > >>>>>>>>>>> I've got a weird problem with the neutron-l3-agent container > on my deployment. It comes up, sets up the > >>>>>>>>>>> iptables > >>>>>>>>>>> rules in the qrouter namespace (and I can see these using "ip > netns...") but traffic isn't having DNAT or > >>>>>>>>>>> SNAT > >>>>>>>>>>> applied. What's most strange is that manually adding a LOG > jump target to the iptables nat PRE/POSTROUTING > >>>>>>>>>>> chains > >>>>>>>>>>> (after enabling nf logging sent to the host kernel, confirmed > that works) doesn't result in any log > >>>>>>>>>>> entries. It's as > >>>>>>>>>>> if the nat table isn't being applied at all for any packets > traversing the qrouter namespace. This is > >>>>>>>>>>> driving me > >>>>>>>>>>> crazy :) > >>>>>>>>>>> > >>>>>>>>>>> Anyone got some quick suggestions? (assume I tried the obvious > stuff). > >>>>>>>>>>> > >>>>>>>>>>> Jon. > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Computer Architect > >>>>>>>>>> > >>>>>>>>>> — > >>>>>>>>>> Slawek Kaplonski > >>>>>>>>>> Senior software engineer > >>>>>>>>>> Red Hat > >>>>>>>>>> > >>>>>>>>>> > >>>>> > >>>>> > >>>> > > > > — > Slawek Kaplonski > Senior software engineer > Red Hat > > -- Computer Architect -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Mon Jan 6 22:48:19 2020 From: openstack at fried.cc (Eric Fried) Date: Mon, 6 Jan 2020 16:48:19 -0600 Subject: [cliff][docs][requirements] new cliff versions causes docs to fail to build In-Reply-To: <20191222175308.juzyu6grndfcf2ez@mthode.org> References: <20191222175308.juzyu6grndfcf2ez@mthode.org> Message-ID: On 12/22/19 11:53 AM, Matthew Thode wrote: > Looks like some things changed in the new version that we depended upon > and are now causing failures. > > Exception occurred: > File "/home/zuul/src/opendev.org/openstack/python-openstackclient/.tox/docs/lib/python3.6/site-packages/cliff/sphinxext.py", line 245, in _load_app > if not issubclass(cliff_app_class, app.App): > TypeError: issubclass() arg 1 must be a class > This should have been fixed by [1], which is in cliff since 2.14.0. The python-openstackclient docs target (which IIUC still uses the def in tox.ini?) pulls in requirements.txt which lists cliff!=2.9.0,>=2.8.0 # Apache-2.0 and upper-constraints, which is at 2.16.0. All that seems copacetic to me. I also can't reproduce the failure locally building python-openstackclient docs from scratch. What/where/how were you building when you encountered this? efried [1] https://review.opendev.org/#/c/614218/ From openstack at fried.cc Mon Jan 6 22:59:45 2020 From: openstack at fried.cc (Eric Fried) Date: Mon, 6 Jan 2020 16:59:45 -0600 Subject: [cliff][docs][requirements] new cliff versions causes docs to fail to build In-Reply-To: References: <20191222175308.juzyu6grndfcf2ez@mthode.org> Message-ID: <1f796271-40f9-f93b-17b8-9ed30c91e51a@fried.cc> > cliff!=2.9.0,>=2.8.0 # Apache-2.0 I guess it wouldn't hurt to bump this to >=2.14.0 efried . From andrei.perepiolkin at open-e.com Tue Jan 7 06:28:09 2020 From: andrei.perepiolkin at open-e.com (Andrei Perapiolkin) Date: Tue, 7 Jan 2020 08:28:09 +0200 Subject: [kolla] Quick start: ansible deploy failure In-Reply-To: References: <88226158-d15c-7f23-e692-aa461f6d8549@open-e.com> Message-ID: Hi Radosław, Thanks for answering me. Yes I was deploying "Stein". And Yes, after setting openstack_release to Train error disappeared. Many thanks again Radosław. Andrei Perepiolkin On 1/6/20 11:08 AM, Radosław Piliszek wrote: > Hi Andrei, > > I see you use kolla-ansible for Train, yet it looks as if you are > deploying Stein there. > Could you confirm that? > If you prefer to deploy Stein, please use the Stein branch of > kolla-ansible or analogically the 8.* releases from PyPI. > Otherwise try deploying Train. > > -yoctozepto > > pon., 6 sty 2020 o 05:58 Andrei Perapiolkin > napisał(a): >> Hello, >> >> >> Im following quick start guide on deploying Kolla ansible and getting failure on deploy stage: >> >> https://docs.openstack.org/kolla-ansible/latest/user/quickstart.html >> >> kolla-ansible -i ./multinode deploy >> >> TASK [mariadb : Creating haproxy mysql user] ******************************************************************************************************************************** >> >> fatal: [control01]: FAILED! => {"changed": false, "msg": "Can not parse the inner module output: localhost | SUCCESS => {\n \"changed\": false, \n \"user\": \"haproxy\"\n}\n"} >> >> >> I deploy to Centos7 with latest updates. >> >> >> [user at master ~]$ pip list >> DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support >> Package Version >> -------------------------------- ---------- >> ansible 2.9.1 >> Babel 2.8.0 >> backports.ssl-match-hostname 3.7.0.1 >> certifi 2019.11.28 >> cffi 1.13.2 >> chardet 3.0.4 >> configobj 4.7.2 >> cryptography 2.8 >> debtcollector 1.22.0 >> decorator 3.4.0 >> docker 4.1.0 >> enum34 1.1.6 >> funcsigs 1.0.2 >> httplib2 0.9.2 >> idna 2.8 >> iniparse 0.4 >> ipaddress 1.0.23 >> IPy 0.75 >> iso8601 0.1.12 >> Jinja2 2.10.3 >> jmespath 0.9.4 >> kitchen 1.1.1 >> kolla-ansible 9.0.0 >> MarkupSafe 1.1.1 >> monotonic 1.5 >> netaddr 0.7.19 >> netifaces 0.10.9 >> oslo.config 6.12.0 >> oslo.i18n 3.25.0 >> oslo.utils 3.42.1 >> paramiko 2.1.1 >> pbr 5.4.4 >> perf 0.1 >> pip 19.3.1 >> ply 3.4 >> policycoreutils-default-encoding 0.1 >> pyasn1 0.1.9 >> pycparser 2.19 >> pycurl 7.19.0 >> pygobject 3.22.0 >> pygpgme 0.3 >> pyliblzma 0.5.3 >> pyparsing 2.4.6 >> python-linux-procfs 0.4.9 >> pytz 2019.3 >> pyudev 0.15 >> pyxattr 0.5.1 >> PyYAML 5.2 >> requests 2.22.0 >> rfc3986 1.3.2 >> schedutils 0.4 >> seobject 0.1 >> sepolicy 1.1 >> setuptools 44.0.0 >> six 1.13.0 >> slip 0.4.0 >> slip.dbus 0.4.0 >> stevedore 1.31.0 >> urlgrabber 3.10 >> urllib3 1.25.7 >> websocket-client 0.57.0 >> wrapt 1.11.2 >> yum-metadata-parser 1.1.4 >> >> >> and it looks like Im not alone with such issue: https://q.cnblogs.com/q/125213/ >> >> >> Thanks for your attention, >> >> Andrei Perepiolkin From jiaopengju at cmss.chinamobile.com Tue Jan 7 07:43:47 2020 From: jiaopengju at cmss.chinamobile.com (jiaopengju) Date: Tue, 07 Jan 2020 15:43:47 +0800 Subject: [largescale-sig] Meeting summary and next actions In-Reply-To: <06e5f16f-dfa4-8189-da7b-ad2250df8125@openstack.org> References: <3c3a6232-9a3b-d240-ab82-c7ac4997f5c0@openstack.org> <06e5f16f-dfa4-8189-da7b-ad2250df8125@openstack.org> Message-ID: 2- move the meeting to January 22, but we may lose Chinese participants to new year preparations... Thank you ttx. The second option is OK for me, I will be online on January 22. -- Pengju Jiao(jiaopengju) 在 2020/1/6 下午10:32,“Thierry Carrez” 写入: Thierry Carrez wrote: > [...] > The next meeting will happen on January 15, at 9:00 UTC on > #openstack-meeting. Oops, some unexpected travel came up and I won't be available to chair the meeting on that date. We can either: 1- keep the meeting, with someone else chairing. I can help with posting the agenda before and the summary after, just need someone to start the meeting and lead it -- any volunteer? 2- move the meeting to January 22, but we may lose Chinese participants to new year preparations... Thoughts? -- Thierry Carrez (ttx) From radoslaw.piliszek at gmail.com Tue Jan 7 07:51:10 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 7 Jan 2020 08:51:10 +0100 Subject: [kolla] neutron-l3-agent namespace NAT table not working? In-Reply-To: References: <9a331abcc2d5eaf119dc1c1903c3405024ce84a8.camel@redhat.com> Message-ID: Thanks, Jon, though it's still too general to deduce anything more. Please see my comments on bug. -yoctozepto pon., 6 sty 2020 o 23:27 Jon Masters napisał(a): > > https://bugs.launchpad.net/kolla/+bug/1858505 > > On Mon, Jan 6, 2020 at 11:56 AM Slawek Kaplonski wrote: >> >> Hi, >> >> > On 6 Jan 2020, at 16:15, Brian Haley wrote: >> > >> > On 1/6/20 7:33 AM, Radosław Piliszek wrote: >> >> Folks, this seems to be about C7, not C8, and >> >> "neutron_legacy_iptables" does not apply here. >> >> @Jon - what is the kernel bug you mentioned but never referenced? >> > >> > There was a previous kernel bug in a Centos kernel that broke DNAT, https://bugs.launchpad.net/neutron/+bug/1776778 but don't know if this is the same issue. I would have hoped no one was using that kernel by now, and/or it was blacklisted. >> >> This one also came to my mind when I read about kernel bug here. But this old bug was affecting only DNAT on dvr routers IIRC so IMO it doesn’t seems like same issue. >> >> > >> > -Brian >> > >> >> pon., 6 sty 2020 o 13:13 Jon Masters napisał(a): >> >>> >> >>> I did specifically check for such a conflict tho before proceeding down the path I went :) >> >>> >> >>> -- >> >>> Computer Architect >> >>> >> >>> >> >>>> On Jan 6, 2020, at 03:40, Sean Mooney wrote: >> >>>> >> >>>> On Mon, 2020-01-06 at 10:11 +0100, Radosław Piliszek wrote: >> >>>>> If it's RHEL kernel's bug, then Red Hat would likely want to know >> >>>>> about it (if not knowing already). >> >>>>> I have my kolla deployment on c7.7 and I don't encounter this issue, >> >>>>> though there is a pending kernel update so now I'm worried about >> >>>>> applying it... >> >>>> it sound more like a confilct between legacy iptables and the new nftables based replacement. >> >>>> if you mix the two then it will appear as if the rules are installed but only some of the rules will run. >> >>>> so the container images and the host need to be both configured to use the same versions. >> >>>> >> >>>> that said fi you are using centos images on a centos host they should be providing your usnign centos 7 or centos 8 on >> >>>> both. if you try to use centos 7 image on a centos 8 host or centos 8 images on a centos 7 host it would likely have >> >>>> issues due to the fact centos 8 uses a differt iptables implemeantion >> >>>> >> >>>>> >> >>>>> -yoctozepto >> >>>>> >> >>>>> pon., 6 sty 2020 o 03:34 Jon Masters napisał(a): >> >>>>>> >> >>>>>> There’s no bug ID that I’m aware of. But I’ll go look for one or file one. >> >>>>>> >> >>>>>> -- >> >>>>>> Computer Architect >> >>>>>> >> >>>>>> >> >>>>>>> On Jan 5, 2020, at 18:51, Laurent Dumont wrote: >> >>>>>> >> >>>>>>  >> >>>>>> Do you happen to have the bug ID for Centos? >> >>>>>> >> >>>>>> On Sun, Jan 5, 2020 at 2:11 PM Jon Masters wrote: >> >>>>>>> >> >>>>>>> This turns out to a not well documented bug in the CentOS7.7 kernel that causes exactly nat rules not to run as I >> >>>>>>> was seeing. Oh dear god was this nasty as whatever to find and workaround. >> >>>>>>> >> >>>>>>> -- >> >>>>>>> Computer Architect >> >>>>>>> >> >>>>>>> >> >>>>>>>> On Jan 4, 2020, at 10:39, Jon Masters wrote: >> >>>>>>>> >> >>>>>>>> Excuse top posting on my phone. Also, yes, the namespaces are as described. It’s just that the (correct) nat >> >>>>>>>> rules for the qrouter netns are never running, in spite of the two interfaces existing in that ns and correctly >> >>>>>>>> attached to the vswitch. >> >>>>>>>> >> >>>>>>>> -- >> >>>>>>>> Computer Architect >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>>> On Jan 4, 2020, at 07:56, Sean Mooney wrote: >> >>>>>>>>>> >> >>>>>>>>>> On Sat, 2020-01-04 at 10:46 +0100, Slawek Kaplonski wrote: >> >>>>>>>>>> Hi, >> >>>>>>>>>> >> >>>>>>>>>> Is this qrouter namespace created with all those rules in container or in the host directly? >> >>>>>>>>>> Do You have qr-xxx and qg-xxx ports from br-int in this qrouter namespace? >> >>>>>>>>> >> >>>>>>>>> in kolla the l3 agent should be running with net=host so the container should be useing the hosts >> >>>>>>>>> root namespace and it will create network namespaces as needed for the different routers. >> >>>>>>>>> >> >>>>>>>>> the ip table rules should be in the router sub namespaces. >> >>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>>>> On 4 Jan 2020, at 05:44, Jon Masters wrote: >> >>>>>>>>>>> >> >>>>>>>>>>> Hi there, >> >>>>>>>>>>> >> >>>>>>>>>>> I've got a weird problem with the neutron-l3-agent container on my deployment. It comes up, sets up the >> >>>>>>>>>>> iptables >> >>>>>>>>>>> rules in the qrouter namespace (and I can see these using "ip netns...") but traffic isn't having DNAT or >> >>>>>>>>>>> SNAT >> >>>>>>>>>>> applied. What's most strange is that manually adding a LOG jump target to the iptables nat PRE/POSTROUTING >> >>>>>>>>>>> chains >> >>>>>>>>>>> (after enabling nf logging sent to the host kernel, confirmed that works) doesn't result in any log >> >>>>>>>>>>> entries. It's as >> >>>>>>>>>>> if the nat table isn't being applied at all for any packets traversing the qrouter namespace. This is >> >>>>>>>>>>> driving me >> >>>>>>>>>>> crazy :) >> >>>>>>>>>>> >> >>>>>>>>>>> Anyone got some quick suggestions? (assume I tried the obvious stuff). >> >>>>>>>>>>> >> >>>>>>>>>>> Jon. >> >>>>>>>>>>> >> >>>>>>>>>>> -- >> >>>>>>>>>>> Computer Architect >> >>>>>>>>>> >> >>>>>>>>>> — >> >>>>>>>>>> Slawek Kaplonski >> >>>>>>>>>> Senior software engineer >> >>>>>>>>>> Red Hat >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>> >> >>>>> >> >>>> >> > >> >> — >> Slawek Kaplonski >> Senior software engineer >> Red Hat >> > > > -- > Computer Architect From martialmichel at datamachines.io Tue Jan 7 02:38:32 2020 From: martialmichel at datamachines.io (Martial Michel) Date: Mon, 6 Jan 2020 21:38:32 -0500 Subject: [Scientific] Scientific SIG meeting July 7th 2100 UTC Message-ID: The first meeting of the new year at 2100 UTC on Tuesday, July 7th. Mostly an Any Other Business meeting to get its participants back from the holidays as reflected by the agenda :) https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_January_7th_2020 -------------- next part -------------- An HTML attachment was scrubbed... URL: From martialmichel at datamachines.io Tue Jan 7 02:38:32 2020 From: martialmichel at datamachines.io (Martial Michel) Date: Mon, 6 Jan 2020 21:38:32 -0500 Subject: [Scientific] Scientific SIG meeting July 7th 2100 UTC Message-ID: The first meeting of the new year at 2100 UTC on Tuesday, July 7th. Mostly an Any Other Business meeting to get its participants back from the holidays as reflected by the agenda :) https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_January_7th_2020 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sshnaidm at redhat.com Tue Jan 7 11:20:04 2020 From: sshnaidm at redhat.com (Sagi Shnaidman) Date: Tue, 7 Jan 2020 13:20:04 +0200 Subject: [all][tripleo][openstack-ansible] Openstack Ansible modules - next steps In-Reply-To: References: Message-ID: Hi, last meeting was pretty short and only 2 participants due to holidays, so I think we can discuss this week the same agenda. I remind that we agreed to move ansible modules after 13 January. - what is the best strategy for freezing current modules in Ansible? Because a few patches were merged just recently [1] Seems like "freezing" doesn't really work. - python versions support in modules - keeping history when moving modules and other topics [2] Please add your questions to "Open discussion" section if there are some. Thanks [1] https://github.com/ansible/ansible/commits/devel/lib/ansible/modules/cloud/openstack [2] https://etherpad.openstack.org/p/openstack-ansible-modules On Fri, Dec 13, 2019 at 12:00 AM Sagi Shnaidman wrote: > Hi, all > short minutes from the meeting today about moving of Openstack Ansible > modules to Openstack. > > 1. Because of some level of uncertainty and different opinions, the > details of treatment of old modules will be under discussion in ML. I'll > send a mail about this topic. > 2. We agreed to have modules under "openstack." namespace and named > "cloud". So regular modules will be named like "openstack.cloud.os_server" > for example. > 3. We agreed to keep Ansible modules as thin as possible, putting the > logic into SDK. > 4. Also we will keep compatibility with as much Ansible versions as > possible. > 5. We agreed to have manual releases of Ansible modules as much as we > need. Similarly as it's done with SDK. > > Logs: > http://eavesdrop.openstack.org/meetings/api_sig/2019/api_sig.2019-12-12-16.00.log.html > Minutes: > http://eavesdrop.openstack.org/meetings/api_sig/2019/api_sig.2019-12-12-16.00.html > Etherpad: https://etherpad.openstack.org/p/openstack-ansible-modules > > Next time: Thursday 19 Dec 2019 4.00 PM UTC. > > Thanks > > On Fri, Dec 6, 2019 at 12:03 AM Sagi Shnaidman > wrote: > >> Hi, all >> short minutes from the meeting today about Openstack Ansible modules. >> >> 1. Ansible 2.10 is going to move all modules to collections, so Openstack >> modules should find a new home in Openstack repos. >> 2. Namespace for openstack modules will be named "openstack.". What is >> coming after the dot is still under discussion. >> 3. Current modules will be migrated to collections in "openstack." as is >> with their names and will be still available for playbooks (via >> symlinking). It will avoid breaking people that use in their playbooks os_* >> modules now. >> 4. Old modules will be frozen after migrations and all development work >> will go in the new modules which will live aside. >> 5. Critical bugfixes to 2.9 versions will be done via Ansible GitHub repo >> as usual and synced manually to "openstack." collection. It must be a very >> exceptional case. >> 6. Migrations are set for mid of January 2020 approximately. >> 7. Modules should stay compatible with last Ansible and collections API >> changed. >> 8. Because current old modules are licensed with GPL and license of >> Openstack is Apache2, we need to figure out if we can either relicense them >> or develop new ones with different license or to continue to work on new >> ones with GPL in SIG repo. Agreed to ask on legal-discuss ML. >> >> Long minutes: >> http://eavesdrop.openstack.org/meetings/api_sig/2019/api_sig.2019-12-05-16.00.html >> Logs: >> http://eavesdrop.openstack.org/meetings/api_sig/2019/api_sig.2019-12-05-16.00.log.html >> >> Etherpad: https://etherpad.openstack.org/p/openstack-ansible-modules >> Next time Thursday 12 Dec 2019 4.00 PM UTC. >> >> Thanks >> >> On Tue, Dec 3, 2019 at 8:18 PM Sagi Shnaidman >> wrote: >> >>> Hi, all >>> In the meeting today we agreed to meet every Thursday starting *this >>> week* at 4.00 PM UTC on #openstack-sdks channel on Freenode. We'll >>> discuss everything related to Openstack Ansible modules. >>> Agenda and topics are in the etherpad: >>> https://etherpad.openstack.org/p/openstack-ansible-modules >>> (I've created a new one, because we don't limit to Ironic modules only, >>> it's about all of them in general) >>> >>> Short minutes from meeting today: >>> Organizational: >>> 1. We meet every Thursday from this week at 4.00 PM UTC on >>> #openstack-sdks >>> 2. Interested parties for now are: Ironic, Tripleo, Openstack-Ansible, >>> Kolla-ansible, OpenstackSDK teams. Feel free to join and add yourself in >>> the etherpad. [1] >>> 3. We'll track our work in Storyboard for ansible-collections-openstack >>> (in progress) >>> 4. Openstack Ansible modules will live as collections under Ansible SIG >>> in repo openstack/ansible-collections-openstack [2] because there are >>> issues with different licensing: GPLv3 for Ansible in upstream and >>> Openstack license (Apache2). >>> 5. Ansible upstream Openstack modules will be merge-frozen when we'll >>> have our collections fully working and will be deprecated from Ansible at >>> some point in the future. >>> 6. Openstack Ansible collections will be published to Galaxy. >>> 7. There is a list of people that can be pinged for reviews in >>> ansible-collections-openstack project, feel free to join there [1] >>> >>> Technical: >>> 1. We use openstacksdk instead of [project]client modules. >>> 2. We will rename modules to be more like os_[service_type] named, >>> examples are in Ironic modules etherpad [3] >>> >>> Logs from meeting today you can find here: >>> http://eavesdrop.openstack.org/meetings/ansible_sig/2019/ansible_sig.2019-12-03-15.01.log.html >>> Please feel free to participate and add topics to agenda. [1] >>> >>> [1] https://etherpad.openstack.org/p/openstack-ansible-modules >>> [2] https://review.opendev.org/#/c/684740/ >>> [3] https://etherpad.openstack.org/p/ironic-ansible-modules >>> >>> Thanks >>> >>> On Wed, Nov 27, 2019 at 7:57 PM Sagi Shnaidman >>> wrote: >>> >>>> Hi, all >>>> >>>> in the light of finding the new home place for openstack related >>>> ansible modules [1] I'd like to discuss the best strategy to create Ironic >>>> ansible modules. Existing Ironic modules in Ansible repo don't cover even >>>> half of Ironic functionality, don't fit current needs and definitely >>>> require an additional work. There are a few topics that require attention >>>> and better be solved before modules are written to save additional work. We >>>> prepared an etherpad [2] with all these questions and if you have ideas or >>>> suggestions on how it should look you're welcome to update it. >>>> We'd like to decide the final place for them, name conventions (the >>>> most complex one!), what they should look like and how better to implement. >>>> Anybody interested in Ansible and baremetal management in Openstack, >>>> you're more than welcome to contribute. >>>> >>>> Thanks >>>> >>>> [1] https://review.opendev.org/#/c/684740/ >>>> [2] https://etherpad.openstack.org/p/ironic-ansible-modules >>>> >>>> -- >>>> Best regards >>>> Sagi Shnaidman >>>> >>> >>> >>> -- >>> Best regards >>> Sagi Shnaidman >>> >> >> >> -- >> Best regards >> Sagi Shnaidman >> > > > -- > Best regards > Sagi Shnaidman > -- Best regards Sagi Shnaidman -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-philippe at evrard.me Tue Jan 7 12:26:10 2020 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Tue, 07 Jan 2020 13:26:10 +0100 Subject: [tc] Expediting patch to unblock governance's CI Message-ID: <58bd7e162eb81f2bc86792a5642e0a1e09d68d99.camel@evrard.me> Hello everyone, Our governance repo's testing is currently failing, and I have patches to fix it (short term [1] and long term). Because it's been a while it's in there (and got recently updated), I will now merge the short term fix, even if we don't have enough votes (nor time). It will unblock many patches and prevent useless rechecks, at the expense of being out of policy. I will gladly take the blame, should there be any. Regards, JP [1] https://review.opendev.org/#/c/700422/ From jichenjc at cn.ibm.com Tue Jan 7 13:02:56 2020 From: jichenjc at cn.ibm.com (Chen CH Ji) Date: Tue, 7 Jan 2020 13:02:56 +0000 Subject: IBM z/VM CI is planning to migrate new environment In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From kotobi at dkrz.de Tue Jan 7 15:14:34 2020 From: kotobi at dkrz.de (Amjad Kotobi) Date: Tue, 7 Jan 2020 16:14:34 +0100 Subject: [neutron][rabbitmq] Neutron-server service shows deprecated "AMQPDeprecationWarning" Message-ID: <4D3B074F-09F2-48BE-BD61-5D34CBFE509E@dkrz.de> Hi, Today we are facing losing connection of neutron especially during instance creation or so as “systemctl status neutron-server” shows below message be deprecated in amqp 2.2.0. Since amqp 2.0 you have to explicitly call Connection.connect() before using the connection. W_FORCE_CONNECT.format(attr=attr))) /usr/lib/python2.7/site-packages/amqp/connection.py:304: AMQPDeprecationWarning: The .transport attribute on the connection was accessed before the connection was established. This is supported for now, but will be deprecated in amqp 2.2.0. Since amqp 2.0 you have to explicitly call Connection.connect() before using the connection. W_FORCE_CONNECT.format(attr=attr))) OpenStack release which we are running is “Pike”. Is there any way to remedy this? Thanks Amjad -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Tue Jan 7 15:15:21 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Tue, 7 Jan 2020 09:15:21 -0600 Subject: [cliff][docs][requirements] new cliff versions causes docs to fail to build In-Reply-To: References: <20191222175308.juzyu6grndfcf2ez@mthode.org> Message-ID: <20200107151521.GA349057@sm-workstation> On Mon, Jan 06, 2020 at 04:48:19PM -0600, Eric Fried wrote: > On 12/22/19 11:53 AM, Matthew Thode wrote: > > Looks like some things changed in the new version that we depended upon > > and are now causing failures. > > > > Exception occurred: > > File "/home/zuul/src/opendev.org/openstack/python-openstackclient/.tox/docs/lib/python3.6/site-packages/cliff/sphinxext.py", line 245, in _load_app > > if not issubclass(cliff_app_class, app.App): > > TypeError: issubclass() arg 1 must be a class > > > > This should have been fixed by [1], which is in cliff since 2.14.0. The > python-openstackclient docs target (which IIUC still uses the def in > tox.ini?) pulls in requirements.txt which lists > > cliff!=2.9.0,>=2.8.0 # Apache-2.0 > > and upper-constraints, which is at 2.16.0. All that seems copacetic to > me. I also can't reproduce the failure locally building > python-openstackclient docs from scratch. > > What/where/how were you building when you encountered this? > > efried > > [1] https://review.opendev.org/#/c/614218/ Part of this could be that cliff is still capped since the newest release has some issues that have yet to be addressed. The upper constraint can't be raised until they are (which also likely means blacklisting this version), but I haven't seen any activity there yet. So the fix that was supposed to handle this doesn't appear to have actually done so. http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011741.html From sean.mcginnis at gmx.com Tue Jan 7 15:22:37 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Tue, 7 Jan 2020 09:22:37 -0600 Subject: [cliff][docs][requirements] new cliff versions causes docs to fail to build In-Reply-To: <20200107151521.GA349057@sm-workstation> References: <20191222175308.juzyu6grndfcf2ez@mthode.org> <20200107151521.GA349057@sm-workstation> Message-ID: <20200107152237.GA349707@sm-workstation> > > > TypeError: issubclass() arg 1 must be a class > > > > > > > This should have been fixed by [1], which is in cliff since 2.14.0. The > > python-openstackclient docs target (which IIUC still uses the def in > > tox.ini?) pulls in requirements.txt which lists > > > > cliff!=2.9.0,>=2.8.0 # Apache-2.0 > > > > and upper-constraints, which is at 2.16.0. All that seems copacetic to > > me. I also can't reproduce the failure locally building > > python-openstackclient docs from scratch. > > > > What/where/how were you building when you encountered this? > > > > efried > > > > [1] https://review.opendev.org/#/c/614218/ > > Part of this could be that cliff is still capped since the newest release has > some issues that have yet to be addressed. The upper constraint can't be raised > until they are (which also likely means blacklisting this version), but I > haven't seen any activity there yet. > > So the fix that was supposed to handle this doesn't appear to have actually > done so. > Or rather, it had fixed it, but somehow in the latest 2.17.0 release, the one unrelated change included somehow broke it again. https://github.com/openstack/cliff/compare/2.16.0...2.17.0 From sfinucan at redhat.com Tue Jan 7 16:44:37 2020 From: sfinucan at redhat.com (Stephen Finucane) Date: Tue, 07 Jan 2020 16:44:37 +0000 Subject: [cliff][docs][requirements] new cliff versions causes docs to fail to build In-Reply-To: <20200107152237.GA349707@sm-workstation> References: <20191222175308.juzyu6grndfcf2ez@mthode.org> <20200107151521.GA349057@sm-workstation> <20200107152237.GA349707@sm-workstation> Message-ID: On Tue, 2020-01-07 at 09:22 -0600, Sean McGinnis wrote: > > > > TypeError: issubclass() arg 1 must be a class > > > > > > > > > > This should have been fixed by [1], which is in cliff since 2.14.0. The > > > python-openstackclient docs target (which IIUC still uses the def in > > > tox.ini?) pulls in requirements.txt which lists > > > > > > cliff!=2.9.0,>=2.8.0 # Apache-2.0 > > > > > > and upper-constraints, which is at 2.16.0. All that seems copacetic to > > > me. I also can't reproduce the failure locally building > > > python-openstackclient docs from scratch. > > > > > > What/where/how were you building when you encountered this? > > > > > > efried > > > > > > [1] https://review.opendev.org/#/c/614218/ > > > > Part of this could be that cliff is still capped since the newest release has > > some issues that have yet to be addressed. The upper constraint can't be raised > > until they are (which also likely means blacklisting this version), but I > > haven't seen any activity there yet. > > > > So the fix that was supposed to handle this doesn't appear to have actually > > done so. > > > > Or rather, it had fixed it, but somehow in the latest 2.17.0 release, the one > unrelated change included somehow broke it again. > > https://github.com/openstack/cliff/compare/2.16.0...2.17.0 Commit 8bcd068e876ddd48ae61c1803449d666f5e28ba0, a.k.a. cliff 2.17.0 is not the commit you (should have been) looking for. As noted at [1], we appear to have tagged a commit from the review branch instead of one from master, which mean 2.17.0 is based on code from master shortly after 2.11.0 (!) was released. I suggest we blacklist 2.17.0 and issue a new 2.17.1 or 2.18.0 release post-haste. That's not all though. For some daft reason, the 'python-rsdclient' projects has imported argparses 'HelpFormatter' from 'cliff._argparse' instead of 'argparse'. They need to stop doing this because commit 584352dcd008d58c433136539b22a6ae9d6c45cc of cliff means this will no longer work. Just import argparse directly. Stephen [1] https://review.opendev.org/#/c/698485/1/deliverables/ussuri/cliff.yaml at 13 From gmann at ghanshyammann.com Tue Jan 7 17:16:19 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 07 Jan 2020 11:16:19 -0600 Subject: [qa][infra][stable] Stable branches gate status: tempest-full-* jobs failing for stable/ocata|pike|queens Message-ID: <16f8101d6ea.be1780a3214520.3007727257147254758@ghanshyammann.com> Hello Everyone, tempest-full-* jobs are failing on stable/queens, stable/pike, and stable/ocata(legacy-tempest-dsvm-neutron-full-ocata) [1].ld Please hold any recheck till fix is merged. whoami-rajat reported about the tempest-full-queens-py3 job failure and later while debugging we found that same is failing for pike and ocata(job name there - legacy-tempest-dsvm-neutron-full-ocata). Failure is due to "Timeout on connecting the vnc console url" because there is no 'n-cauth' service running which is required for these stable branches. In Ussuri that service has been removed from nova. 'n-cauth' has been removed from ENABLED_SERVICES recently in - https://review.opendev.org/#/c/700217/ which effected only stable branches till queens. stable/rocky|stein are working because we have moved the services enable things from devstack-gate's test matrix to devstack base job[2]. Patch[2] was not backported to stable/queens and stable/pike which I am not sure why. We have two ways to fix the stable branches gate: 1. re-enable the n-cauth in devstack-gate. Hope all other removes services create no problem. pros: easy to fix, fix for all three stable branches. patch- https://review.opendev.org/#/c/701404/ 2. Backport the 546765[2] to stable/queens and stable/pike. pros: this removes the dependency form test-matrix which is the overall goal to remove d-g dependency. cons: It cannot be backported to stable/ocata as no zuulv3 base jobs there. This is already EM and anyone still cares about this? I think for fixing the gate (Tempest master and stable/queens|pike|ocata), we can go with option 1 and later we backport the devstack migration. [1] - http://zuul.openstack.org/builds?job_name=tempest-full-queens-py3 - http://zuul.openstack.org/builds?job_name=tempest-full-pike - http://zuul.openstack.org/builds?job_name=legacy-tempest-dsvm-neutron-full-ocata - reported bug - https://bugs.launchpad.net/devstack/+bug/1858666 [2] https://review.opendev.org/#/c/546765/ -gmann From rosmaita.fossdev at gmail.com Tue Jan 7 19:11:40 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 7 Jan 2020 14:11:40 -0500 Subject: [cinder] first meeting of 2020 tomorrow (8 January) Message-ID: <9fc10836-8805-daca-10f9-a615a152949f@gmail.com> Just wanted to send a quick reminder that the first Cinder team meeting of 2020 well be held tomorrow (8 January) on the usual day (Wednesday) at the usual time (1400 UTC) and in the usual place (#openstack-meeting-4). https://etherpad.openstack.org/p/cinder-ussuri-meetings See you there! brian From aj at suse.com Tue Jan 7 19:50:51 2020 From: aj at suse.com (Andreas Jaeger) Date: Tue, 7 Jan 2020 20:50:51 +0100 Subject: [infra] Retire x/dox Message-ID: <30c7199c-fc6c-7f02-d764-4e51ee9a9cfd@suse.com> The x/dox repo is unused, let's retire it. I'll put up changes with topic "retire-dox", Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB From ahmed.zaky.abdallah at gmail.com Tue Jan 7 20:20:37 2020 From: ahmed.zaky.abdallah at gmail.com (Ahmed ZAKY) Date: Tue, 7 Jan 2020 21:20:37 +0100 Subject: VM boot volume disappears from compute's multipath Daemon when NetApp controller is placed offline Message-ID: I have a setup where each VM gets assigned two vDisks, one encrypted boot volume and another storage volume. Storage used is NetApp (tripleo-netapp). With two controllers on NetApp side working in active/ active mode. My test case goes as follows: - I stop one of the active controllers. - I stop one of my VMs using OpenStack server stop - I then start my VM one more time using OpenStack server start. - VM fails to start. Here're my findings, hope someone would help if they can explain me the behaviour seen below: My VM: vel1bgw01-MCM2, it is running on compute overcloud-sriovperformancecompute-3.localdomain [root at overcloud-controller-0 (vel1asbc01) cbis-admin]# openstack server show vel1bgw01-MCM2 +--------------------------------------+------------------------------------------------------------------------------------------------------------+ | Field | Value | +--------------------------------------+------------------------------------------------------------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | zone1 | | OS-EXT-SRV-ATTR:host | overcloud-sriovperformancecompute-3.localdomain | | OS-EXT-SRV-ATTR:hypervisor_hostname | overcloud-sriovperformancecompute-3.localdomain | | OS-EXT-SRV-ATTR:instance_name | instance-00000 <+4400000>e93 | | OS-EXT-STS:power_state | Running | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2019-12-18T15:49:37.000000 <+4437000000> | | OS-SRV-USG:terminated_at | None | | accessIPv4 | | | accessIPv6 | | | addresses | SBC01_MGW01_TIPC=192.168.48.22 <+441921684822>; SBC01_MGW01_DATAPATH_MATE=192.168.16.11 <+441921681611>; SBC01_MGW01_DATAPATH=192.168.32.8 <+44192168328> | | config_drive | True | | created | 2019-12-18T15:49:16Z | | flavor | SBC_MCM (asbc_mcm) | | hostId | 7886 <+447886>df0f7a3d4e131304 <+44131304>a8eb860e6a704c5fda2a7ed751b544ff2bf5 | | id | 5c70a984-89 <+4498489>a9-44ce-876d-9e2e568eb819 | | image | | | key_name | CBAM-b5fd59a066e8450 <+448450>ca9f104a69da5a043-Keypair | | name | vel1bgw01-MCM2 | | os-extended-volumes:volumes_attached | [{u'id': u'717e5744-4786-42 <+445744478642>dc-9e3e-3c5e6994 <+446994>c482'}, {u'id': u'd6cf0cf9-36d1-4b 62-86 <+446286>b4-faa4a6642166 <+446642166>'}] | | progress | 0 | | project_id | 41777 <+4441777>c6f1e7b4f8d8fd76b5e0f67e5e8 | | properties | | | security_groups | [{u'name': u'vel1bgw01-TIPC-Security-Group'}] | | status | ACTIVE | | updated | 2020-01-07T17:18:32Z | | user_id | be13deba85794016 <+4485794016>a00fec9d18c5d7cf | +--------------------------------------+------------------------------------------------------------------------------------------------------------+ *It is mapped to the following vDisks (seen using virsh list on compute-3)*: - dm-uuid-mpath-3600 <+443600>a098000 <+44098000>d9818 <+449818>b0000185 <+440000185>c5dfa0714 <+440714> è Boot Volume - dm-uuid-mpath-3600 <+443600>a098000 <+44098000>d9818 <+449818>b 000018565 <+44000018565>dfa069e è Storage volume 717e5744-4786-42 <+445744478642>dc-9e3e-3c5e6994 <+446994> c482
d6cf0cf9-36d1-4b62-86 <+446286>b4-faa4a6642166 <+446642166>
Name: crypt-dm-uuid-mpath-3600 <+443600>a098000 <+44098000>d9818 <+449818>b 0000185 <+440000185>c5dfa0714 <+440714> State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 5 Number of targets: 1 UUID: CRYPT-LUKS1-769 <+441769>cc20bc5af469c8c9075 <+449075> a2a6fc4aa0-crypt-dm-uuid-mpath-*3600 <+443600>a098000 <+44098000>d9818 <+449818>b0000185 <+440000185>c5dfa0714 <+440714>* Name: *mpathpy* State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 1 Event number: 32 Major, minor: 253, *4* Number of targets: 1 UUID: mpath-*3600 <+443600>a098000 <+44098000>d9818 <+449818>b0000185 <+440000185>c5dfa0714 <+440714>* Name: crypt-dm-uuid-mpath-3600 <+443600>a098000 <+44098000>d9818 <+449818>b 000018565 <+44000018565>dfa069e State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 7 Number of targets: 1 UUID: CRYPT-LUKS1-4015 <+4414015>c585a0df4074821 <+444074821> ca312c4caacca-crypt-dm-uuid-mpath-*3600 <+443600>a098000 <+44098000>d9818 <+449818>b000018565 <+44000018565>dfa069e* Name: *mpathpz* State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 1 Event number: 28 Major, minor: 253, *6* Number of targets: 1 UUID: mpath-*3600 <+443600>a098000 <+44098000>d9818 <+449818>b000018565 <+44000018565>dfa069e* This means boot volume is represented by dm-4 while storage volume is represented by dm-6 Dumping the multipath daemon on the controller shows that at a steady running state both DMs are accounted for (see below). multipathd> show maps name sysfs uuid *mpathpy dm-4 3600 <+4443600>a098000 <+44098000>d9818 <+449818>b0000185 <+440000185>c5dfa0714 <+440714>* *mpathpz dm-6 3600 <+4463600>a098000 <+44098000>d9818 <+449818>b000018565 <+44000018565>dfa069e* mpathqi dm-12 3600 <+44123600>a098000 <+44098000>d9818 <+449818>b000018 <+44000018>df5dfafd40 mpathqj dm-13 3600 <+44133600>a098000 <+44098000>d9818 <+449818>b000018 <+44000018>de5dfafd10 mpathpw dm-0 3600 <+4403600>a098000 <+44098000>d9818 <+449818>b000018425 <+44000018425>dfa059f mpathpx dm-1 3600 <+4413600>a098000 <+44098000>d9818 <+449818>b0000184 <+440000184>c5dfa05fc mpathqk dm-16 3600 <+44163600>a098000 <+44098000>d9818 <+449818>b000018 <+44000018>eb5dfafe80 mpathql dm-17 3600 <+44173600>a098000 <+44098000>d9818 <+449818>b000018 <+44000018>e95dfafe26 mpathqh dm-9 3600 <+4493600>a098000 <+44098000>d9818 <+449818>b000018 <+44000018>c65dfafa91 These vDisks are mapped to the following multipaths: multipathd> show topology mpathpy (3600 <+443600>a098000 <+44098000>d9818 <+449818>b0000185 <+440000185>c5dfa0714) <+440714> dm-4 NETAPP ,INF-01-00 <+440100> size=21G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 rdac' wp=rw |-+- policy='service-time 0' prio=14 status=active | |- 30:0:0:82 sdm 8:192 active ready running | `- 32:0:0:82 sdk 8:160 active ready running `-+- policy='service-time 0' prio=0 status=enabled |- 33:0:0:82 sdn 8:208 failed faulty running `- 31:0:0:82 sdl 8:176 failed faulty running mpathpz (3600 <+443600>a098000 <+44098000>d9818 <+449818>b000018565 <+44000018565>dfa069e) dm-6 NETAPP ,INF-01-00 <+440100> size=10G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 rdac' wp=rw |-+- policy='service-time 0' prio=14 status=active | |- 30:0:0:229 sdr 65:16 active ready running | `- 32:0:0:229 sdp 8:240 active ready running `-+- policy='service-time 0' prio=0 status=enabled |- 31:0:0:229 sdo 8:224 failed faulty running `- 33:0:0:229 sdq 65:0 failed faulty running Now, it starts getting very interesting, if I shutdown controller-A from NetApp side, dm-4 disappears but dm-6 is still running while detecting the active path is controller B while standby path is controller-A which now is displayed as failed multipathd> show topology mpathpz (3600 <+443600>a098000 <+44098000>d9818 <+449818>b000018565 <+44000018565>dfa069e) dm-6 NETAPP ,INF-01-00 <+440100> size=10G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 rdac' wp=rw |-+- policy='service-time 0' prio=0 status=enabled | |- 30:0:0:229 sdr 65:16 failed faulty running | `- 32:0:0:229 sdp 8:240 failed faulty running `-+- policy='service-time 0' prio=11 status=active |- 31:0:0:229 sdo 8:224 active ready running `- 33:0:0:229 sdq 65:0 active ready running multipathd> show maps name sysfs uuid *mpathpz dm-6 3600 <+4463600>a098000 <+44098000>d9818 <+449818>b000018565 <+44000018565>dfa069e* mpathqi dm-12 3600 <+44123600>a098000 <+44098000>d9818 <+449818>b000018 <+44000018>df5dfafd40 mpathqj dm-13 3600 <+44133600>a098000 <+44098000>d9818 <+449818>b000018 <+44000018>de5dfafd10 mpathpw dm-0 3600 <+4403600>a098000 <+44098000>d9818 <+449818>b000018425 <+44000018425>dfa059f mpathpx dm-1 3600 <+4413600>a098000 <+44098000>d9818 <+449818>b0000184 <+440000184>c5dfa05fc mpathqk dm-16 3600 <+44163600>a098000 <+44098000>d9818 <+449818>b000018 <+44000018>eb5dfafe80 mpathql dm-17 3600 <+44173600>a098000 <+44098000>d9818 <+449818>b000018 <+44000018>e95dfafe26 mpathqg dm-8 3600 <+4483600>a098000 <+44098000>d9818 <+449818>b000018 <+44000018>c75dfafac0 mpathqh dm-9 3600 <+4493600>a098000 <+44098000>d9818 <+449818>b000018 <+44000018>c65dfafa91 If I restore Controller-A into service from NetApp side while fail only the path to controller A from multipathd everything works fine, dm-4 is still present and the VM can be put into service. multipathd> fail path sdk ok multipathd> multipathd> fail path sdm ok mpathpy (3600 <+443600>a098000 <+44098000>d9818 <+449818>b0000185 <+440000185>c5dfa0714) <+440714> dm-4 NETAPP ,INF-01-00 <+440100> size=21G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 rdac' wp=rw |-+- policy='service-time 0' prio=0 status=enabled | |- 32:0:0:82 sdk 8:160 failed faulty running | `- 30:0:0:82 sdm 8:192 failed faulty running `-+- policy='service-time 0' prio=9 status=active |- 31:0:0:82 sdl 8:176 active ready running `- 33:0:0:82 sdn 8:208 active ready running multipathd> reinstate path sdk ok multipathd> multipathd> reinstate path sdm ok mpathpy (3600 <+443600>a098000 <+44098000>d9818 <+449818>b0000185 <+440000185>c5dfa0714) <+440714> dm-4 NETAPP ,INF-01-00 <+440100> size=21G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 rdac' wp=rw |-+- policy='service-time 0' prio=14 status=active | |- 32:0:0:82 sdk 8:160 active ready running | `- 30:0:0:82 sdm 8:192 active ready running `-+- policy='service-time 0' prio=9 status=enabled |- 31:0:0:82 sdl 8:176 active ready running `- 33:0:0:82 sdn 8:208 active ready running It is observed in the working case, the storage volume disappears (which seems normal), also the instance totally vanishes from the virsh list and no trace can be found at the KVM level if we run ps -def | grep fd | grep . However, the boot volume is always present in the multipathd records when we stop the VM at normal conditions without stopping NetApp controller. Any ideas? Kind regards, Ahmed -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Tue Jan 7 21:59:57 2020 From: openstack at nemebean.com (Ben Nemec) Date: Tue, 7 Jan 2020 15:59:57 -0600 Subject: [neutron][rabbitmq][oslo] Neutron-server service shows deprecated "AMQPDeprecationWarning" In-Reply-To: <4D3B074F-09F2-48BE-BD61-5D34CBFE509E@dkrz.de> References: <4D3B074F-09F2-48BE-BD61-5D34CBFE509E@dkrz.de> Message-ID: <294c93b5-0ddc-284b-34a1-ffce654ba047@nemebean.com> On 1/7/20 9:14 AM, Amjad Kotobi wrote: > Hi, > > Today we are facing losing connection of neutron especially during > instance creation or so as “systemctl status neutron-server” shows below > message > > be deprecated in amqp 2.2.0. > Since amqp 2.0 you have to explicitly call Connection.connect() > before using the connection. > W_FORCE_CONNECT.format(attr=attr))) > /usr/lib/python2.7/site-packages/amqp/connection.py:304: > AMQPDeprecationWarning: The .transport attribute on the connection was > accessed before > the connection was established.  This is supported for now, but will > be deprecated in amqp 2.2.0. > Since amqp 2.0 you have to explicitly call Connection.connect() > before using the connection. > W_FORCE_CONNECT.format(attr=attr))) It looks like this is a red herring, but it should be fixed in the current oslo.messaging pike release. See [0] and the related bug. 0: https://review.opendev.org/#/c/605324/ > > OpenStack release which we are running is “Pike”. > > Is there any way to remedy this? I don't think this should be a fatal problem in and of itself so I suspect it's masking something else. However, I would recommend updating to the latest pike release of oslo.messaging where the deprecated feature is not used. If that doesn't fix the problem, please send us whatever errors remain after this one is eliminated. > > Thanks > Amjad From Arkady.Kanevsky at dell.com Tue Jan 7 23:17:25 2020 From: Arkady.Kanevsky at dell.com (Arkady.Kanevsky at dell.com) Date: Tue, 7 Jan 2020 23:17:25 +0000 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> <7b55e3b28d644492a846fdb10f7b127b@AUSX13MPS308.AMER.DELL.COM> Message-ID: <582d2544d3d74fe7beef50aaaa35d558@AUSX13MPS308.AMER.DELL.COM> Excellent points Julia. It is hard to image that any production env of any customer will allow anybody but administrator to update FW on any device at any time. The security implication are huge. Cheers, Arkady -----Original Message----- From: Julia Kreger Sent: Monday, January 6, 2020 3:33 PM To: Kanevsky, Arkady Cc: Zhipeng Huang; openstack-discuss Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management [EXTERNAL EMAIL] Greetings Arkady, I think your message makes a very good case and raises a point that I've been trying to type out for the past hour, but with only different words. We have multiple USER driven interactions with a similarly desired, if not the exact same desired end result where different paths can be taken, as we perceive use cases from "As a user, I would like a VM with a configured accelerator", "I would like any compute resource (VM or Baremetal), with a configured accelerator", to "As an administrator, I need to reallocate a baremetal node for this different use, so my user can leverage its accelerator once they know how and are ready to use it.", and as suggested "I as a user want baremetal with k8s and configured accelerators." And I suspect this diversity of use patterns is where things begin to become difficult. As such I believe, we in essence, have a question of a support or compatibility matrix that definitely has gaps depending on "how" the "user" wants or needs to achieve their goals. And, I think where this entire discussion _can_ go sideways is... (from what I understand) some of these devices need to be flashed by the application user with firmware on demand to meet the user's needs, which is where lifecycle and support interactions begin to become... conflicted. Further complicating matters is the "Metal to Tenant" use cases where the user requesting the machine is not an administrator, but has some level of inherent administrative access to all Operating System accessible devices once their OS has booted. Which makes me wonder "What if the cloud administrators WANT to block the tenant's direct ability to write/flash firmware into accelerator/smartnic/etc?" I suspect if cloud administrators want to block such hardware access, vendors will want to support such a capability. Blocking such access inherently forces some actions into hardware management/maintenance workflows, and may ultimately may cause some of a support matrix's use cases to be unsupportable, again ultimately depending on what exactly the user is attempting to achieve. Going back to the suggestions in the original email, They seem logical to me in terms of the delineation and separation of responsibilities as we present a cohesive solution the users of our software. Greetings Zhipeng, Is there any documentation at present that details the desired support and use cases? I think this would at least help my understanding, since everything that requires the power to be on would still need to be integrated with-in workflows for eventual tighter integration. Also, has Cyborg drafted any plans or proposals for integration? -Julia On Mon, Jan 6, 2020 at 9:14 AM wrote: > > Zhipeng, > > Thanks for quick feedback. > > Where is accelerating device is running? I am aware of 3 possibilities: servers, storage, switches. > > In each one of them the device is managed as part of server, storage box or switch. > > > > The core of my message is separation of device life cycle management in the “box” where it is placed, from the programming the device as needed per application (VM, container). > > > > Thanks, > Arkady > > > > From: Zhipeng Huang > Sent: Friday, January 3, 2020 7:53 PM > To: Kanevsky, Arkady > Cc: OpenStack Discuss > Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] > accelerators management > > > > [EXTERNAL EMAIL] > > Hi Arkady, > > > > Thanks for your interest in Cyborg project :) I would like to point out that when we initiated the project there are two specific use cases we want to cover: the accelerators attached locally (via PCIe or other bus type) or remotely (via Ethernet or other fabric type). > > > > For the latter one, it is clear that its life cycle is independent from the server (like block device managed by Cinder). For the former one however, its life cycle is not dependent on server for all kinds of accelerators either. For example we already have PCIe based AI accelerator cards or Smart NICs that could be power on/off when the server is on all the time. > > > > Therefore it is not a good idea to move all the life cycle management part into Ironic for the above mentioned reasons. Ironic integration is very important for the standalone usage of Cyborg for Kubernetes, Envoy (TLS acceleration) and others alike. > > > > Hope this answers your question :) > > > > On Sat, Jan 4, 2020 at 5:23 AM wrote: > > Fellow Open Stackers, > > I have been thinking on how to handle SmartNICs, GPUs, FPGA handling across different projects within OpenStack with Cyborg taking a leading role in it. > > > > Cyborg is important project and address accelerator devices that are part of the server and potentially switches and storage. > > It is address 3 different use cases and users there are all grouped into single project. > > > > Application user need to program a portion of the device under management, like GPU, or SmartNIC for that app usage. Having a common way to do it across different device families and across different vendor is very important. And that has to be done every time a VM is deploy that need usage of a device. That is tied with VM scheduling. > Administrator need to program the whole device for specific usage. That covers the scenario when device can only support single tenant or single use case. That is done once during OpenStack deployment but may need reprogramming to configure device for different usage. May or may not require reboot of the server. > Administrator need to setup device for its use, like burning specific FW on it. This is typically done as part of server life-cycle event. > > > > The first 2 cases cover application life cycle of device usage. > > The last one covers device life cycle independently how it is used. > > > > Managing life cycle of devices is Ironic responsibility, One cannot and should not manage lifecycle of server components independently. Managing server devices outside server management violates customer service agreements with server vendors and breaks server support agreements. > > Nova and Neutron are getting info about all devices and their capabilities from Ironic; that they use for scheduling. We should avoid creating new project for every new component of the server and modify nova and neuron for each new device. (the same will also apply to cinder and manila if smart devices used in its data/control path on a server). > > Finally we want Cyborg to be able to be used in standalone capacity, say for Kubernetes. > > > > Thus, I propose that Cyborg cover use cases 1 & 2, and Ironic would cover use case 3. > > Thus, move all device Life-cycle code from Cyborg to Ironic. > > Concentrate Cyborg of fulfilling the first 2 use cases. > > Simplify integration with Nova and Neutron for using these accelerators to use existing Ironic mechanism for it. > > Create idempotent calls for use case 1 so Nova and Neutron can use it as part of VM deployment to ensure that devices are programmed for VM under scheduling need. > > Create idempotent call(s) for use case 2 for TripleO to setup device for single accelerator usage of a node. > > [Propose similar model for CNI integration.] > > > > Let the discussion start! > > > > Thanks., > Arkady > > > > > -- > > Zhipeng (Howard) Huang > > > > Principle Engineer > > OpenStack, Kubernetes, CNCF, LF Edge, ONNX, Kubeflow, OpenSDS, Open > Service Broker API, OCP, Hyperledger, ETSI, SNIA, DMTF, W3C > > From fungi at yuggoth.org Tue Jan 7 23:51:39 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 7 Jan 2020 23:51:39 +0000 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: <582d2544d3d74fe7beef50aaaa35d558@AUSX13MPS308.AMER.DELL.COM> References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> <7b55e3b28d644492a846fdb10f7b127b@AUSX13MPS308.AMER.DELL.COM> <582d2544d3d74fe7beef50aaaa35d558@AUSX13MPS308.AMER.DELL.COM> Message-ID: <20200107235139.2l5iw2fumgsfoz5u@yuggoth.org> On 2020-01-07 23:17:25 +0000 (+0000), Arkady.Kanevsky at dell.com wrote: > It is hard to image that any production env of any customer will > allow anybody but administrator to update FW on any device at any > time. The security implication are huge. [...] I thought this was precisely the point of exposing FPGA hardware into server instances. Or do you not count programming those as "updating firmware?" -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From aj at suse.com Wed Jan 8 08:35:36 2020 From: aj at suse.com (Andreas Jaeger) Date: Wed, 8 Jan 2020 09:35:36 +0100 Subject: [infra] Retire openstack/js-openstack-lib repository Message-ID: The js-openstack-lib repository is orphaned and has not seen any real merges or contributions since February 2017, I propose to retire it. I'll send retirement changes using topic retire-js-openstack-lib, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB From radoslaw.piliszek at gmail.com Wed Jan 8 09:21:06 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Wed, 8 Jan 2020 10:21:06 +0100 Subject: [infra] Retire openstack/js-openstack-lib repository In-Reply-To: References: Message-ID: Are there any alternatives? I would be glad to pick this up because I planned some integrations like this on my own. -yoctozepto śr., 8 sty 2020 o 09:48 Andreas Jaeger napisał(a): > > The js-openstack-lib repository is orphaned and has not seen any real > merges or contributions since February 2017, I propose to retire it. > > I'll send retirement changes using topic retire-js-openstack-lib, > > Andreas > -- > Andreas Jaeger aj at suse.com Twitter: jaegerandi > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB > From aj at suse.com Wed Jan 8 09:26:14 2020 From: aj at suse.com (Andreas Jaeger) Date: Wed, 8 Jan 2020 10:26:14 +0100 Subject: [infra] Retire openstack/js-openstack-lib repository In-Reply-To: References: Message-ID: On 08/01/2020 10.21, Radosław Piliszek wrote: > Are there any alternatives? > I would be glad to pick this up because I planned some integrations > like this on my own. If you want to pick this up, best discuss with Clark as Infra PTL. We can keep it if there is real interest, Andreas > -yoctozepto > > śr., 8 sty 2020 o 09:48 Andreas Jaeger napisał(a): >> >> The js-openstack-lib repository is orphaned and has not seen any real >> merges or contributions since February 2017, I propose to retire it. >> >> I'll send retirement changes using topic retire-js-openstack-lib, >> >> Andreas >> -- >> Andreas Jaeger aj at suse.com Twitter: jaegerandi >> SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg >> (HRB 36809, AG Nürnberg) GF: Felix Imendörffer >> GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB >> -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB From radoslaw.piliszek at gmail.com Wed Jan 8 09:32:37 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Wed, 8 Jan 2020 10:32:37 +0100 Subject: [infra] Retire openstack/js-openstack-lib repository In-Reply-To: References: Message-ID: Thanks, Andreas. Will do. I thought it might be also wise to preserve this since there are posts now and then that horizon is reaching its limit and a JS lib might be beneficial for any possible replacement (as it can run from the browser). Though I have no idea what the state of this library is. OTOH, quick google search reveals that alternatives do not seem better at the first glance. The only promising one was https://github.com/pkgcloud/pkgcloud but it is not OS-centric and has therefore different goals. -yoctozepto śr., 8 sty 2020 o 10:26 Andreas Jaeger napisał(a): > > On 08/01/2020 10.21, Radosław Piliszek wrote: > > Are there any alternatives? > > I would be glad to pick this up because I planned some integrations > > like this on my own. > > If you want to pick this up, best discuss with Clark as Infra PTL. We > can keep it if there is real interest, > > Andreas > > > -yoctozepto > > > > śr., 8 sty 2020 o 09:48 Andreas Jaeger napisał(a): > >> > >> The js-openstack-lib repository is orphaned and has not seen any real > >> merges or contributions since February 2017, I propose to retire it. > >> > >> I'll send retirement changes using topic retire-js-openstack-lib, > >> > >> Andreas > >> -- > >> Andreas Jaeger aj at suse.com Twitter: jaegerandi > >> SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > >> (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > >> GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB > >> > > > -- > Andreas Jaeger aj at suse.com Twitter: jaegerandi > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB From ileixe at gmail.com Wed Jan 8 10:08:59 2020 From: ileixe at gmail.com (=?UTF-8?B?7JaR7Jyg7ISd?=) Date: Wed, 8 Jan 2020 19:08:59 +0900 Subject: [neutron][ironic] dynamic routing protocol status for routed network Message-ID: Hi, For ironic flat provider network, I found it's hard to scale manually (for many racks). So I'm trying to use routed network for the purpose. One blur thing for routed network is how we can handle segment connectivity. >From reference (https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html), there is future work about dynamic routing protocol though, I could not find any hints for the functionality. How do you guys using routed network handle the segments' connectivity? Is there any project to speak subnet info via BGP? Thanks in advance. From skatsaounis at admin.grnet.gr Wed Jan 8 11:25:08 2020 From: skatsaounis at admin.grnet.gr (Stamatis Katsaounis) Date: Wed, 8 Jan 2020 11:25:08 +0000 Subject: [charms][watcher] OpenStack Watcher Charm Message-ID: <159661b1-7edf-e55d-c7b9-cf3b97bffffb@admin.grnet.gr> Hi all, Purpose of this email is to let you know that we released an unofficial charm of OpenStack Watcher [1]. This charm gave us the opportunity to deploy OpenStack Watcher to our charmed OpenStack deployment. After seeing value in it, we decided to publish it through GRNET GitHub Organization account for several reasons. First of all, we would love to get feedback on it as it is our first try on creating an OpenStack reactive charm. Secondly, we would be glad to see other OpenStack operators deploy Watcher and share with us knowledge on the project and possible use cases. Finally, it would be ideal to come up with an official OpenStack Watcher charm repository under charmers umbrella. By doing this, another OpenStack project is going to be available not only for Train version but for any future version of OpenStack. Most important, the CI tests are going to ensure that the code is not broken and persuade other operators to use it. Before closing my email, I would like to give some insight on the architecture of the code base and the deployment process. To begin with, charm-watcher is based on other reactive OpenStack charms. During its deployment Barbican, Designate, Octavia and other charms' code bases were counseled. Furthermore, the structure is the same as any official OpenStack charm, of course without functional tests, which is something we cannot provide. Speaking about the deployment process, apart from having a basic charmed OpenStack deployment, operator has to change two tiny configuration options on Nova cloud controller and Cinder. As explained in the Watcher configuration guide, special care has to be done with Oslo notifications for Nova and Cinder [2]. In order to achieve that in charmed OpenStack some issues were met and solved with the following patches [3], [4], [5], [6]. With these patches, operator can set the extra Oslo configuration and this is the only extra configuration needs to take place. Finally, with [7] Keystone charm can accept a relation with Watcher charm instead of ignoring it. To be able to deploy GRNET Watcher charm on Train, patches [3], [4], [5] and [7] have to be back-ported to stable/19.10 branch but that will require the approval of charmers team. Please let me know if such an option is available and in that case I am going to open the relevant patches. Furthermore, if you think that it could be a good option to create a spec and then introduce an official Watcher charm, I would love to help on that. I wish all a happy new year and I am looking forward to your response and possible feedback. PS. If we could have an Ubuntu package for watcher-dashboard [8] like octavia-dashboard [9] we would release a charm for it as well. Best regards, Stamatis Katsaounis [1] https://github.com/grnet/charm-watcher [2] https://docs.openstack.org/watcher/latest/configuration/configuring.html#configure-nova-notifications [3] https://review.opendev.org/#/c/699079/ [4] https://review.opendev.org/#/c/699081/ [5] https://review.opendev.org/#/c/699657/ [6] https://github.com/juju/charm-helpers/pull/405 [7] https://review.opendev.org/#/c/699082/ [8] https://github.com/openstack/watcher-dashboard [9] https://launchpad.net/ubuntu/+source/octavia-dashboard -- [cid:part10.101270FA.EDBB3A4A at admin.grnet.gr] Stamatis Katsaounis DevOps Engineer t :(+30) 210 7471130 (ext. 483) f : + 30 210 7474490 GRNET | Networking Research and Education www.grnet.gr | 7, Kifisias Av., 115 23, Athens [Facebook icon] [Twitter icon] [Youtube icon] [LinkedIn icon] [Instagram icon] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: elggckbbbdlhkdig.png Type: image/png Size: 4948 bytes Desc: elggckbbbdlhkdig.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bopemjoginlmgoci.png Type: image/png Size: 645 bytes Desc: bopemjoginlmgoci.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpjbaodeiokhnjgn.png Type: image/png Size: 661 bytes Desc: gpjbaodeiokhnjgn.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: oichedjkimmmjbge.png Type: image/png Size: 738 bytes Desc: oichedjkimmmjbge.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: edpopdhgfeakkkhe.png Type: image/png Size: 716 bytes Desc: edpopdhgfeakkkhe.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ajmnellhaofbfpnj.png Type: image/png Size: 653 bytes Desc: ajmnellhaofbfpnj.png URL: From moreira.belmiro.email.lists at gmail.com Wed Jan 8 12:20:06 2020 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Wed, 8 Jan 2020 13:20:06 +0100 Subject: [largescale-sig] Meeting summary and next actions In-Reply-To: <06e5f16f-dfa4-8189-da7b-ad2250df8125@openstack.org> References: <3c3a6232-9a3b-d240-ab82-c7ac4997f5c0@openstack.org> <06e5f16f-dfa4-8189-da7b-ad2250df8125@openstack.org> Message-ID: Hi Thierry, all, I'm OK with both dates. If you agree to keep the meeting on January 15 I can chair it. cheers, Belmiro On Mon, Jan 6, 2020 at 3:41 PM Thierry Carrez wrote: > Thierry Carrez wrote: > > [...] > > The next meeting will happen on January 15, at 9:00 UTC on > > #openstack-meeting. > > Oops, some unexpected travel came up and I won't be available to chair > the meeting on that date. We can either: > > 1- keep the meeting, with someone else chairing. I can help with posting > the agenda before and the summary after, just need someone to start the > meeting and lead it -- any volunteer? > > 2- move the meeting to January 22, but we may lose Chinese participants > to new year preparations... > > Thoughts? > > -- > Thierry Carrez (ttx) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.carpen at cineca.it Wed Jan 8 13:01:35 2020 From: m.carpen at cineca.it (mcarpene) Date: Wed, 8 Jan 2020 14:01:35 +0100 Subject: OIDC/OAuth2 token introspection in Keystone Message-ID: Hi all, my question is: could OS Keystone support OIDC/OAuth2 token introspection/validation. I mean for example executing a swift command via CLI adding a OIDC token bearer as a parameter to the swift command. In this case Keystone should validate the OIDC token towards and external IdP (using introspection endpoint/protocol for oidc). Is this currently supported, or eventually would be done in the near future? thanks Michele -- Michele Carpené SuperComputing Applications and Innovation Department CINECA - via Magnanelli, 6/3, 40033 Casalecchio di Reno (Bologna) - ITALY Tel: +39 051 6171730 Fax: +39 051 6132198 Skype: mcarpene http://www.hpc.cineca.it/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knikolla at bu.edu Wed Jan 8 15:28:14 2020 From: knikolla at bu.edu (Nikolla, Kristi) Date: Wed, 8 Jan 2020 15:28:14 +0000 Subject: OIDC/OAuth2 token introspection in Keystone In-Reply-To: References: Message-ID: <338A6D25-9DBF-492D-A94C-14E4A311FBE7@bu.edu> Hi Michele, We just approved a feature request for that [0], however it was merged to backlog, meaning no specific timeline for it being implemented yet. With the current implementation, you can use OAuth 2.0 Access Tokens with Keystone, however the token introspection endpoint will be used, therefore only the claims contained in the access token will be returned. I am assuming your question is with regards to the userinfo endpoint and OIDC claims, which we do not currently support. [0]. https://review.opendev.org/#/c/373983/ On Jan 8, 2020, at 8:01 AM, mcarpene > wrote: Hi all, my question is: could OS Keystone support OIDC/OAuth2 token introspection/validation. I mean for example executing a swift command via CLI adding a OIDC token bearer as a parameter to the swift command. In this case Keystone should validate the OIDC token towards and external IdP (using introspection endpoint/protocol for oidc). Is this currently supported, or eventually would be done in the near future? thanks Michele -- Michele Carpené SuperComputing Applications and Innovation Department CINECA - via Magnanelli, 6/3, 40033 Casalecchio di Reno (Bologna) - ITALY Tel: +39 051 6171730 Fax: +39 051 6132198 Skype: mcarpene http://www.hpc.cineca.it/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knikolla at bu.edu Wed Jan 8 15:43:31 2020 From: knikolla at bu.edu (Nikolla, Kristi) Date: Wed, 8 Jan 2020 15:43:31 +0000 Subject: OIDC/OAuth2 token introspection in Keystone In-Reply-To: <099190af-4c18-ce7a-3cb4-6e2ee033a07c@cineca.it> References: <338A6D25-9DBF-492D-A94C-14E4A311FBE7@bu.edu> <099190af-4c18-ce7a-3cb4-6e2ee033a07c@cineca.it> Message-ID: <3AFA4803-A34B-4ACE-96CC-63C2A4186922@bu.edu> There is an patch to improve the documentation for using the CLI with OIDC, but it hasn't merged yet. See here https://review.opendev.org/#/c/693838 Keystoneauth has plugins in place for authenticating with the OIDC IdP in multiple ways, including using an access token, see here https://github.com/openstack/keystoneauth/blob/master/keystoneauth1/identity/v3/oidc.py Best, Kristi On Jan 8, 2020, at 10:31 AM, mcarpene > wrote: Many thanks Nikolla, I was able to federate using OIDC IdP via the dashboard. I meant the problem is authenticating via CLI providing a OIDC token via command line, but maybe you already answered to my request. BR, Michele On 08/01/20 16:28, wrote: Hi Michele, We just approved a feature request for that [0], however it was merged to backlog, meaning no specific timeline for it being implemented yet. With the current implementation, you can use OAuth 2.0 Access Tokens with Keystone, however the token introspection endpoint will be used, therefore only the claims contained in the access token will be returned. I am assuming your question is with regards to the userinfo endpoint and OIDC claims, which we do not currently support. [0]. https://review.opendev.org/#/c/373983/ On Jan 8, 2020, at 8:01 AM, mcarpene > wrote: Hi all, my question is: could OS Keystone support OIDC/OAuth2 token introspection/validation. I mean for example executing a swift command via CLI adding a OIDC token bearer as a parameter to the swift command. In this case Keystone should validate the OIDC token towards and external IdP (using introspection endpoint/protocol for oidc). Is this currently supported, or eventually would be done in the near future? thanks Michele -- Michele Carpené SuperComputing Applications and Innovation Department CINECA - via Magnanelli, 6/3, 40033 Casalecchio di Reno (Bologna) - ITALY Tel: +39 051 6171730 Fax: +39 051 6132198 Skype: mcarpene http://www.hpc.cineca.it/ -- Michele Carpené SuperComputing Applications and Innovation Department CINECA - via Magnanelli, 6/3, 40033 Casalecchio di Reno (Bologna) - ITALY Tel: +39 051 6171730 Fax: +39 051 6132198 Skype: mcarpene http://www.hpc.cineca.it/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shubjero at gmail.com Wed Jan 8 16:25:35 2020 From: shubjero at gmail.com (shubjero) Date: Wed, 8 Jan 2020 11:25:35 -0500 Subject: Compute node NIC bonding for increased instance throughput Message-ID: Good day, I have a question for the OpenStack community, hopefully someone can help me out here. Goal ------------------ Provision an NFS instance capable of providing 20Gbps of network throughput to be used by multiple other instances within the same project/network. Background ------------------ We run an OpenStack Stein cluster on Ubuntu 18.04. Our Neutron architecture is using openvswitch and GRE. Our compute nodes have two 10G NIC's and are configured in a layer3+4 LACP to the Top of Rack switch. Observations ------------------ Successfully see 20Gbps of traffic balanced across both slaves in the bond when performing iperf3 tests at the *baremetal/os/ubuntu* layer with two other compute nodes as iperf3 clients. Problem ------------------ We are unable to achieve 20Gbps at the instance level. We have tried multiple iperf3 connections from multiple other instances on different compute nodes and we are only able to reach 10Gbps and notice that traffic is not utilizing both slaves in the bond. One slave gets all of the traffic while the other slave sits basically idle. I have some configuration output here: http://paste.openstack.org/show/QdQq76q6VI1XN5tLW0xH/ Any help would be appreciated! Jared Baker Cloud Architect, Ontario Institute for Cancer Research From Arkady.Kanevsky at dell.com Wed Jan 8 16:31:49 2020 From: Arkady.Kanevsky at dell.com (Arkady.Kanevsky at dell.com) Date: Wed, 8 Jan 2020 16:31:49 +0000 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: <20200107235139.2l5iw2fumgsfoz5u@yuggoth.org> References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> <7b55e3b28d644492a846fdb10f7b127b@AUSX13MPS308.AMER.DELL.COM> <582d2544d3d74fe7beef50aaaa35d558@AUSX13MPS308.AMER.DELL.COM> <20200107235139.2l5iw2fumgsfoz5u@yuggoth.org> Message-ID: <2706c21c3f7d4203a8a20342f8f6a68c@AUSX13MPS308.AMER.DELL.COM> Jeremy, Correct. programming devices and "updating firmware" I count as separate activities. Similar to CPU or GPU. -----Original Message----- From: Jeremy Stanley Sent: Tuesday, January 7, 2020 5:52 PM To: openstack-discuss at lists.openstack.org Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management On 2020-01-07 23:17:25 +0000 (+0000), Arkady.Kanevsky at dell.com wrote: > It is hard to image that any production env of any customer will allow > anybody but administrator to update FW on any device at any time. The > security implication are huge. [...] I thought this was precisely the point of exposing FPGA hardware into server instances. Or do you not count programming those as "updating firmware?" -- Jeremy Stanley From sinan at turka.nl Wed Jan 8 16:43:06 2020 From: sinan at turka.nl (Sinan Polat) Date: Wed, 8 Jan 2020 17:43:06 +0100 Subject: Compute node NIC bonding for increased instance throughput In-Reply-To: References: Message-ID: Hi Jared, A single stream will utilize just 1 link. Have you tried with multiple streams using different sources? What do you mean with layer3+4. Do you mean the xmit hash policy? Sinan > Op 8 jan. 2020 om 17:25 heeft shubjero het volgende geschreven: > > Good day, > > I have a question for the OpenStack community, hopefully someone can > help me out here. > > Goal > ------------------ > Provision an NFS instance capable of providing 20Gbps of network > throughput to be used by multiple other instances within the same > project/network. > > Background > ------------------ > We run an OpenStack Stein cluster on Ubuntu 18.04. Our Neutron > architecture is using openvswitch and GRE. Our compute nodes have two > 10G NIC's and are configured in a layer3+4 LACP to the Top of Rack > switch. > > Observations > ------------------ > Successfully see 20Gbps of traffic balanced across both slaves in the > bond when performing iperf3 tests at the *baremetal/os/ubuntu* layer > with two other compute nodes as iperf3 clients. > > Problem > ------------------ > We are unable to achieve 20Gbps at the instance level. We have tried > multiple iperf3 connections from multiple other instances on different > compute nodes and we are only able to reach 10Gbps and notice that > traffic is not utilizing both slaves in the bond. One slave gets all > of the traffic while the other slave sits basically idle. > > I have some configuration output here: > http://paste.openstack.org/show/QdQq76q6VI1XN5tLW0xH/ > > Any help would be appreciated! > > Jared Baker > Cloud Architect, Ontario Institute for Cancer Research > From tpb at dyncloud.net Wed Jan 8 16:54:12 2020 From: tpb at dyncloud.net (Tom Barron) Date: Wed, 8 Jan 2020 11:54:12 -0500 Subject: [Manila] First meeting of 2020 Message-ID: <20200108165412.jtzxx425wfzq6um7@barron.net> Hey Zorillas! Just a reminder of our first meeting in 2020, 9 January at 1500 UTC on Freenode #openstack-meeting-alt. Feel free to update the agenda [1]. -- Tom [1] https://wiki.openstack.org/wiki/Manila/Meetings#Next_meeting From pierre at stackhpc.com Wed Jan 8 17:31:10 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Wed, 8 Jan 2020 18:31:10 +0100 Subject: [scientific][www] Unable to download OpenStack for Scientific Research book Message-ID: Hello, I tried to download the book at https://www.openstack.org/science/ but the link doesn't work. Could this please be fixed? I looked on openstack.org for a contact address, but couldn't find one. Please let me know if there is a specific address I should use next time. Thanks, Pierre From cboylan at sapwetik.org Wed Jan 8 17:35:04 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Wed, 08 Jan 2020 09:35:04 -0800 Subject: [all][infra] Compressed job log artifacts not loading in browser Message-ID: <0e443056-2eb3-4bd2-9d25-6a1b55d214ea@www.fastmail.com> Over the holidays the infra team noticed that some of our release artifacts were in the wrong file format. On further investigation we discovered the reason for this was some swift implementations were inflating compressed tarballs when retrieved by clients not setting accept-encoding: gzip. We then ended up with tar files when we expected .tar.gz format files. This behavior seems to be controlled by the content-encoding we set on object upload. If we tell swift that the object is a gzip'd file on upload then the webservers helpfully inflate them when a client retrieves them. In order to fix our release artifacts we have updated our swift upload tooling to stop setting content-type on gzip files. This forces the swift implementation to return the files in the same format they are uploaded when retrieved. A side effect of this is that any gzip'd files (like testr_results.html.gz) are no longer automatically decompressed for you when you retrieve them. We have fixes up to handle common occurrences of this at https://review.opendev.org/#/c/701578/ and https://review.opendev.org/701282 (note the second case is already handled on OpenDev's Zuul). If you run into other files you expect to be browesable but get compressed instead, the fix is to stop compressing the files explicitly in the job. We will still upload files in compressed form to swift for efficiency but we should operate on these files as if they were uncompressed. Then for files that should be compressed (like .tar.gz files) we can pass them through as is avoiding any format change problems. Clark From mordred at inaugust.com Wed Jan 8 17:39:18 2020 From: mordred at inaugust.com (Monty Taylor) Date: Wed, 8 Jan 2020 12:39:18 -0500 Subject: [infra] Retire openstack/js-openstack-lib repository In-Reply-To: References: Message-ID: <2AA78965-B496-41D9-A41B-DF75694A3EB9@inaugust.com> > On Jan 8, 2020, at 4:32 AM, Radosław Piliszek wrote: > > Thanks, Andreas. Will do. > > I thought it might be also wise to preserve this since there are posts > now and then that horizon is reaching its limit and a JS lib might be > beneficial for any possible replacement (as it can run from the > browser) Said this in IRC, but for the mailing list - I’d be happy to accept it into the SDK project as a deliverable if you wanted to take it on. From what I can tell it does process clouds.yaml files - so it might be a nice way for us to verify good cross-language support for that format. (Should probably also add support for things like os-service-types and the well-known api discovery that have come since this library was last worked on) It would be nice to keep it and move it forward if it’s solid and a thing that’s valuable to people. > Though I have no idea what the state of this library is. OTOH, quick > google search reveals that alternatives do not seem better at the > first glance. > The only promising one was https://github.com/pkgcloud/pkgcloud but it > is not OS-centric and has therefore different goals. > > -yoctozepto > > śr., 8 sty 2020 o 10:26 Andreas Jaeger napisał(a): >> >> On 08/01/2020 10.21, Radosław Piliszek wrote: >>> Are there any alternatives? >>> I would be glad to pick this up because I planned some integrations >>> like this on my own. >> >> If you want to pick this up, best discuss with Clark as Infra PTL. We >> can keep it if there is real interest, >> >> Andreas >> >>> -yoctozepto >>> >>> śr., 8 sty 2020 o 09:48 Andreas Jaeger napisał(a): >>>> >>>> The js-openstack-lib repository is orphaned and has not seen any real >>>> merges or contributions since February 2017, I propose to retire it. >>>> >>>> I'll send retirement changes using topic retire-js-openstack-lib, >>>> >>>> Andreas >>>> -- >>>> Andreas Jaeger aj at suse.com Twitter: jaegerandi >>>> SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg >>>> (HRB 36809, AG Nürnberg) GF: Felix Imendörffer >>>> GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB >>>> >> >> >> -- >> Andreas Jaeger aj at suse.com Twitter: jaegerandi >> SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg >> (HRB 36809, AG Nürnberg) GF: Felix Imendörffer >> GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB > > From fungi at yuggoth.org Wed Jan 8 18:47:45 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 8 Jan 2020 18:47:45 +0000 Subject: [scientific][www] Unable to download OpenStack for Scientific Research book In-Reply-To: References: Message-ID: <20200108184745.fs6d4p4udr7deffp@yuggoth.org> On 2020-01-08 18:31:10 +0100 (+0100), Pierre Riteau wrote: > I tried to download the book at https://www.openstack.org/science/ > but the link doesn't work. Could this please be fixed? I've personally reported it to the webmasters for the www.openstack.org site. In the meantime, a bit of searching turns up https://www.openstack.org/assets/science/CrossroadofCloudandHPC.pdf which will redirect to a working copy. As I wrote this, Wes Wilson pointed out to me that there's also a 6x9in "printable" version at https://www.openstack.org/assets/science/CrossroadofCloudandHPC-Print.pdf and preprinted copies for purchase at https://www.amazon.com/dp/1978244703/ if that's more your speed. > I looked on openstack.org for a contact address, but couldn't find > one. Please let me know if there is a specific address I should use > next time. Yes, I know they're working on getting something added. They've generally been relying on E-mails to summitapp at openstack.org or bugs filed at https://bugs.launchpad.net/openstack-org/+filebug but I gather they're creating a support at openstack.org address or something along those lines to mention in page footers on the site soon. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From openstack at fried.cc Wed Jan 8 19:45:37 2020 From: openstack at fried.cc (Eric Fried) Date: Wed, 8 Jan 2020 13:45:37 -0600 Subject: [cliff][docs][requirements] new cliff versions causes docs to fail to build In-Reply-To: References: <20191222175308.juzyu6grndfcf2ez@mthode.org> <20200107151521.GA349057@sm-workstation> <20200107152237.GA349707@sm-workstation> Message-ID: <28c1b684-7597-99cf-42bb-4995e0aa9f54@fried.cc> > I suggest we blacklist 2.17.0 I see you did this here [1] (merged) > and issue > a new 2.17.1 or 2.18.0 release post-haste. and this here [2] (open) > That's not all though. For some daft reason, the 'python-rsdclient' > projects has imported argparses 'HelpFormatter' from 'cliff._argparse' > instead of 'argparse'. They need to stop doing this because commit > 584352dcd008d58c433136539b22a6ae9d6c45cc of cliff means this will no > longer work. Just import argparse directly. I didn't see an open review for this so I made one [3]. I'm not sure how we should deal with python-rsdclient's cliff req, though. Should we blacklist the bad version in accordance with [1], or remove the req entirely since that was the only thing in the project that referenced it? And should that happen in the same patch or separately? efried [1] https://review.opendev.org/#/c/701406/ [2] https://review.opendev.org/#/c/701405/ [3] https://review.opendev.org/#/c/701599/ From openstack at fried.cc Wed Jan 8 19:53:40 2020 From: openstack at fried.cc (Eric Fried) Date: Wed, 8 Jan 2020 13:53:40 -0600 Subject: [all][infra] Compressed job log artifacts not loading in browser In-Reply-To: <0e443056-2eb3-4bd2-9d25-6a1b55d214ea@www.fastmail.com> References: <0e443056-2eb3-4bd2-9d25-6a1b55d214ea@www.fastmail.com> Message-ID: Since it was not obvious to me, and thus might not be obvious to others, this > If you run into other files you expect to be browesable but get compressed instead, the fix is to stop compressing the files explicitly in the job. means that (many? most? all?) legacy jobs are impacted, and the remedy is to fix the individual jobs as noted, or (preferably) port them to zuulv3. efried . From rosmaita.fossdev at gmail.com Wed Jan 8 20:43:33 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 8 Jan 2020 15:43:33 -0500 Subject: [cinder] ussuri virtual mid-cycle poll Message-ID: As determined at the virtual PTG (which seems like it happened only a few weeks ago), we'll be doing a two-phase virtual mid-cycle, meeting for two hours the week of 20 January (before the spec freeze) and again around the week of 16 March (the Cinder new feature status checkpoint). There's a poll up to determine a suitable time for the first virtual mid-cycle meeting: https://doodle.com/poll/n3tmq8ep43dyi7tv Please fill out the poll as soon as you can. If all the times are horrible for you, please suggest an alternative in a comment on the poll. The poll will close at 23:59 UTC on Saturday 11 January. (I know it's soon, but that way we'll have time to make adjustments if necessary.) cheers, brian From radoslaw.piliszek at gmail.com Wed Jan 8 21:03:48 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Wed, 8 Jan 2020 22:03:48 +0100 Subject: [infra] Retire openstack/js-openstack-lib repository In-Reply-To: <2AA78965-B496-41D9-A41B-DF75694A3EB9@inaugust.com> References: <2AA78965-B496-41D9-A41B-DF75694A3EB9@inaugust.com> Message-ID: While the project is not well-documented (for any potential user), the code looks quite nice (well-structured, test-covered and documented). I checked with nodejs6 (old obsoleted) as this was what functional tests jobs were mentioning and I did not want any surprises. Yet it failed to properly interpret Stein endpoints. First issue is that it requires unversioned keystone url passed to it. Then it started failing on something less obvious and I am too tired today to debug it. :-) Deps are partially deprecated, some have been replaced, some have security issues. Based on first impression I see it fit for keeping as a deliverable but it needs some work to bring it back in shape. It makes sense to go to SDK project, albeit it requires nodejs familiarity in addition to general API/SDK building knowledge. PS: I noticed nodejs 8 is already EOL (this year) and it seems to be the max in infra. I would appreciate any help with getting nodejs 10 and 12 into infra. -yoctozepto śr., 8 sty 2020 o 18:39 Monty Taylor napisał(a): > > > > > On Jan 8, 2020, at 4:32 AM, Radosław Piliszek wrote: > > > > Thanks, Andreas. Will do. > > > > I thought it might be also wise to preserve this since there are posts > > now and then that horizon is reaching its limit and a JS lib might be > > beneficial for any possible replacement (as it can run from the > > browser) > > Said this in IRC, but for the mailing list - I’d be happy to accept it into the SDK project as a deliverable if you wanted to take it on. From what I can tell it does process clouds.yaml files - so it might be a nice way for us to verify good cross-language support for that format. (Should probably also add support for things like os-service-types and the well-known api discovery that have come since this library was last worked on) It would be nice to keep it and move it forward if it’s solid and a thing that’s valuable to people. > > > Though I have no idea what the state of this library is. OTOH, quick > > google search reveals that alternatives do not seem better at the > > first glance. > > The only promising one was https://github.com/pkgcloud/pkgcloud but it > > is not OS-centric and has therefore different goals. > > > > -yoctozepto > > > > śr., 8 sty 2020 o 10:26 Andreas Jaeger napisał(a): > >> > >> On 08/01/2020 10.21, Radosław Piliszek wrote: > >>> Are there any alternatives? > >>> I would be glad to pick this up because I planned some integrations > >>> like this on my own. > >> > >> If you want to pick this up, best discuss with Clark as Infra PTL. We > >> can keep it if there is real interest, > >> > >> Andreas > >> > >>> -yoctozepto > >>> > >>> śr., 8 sty 2020 o 09:48 Andreas Jaeger napisał(a): > >>>> > >>>> The js-openstack-lib repository is orphaned and has not seen any real > >>>> merges or contributions since February 2017, I propose to retire it. > >>>> > >>>> I'll send retirement changes using topic retire-js-openstack-lib, > >>>> > >>>> Andreas > >>>> -- > >>>> Andreas Jaeger aj at suse.com Twitter: jaegerandi > >>>> SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > >>>> (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > >>>> GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB > >>>> > >> > >> > >> -- > >> Andreas Jaeger aj at suse.com Twitter: jaegerandi > >> SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > >> (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > >> GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB > > > > > From fungi at yuggoth.org Wed Jan 8 21:12:08 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 8 Jan 2020 21:12:08 +0000 Subject: [infra] Retire openstack/js-openstack-lib repository In-Reply-To: References: <2AA78965-B496-41D9-A41B-DF75694A3EB9@inaugust.com> Message-ID: <20200108211208.mnvspulwlghdyyz5@yuggoth.org> On 2020-01-08 22:03:48 +0100 (+0100), Radosław Piliszek wrote: [...] > I noticed nodejs 8 is already EOL (this year) and it seems to be > the max in infra. I would appreciate any help with getting nodejs > 10 and 12 into infra. [...] Can you be more specific? Zuul will obviously allow you to install anything you like in a job, so presumably you're finding some defaults hard-coded somewhere we should reevaluate? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ekultails at gmail.com Wed Jan 8 21:49:15 2020 From: ekultails at gmail.com (Luke Short) Date: Wed, 8 Jan 2020 16:49:15 -0500 Subject: [tripleo] Use Podman 1.6 in CI Message-ID: Hey folks, We have been running into a situation where an older version of Podman is used in CI that has consistent failures. It has problems deleting storage associated with a container. A possible workaround (originally created by Damien) can be found here [1]. A few of us been in talks with the Podman team about this problem and have tested with a newer version of it (1.6.4, to be exact) and found that it is no longer an issue. The most ideal situation is to simply use this newer version of Podman instead of adding hacky workarounds that we will soon revert. However, I am unsure about how we would go about doing this upstream. RHEL will soon get an updated Podman version but CentOS always lags behind. Even CentOS 8 Stream does not contain the newer version [2]. The question/ask I have is can we ship/use a newer version of Podman in our upstream CI? Or should we continue our efforts on making a workaround? 1. https://review.opendev.org/#/c/698999/ 2. http://mirror.centos.org/centos/8-stream/AppStream/x86_64/os/Packages/ Sincerely, Luke Short -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Wed Jan 8 21:55:57 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Wed, 08 Jan 2020 13:55:57 -0800 Subject: [infra] Retire openstack/js-openstack-lib repository In-Reply-To: <20200108211208.mnvspulwlghdyyz5@yuggoth.org> References: <2AA78965-B496-41D9-A41B-DF75694A3EB9@inaugust.com> <20200108211208.mnvspulwlghdyyz5@yuggoth.org> Message-ID: <8679edfb-a26a-4292-9886-3c71cec21f83@www.fastmail.com> On Wed, Jan 8, 2020, at 1:12 PM, Jeremy Stanley wrote: > On 2020-01-08 22:03:48 +0100 (+0100), Radosław Piliszek wrote: > [...] > > I noticed nodejs 8 is already EOL (this year) and it seems to be > > the max in infra. I would appreciate any help with getting nodejs > > 10 and 12 into infra. > [...] > > Can you be more specific? Zuul will obviously allow you to install > anything you like in a job, so presumably you're finding some > defaults hard-coded somewhere we should reevaluate? We even supply a role from zuul-jobs to install nodejs from nodesource for you, https://zuul-ci.org/docs/zuul-jobs/js-roles.html#role-install-nodejs. This can install any nodejs version available from nodesource for the current platform. Clark From aschultz at redhat.com Wed Jan 8 22:18:23 2020 From: aschultz at redhat.com (Alex Schultz) Date: Wed, 8 Jan 2020 15:18:23 -0700 Subject: [tripleo] tripleo-operator-ansible start and request for input Message-ID: [Hello folks, I've begun the basic start of the tripleo-operator-ansible collection work[0]. At the start of this work, I've chosen the undercloud installation[1] as the first role to use to figure out how we the end user's to consume these roles. I wanted to bring up this initial implementation so that we can discuss how folks will include these roles. The initial implementation is a wrapper around the tripleoclient command as run via openstackclient. This means that the 'tripleo-undercloud' role provides implementations for 'openstack undercloud backup', 'openstack undercloud install', and 'openstack undercloud upgrade'. In terms of naming conventions, I'm proposing that we would name the roles "tripleo-" with the last part of the command action being an "action". Examples: "openstack undercloud *" -> role: tripleo-undercloud action: (backup|install|upgrade) "openstack undercloud minion *" -> role: tripleo-undercloud-minion action: (install|upgrade) "openstack overcloud *" -> role: tripleo-overcloud action: (deploy|delete|export) "openstack overcloud node *" -> role: tripleo-overcloud-node action: (import|introspect|provision|unprovision) In terms of end user interface, I've got two proposals out in terms of possible implementations. Tasks from method: The initial commit propose that we would require the end user to use an include_role/tasks_from call to perform the desired action. For example: - hosts: undercloud gather_facts: true tasks: - name: Install undercloud collections: - tripleo.operator import_role: name: tripleo-undercloud tasks_from: install vars: tripleo_undercloud_debug: true Variable switch method: I've also proposed an alternative implementation[2] that would use include_role but require the end user to set a specific variable to change if the role runs 'install', 'backup' or 'upgrade'. With this patch the playbook would look something like: - hosts: undercloud gather_facts: true tasks: - name: Install undercloud collections: - tripleo.operator import_role: name: tripleo-undercloud vars: tripleo_undercloud_action: install tripleo_undercloud_debug: true I would like to solicit feedback on which one of these is the preferred integration method when calling these roles. I have two patches up in tripleo-quickstart-extras to show how these calls could be run. The "Tasks from method" can be viewed here[3]. The "Variable switch method" can be viewed here[4]. I can see pros and cons for both methods. My take would be: Tasks from method: Pros: - action is a bit more explicit - dynamic logic left up to the playbook/consumer. - May not have a 'default' action (as main.yml is empty, though it could be implemented). - tasks_from would be a global implementation across all roles rather than having a changing variable name. Cons: - internal task file names must be known by the consumer (though IMHO this is no different than the variable name + values in the other implementation) - role/action inclusions is not dynamic in the role (it can be in the playbook) Variable switch method: Pros: - inclusion of the role by default runs an install - action can be dynamically changed from the calling playbook via an ansible var - structure of the task files is internal to the role and the user of the role need not know the filenames/structure. Cons: - calling playbook is not explicit in that the action can be switched dynamically (e.g. intentionally or accidentally because it is dynamic) - implementer must know to configure a variable called `tripleo_undercloud_action` to switch between install/backup/upgrade actions - variable names are likely different depending on the role My personal preference might be to use the "Tasks from method" because it would lend itself to the same implementation across all roles and the dynamic logic is left to the playbook rather than internally in the role. For example, we'd end up with something like: - hosts: undercloud gather_facts: true collections: - tripleo.operator tasks: - name: Install undercloud import_role: name: tripleo-undercloud tasks_from: install vars: tripleo_undercloud_debug: true - name: Upload images import_role: name: tripleo-overcloud-images tasks_from: upload vars: tripleo_overcloud_images_debug: true - name: Import nodes import_role: name: tripleo-overcloud-node tasks_from: import vars: tripleo_overcloud_node_debug: true tripleo_overcloud_node_import_file: instack.json - name: Introspect nodes import_role: name: tripleo-overcloud-node tasks_from: introspect vars: tripleo_overcloud_node_debug: true tripleo_overcloud_node_introspect_all_manageable: True tripleo_overcloud_node_introspect_provide: True - name: Overcloud deploy import_role: name: tripleo-overcloud tasks_from: deploy vars: tripleo_overcloud_debug: true tripleo_overcloud_deploy_environment_files: - /home/stack/params.yaml The same general tasks performed via the "Variable switch method" would look something like: - hosts: undercloud gather_facts: true collections: - tripleo.operator tasks: - name: Install undercloud import_role: name: tripleo-undercloud vars: tripleo_undercloud_action: install tripleo_undercloud_debug: true - name: Upload images import_role: name: tripleo-overcloud-images vars: tripleo_overcloud_images_action: upload tripleo_overcloud_images_debug: true - name: Import nodes import_role: name: tripleo-overcloud-node vars: tripleo_overcloud_node_action: import tripleo_overcloud_node_debug: true tripleo_overcloud_node_import_file: instack.json - name: Introspect nodes import_role: name: tripleo-overcloud-node vars: tripleo_overcloud_node_action: introspect tripleo_overcloud_node_debug: true tripleo_overcloud_node_introspect_all_manageable: True tripleo_overcloud_node_introspect_provide: True - name: Overcloud deploy import_role: name: tripleo-overcloud vars: tripleo_overcloud_action: deploy tripleo_overcloud_debug: true tripleo_overcloud_deploy_environment_files: - /home/stack/params.yaml Thoughts? Thanks, -Alex [0] https://blueprints.launchpad.net/tripleo/+spec/tripleo-operator-ansible [1] https://review.opendev.org/#/c/699311/ [2] https://review.opendev.org/#/c/701628/ [3] https://review.opendev.org/#/c/701034/ [4] https://review.opendev.org/#/c/701628/ From fungi at yuggoth.org Wed Jan 8 22:25:24 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 8 Jan 2020 22:25:24 +0000 Subject: [tripleo] Use Podman 1.6 in CI In-Reply-To: References: Message-ID: <20200108222524.jxhxlxzuvxt3mazw@yuggoth.org> On 2020-01-08 16:49:15 -0500 (-0500), Luke Short wrote: > We have been running into a situation where an older version of > Podman is used in CI that has consistent failures. It has problems > deleting storage associated with a container. [...] > The question/ask I have is can we ship/use a newer version of > Podman in our upstream CI? Or should we continue our efforts on > making a workaround? [...] This sounds like a problem users of your software could encounter in production. If so, how does only fixing it in CI jobs help your users? It seems like time might be better spent fixing the problem for everyone. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From aschultz at redhat.com Wed Jan 8 22:35:03 2020 From: aschultz at redhat.com (Alex Schultz) Date: Wed, 8 Jan 2020 15:35:03 -0700 Subject: [tripleo] Use Podman 1.6 in CI In-Reply-To: <20200108222524.jxhxlxzuvxt3mazw@yuggoth.org> References: <20200108222524.jxhxlxzuvxt3mazw@yuggoth.org> Message-ID: On Wed, Jan 8, 2020 at 3:30 PM Jeremy Stanley wrote: > > On 2020-01-08 16:49:15 -0500 (-0500), Luke Short wrote: > > We have been running into a situation where an older version of > > Podman is used in CI that has consistent failures. It has problems > > deleting storage associated with a container. > [...] > > The question/ask I have is can we ship/use a newer version of > > Podman in our upstream CI? Or should we continue our efforts on > > making a workaround? > [...] > > This sounds like a problem users of your software could encounter in > production. If so, how does only fixing it in CI jobs help your > users? It seems like time might be better spent fixing the problem > for everyone. Btw fixing CI implies fixing for everyone. In other words, how do we make it available for everyone (including CI). This is one of those ecosystem things because we (tripleo/openstack) don't necessarily ship it but we do need to use it. I'm uncertain of the centos7/podman 1.6 support and which branches are affected by this? This might be a better question for RDO. Thanks, -Alex > -- > Jeremy Stanley From ekultails at gmail.com Wed Jan 8 22:42:31 2020 From: ekultails at gmail.com (Luke Short) Date: Wed, 8 Jan 2020 17:42:31 -0500 Subject: [tripleo] tripleo-operator-ansible start and request for input In-Reply-To: References: Message-ID: Hey Alex, This is a great starting point! Thanks for sharing. I personally prefer the variables approach. This more-so aligns with the best practices for creating an Ansible role. An operator could provide a single variables file that has the booleans set for what they want configured. This also makes it easier to include/import the role once and then have it do multiple actions. For example, tripleo-undercloud can be used for both installation and backup. CI could even consume this to run "all the things" from the role by setting those extra variables. We provide the playbooks and the operators configure the variables for their environment. Long-term, I see TripleO consuming a single or few straight-forward Ansible variables files that define the entire deployment as opposed to the giant monster that Heat templates have become. Those are just my initial thoughts on the matter. I am interested to see what others think as well. Sincerely, Luke Short On Wed, Jan 8, 2020 at 5:20 PM Alex Schultz wrote: > [Hello folks, > > I've begun the basic start of the tripleo-operator-ansible collection > work[0]. At the start of this work, I've chosen the undercloud > installation[1] as the first role to use to figure out how we the end > user's to consume these roles. I wanted to bring up this initial > implementation so that we can discuss how folks will include these > roles. The initial implementation is a wrapper around the > tripleoclient command as run via openstackclient. This means that the > 'tripleo-undercloud' role provides implementations for 'openstack > undercloud backup', 'openstack undercloud install', and 'openstack > undercloud upgrade'. > > In terms of naming conventions, I'm proposing that we would name the > roles "tripleo-" with the last part of the command > action being an "action". Examples: > > "openstack undercloud *" -> > role: tripleo-undercloud > action: (backup|install|upgrade) > > "openstack undercloud minion *" -> > role: tripleo-undercloud-minion > action: (install|upgrade) > > "openstack overcloud *" -> > role: tripleo-overcloud > action: (deploy|delete|export) > > "openstack overcloud node *" -> > role: tripleo-overcloud-node > action: (import|introspect|provision|unprovision) > > In terms of end user interface, I've got two proposals out in terms of > possible implementations. > > Tasks from method: > The initial commit propose that we would require the end user to use > an include_role/tasks_from call to perform the desired action. For > example: > > - hosts: undercloud > gather_facts: true > tasks: > - name: Install undercloud > collections: > - tripleo.operator > import_role: > name: tripleo-undercloud > tasks_from: install > vars: > tripleo_undercloud_debug: true > > Variable switch method: > I've also proposed an alternative implementation[2] that would use > include_role but require the end user to set a specific variable to > change if the role runs 'install', 'backup' or 'upgrade'. With this > patch the playbook would look something like: > > - hosts: undercloud > gather_facts: true > tasks: > - name: Install undercloud > collections: > - tripleo.operator > import_role: > name: tripleo-undercloud > vars: > tripleo_undercloud_action: install > tripleo_undercloud_debug: true > > I would like to solicit feedback on which one of these is the > preferred integration method when calling these roles. I have two > patches up in tripleo-quickstart-extras to show how these calls could > be run. The "Tasks from method" can be viewed here[3]. The "Variable > switch method" can be viewed here[4]. I can see pros and cons for > both methods. > > My take would be: > > Tasks from method: > Pros: > - action is a bit more explicit > - dynamic logic left up to the playbook/consumer. > - May not have a 'default' action (as main.yml is empty, though it > could be implemented). > - tasks_from would be a global implementation across all roles rather > than having a changing variable name. > > Cons: > - internal task file names must be known by the consumer (though IMHO > this is no different than the variable name + values in the other > implementation) > - role/action inclusions is not dynamic in the role (it can be in the > playbook) > > Variable switch method: > Pros: > - inclusion of the role by default runs an install > - action can be dynamically changed from the calling playbook via an > ansible var > - structure of the task files is internal to the role and the user of > the role need not know the filenames/structure. > > Cons: > - calling playbook is not explicit in that the action can be switched > dynamically (e.g. intentionally or accidentally because it is dynamic) > - implementer must know to configure a variable called > `tripleo_undercloud_action` to switch between install/backup/upgrade > actions > - variable names are likely different depending on the role > > My personal preference might be to use the "Tasks from method" because > it would lend itself to the same implementation across all roles and > the dynamic logic is left to the playbook rather than internally in > the role. For example, we'd end up with something like: > > - hosts: undercloud > gather_facts: true > collections: > - tripleo.operator > tasks: > - name: Install undercloud > import_role: > name: tripleo-undercloud > tasks_from: install > vars: > tripleo_undercloud_debug: true > - name: Upload images > import_role: > name: tripleo-overcloud-images > tasks_from: upload > vars: > tripleo_overcloud_images_debug: true > - name: Import nodes > import_role: > name: tripleo-overcloud-node > tasks_from: import > vars: > tripleo_overcloud_node_debug: true > tripleo_overcloud_node_import_file: instack.json > - name: Introspect nodes > import_role: > name: tripleo-overcloud-node > tasks_from: introspect > vars: > tripleo_overcloud_node_debug: true > tripleo_overcloud_node_introspect_all_manageable: True > tripleo_overcloud_node_introspect_provide: True > - name: Overcloud deploy > import_role: > name: tripleo-overcloud > tasks_from: deploy > vars: > tripleo_overcloud_debug: true > tripleo_overcloud_deploy_environment_files: > - /home/stack/params.yaml > > The same general tasks performed via the "Variable switch method" > would look something like: > > - hosts: undercloud > gather_facts: true > collections: > - tripleo.operator > tasks: > - name: Install undercloud > import_role: > name: tripleo-undercloud > vars: > tripleo_undercloud_action: install > tripleo_undercloud_debug: true > - name: Upload images > import_role: > name: tripleo-overcloud-images > vars: > tripleo_overcloud_images_action: upload > tripleo_overcloud_images_debug: true > - name: Import nodes > import_role: > name: tripleo-overcloud-node > vars: > tripleo_overcloud_node_action: import > tripleo_overcloud_node_debug: true > tripleo_overcloud_node_import_file: instack.json > - name: Introspect nodes > import_role: > name: tripleo-overcloud-node > vars: > tripleo_overcloud_node_action: introspect > tripleo_overcloud_node_debug: true > tripleo_overcloud_node_introspect_all_manageable: True > tripleo_overcloud_node_introspect_provide: True > - name: Overcloud deploy > import_role: > name: tripleo-overcloud > vars: > tripleo_overcloud_action: deploy > tripleo_overcloud_debug: true > tripleo_overcloud_deploy_environment_files: > - /home/stack/params.yaml > > Thoughts? > > Thanks, > -Alex > > [0] > https://blueprints.launchpad.net/tripleo/+spec/tripleo-operator-ansible > [1] https://review.opendev.org/#/c/699311/ > [2] https://review.opendev.org/#/c/701628/ > [3] https://review.opendev.org/#/c/701034/ > [4] https://review.opendev.org/#/c/701628/ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Wed Jan 8 22:54:26 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 8 Jan 2020 22:54:26 +0000 Subject: [tripleo] Use Podman 1.6 in CI In-Reply-To: References: <20200108222524.jxhxlxzuvxt3mazw@yuggoth.org> Message-ID: <20200108225426.7jqu7mquf7ktxkqx@yuggoth.org> On 2020-01-08 15:35:03 -0700 (-0700), Alex Schultz wrote: > On Wed, Jan 8, 2020 at 3:30 PM Jeremy Stanley wrote: > > On 2020-01-08 16:49:15 -0500 (-0500), Luke Short wrote: > > > We have been running into a situation where an older version of > > > Podman is used in CI that has consistent failures. It has problems > > > deleting storage associated with a container. > > [...] > > > The question/ask I have is can we ship/use a newer version of > > > Podman in our upstream CI? Or should we continue our efforts on > > > making a workaround? > > [...] > > > > This sounds like a problem users of your software could encounter in > > production. If so, how does only fixing it in CI jobs help your > > users? It seems like time might be better spent fixing the problem > > for everyone. > > Btw fixing CI implies fixing for everyone. In other words, how do we > make it available for everyone (including CI). This is one of those > ecosystem things because we (tripleo/openstack) don't necessarily ship > it but we do need to use it. I'm uncertain of the centos7/podman 1.6 > support and which branches are affected by this? This might be a > better question for RDO. I see, "ship/use a newer version of Podman in our upstream CI" didn't seem to necessarily imply getting a newer version of Podman into RDO/TripleO and the hands of its users. I have a bit of a knee-jerk reaction whenever I see someone talk about "fixing CI" when the underlying problem is in the software being tested and not the CI jobs. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ekultails at gmail.com Wed Jan 8 23:07:17 2020 From: ekultails at gmail.com (Luke Short) Date: Wed, 8 Jan 2020 18:07:17 -0500 Subject: [tripleo] Use Podman 1.6 in CI In-Reply-To: <20200108225426.7jqu7mquf7ktxkqx@yuggoth.org> References: <20200108222524.jxhxlxzuvxt3mazw@yuggoth.org> <20200108225426.7jqu7mquf7ktxkqx@yuggoth.org> Message-ID: Hey folks, Thank you for all of the feedback so far. The goal is definitely to fix this everywhere we can, no just in CI. Sorry for my poor choice of words. I will migrate this discussion over to the RDO community. Sincerely, Luke Short On Wed, Jan 8, 2020 at 5:55 PM Jeremy Stanley wrote: > On 2020-01-08 15:35:03 -0700 (-0700), Alex Schultz wrote: > > On Wed, Jan 8, 2020 at 3:30 PM Jeremy Stanley wrote: > > > On 2020-01-08 16:49:15 -0500 (-0500), Luke Short wrote: > > > > We have been running into a situation where an older version of > > > > Podman is used in CI that has consistent failures. It has problems > > > > deleting storage associated with a container. > > > [...] > > > > The question/ask I have is can we ship/use a newer version of > > > > Podman in our upstream CI? Or should we continue our efforts on > > > > making a workaround? > > > [...] > > > > > > This sounds like a problem users of your software could encounter in > > > production. If so, how does only fixing it in CI jobs help your > > > users? It seems like time might be better spent fixing the problem > > > for everyone. > > > > Btw fixing CI implies fixing for everyone. In other words, how do we > > make it available for everyone (including CI). This is one of those > > ecosystem things because we (tripleo/openstack) don't necessarily ship > > it but we do need to use it. I'm uncertain of the centos7/podman 1.6 > > support and which branches are affected by this? This might be a > > better question for RDO. > > I see, "ship/use a newer version of Podman in our upstream CI" > didn't seem to necessarily imply getting a newer version of Podman > into RDO/TripleO and the hands of its users. I have a bit of a > knee-jerk reaction whenever I see someone talk about "fixing CI" > when the underlying problem is in the software being tested and not > the CI jobs. > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ssbarnea at redhat.com Wed Jan 8 23:16:22 2020 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Wed, 8 Jan 2020 23:16:22 +0000 Subject: [tripleo] Use Podman 1.6 in CI In-Reply-To: <20200108225426.7jqu7mquf7ktxkqx@yuggoth.org> References: <20200108222524.jxhxlxzuvxt3mazw@yuggoth.org> <20200108225426.7jqu7mquf7ktxkqx@yuggoth.org> Message-ID: <84998B48-B820-4931-A35A-31AD98EF8A2A@redhat.com> One of the things I am working on is to add CI jobs on podman project itself that builds rpms packages for all supported systems and tests them, final goal being to test them with openstack. I am not done yet but got a good progress: https://review.rdoproject.org/zuul/buildsets?project=containers%2Flibpod https://github.com/containers/libpod/pull/4815 - current WIP (after merging few others) Since I started working on this I found several bugs in podman, so I think that the effort would pay off. Cheers Sorin > On 8 Jan 2020, at 22:54, Jeremy Stanley wrote: > > ewer version of Podman in our upstream CI" > didn't seem to necessarily imply getting a newer version of Podman > into RDO/TripleO and the hands of its users. I have a bit of a > knee-jerk reaction -------------- next part -------------- An HTML attachment was scrubbed... URL: From emilien at redhat.com Wed Jan 8 23:21:23 2020 From: emilien at redhat.com (Emilien Macchi) Date: Wed, 8 Jan 2020 18:21:23 -0500 Subject: [tripleo] tripleo-operator-ansible start and request for input In-Reply-To: References: Message-ID: On Wed, Jan 8, 2020 at 5:25 PM Alex Schultz wrote: > [Hello folks, > > I've begun the basic start of the tripleo-operator-ansible collection > work[0]. At the start of this work, I've chosen the undercloud > installation[1] as the first role to use to figure out how we the end > user's to consume these roles. I wanted to bring up this initial > implementation so that we can discuss how folks will include these > roles. The initial implementation is a wrapper around the > tripleoclient command as run via openstackclient. This means that the > 'tripleo-undercloud' role provides implementations for 'openstack > undercloud backup', 'openstack undercloud install', and 'openstack > undercloud upgrade'. > > In terms of naming conventions, I'm proposing that we would name the > roles "tripleo-" with the last part of the command > action being an "action". Examples: > > "openstack undercloud *" -> > role: tripleo-undercloud > action: (backup|install|upgrade) > > "openstack undercloud minion *" -> > role: tripleo-undercloud-minion > action: (install|upgrade) > > "openstack overcloud *" -> > role: tripleo-overcloud > action: (deploy|delete|export) > > "openstack overcloud node *" -> > role: tripleo-overcloud-node > action: (import|introspect|provision|unprovision) > Another technically valid option could be: "openstack overcloud node *" to role: tripleo-overcloud action: node/import|node/introspect, etc. The role could have tasks/node/import.yml, tasks/node/introspect.yml, etc. It's to me another option to consider so we reduce the number of roles (and therefore LOC involved). > > In terms of end user interface, I've got two proposals out in terms of > possible implementations. > > Tasks from method: > The initial commit propose that we would require the end user to use > an include_role/tasks_from call to perform the desired action. For > example: > > - hosts: undercloud > gather_facts: true > tasks: > - name: Install undercloud > collections: > - tripleo.operator > import_role: > name: tripleo-undercloud > tasks_from: install > vars: > tripleo_undercloud_debug: true > > Variable switch method: > I've also proposed an alternative implementation[2] that would use > include_role but require the end user to set a specific variable to > change if the role runs 'install', 'backup' or 'upgrade'. With this > patch the playbook would look something like: > > - hosts: undercloud > gather_facts: true > tasks: > - name: Install undercloud > collections: > - tripleo.operator > import_role: > name: tripleo-undercloud > vars: > tripleo_undercloud_action: install > tripleo_undercloud_debug: true > > I would like to solicit feedback on which one of these is the > preferred integration method when calling these roles. I have two > patches up in tripleo-quickstart-extras to show how these calls could > be run. The "Tasks from method" can be viewed here[3]. The "Variable > switch method" can be viewed here[4]. I can see pros and cons for > both methods. > > My take would be: > > Tasks from method: > Pros: > - action is a bit more explicit > - dynamic logic left up to the playbook/consumer. > - May not have a 'default' action (as main.yml is empty, though it > could be implemented). > - tasks_from would be a global implementation across all roles rather > than having a changing variable name. > Not sure but it might be slightly faster as well, since we directly import what we need. I prefer this proposal as well also because I've already seen this pattern in tripleo-ansible. > > Cons: > - internal task file names must be known by the consumer (though IMHO > this is no different than the variable name + values in the other > implementation) > - role/action inclusions is not dynamic in the role (it can be in the > playbook) > > Variable switch method: > Pros: > - inclusion of the role by default runs an install > - action can be dynamically changed from the calling playbook via an > ansible var > - structure of the task files is internal to the role and the user of > the role need not know the filenames/structure. > > Cons: > - calling playbook is not explicit in that the action can be switched > dynamically (e.g. intentionally or accidentally because it is dynamic) > - implementer must know to configure a variable called > `tripleo_undercloud_action` to switch between install/backup/upgrade > actions > - variable names are likely different depending on the role > > My personal preference might be to use the "Tasks from method" because > it would lend itself to the same implementation across all roles and > the dynamic logic is left to the playbook rather than internally in > the role. For example, we'd end up with something like: > > - hosts: undercloud > gather_facts: true > collections: > - tripleo.operator > tasks: > - name: Install undercloud > import_role: > name: tripleo-undercloud > tasks_from: install > vars: > tripleo_undercloud_debug: true > - name: Upload images > import_role: > name: tripleo-overcloud-images > tasks_from: upload > vars: > tripleo_overcloud_images_debug: true > - name: Import nodes > import_role: > name: tripleo-overcloud-node > tasks_from: import > vars: > tripleo_overcloud_node_debug: true > tripleo_overcloud_node_import_file: instack.json > - name: Introspect nodes > import_role: > name: tripleo-overcloud-node > tasks_from: introspect > vars: > tripleo_overcloud_node_debug: true > tripleo_overcloud_node_introspect_all_manageable: True > tripleo_overcloud_node_introspect_provide: True > - name: Overcloud deploy > import_role: > name: tripleo-overcloud > tasks_from: deploy > vars: > tripleo_overcloud_debug: true > tripleo_overcloud_deploy_environment_files: > - /home/stack/params.yaml > > The same general tasks performed via the "Variable switch method" > would look something like: > > - hosts: undercloud > gather_facts: true > collections: > - tripleo.operator > tasks: > - name: Install undercloud > import_role: > name: tripleo-undercloud > vars: > tripleo_undercloud_action: install > tripleo_undercloud_debug: true > - name: Upload images > import_role: > name: tripleo-overcloud-images > vars: > tripleo_overcloud_images_action: upload > tripleo_overcloud_images_debug: true > - name: Import nodes > import_role: > name: tripleo-overcloud-node > vars: > tripleo_overcloud_node_action: import > tripleo_overcloud_node_debug: true > tripleo_overcloud_node_import_file: instack.json > - name: Introspect nodes > import_role: > name: tripleo-overcloud-node > vars: > tripleo_overcloud_node_action: introspect > tripleo_overcloud_node_debug: true > tripleo_overcloud_node_introspect_all_manageable: True > tripleo_overcloud_node_introspect_provide: True > - name: Overcloud deploy > import_role: > name: tripleo-overcloud > vars: > tripleo_overcloud_action: deploy > tripleo_overcloud_debug: true > tripleo_overcloud_deploy_environment_files: > - /home/stack/params.yaml > > Thoughts? > > Thanks, > -Alex > > [0] > https://blueprints.launchpad.net/tripleo/+spec/tripleo-operator-ansible > [1] https://review.opendev.org/#/c/699311/ > [2] https://review.opendev.org/#/c/701628/ > [3] https://review.opendev.org/#/c/701034/ > [4] https://review.opendev.org/#/c/701628/ > > > Nice work Alex! -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-philippe at evrard.me Wed Jan 8 23:31:40 2020 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Thu, 09 Jan 2020 00:31:40 +0100 Subject: [tc] January meeting agenda Message-ID: <1d35cdc723dbd4d50ab6a933b6a6a2c8a8ee4153.camel@evrard.me> Hello everyone, Our next meeting is happening next week Thursday (the 16th), and the agenda is, as usual, on the wiki! Here is a primer of the agenda for this month: - report on large scale sig -- how does this fly and how/what are the action items. - report on the vision reflection update - report on the analysis of the survey - report on the convo for Telemetry with Catalyst -- where are we now? What are the next steps (Gnocchi fork)? - report on multi-arch SIG - report on infra liaison and static hosting -- check if there is progress - report on stable branch policy work. - report on the oslo metrics project -- has code appeared since last convo? - report on the community goals for U and V, py2 drop, and goal select process schedule. - report on release naming - report on the ideas repo See you there! Regards, Jean-Philippe Evrard (evrardjp) From rico.lin.guanyu at gmail.com Thu Jan 9 05:35:57 2020 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Thu, 9 Jan 2020 13:35:57 +0800 Subject: [Multi-Arch SIG] summary and actions from last meeting Message-ID: Hi all Thanks to all who sign up and help to form Multi-Arch SIG We hosted our two initial meetings this week [1] and both successful. And for who cares about multi-arch (for example like ARM support in community), feel free to join our future meetings [2]. Here are some actions from this week's meetings: * Create StoryBoard for Multi-Arch SIG (ricolin) * Build multi-arch SIG repo (ricolin) * Collect update ppc64le actions and resources in community if see any (mrda) * Help with documentations once Multi-Arch SIG repo is ready (jeremyfreudberg) * Update governance-sigs to make a more clear description for Multi-Arch SIG (jeremyfreudberg) There are two docs. ideas `oh, on arm64 you need to do XYZ in other way - here is how and why` and `use cases, who, and issues they had`. Both of them seem like great docs. to start with. So that I assume will be something this SIG will try to work on as one of our first step goals. We need more people to help and more resource in CI as well. There's a lot of WIP resources and CI jobs in our community and all of them are collected in our etherpad [3] now. Feel free to update that etherpad and also sign you or your organization's name on it. Please join us:) Once we have our StoryBoard ready, we should be able to create tasks on it so everyone can create and track tasks like CI job, documentations, etc. last but not least, we are looking for people to nominate or volunteer for chair roles (which can be multiple). There are some really experienced community members been nominated and I will check with them if willing to take the role. On the other hand, I'm volunteering to apply for one of SIG chair seats and help to build this SIG, but happy to give it to others if we can have more people sign up for that role:) So let me know if you're interested. [1] http://eavesdrop.openstack.org/meetings/multi_arch/2020/ [2] http://eavesdrop.openstack.org/#Multi-Arch_SIG_Meeting [3] https://etherpad.openstack.org/p/Multi-arch -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From ykarel at redhat.com Thu Jan 9 07:04:39 2020 From: ykarel at redhat.com (Yatin Karel) Date: Thu, 9 Jan 2020 12:34:39 +0530 Subject: [tripleo] Use Podman 1.6 in CI In-Reply-To: References: <20200108222524.jxhxlxzuvxt3mazw@yuggoth.org> <20200108225426.7jqu7mquf7ktxkqx@yuggoth.org> Message-ID: Hi Luke Short, On Thu, Jan 9, 2020 at 4:40 AM Luke Short wrote: > > Hey folks, > > Thank you for all of the feedback so far. The goal is definitely to fix this everywhere we can, no just in CI. Sorry for my poor choice of words. I will migrate this discussion over to the RDO community. > So iiuc the problem correctly podman 1.6.4 is needed to fix some race issues, the corresponding bug https://bugs.launchpad.net/tripleo/+bug/1856324 mainly referred CentOS7 jobs/users as most of the Upstream work/development around CentOS7 but considering efforts around CentOS8 i will try to put info related to both wrt RDO. With respect to CentOS8:- So plan for master is to move to CentOS8, but CentOS8 is still not completely ready, it's WIP. Current status and issues can be found with [1][2]. wrt podman version, as soon as job/users start consuming CentOS8, podman version whatever shipped with it will be available, most likely it will be podman-1.4.2-5 looking at Stream content [5], which might be updated with future updates/releases. I guess similar race issue might be hitting in Train as well, so with respect to Train, there is also plan to add CentOS8 support for Train in addition to CentOS7 as a follow up/parallel to master efforts. Now with respect to CentOS7:- Current podman version we have in RDO is 1.5.1-3 for both train and master. There was an attempt [3] in past from @Emilien Macchi to update podman to 1.6.1 in RDO but there were some issues running on CentOS7 and we didn't hear much from container Team on how to move forward, we can attempt again to see if > 1.6.1 is working which mostly depends on Container Teams plan for podman and CentOS7. In RDO we use the builds done by Container Team and last successful build on CBS is 1.6.2[4]. [1] https://lists.rdoproject.org/pipermail/dev/2020-January/009230.html [2] https://trello.com/c/fv3u22df/709-centos8-move-to-centos8 [3] https://review.rdoproject.org/r/#/c/23449/ [4] https://cbs.centos.org/koji/packageinfo?packageID=6853 [5] http://mirror.centos.org/centos/8-stream/AppStream/x86_64/os/Packages/ > Sincerely, > Luke Short > > On Wed, Jan 8, 2020 at 5:55 PM Jeremy Stanley wrote: >> >> On 2020-01-08 15:35:03 -0700 (-0700), Alex Schultz wrote: >> > On Wed, Jan 8, 2020 at 3:30 PM Jeremy Stanley wrote: >> > > On 2020-01-08 16:49:15 -0500 (-0500), Luke Short wrote: >> > > > We have been running into a situation where an older version of >> > > > Podman is used in CI that has consistent failures. It has problems >> > > > deleting storage associated with a container. >> > > [...] >> > > > The question/ask I have is can we ship/use a newer version of >> > > > Podman in our upstream CI? Or should we continue our efforts on >> > > > making a workaround? >> > > [...] >> > > >> > > This sounds like a problem users of your software could encounter in >> > > production. If so, how does only fixing it in CI jobs help your >> > > users? It seems like time might be better spent fixing the problem >> > > for everyone. >> > >> > Btw fixing CI implies fixing for everyone. In other words, how do we >> > make it available for everyone (including CI). This is one of those >> > ecosystem things because we (tripleo/openstack) don't necessarily ship >> > it but we do need to use it. I'm uncertain of the centos7/podman 1.6 >> > support and which branches are affected by this? This might be a >> > better question for RDO. >> >> I see, "ship/use a newer version of Podman in our upstream CI" >> didn't seem to necessarily imply getting a newer version of Podman >> into RDO/TripleO and the hands of its users. I have a bit of a >> knee-jerk reaction whenever I see someone talk about "fixing CI" >> when the underlying problem is in the software being tested and not >> the CI jobs. >> -- >> Jeremy Stanley Thanks and Regards Yatin Karel From radoslaw.piliszek at gmail.com Thu Jan 9 07:58:32 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 9 Jan 2020 08:58:32 +0100 Subject: [infra] Retire openstack/js-openstack-lib repository In-Reply-To: <8679edfb-a26a-4292-9886-3c71cec21f83@www.fastmail.com> References: <2AA78965-B496-41D9-A41B-DF75694A3EB9@inaugust.com> <20200108211208.mnvspulwlghdyyz5@yuggoth.org> <8679edfb-a26a-4292-9886-3c71cec21f83@www.fastmail.com> Message-ID: Best infra team around, you go to sleep and the problem is solved. :-) Thanks for the link. I was meaning these templates: https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/project-templates.yaml which reference nodejs up to 8. I see zuul is already using the same jobs referenced in those templates but with node 10 so it presumably works which is great indeed: https://opendev.org/zuul/zuul/src/branch/master/.zuul.yaml#L212 The most nodejs-scary part is included in infra docs: https://docs.openstack.org/infra/manual/creators.html#central-config-exceptions which reference nodejs4 (exorcists required immediately). -yoctozepto śr., 8 sty 2020 o 23:03 Clark Boylan napisał(a): > > On Wed, Jan 8, 2020, at 1:12 PM, Jeremy Stanley wrote: > > On 2020-01-08 22:03:48 +0100 (+0100), Radosław Piliszek wrote: > > [...] > > > I noticed nodejs 8 is already EOL (this year) and it seems to be > > > the max in infra. I would appreciate any help with getting nodejs > > > 10 and 12 into infra. > > [...] > > > > Can you be more specific? Zuul will obviously allow you to install > > anything you like in a job, so presumably you're finding some > > defaults hard-coded somewhere we should reevaluate? > > We even supply a role from zuul-jobs to install nodejs from nodesource for you, https://zuul-ci.org/docs/zuul-jobs/js-roles.html#role-install-nodejs. This can install any nodejs version available from nodesource for the current platform. > > Clark > From pierre at stackhpc.com Thu Jan 9 09:27:43 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Thu, 9 Jan 2020 10:27:43 +0100 Subject: [scientific][www] Unable to download OpenStack for Scientific Research book In-Reply-To: <20200108184745.fs6d4p4udr7deffp@yuggoth.org> References: <20200108184745.fs6d4p4udr7deffp@yuggoth.org> Message-ID: On Wed, 8 Jan 2020 at 19:56, Jeremy Stanley wrote: > > On 2020-01-08 18:31:10 +0100 (+0100), Pierre Riteau wrote: > > I tried to download the book at https://www.openstack.org/science/ > > but the link doesn't work. Could this please be fixed? > > I've personally reported it to the webmasters for the > www.openstack.org site. In the meantime, a bit of searching turns up > https://www.openstack.org/assets/science/CrossroadofCloudandHPC.pdf > which will redirect to a working copy. As I wrote this, Wes Wilson > pointed out to me that there's also a 6x9in "printable" version at > https://www.openstack.org/assets/science/CrossroadofCloudandHPC-Print.pdf > and preprinted copies for purchase at > https://www.amazon.com/dp/1978244703/ if that's more your speed. > > > I looked on openstack.org for a contact address, but couldn't find > > one. Please let me know if there is a specific address I should use > > next time. > > Yes, I know they're working on getting something added. They've > generally been relying on E-mails to summitapp at openstack.org or bugs > filed at https://bugs.launchpad.net/openstack-org/+filebug but I > gather they're creating a support at openstack.org address or something > along those lines to mention in page footers on the site soon. > -- > Jeremy Stanley Hi Jeremy, Thanks a lot for reporting it. I did not realize there was a Launchpad project for openstack.org, I should try it next time. Pierre Riteau (priteau) From aj at suse.com Thu Jan 9 08:25:18 2020 From: aj at suse.com (Andreas Jaeger) Date: Thu, 9 Jan 2020 09:25:18 +0100 Subject: [infra] Retire openstack/js-openstack-lib repository In-Reply-To: References: <2AA78965-B496-41D9-A41B-DF75694A3EB9@inaugust.com> <20200108211208.mnvspulwlghdyyz5@yuggoth.org> <8679edfb-a26a-4292-9886-3c71cec21f83@www.fastmail.com> Message-ID: <07467b2a-7e5e-6ba1-8481-27c87f58d318@suse.com> On 09/01/2020 08.58, Radosław Piliszek wrote: > Best infra team around, you go to sleep and the problem is solved. :-) > Thanks for the link. > > I was meaning these templates: > https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/project-templates.yaml > which reference nodejs up to 8. New templates for nodejs 10 or 11 are welcome ;) > I see zuul is already using the same jobs referenced in those > templates but with node 10 so it presumably works which is great > indeed: > https://opendev.org/zuul/zuul/src/branch/master/.zuul.yaml#L212 > > The most nodejs-scary part is included in infra docs: > https://docs.openstack.org/infra/manual/creators.html#central-config-exceptions > which reference nodejs4 (exorcists required immediately). It is meant to reference the publish-to-npm nodejs jobs, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB From sshnaidm at redhat.com Thu Jan 9 10:20:02 2020 From: sshnaidm at redhat.com (Sagi Shnaidman) Date: Thu, 9 Jan 2020 12:20:02 +0200 Subject: [tripleo] tripleo-operator-ansible start and request for input In-Reply-To: References: Message-ID: Thanks for bringing this up, Alex I was thinking if we can the third option - to have small "single responsibility" roles for every action. For example: tripleo-undercloud-install tripleo-undercloud-backup tripleo-undercloud-upgrade And then no one needs to dig into roles to check what actions are supported, but just "ls roles/". Also these roles usually have nothing in common but name, and if they are quite isolated, I think it's better to have them defined separately. >From cons I can count: more roles and might be some level of duplication in variables. For pros it's more readable playbook and clear actions: - hosts: undercloud gather_facts: true collections: - tripleo.operator vars: tripleo_undercloud_debug: true tasks: - name: Install undercloud import_role: name: undercloud-install - name: Upgrade undercloud import_role: name: undercloud-upgrade Thanks On Thu, Jan 9, 2020 at 12:22 AM Alex Schultz wrote: > [Hello folks, > > I've begun the basic start of the tripleo-operator-ansible collection > work[0]. At the start of this work, I've chosen the undercloud > installation[1] as the first role to use to figure out how we the end > user's to consume these roles. I wanted to bring up this initial > implementation so that we can discuss how folks will include these > roles. The initial implementation is a wrapper around the > tripleoclient command as run via openstackclient. This means that the > 'tripleo-undercloud' role provides implementations for 'openstack > undercloud backup', 'openstack undercloud install', and 'openstack > undercloud upgrade'. > > In terms of naming conventions, I'm proposing that we would name the > roles "tripleo-" with the last part of the command > action being an "action". Examples: > > "openstack undercloud *" -> > role: tripleo-undercloud > action: (backup|install|upgrade) > > "openstack undercloud minion *" -> > role: tripleo-undercloud-minion > action: (install|upgrade) > > "openstack overcloud *" -> > role: tripleo-overcloud > action: (deploy|delete|export) > > "openstack overcloud node *" -> > role: tripleo-overcloud-node > action: (import|introspect|provision|unprovision) > > In terms of end user interface, I've got two proposals out in terms of > possible implementations. > > Tasks from method: > The initial commit propose that we would require the end user to use > an include_role/tasks_from call to perform the desired action. For > example: > > - hosts: undercloud > gather_facts: true > tasks: > - name: Install undercloud > collections: > - tripleo.operator > import_role: > name: tripleo-undercloud > tasks_from: install > vars: > tripleo_undercloud_debug: true > > Variable switch method: > I've also proposed an alternative implementation[2] that would use > include_role but require the end user to set a specific variable to > change if the role runs 'install', 'backup' or 'upgrade'. With this > patch the playbook would look something like: > > - hosts: undercloud > gather_facts: true > tasks: > - name: Install undercloud > collections: > - tripleo.operator > import_role: > name: tripleo-undercloud > vars: > tripleo_undercloud_action: install > tripleo_undercloud_debug: true > > I would like to solicit feedback on which one of these is the > preferred integration method when calling these roles. I have two > patches up in tripleo-quickstart-extras to show how these calls could > be run. The "Tasks from method" can be viewed here[3]. The "Variable > switch method" can be viewed here[4]. I can see pros and cons for > both methods. > > My take would be: > > Tasks from method: > Pros: > - action is a bit more explicit > - dynamic logic left up to the playbook/consumer. > - May not have a 'default' action (as main.yml is empty, though it > could be implemented). > - tasks_from would be a global implementation across all roles rather > than having a changing variable name. > > Cons: > - internal task file names must be known by the consumer (though IMHO > this is no different than the variable name + values in the other > implementation) > - role/action inclusions is not dynamic in the role (it can be in the > playbook) > > Variable switch method: > Pros: > - inclusion of the role by default runs an install > - action can be dynamically changed from the calling playbook via an > ansible var > - structure of the task files is internal to the role and the user of > the role need not know the filenames/structure. > > Cons: > - calling playbook is not explicit in that the action can be switched > dynamically (e.g. intentionally or accidentally because it is dynamic) > - implementer must know to configure a variable called > `tripleo_undercloud_action` to switch between install/backup/upgrade > actions > - variable names are likely different depending on the role > > My personal preference might be to use the "Tasks from method" because > it would lend itself to the same implementation across all roles and > the dynamic logic is left to the playbook rather than internally in > the role. For example, we'd end up with something like: > > - hosts: undercloud > gather_facts: true > collections: > - tripleo.operator > tasks: > - name: Install undercloud > import_role: > name: tripleo-undercloud > tasks_from: install > vars: > tripleo_undercloud_debug: true > - name: Upload images > import_role: > name: tripleo-overcloud-images > tasks_from: upload > vars: > tripleo_overcloud_images_debug: true > - name: Import nodes > import_role: > name: tripleo-overcloud-node > tasks_from: import > vars: > tripleo_overcloud_node_debug: true > tripleo_overcloud_node_import_file: instack.json > - name: Introspect nodes > import_role: > name: tripleo-overcloud-node > tasks_from: introspect > vars: > tripleo_overcloud_node_debug: true > tripleo_overcloud_node_introspect_all_manageable: True > tripleo_overcloud_node_introspect_provide: True > - name: Overcloud deploy > import_role: > name: tripleo-overcloud > tasks_from: deploy > vars: > tripleo_overcloud_debug: true > tripleo_overcloud_deploy_environment_files: > - /home/stack/params.yaml > > The same general tasks performed via the "Variable switch method" > would look something like: > > - hosts: undercloud > gather_facts: true > collections: > - tripleo.operator > tasks: > - name: Install undercloud > import_role: > name: tripleo-undercloud > vars: > tripleo_undercloud_action: install > tripleo_undercloud_debug: true > - name: Upload images > import_role: > name: tripleo-overcloud-images > vars: > tripleo_overcloud_images_action: upload > tripleo_overcloud_images_debug: true > - name: Import nodes > import_role: > name: tripleo-overcloud-node > vars: > tripleo_overcloud_node_action: import > tripleo_overcloud_node_debug: true > tripleo_overcloud_node_import_file: instack.json > - name: Introspect nodes > import_role: > name: tripleo-overcloud-node > vars: > tripleo_overcloud_node_action: introspect > tripleo_overcloud_node_debug: true > tripleo_overcloud_node_introspect_all_manageable: True > tripleo_overcloud_node_introspect_provide: True > - name: Overcloud deploy > import_role: > name: tripleo-overcloud > vars: > tripleo_overcloud_action: deploy > tripleo_overcloud_debug: true > tripleo_overcloud_deploy_environment_files: > - /home/stack/params.yaml > > Thoughts? > > Thanks, > -Alex > > [0] > https://blueprints.launchpad.net/tripleo/+spec/tripleo-operator-ansible > [1] https://review.opendev.org/#/c/699311/ > [2] https://review.opendev.org/#/c/701628/ > [3] https://review.opendev.org/#/c/701034/ > [4] https://review.opendev.org/#/c/701628/ > > > -- Best regards Sagi Shnaidman -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Thu Jan 9 10:31:32 2020 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 9 Jan 2020 11:31:32 +0100 Subject: [largescale-sig] Meeting summary and next actions In-Reply-To: References: <3c3a6232-9a3b-d240-ab82-c7ac4997f5c0@openstack.org> <06e5f16f-dfa4-8189-da7b-ad2250df8125@openstack.org> Message-ID: <3bc9f82f-a856-c02a-86a0-0e927397acf8@openstack.org> Belmiro Moreira wrote: > Hi Thierry, all, > I'm OK with both dates. > > If you agree to keep the meeting on January 15 I can chair it. Then I propose we keep the date as planned, will be less confusing. Thanks Belmiro for the offer of chairing it. I'll be preparing and sending the agenda ahead of the meeting, and pick up the summary afterwards. Regards, -- Thierry Carrez (ttx) From bdobreli at redhat.com Thu Jan 9 11:38:04 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Thu, 9 Jan 2020 12:38:04 +0100 Subject: [tripleo] tripleo-operator-ansible start and request for input In-Reply-To: References: Message-ID: <65f971d5-a4ec-893c-65e8-9fddb9c2407f@redhat.com> On 09.01.2020 11:20, Sagi Shnaidman wrote: > Thanks for bringing this up, Alex > > I was thinking if we can the third option - to have small "single > responsibility" roles for every action. For example: > tripleo-undercloud-install > tripleo-undercloud-backup > tripleo-undercloud-upgrade > > And then no one needs to dig into roles to check what actions are > supported, but just "ls roles/". Also these roles usually have nothing > in common but name, and if they are quite isolated, I think it's better > to have them defined separately. +1 A role should do one thing and do it good (c) from somewhere > From cons I can count: more roles and might be some level of > duplication in variables. > For pros it's more readable playbook and clear actions: > > - hosts: undercloud >   gather_facts: true >   collections: >     - tripleo.operator >   vars: >     tripleo_undercloud_debug: true >   tasks: > >     - name: Install undercloud >       import_role: >         name: undercloud-install > >     - name: Upgrade undercloud >       import_role: >         name: undercloud-upgrade > > Thanks > > On Thu, Jan 9, 2020 at 12:22 AM Alex Schultz > wrote: > > [Hello folks, > > I've begun the basic start of the tripleo-operator-ansible collection > work[0].  At the start of this work, I've chosen the undercloud > installation[1] as the first role to use to figure out how we the end > user's to consume these roles.  I wanted to bring up this initial > implementation so that we can discuss how folks will include these > roles.  The initial implementation is a wrapper around the > tripleoclient command as run via openstackclient.  This means that the > 'tripleo-undercloud' role provides implementations for 'openstack > undercloud backup', 'openstack undercloud install', and 'openstack > undercloud upgrade'. > > In terms of naming conventions, I'm proposing that we would name the > roles "tripleo-" with the last part of the command > action being an "action". Examples: > > "openstack undercloud *" -> > role: tripleo-undercloud > action: (backup|install|upgrade) > > "openstack undercloud minion *" -> > role: tripleo-undercloud-minion > action: (install|upgrade) > > "openstack overcloud *" -> > role: tripleo-overcloud > action: (deploy|delete|export) > > "openstack overcloud node *" -> > role: tripleo-overcloud-node > action: (import|introspect|provision|unprovision) > > In terms of end user interface, I've got two proposals out in terms of > possible implementations. > > Tasks from method: > The initial commit propose that we would require the end user to use > an include_role/tasks_from call to perform the desired action.  For > example: > >     - hosts: undercloud >       gather_facts: true >       tasks: >         - name: Install undercloud >           collections: >             - tripleo.operator >           import_role: >             name: tripleo-undercloud >             tasks_from: install >           vars: >             tripleo_undercloud_debug: true > > Variable switch method: > I've also proposed an alternative implementation[2] that would use > include_role but require the end user to set a specific variable to > change if the role runs 'install', 'backup' or 'upgrade'. With this > patch the playbook would look something like: > >     - hosts: undercloud >       gather_facts: true >       tasks: >         - name: Install undercloud >           collections: >             - tripleo.operator >           import_role: >             name: tripleo-undercloud >           vars: >             tripleo_undercloud_action: install >             tripleo_undercloud_debug: true > > I would like to solicit feedback on which one of these is the > preferred integration method when calling these roles. I have two > patches up in tripleo-quickstart-extras to show how these calls could > be run. The "Tasks from method" can be viewed here[3]. The "Variable > switch method" can be viewed here[4].  I can see pros and cons for > both methods. > > My take would be: > > Tasks from method: > Pros: >  - action is a bit more explicit >  - dynamic logic left up to the playbook/consumer. >  - May not have a 'default' action (as main.yml is empty, though it > could be implemented). >  - tasks_from would be a global implementation across all roles rather > than having a changing variable name. > > Cons: >  - internal task file names must be known by the consumer (though IMHO > this is no different than the variable name + values in the other > implementation) >  - role/action inclusions is not dynamic in the role (it can be in > the playbook) > > Variable switch method: > Pros: >  - inclusion of the role by default runs an install >  - action can be dynamically changed from the calling playbook via an > ansible var >  - structure of the task files is internal to the role and the user of > the role need not know the filenames/structure. > > Cons: >  - calling playbook is not explicit in that the action can be switched > dynamically (e.g. intentionally or accidentally because it is dynamic) >  - implementer must know to configure a variable called > `tripleo_undercloud_action` to switch between install/backup/upgrade > actions >  - variable names are likely different depending on the role > > My personal preference might be to use the "Tasks from method" because > it would lend itself to the same implementation across all roles and > the dynamic logic is left to the playbook rather than internally in > the role. For example, we'd end up with something like: > >     - hosts: undercloud >       gather_facts: true >       collections: >         - tripleo.operator >       tasks: >         - name: Install undercloud >           import_role: >             name: tripleo-undercloud >             tasks_from: install >           vars: >             tripleo_undercloud_debug: true >         - name: Upload images >           import_role: >             name: tripleo-overcloud-images >             tasks_from: upload >           vars: >             tripleo_overcloud_images_debug: true >         - name: Import nodes >           import_role: >             name: tripleo-overcloud-node >             tasks_from: import >           vars: >             tripleo_overcloud_node_debug: true >             tripleo_overcloud_node_import_file: instack.json >         - name: Introspect nodes >           import_role: >             name: tripleo-overcloud-node >             tasks_from: introspect >           vars: >             tripleo_overcloud_node_debug: true >             tripleo_overcloud_node_introspect_all_manageable: True >             tripleo_overcloud_node_introspect_provide: True >         - name: Overcloud deploy >           import_role: >             name: tripleo-overcloud >             tasks_from: deploy >           vars: >             tripleo_overcloud_debug: true >             tripleo_overcloud_deploy_environment_files: >               - /home/stack/params.yaml > > The same general tasks performed via the "Variable switch method" > would look something like: > >     - hosts: undercloud >       gather_facts: true >       collections: >         - tripleo.operator >       tasks: >         - name: Install undercloud >           import_role: >             name: tripleo-undercloud >           vars: >             tripleo_undercloud_action: install >             tripleo_undercloud_debug: true >         - name: Upload images >           import_role: >             name: tripleo-overcloud-images >           vars: >             tripleo_overcloud_images_action: upload >             tripleo_overcloud_images_debug: true >         - name: Import nodes >           import_role: >             name: tripleo-overcloud-node >           vars: >             tripleo_overcloud_node_action: import >             tripleo_overcloud_node_debug: true >             tripleo_overcloud_node_import_file: instack.json >         - name: Introspect nodes >           import_role: >             name: tripleo-overcloud-node >           vars: >             tripleo_overcloud_node_action: introspect >             tripleo_overcloud_node_debug: true >             tripleo_overcloud_node_introspect_all_manageable: True >             tripleo_overcloud_node_introspect_provide: True >         - name: Overcloud deploy >           import_role: >             name: tripleo-overcloud >           vars: >             tripleo_overcloud_action: deploy >             tripleo_overcloud_debug: true >             tripleo_overcloud_deploy_environment_files: >               - /home/stack/params.yaml > > Thoughts? > > Thanks, > -Alex > > [0] > https://blueprints.launchpad.net/tripleo/+spec/tripleo-operator-ansible > [1] https://review.opendev.org/#/c/699311/ > [2] https://review.opendev.org/#/c/701628/ > [3] https://review.opendev.org/#/c/701034/ > [4] https://review.opendev.org/#/c/701628/ > > > > > -- > Best regards > Sagi Shnaidman -- Best regards, Bogdan Dobrelya, Irc #bogdando From kotobi at dkrz.de Thu Jan 9 11:57:11 2020 From: kotobi at dkrz.de (Amjad Kotobi) Date: Thu, 9 Jan 2020 12:57:11 +0100 Subject: [neutron][rabbitmq][oslo] Neutron-server service shows deprecated "AMQPDeprecationWarning" In-Reply-To: <294c93b5-0ddc-284b-34a1-ffce654ba047@nemebean.com> References: <4D3B074F-09F2-48BE-BD61-5D34CBFE509E@dkrz.de> <294c93b5-0ddc-284b-34a1-ffce654ba047@nemebean.com> Message-ID: <274FDC2A-837B-45CC-BFBF-8C09A182550A@dkrz.de> Hi Ben, > On 7. Jan 2020, at 22:59, Ben Nemec wrote: > > > > On 1/7/20 9:14 AM, Amjad Kotobi wrote: >> Hi, >> Today we are facing losing connection of neutron especially during instance creation or so as “systemctl status neutron-server” shows below message >> be deprecated in amqp 2.2.0. >> Since amqp 2.0 you have to explicitly call Connection.connect() >> before using the connection. >> W_FORCE_CONNECT.format(attr=attr))) >> /usr/lib/python2.7/site-packages/amqp/connection.py:304: AMQPDeprecationWarning: The .transport attribute on the connection was accessed before >> the connection was established. This is supported for now, but will >> be deprecated in amqp 2.2.0. >> Since amqp 2.0 you have to explicitly call Connection.connect() >> before using the connection. >> W_FORCE_CONNECT.format(attr=attr))) > > It looks like this is a red herring, but it should be fixed in the current oslo.messaging pike release. See [0] and the related bug. > > 0: https://review.opendev.org/#/c/605324/ > >> OpenStack release which we are running is “Pike”. >> Is there any way to remedy this? > > I don't think this should be a fatal problem in and of itself so I suspect it's masking something else. However, I would recommend updating to the latest pike release of oslo.messaging where the deprecated feature is not used. If that doesn't fix the problem, please send us whatever errors remain after this one is eliminated. I checked it out, we are having the latest pike Oslo.messaging and it still showing the same upper messages. Any ideas? > >> Thanks >> Amjad From cjeanner at redhat.com Thu Jan 9 12:02:43 2020 From: cjeanner at redhat.com (=?UTF-8?Q?C=c3=a9dric_Jeanneret?=) Date: Thu, 9 Jan 2020 13:02:43 +0100 Subject: [tripleo] tripleo-operator-ansible start and request for input In-Reply-To: References: Message-ID: <5d0ab9db-0858-54df-1ddd-52b37294c5c5@redhat.com> On 1/9/20 11:20 AM, Sagi Shnaidman wrote: > Thanks for bringing this up, Alex > > I was thinking if we can the third option - to have small "single > responsibility" roles for every action. For example: > tripleo-undercloud-install > tripleo-undercloud-backup > tripleo-undercloud-upgrade I would prefer that solution, since it allows to keep code stupid simple, without block|when|other switches that can make maintenance complicated. Doing so will probably make the unit testing easier as well (thinking of molecule here, mainly). > > And then no one needs to dig into roles to check what actions are > supported, but just "ls roles/". Also these roles usually have nothing > in common but name, and if they are quite isolated, I think it's better > to have them defined separately. > From cons I can count: more roles and might be some level of duplication > in variables. We would probably need some common params|variables in order to avoid duplication... The var part might be a source of headache in order to avoid as much as possible duplications. > For pros it's more readable playbook and clear actions: > > - hosts: undercloud >   gather_facts: true >   collections: >     - tripleo.operator >   vars: >     tripleo_undercloud_debug: true >   tasks: > >     - name: Install undercloud >       import_role: >         name: undercloud-install > >     - name: Upgrade undercloud >       import_role: >         name: undercloud-upgrade > > Thanks > > On Thu, Jan 9, 2020 at 12:22 AM Alex Schultz > wrote: > > [Hello folks, > > I've begun the basic start of the tripleo-operator-ansible collection > work[0].  At the start of this work, I've chosen the undercloud > installation[1] as the first role to use to figure out how we the end > user's to consume these roles.  I wanted to bring up this initial > implementation so that we can discuss how folks will include these > roles.  The initial implementation is a wrapper around the > tripleoclient command as run via openstackclient.  This means that the > 'tripleo-undercloud' role provides implementations for 'openstack > undercloud backup', 'openstack undercloud install', and 'openstack > undercloud upgrade'. > > In terms of naming conventions, I'm proposing that we would name the > roles "tripleo-" with the last part of the command > action being an "action". Examples: > > "openstack undercloud *" -> > role: tripleo-undercloud > action: (backup|install|upgrade) > > "openstack undercloud minion *" -> > role: tripleo-undercloud-minion > action: (install|upgrade) > > "openstack overcloud *" -> > role: tripleo-overcloud > action: (deploy|delete|export) > > "openstack overcloud node *" -> > role: tripleo-overcloud-node > action: (import|introspect|provision|unprovision) > > In terms of end user interface, I've got two proposals out in terms of > possible implementations. > > Tasks from method: > The initial commit propose that we would require the end user to use > an include_role/tasks_from call to perform the desired action.  For > example: > >     - hosts: undercloud >       gather_facts: true >       tasks: >         - name: Install undercloud >           collections: >             - tripleo.operator >           import_role: >             name: tripleo-undercloud >             tasks_from: install >           vars: >             tripleo_undercloud_debug: true > > Variable switch method: > I've also proposed an alternative implementation[2] that would use > include_role but require the end user to set a specific variable to > change if the role runs 'install', 'backup' or 'upgrade'. With this > patch the playbook would look something like: > >     - hosts: undercloud >       gather_facts: true >       tasks: >         - name: Install undercloud >           collections: >             - tripleo.operator >           import_role: >             name: tripleo-undercloud >           vars: >             tripleo_undercloud_action: install >             tripleo_undercloud_debug: true > > I would like to solicit feedback on which one of these is the > preferred integration method when calling these roles. I have two > patches up in tripleo-quickstart-extras to show how these calls could > be run. The "Tasks from method" can be viewed here[3]. The "Variable > switch method" can be viewed here[4].  I can see pros and cons for > both methods. > > My take would be: > > Tasks from method: > Pros: >  - action is a bit more explicit >  - dynamic logic left up to the playbook/consumer. >  - May not have a 'default' action (as main.yml is empty, though it > could be implemented). >  - tasks_from would be a global implementation across all roles rather > than having a changing variable name. > > Cons: >  - internal task file names must be known by the consumer (though IMHO > this is no different than the variable name + values in the other > implementation) >  - role/action inclusions is not dynamic in the role (it can be in > the playbook) > > Variable switch method: > Pros: >  - inclusion of the role by default runs an install >  - action can be dynamically changed from the calling playbook via an > ansible var >  - structure of the task files is internal to the role and the user of > the role need not know the filenames/structure. > > Cons: >  - calling playbook is not explicit in that the action can be switched > dynamically (e.g. intentionally or accidentally because it is dynamic) >  - implementer must know to configure a variable called > `tripleo_undercloud_action` to switch between install/backup/upgrade > actions >  - variable names are likely different depending on the role > > My personal preference might be to use the "Tasks from method" because > it would lend itself to the same implementation across all roles and > the dynamic logic is left to the playbook rather than internally in > the role. For example, we'd end up with something like: > >     - hosts: undercloud >       gather_facts: true >       collections: >         - tripleo.operator >       tasks: >         - name: Install undercloud >           import_role: >             name: tripleo-undercloud >             tasks_from: install >           vars: >             tripleo_undercloud_debug: true >         - name: Upload images >           import_role: >             name: tripleo-overcloud-images >             tasks_from: upload >           vars: >             tripleo_overcloud_images_debug: true >         - name: Import nodes >           import_role: >             name: tripleo-overcloud-node >             tasks_from: import >           vars: >             tripleo_overcloud_node_debug: true >             tripleo_overcloud_node_import_file: instack.json >         - name: Introspect nodes >           import_role: >             name: tripleo-overcloud-node >             tasks_from: introspect >           vars: >             tripleo_overcloud_node_debug: true >             tripleo_overcloud_node_introspect_all_manageable: True >             tripleo_overcloud_node_introspect_provide: True >         - name: Overcloud deploy >           import_role: >             name: tripleo-overcloud >             tasks_from: deploy >           vars: >             tripleo_overcloud_debug: true >             tripleo_overcloud_deploy_environment_files: >               - /home/stack/params.yaml > > The same general tasks performed via the "Variable switch method" > would look something like: > >     - hosts: undercloud >       gather_facts: true >       collections: >         - tripleo.operator >       tasks: >         - name: Install undercloud >           import_role: >             name: tripleo-undercloud >           vars: >             tripleo_undercloud_action: install >             tripleo_undercloud_debug: true >         - name: Upload images >           import_role: >             name: tripleo-overcloud-images >           vars: >             tripleo_overcloud_images_action: upload >             tripleo_overcloud_images_debug: true >         - name: Import nodes >           import_role: >             name: tripleo-overcloud-node >           vars: >             tripleo_overcloud_node_action: import >             tripleo_overcloud_node_debug: true >             tripleo_overcloud_node_import_file: instack.json >         - name: Introspect nodes >           import_role: >             name: tripleo-overcloud-node >           vars: >             tripleo_overcloud_node_action: introspect >             tripleo_overcloud_node_debug: true >             tripleo_overcloud_node_introspect_all_manageable: True >             tripleo_overcloud_node_introspect_provide: True >         - name: Overcloud deploy >           import_role: >             name: tripleo-overcloud >           vars: >             tripleo_overcloud_action: deploy >             tripleo_overcloud_debug: true >             tripleo_overcloud_deploy_environment_files: >               - /home/stack/params.yaml > > Thoughts? > > Thanks, > -Alex > > [0] > https://blueprints.launchpad.net/tripleo/+spec/tripleo-operator-ansible > [1] https://review.opendev.org/#/c/699311/ > [2] https://review.opendev.org/#/c/701628/ > [3] https://review.opendev.org/#/c/701034/ > [4] https://review.opendev.org/#/c/701628/ > > > > > -- > Best regards > Sagi Shnaidman -- Cédric Jeanneret (He/Him/His) Software Engineer - OpenStack Platform Red Hat EMEA https://www.redhat.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From radoslaw.piliszek at gmail.com Thu Jan 9 12:55:32 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 9 Jan 2020 13:55:32 +0100 Subject: [infra] Retire openstack/js-openstack-lib repository In-Reply-To: <07467b2a-7e5e-6ba1-8481-27c87f58d318@suse.com> References: <2AA78965-B496-41D9-A41B-DF75694A3EB9@inaugust.com> <20200108211208.mnvspulwlghdyyz5@yuggoth.org> <8679edfb-a26a-4292-9886-3c71cec21f83@www.fastmail.com> <07467b2a-7e5e-6ba1-8481-27c87f58d318@suse.com> Message-ID: Aye, will do at some point. So the lib looks like user friendliness was not one of its goals. I had a typo in credentials and instead of throwing unauth (or similar) at me, it instead threw the whole response object (which node gladly printed out as "Object" because why not). Error handling to improve. OTOH, it checks code coverage and has both kinds of tests (unit, functional). As for more good news, I managed to run functional tests locally against Stein with only two failures: Failed: Current devstack glance version (2.7) is not supported. Failed: Current devstack keystone version (3.12) is not supported. which are quite expected (no idea why these are tested as functional, these are more like sanity checks for debugging if functionals actually fail IMHO). Real functional tests passed and it really does what it says on the box (which is very little but still). The CI functional tests jobs in Zuul are part of the legacy dsvm (devstack vm?) thingy. nodejs4 fails because of repos being long gone, nodejs6 actually installs nodejs8 but fails on npm being not installed. I see all Zuul config is external now. I would prefer it all in lib's repo. I presume it would work if I added zuul.d there, right? Still, need to drop the failing functional jobs to merge anything new. I did my research as promised, please let me know how we would like to (/should) proceed now. -yoctozepto czw., 9 sty 2020 o 10:43 Andreas Jaeger napisał(a): > > On 09/01/2020 08.58, Radosław Piliszek wrote: > > Best infra team around, you go to sleep and the problem is solved. :-) > > Thanks for the link. > > > > I was meaning these templates: > > https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/project-templates.yaml > > which reference nodejs up to 8. > > New templates for nodejs 10 or 11 are welcome ;) > > > I see zuul is already using the same jobs referenced in those > > templates but with node 10 so it presumably works which is great > > indeed: > > https://opendev.org/zuul/zuul/src/branch/master/.zuul.yaml#L212 > > > > The most nodejs-scary part is included in infra docs: > > https://docs.openstack.org/infra/manual/creators.html#central-config-exceptions > > which reference nodejs4 (exorcists required immediately). > > It is meant to reference the publish-to-npm nodejs jobs, > > Andreas > -- > Andreas Jaeger aj at suse.com Twitter: jaegerandi > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB From james.slagle at gmail.com Thu Jan 9 12:57:26 2020 From: james.slagle at gmail.com (James Slagle) Date: Thu, 9 Jan 2020 07:57:26 -0500 Subject: [tripleo] tripleo-operator-ansible start and request for input In-Reply-To: References: Message-ID: On Thu, Jan 9, 2020 at 5:25 AM Sagi Shnaidman wrote: > > Thanks for bringing this up, Alex > > I was thinking if we can the third option - to have small "single responsibility" roles for every action. For example: > tripleo-undercloud-install > tripleo-undercloud-backup > tripleo-undercloud-upgrade Good idea, and I tend to agree as well. If we really wanted a single undercloud role at some point, then we could always go back to the original idea, and have a tripleo-undercloud role that just included these other more fine grained roles. But, for now, I like the idea of smaller focused roles. -- -- James Slagle -- From C-Ramakrishna.Bhupathi at charter.com Thu Jan 9 13:45:45 2020 From: C-Ramakrishna.Bhupathi at charter.com (Bhupathi, Ramakrishna) Date: Thu, 9 Jan 2020 13:45:45 +0000 Subject: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken Message-ID: <59ed63745f4e4c42a63692c3ee4eb10d@ncwmexgp031.CORP.CHARTERCOM.com> Folks, I am building a Kubernetes Cluster( Openstack Train) and using fedora atomic-29 image . The nodes come up fine ( I have a simple 1 master and 1 node) , but the cluster creation times out, and when I access the cloud-init logs I see this error . Wondering what I am missing as this used to work before. I wonder if this is image related . [ERROR]: Unable to render networking. Network config is likely broken: No available network renderers found. Searched through list: ['eni', 'sysconfig', 'netplan'] Essentially the stack creation fails in "kube_cluster_deploy" Can somebody help me debug this ? Any help is appreciated. --RamaK E-MAIL CONFIDENTIALITY NOTICE: The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Thu Jan 9 16:00:54 2020 From: aschultz at redhat.com (Alex Schultz) Date: Thu, 9 Jan 2020 09:00:54 -0700 Subject: [tripleo] tripleo-operator-ansible start and request for input In-Reply-To: References: Message-ID: On Thu, Jan 9, 2020 at 3:20 AM Sagi Shnaidman wrote: > > Thanks for bringing this up, Alex > > I was thinking if we can the third option - to have small "single responsibility" roles for every action. For example: > tripleo-undercloud-install > tripleo-undercloud-backup > tripleo-undercloud-upgrade > Ok it seems like this is the generally preferred structure. We'll go with this and I'll update my patches to reflect this. One issue with this is the extra duplication in files but that might be minor. > And then no one needs to dig into roles to check what actions are supported, but just "ls roles/". Also these roles usually have nothing in common but name, and if they are quite isolated, I think it's better to have them defined separately. > From cons I can count: more roles and might be some level of duplication in variables. > For pros it's more readable playbook and clear actions: > > - hosts: undercloud > gather_facts: true > collections: > - tripleo.operator > vars: > tripleo_undercloud_debug: true > tasks: > > - name: Install undercloud > import_role: > name: undercloud-install > > - name: Upgrade undercloud > import_role: > name: undercloud-upgrade > > Thanks > > On Thu, Jan 9, 2020 at 12:22 AM Alex Schultz wrote: >> >> [Hello folks, >> >> I've begun the basic start of the tripleo-operator-ansible collection >> work[0]. At the start of this work, I've chosen the undercloud >> installation[1] as the first role to use to figure out how we the end >> user's to consume these roles. I wanted to bring up this initial >> implementation so that we can discuss how folks will include these >> roles. The initial implementation is a wrapper around the >> tripleoclient command as run via openstackclient. This means that the >> 'tripleo-undercloud' role provides implementations for 'openstack >> undercloud backup', 'openstack undercloud install', and 'openstack >> undercloud upgrade'. >> >> In terms of naming conventions, I'm proposing that we would name the >> roles "tripleo-" with the last part of the command >> action being an "action". Examples: >> >> "openstack undercloud *" -> >> role: tripleo-undercloud >> action: (backup|install|upgrade) >> >> "openstack undercloud minion *" -> >> role: tripleo-undercloud-minion >> action: (install|upgrade) >> >> "openstack overcloud *" -> >> role: tripleo-overcloud >> action: (deploy|delete|export) >> >> "openstack overcloud node *" -> >> role: tripleo-overcloud-node >> action: (import|introspect|provision|unprovision) >> >> In terms of end user interface, I've got two proposals out in terms of >> possible implementations. >> >> Tasks from method: >> The initial commit propose that we would require the end user to use >> an include_role/tasks_from call to perform the desired action. For >> example: >> >> - hosts: undercloud >> gather_facts: true >> tasks: >> - name: Install undercloud >> collections: >> - tripleo.operator >> import_role: >> name: tripleo-undercloud >> tasks_from: install >> vars: >> tripleo_undercloud_debug: true >> >> Variable switch method: >> I've also proposed an alternative implementation[2] that would use >> include_role but require the end user to set a specific variable to >> change if the role runs 'install', 'backup' or 'upgrade'. With this >> patch the playbook would look something like: >> >> - hosts: undercloud >> gather_facts: true >> tasks: >> - name: Install undercloud >> collections: >> - tripleo.operator >> import_role: >> name: tripleo-undercloud >> vars: >> tripleo_undercloud_action: install >> tripleo_undercloud_debug: true >> >> I would like to solicit feedback on which one of these is the >> preferred integration method when calling these roles. I have two >> patches up in tripleo-quickstart-extras to show how these calls could >> be run. The "Tasks from method" can be viewed here[3]. The "Variable >> switch method" can be viewed here[4]. I can see pros and cons for >> both methods. >> >> My take would be: >> >> Tasks from method: >> Pros: >> - action is a bit more explicit >> - dynamic logic left up to the playbook/consumer. >> - May not have a 'default' action (as main.yml is empty, though it >> could be implemented). >> - tasks_from would be a global implementation across all roles rather >> than having a changing variable name. >> >> Cons: >> - internal task file names must be known by the consumer (though IMHO >> this is no different than the variable name + values in the other >> implementation) >> - role/action inclusions is not dynamic in the role (it can be in the playbook) >> >> Variable switch method: >> Pros: >> - inclusion of the role by default runs an install >> - action can be dynamically changed from the calling playbook via an >> ansible var >> - structure of the task files is internal to the role and the user of >> the role need not know the filenames/structure. >> >> Cons: >> - calling playbook is not explicit in that the action can be switched >> dynamically (e.g. intentionally or accidentally because it is dynamic) >> - implementer must know to configure a variable called >> `tripleo_undercloud_action` to switch between install/backup/upgrade >> actions >> - variable names are likely different depending on the role >> >> My personal preference might be to use the "Tasks from method" because >> it would lend itself to the same implementation across all roles and >> the dynamic logic is left to the playbook rather than internally in >> the role. For example, we'd end up with something like: >> >> - hosts: undercloud >> gather_facts: true >> collections: >> - tripleo.operator >> tasks: >> - name: Install undercloud >> import_role: >> name: tripleo-undercloud >> tasks_from: install >> vars: >> tripleo_undercloud_debug: true >> - name: Upload images >> import_role: >> name: tripleo-overcloud-images >> tasks_from: upload >> vars: >> tripleo_overcloud_images_debug: true >> - name: Import nodes >> import_role: >> name: tripleo-overcloud-node >> tasks_from: import >> vars: >> tripleo_overcloud_node_debug: true >> tripleo_overcloud_node_import_file: instack.json >> - name: Introspect nodes >> import_role: >> name: tripleo-overcloud-node >> tasks_from: introspect >> vars: >> tripleo_overcloud_node_debug: true >> tripleo_overcloud_node_introspect_all_manageable: True >> tripleo_overcloud_node_introspect_provide: True >> - name: Overcloud deploy >> import_role: >> name: tripleo-overcloud >> tasks_from: deploy >> vars: >> tripleo_overcloud_debug: true >> tripleo_overcloud_deploy_environment_files: >> - /home/stack/params.yaml >> >> The same general tasks performed via the "Variable switch method" >> would look something like: >> >> - hosts: undercloud >> gather_facts: true >> collections: >> - tripleo.operator >> tasks: >> - name: Install undercloud >> import_role: >> name: tripleo-undercloud >> vars: >> tripleo_undercloud_action: install >> tripleo_undercloud_debug: true >> - name: Upload images >> import_role: >> name: tripleo-overcloud-images >> vars: >> tripleo_overcloud_images_action: upload >> tripleo_overcloud_images_debug: true >> - name: Import nodes >> import_role: >> name: tripleo-overcloud-node >> vars: >> tripleo_overcloud_node_action: import >> tripleo_overcloud_node_debug: true >> tripleo_overcloud_node_import_file: instack.json >> - name: Introspect nodes >> import_role: >> name: tripleo-overcloud-node >> vars: >> tripleo_overcloud_node_action: introspect >> tripleo_overcloud_node_debug: true >> tripleo_overcloud_node_introspect_all_manageable: True >> tripleo_overcloud_node_introspect_provide: True >> - name: Overcloud deploy >> import_role: >> name: tripleo-overcloud >> vars: >> tripleo_overcloud_action: deploy >> tripleo_overcloud_debug: true >> tripleo_overcloud_deploy_environment_files: >> - /home/stack/params.yaml >> >> Thoughts? >> >> Thanks, >> -Alex >> >> [0] https://blueprints.launchpad.net/tripleo/+spec/tripleo-operator-ansible >> [1] https://review.opendev.org/#/c/699311/ >> [2] https://review.opendev.org/#/c/701628/ >> [3] https://review.opendev.org/#/c/701034/ >> [4] https://review.opendev.org/#/c/701628/ >> >> > > > -- > Best regards > Sagi Shnaidman From juliaashleykreger at gmail.com Thu Jan 9 16:52:15 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Thu, 9 Jan 2020 08:52:15 -0800 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: <2706c21c3f7d4203a8a20342f8f6a68c@AUSX13MPS308.AMER.DELL.COM> References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> <7b55e3b28d644492a846fdb10f7b127b@AUSX13MPS308.AMER.DELL.COM> <582d2544d3d74fe7beef50aaaa35d558@AUSX13MPS308.AMER.DELL.COM> <20200107235139.2l5iw2fumgsfoz5u@yuggoth.org> <2706c21c3f7d4203a8a20342f8f6a68c@AUSX13MPS308.AMER.DELL.COM> Message-ID: On Wed, Jan 8, 2020 at 8:38 AM wrote: > > Jeremy, > Correct. > programming devices and "updating firmware" I count as separate activities. > Similar to CPU or GPU. > Which makes me really wonder, where is that line between the activities? I guess the worry, from a security standpoint, is persistent bytecode. I guess I just don't have a good enough understanding of all the facets in this area to have a sense for that. :/ > -----Original Message----- > From: Jeremy Stanley > Sent: Tuesday, January 7, 2020 5:52 PM > To: openstack-discuss at lists.openstack.org > Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management > > On 2020-01-07 23:17:25 +0000 (+0000), Arkady.Kanevsky at dell.com wrote: > > It is hard to image that any production env of any customer will allow > > anybody but administrator to update FW on any device at any time. The > > security implication are huge. > [...] > > I thought this was precisely the point of exposing FPGA hardware into server instances. Or do you not count programming those as "updating firmware?" > -- > Jeremy Stanley > From sean.mcginnis at gmx.com Thu Jan 9 17:01:15 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 9 Jan 2020 11:01:15 -0600 Subject: [release] making releases fast again (was: decentralising release approvals) In-Reply-To: References: Message-ID: <20200109170115.GA453843@sm-workstation> On Fri, Dec 20, 2019 at 11:04:36AM +0100, Thierry Carrez wrote: > Mark Goddard wrote: > > [...] > > As kolla PTL and ironic release liaison I've proposed a number of > > release patches recently. Generally the release team is good at churning > > through these, but sometimes patches can hang around for a while. > > Usually a ping on IRC will get things moving again within a day or so > > (thanks in particular to Sean who has been very responsive). > > I agree we've seen an increase in processing delay lately, and I'd like to > correct that. There are generally three things that would cause a > perceptible delay in release processing... > > 1- wait for two release managers +2 > > This is something we put in place some time ago, as we had a lot of new > members and thought that would be a good way to onboard them. Lately it > created delays as a lot of those were not as active. > > 2- stable releases > > Two subcases in there... Eitherthe deliverable is under stable policy and > there are *significant* delays there as we have to pause to give a chance to > stable-maint-core people to voice an opinion. Or the deliverable is not > under stable policy, but we do a manual check on the changes, as a way to > educate the requester on semver. > > 3- waiting for PTL/release liaison to approve > > That can take a long time, but the release management team is not really at > fault there. > Coming back to hopefully wrap this up... We discussed this in today's release team meeting and decided to make some changes to hopefully make things a little smoother. We will now use the following guidelines for reviewing and approving release requests: For releases in the current development (including some time for the previous cycle for the release-trailing deliverables) we will only require a single reviewer. If everything looks good and there are no concerns, we will +2 and approve the release request without waiting for a second. If the reviewer has any doubts or hesitation, they can decide to wait for a second reviewer, but this should be a much less common situation. For stable releases, we will require two +2s. We will not, however, wait for a designated day for stable team review. If we can get one, all the better, but the normal release team should be aware of stable rules and look for them for any stable release request. Keeping the requirement for two reviewers should help make sure nothing is overlooked with stable policy. We do still want PTL/liaison +1 to appove, so we will continue to wait for that. Thierry is working on some job automation to make checking for that a little easier, so hopefully that will help make that process as smooth as possible. If there are any other questions or concerns, please do let us know. Sean From radoslaw.piliszek at gmail.com Thu Jan 9 17:08:53 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 9 Jan 2020 18:08:53 +0100 Subject: [release] making releases fast again (was: decentralising release approvals) In-Reply-To: <20200109170115.GA453843@sm-workstation> References: <20200109170115.GA453843@sm-workstation> Message-ID: Hi Sean, just to verify my interpretation. This means e.g. [1] is now good to go? 2 release team members, 1 liaison and 1 PTL extra (for stable release). [1] https://review.opendev.org/701080 -yoctozepto czw., 9 sty 2020 o 18:03 Sean McGinnis napisał(a): > > On Fri, Dec 20, 2019 at 11:04:36AM +0100, Thierry Carrez wrote: > > Mark Goddard wrote: > > > [...] > > > As kolla PTL and ironic release liaison I've proposed a number of > > > release patches recently. Generally the release team is good at churning > > > through these, but sometimes patches can hang around for a while. > > > Usually a ping on IRC will get things moving again within a day or so > > > (thanks in particular to Sean who has been very responsive). > > > > I agree we've seen an increase in processing delay lately, and I'd like to > > correct that. There are generally three things that would cause a > > perceptible delay in release processing... > > > > 1- wait for two release managers +2 > > > > This is something we put in place some time ago, as we had a lot of new > > members and thought that would be a good way to onboard them. Lately it > > created delays as a lot of those were not as active. > > > > 2- stable releases > > > > Two subcases in there... Eitherthe deliverable is under stable policy and > > there are *significant* delays there as we have to pause to give a chance to > > stable-maint-core people to voice an opinion. Or the deliverable is not > > under stable policy, but we do a manual check on the changes, as a way to > > educate the requester on semver. > > > > 3- waiting for PTL/release liaison to approve > > > > That can take a long time, but the release management team is not really at > > fault there. > > > > Coming back to hopefully wrap this up... > > We discussed this in today's release team meeting and decided to make some > changes to hopefully make things a little smoother. We will now use the > following guidelines for reviewing and approving release requests: > > For releases in the current development (including some time for the previous > cycle for the release-trailing deliverables) we will only require a single > reviewer. If everything looks good and there are no concerns, we will +2 and > approve the release request without waiting for a second. If the reviewer has > any doubts or hesitation, they can decide to wait for a second reviewer, but > this should be a much less common situation. > > For stable releases, we will require two +2s. We will not, however, wait for a > designated day for stable team review. If we can get one, all the better, but > the normal release team should be aware of stable rules and look for them for > any stable release request. Keeping the requirement for two reviewers should > help make sure nothing is overlooked with stable policy. > > We do still want PTL/liaison +1 to appove, so we will continue to wait for > that. Thierry is working on some job automation to make checking for that a > little easier, so hopefully that will help make that process as smooth as > possible. > > If there are any other questions or concerns, please do let us know. > > Sean > From sean.mcginnis at gmx.com Thu Jan 9 17:17:47 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 9 Jan 2020 11:17:47 -0600 Subject: [release] making releases fast again (was: decentralising release approvals) In-Reply-To: References: <20200109170115.GA453843@sm-workstation> Message-ID: <20200109171747.GB453843@sm-workstation> On Thu, Jan 09, 2020 at 06:08:53PM +0100, Radosław Piliszek wrote: > Hi Sean, > > just to verify my interpretation. > This means e.g. [1] is now good to go? > 2 release team members, 1 liaison and 1 PTL extra (for stable release). > > [1] https://review.opendev.org/701080 > > -yoctozepto > Correct. I will take one quick look again, and if all looks good get that one going. Sean From radoslaw.piliszek at gmail.com Thu Jan 9 18:10:30 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 9 Jan 2020 19:10:30 +0100 Subject: [release] making releases fast again (was: decentralising release approvals) In-Reply-To: <20200109171747.GB453843@sm-workstation> References: <20200109170115.GA453843@sm-workstation> <20200109171747.GB453843@sm-workstation> Message-ID: Thanks, Sean. -yoctozepto czw., 9 sty 2020 o 18:17 Sean McGinnis napisał(a): > > On Thu, Jan 09, 2020 at 06:08:53PM +0100, Radosław Piliszek wrote: > > Hi Sean, > > > > just to verify my interpretation. > > This means e.g. [1] is now good to go? > > 2 release team members, 1 liaison and 1 PTL extra (for stable release). > > > > [1] https://review.opendev.org/701080 > > > > -yoctozepto > > > > Correct. I will take one quick look again, and if all looks good get that one > going. > > Sean From gmann at ghanshyammann.com Thu Jan 9 18:38:26 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 09 Jan 2020 12:38:26 -0600 Subject: [qa][infra][stable] Stable branches gate status: tempest-full-* jobs failing for stable/ocata|pike|queens In-Reply-To: <16f8101d6ea.be1780a3214520.3007727257147254758@ghanshyammann.com> References: <16f8101d6ea.be1780a3214520.3007727257147254758@ghanshyammann.com> Message-ID: <16f8b99bf5c.d67ee841304059.4438464382583793057@ghanshyammann.com> ---- On Tue, 07 Jan 2020 11:16:19 -0600 Ghanshyam Mann wrote ---- > Hello Everyone, > > tempest-full-* jobs are failing on stable/queens, stable/pike, and stable/ocata(legacy-tempest-dsvm-neutron-full-ocata) [1].ld > Please hold any recheck till fix is merged. > > whoami-rajat reported about the tempest-full-queens-py3 job failure and later while debugging we found that same is failing > for pike and ocata(job name there - legacy-tempest-dsvm-neutron-full-ocata). > > Failure is due to "Timeout on connecting the vnc console url" because there is no 'n-cauth' service running which is required > for these stable branches. In Ussuri that service has been removed from nova. > > 'n-cauth' has been removed from ENABLED_SERVICES recently in - https://review.opendev.org/#/c/700217/ which effected only > stable branches till queens. stable/rocky|stein are working because we have moved the services enable things from devstack-gate's > test matrix to devstack base job[2]. Patch[2] was not backported to stable/queens and stable/pike which I am not sure why. > > We have two ways to fix the stable branches gate: > 1. re-enable the n-cauth in devstack-gate. Hope all other removes services create no problem. > pros: easy to fix, fix for all three stable branches. > patch- https://review.opendev.org/#/c/701404/ This is merged now, We can recheck. -gmann > > 2. Backport the 546765[2] to stable/queens and stable/pike. > pros: this removes the dependency form test-matrix which is the overall goal to remove d-g dependency. > cons: It cannot be backported to stable/ocata as no zuulv3 base jobs there. This is already EM and anyone still cares about this? > > I think for fixing the gate (Tempest master and stable/queens|pike|ocata), we can go with option 1 and later > we backport the devstack migration. > > [1] > - http://zuul.openstack.org/builds?job_name=tempest-full-queens-py3 > - http://zuul.openstack.org/builds?job_name=tempest-full-pike > - http://zuul.openstack.org/builds?job_name=legacy-tempest-dsvm-neutron-full-ocata > - reported bug - https://bugs.launchpad.net/devstack/+bug/1858666 > > [2] https://review.opendev.org/#/c/546765/ > > > -gmann > > > From openstack at nemebean.com Thu Jan 9 19:20:34 2020 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 9 Jan 2020 13:20:34 -0600 Subject: [neutron][rabbitmq][oslo] Neutron-server service shows deprecated "AMQPDeprecationWarning" In-Reply-To: <274FDC2A-837B-45CC-BFBF-8C09A182550A@dkrz.de> References: <4D3B074F-09F2-48BE-BD61-5D34CBFE509E@dkrz.de> <294c93b5-0ddc-284b-34a1-ffce654ba047@nemebean.com> <274FDC2A-837B-45CC-BFBF-8C09A182550A@dkrz.de> Message-ID: On 1/9/20 5:57 AM, Amjad Kotobi wrote: > Hi Ben, > >> On 7. Jan 2020, at 22:59, Ben Nemec wrote: >> >> >> >> On 1/7/20 9:14 AM, Amjad Kotobi wrote: >>> Hi, >>> Today we are facing losing connection of neutron especially during instance creation or so as “systemctl status neutron-server” shows below message >>> be deprecated in amqp 2.2.0. >>> Since amqp 2.0 you have to explicitly call Connection.connect() >>> before using the connection. >>> W_FORCE_CONNECT.format(attr=attr))) >>> /usr/lib/python2.7/site-packages/amqp/connection.py:304: AMQPDeprecationWarning: The .transport attribute on the connection was accessed before >>> the connection was established. This is supported for now, but will >>> be deprecated in amqp 2.2.0. >>> Since amqp 2.0 you have to explicitly call Connection.connect() >>> before using the connection. >>> W_FORCE_CONNECT.format(attr=attr))) >> >> It looks like this is a red herring, but it should be fixed in the current oslo.messaging pike release. See [0] and the related bug. >> >> 0: https://review.opendev.org/#/c/605324/ >> >>> OpenStack release which we are running is “Pike”. >>> Is there any way to remedy this? >> >> I don't think this should be a fatal problem in and of itself so I suspect it's masking something else. However, I would recommend updating to the latest pike release of oslo.messaging where the deprecated feature is not used. If that doesn't fix the problem, please send us whatever errors remain after this one is eliminated. > > I checked it out, we are having the latest pike Oslo.messaging and it still showing the same upper messages. Any ideas? Hmm, not sure then. Are there any other log messages around that one which might provide more context on where this is happening? I've also copied a couple of our messaging folks in case they have a better idea what might be going on. >> >>> Thanks >>> Amjad > > From sean.mcginnis at gmx.com Thu Jan 9 21:50:14 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 9 Jan 2020 15:50:14 -0600 Subject: [Release-job-failures] release-post job for openstack/releases for ref refs/heads/master failed In-Reply-To: References: Message-ID: <20200109215014.GA472836@sm-workstation> On Thu, Jan 09, 2020 at 08:32:28PM +0000, zuul at openstack.org wrote: > Build failed. > > - tag-releases https://zuul.opendev.org/t/openstack/build/deb8b8d5504b4689ab6d669eac92f979 : FAILURE in 3m 46s > - publish-tox-docs-static https://zuul.opendev.org/t/openstack/build/None : SKIPPED > This failure can be safely ignored. This was a side effect of marking some older independent repos as 'abandoned' in https://review.opendev.org/#/c/700013/. These are old repos that are no longer under governance and/or retired. The failure itself appears to be from one of the retired repos, before we had specified in the retire procedure that the gitreview file should be kept around. Since these are no longer active, and there wasn't actually anything to tag and release, no need to worry about this failure. Sean From feilong at catalyst.net.nz Thu Jan 9 23:11:50 2020 From: feilong at catalyst.net.nz (Feilong Wang) Date: Fri, 10 Jan 2020 12:11:50 +1300 Subject: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken In-Reply-To: <59ed63745f4e4c42a63692c3ee4eb10d@ncwmexgp031.CORP.CHARTERCOM.com> References: <59ed63745f4e4c42a63692c3ee4eb10d@ncwmexgp031.CORP.CHARTERCOM.com> Message-ID: <6c8f45f2-da74-18fd-7909-84c9c6762fe3@catalyst.net.nz> Hi Bhupathi, Could you please share your cluster template? And please make sure your Nova/Neutron works. On 10/01/20 2:45 AM, Bhupathi, Ramakrishna wrote: > > Folks, > > I am building a Kubernetes Cluster( Openstack Train) and using fedora > atomic-29 image . The nodes come up  fine ( I have a simple 1 master > and 1 node) , but the cluster creation times out,  and when I access > the cloud-init logs I see this error .  Wondering what I am missing as > this used to work before.  I wonder if this is image related . > >   > > [ERROR]: Unable to render networking. Network config is likely broken: > No available network renderers found. Searched through list: ['eni', > 'sysconfig', 'netplan'] > >   > > Essentially the stack creation fails in “kube_cluster_deploy” > >   > > Can somebody help me debug this ? Any help is appreciated. > >   > > --RamaK > > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Fri Jan 10 06:55:49 2020 From: amotoki at gmail.com (Akihiro Motoki) Date: Fri, 10 Jan 2020 15:55:49 +0900 Subject: [infra][stable] python3 is used by default in older stable branches Message-ID: Hi, The horizon team recently noticed that python3 is used as a default python interpreter in older stable branches like pike or ocata. For example, horizon pep8 job in stable/pike and stable/ocata fails [1][2]. We also noticed that some jobs which are expected to run with python2 (using the tox default interpreter as of the release) are now run with python3 [3]. What is the recommended way to cope with this situation? Individual projects can cope with the default interpreter change repo by repo, but this potentially affects all projects with older stable branches. This is the reason I am sending this mail. Best Regards, Akihiro Motoki (amotoki) [1] https://zuul.opendev.org/t/openstack/build/daaeaedb0a184e29a03eeaae59157c78 [2] https://zuul.opendev.org/t/openstack/build/525dc7f926684e54be8b565a7bbf7193 [3] https://zuul.opendev.org/t/openstack/build/adbc53b8d1f74dac9cd606f4a796c442/log/tox/py27dj110-0.log#3 From skaplons at redhat.com Fri Jan 10 07:43:03 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 10 Jan 2020 08:43:03 +0100 Subject: [neutron][vpnaas] New neutron-vpnaas maintainer and core reviewer Message-ID: Hi, After our last call for volunteers to maintain some of the neutron stadium projects, we have new neutron-vpnaas maintainer now \o/ Dongcan Ye just stepped up to take care of this project. After discussion with other neutron-vpnaas core reviewers I added him to neutron-vpnaas core team. He works on neutron-vpnaas since some time already and in our opinion he knows this project good enough to be core reviewer there. Thank You very much Dongcan Ye for help with neutron-vpnaas project :) — Slawek Kaplonski Senior software engineer Red Hat From info at dantalion.nl Fri Jan 10 09:28:19 2020 From: info at dantalion.nl (info at dantalion.nl) Date: Fri, 10 Jan 2020 10:28:19 +0100 Subject: [aodh][keystone] handling of webhook / alarm authentication Message-ID: Hello, I was wondering how a service receiving an aodh webhook could perform authentication? The documentation describes the webhook as a simple post-request so I was wondering if a keystone token context is available when these requests are received? If not, I was wondering if anyone had any recommendation on how to perform authentication upon received post-requests? So far I have come up with limiting the functionality of these webhooks such as rate-limiting and administrators having to explicitly enable these webhooks before they work. Hope anyone else could provide further valuable information. Kind regards, Corne Lukken Watcher core-reviewer From aj at suse.com Fri Jan 10 09:50:51 2020 From: aj at suse.com (Andreas Jaeger) Date: Fri, 10 Jan 2020 10:50:51 +0100 Subject: [infra][stable] python3 is used by default in older stable branches In-Reply-To: References: Message-ID: On 10/01/2020 07.55, Akihiro Motoki wrote: > Hi, > > The horizon team recently noticed that python3 is used as a default > python interpreter in older stable branches like pike or ocata. > For example, horizon pep8 job in stable/pike and stable/ocata fails [1][2]. > We also noticed that some jobs which are expected to run with python2 > (using the tox default interpreter as of the release) are now run with > python3 [3]. Do you know what changed? I don't remember any intended change here, so I'm curious why this happens suddenly. https://zuul.opendev.org/t/openstack/build/0615d1df250144e6a137f0615c25ce66/logs from 27th of December already shows this on rocky https://zuul.opendev.org/t/openstack/build/085c0d9ea5d8466099eef3bb0ffb2213 from the 18th of December uses python 2.7. Andreas > What is the recommended way to cope with this situation? > > Individual projects can cope with the default interpreter change repo by repo, > but this potentially affects all projects with older stable branches. > This is the reason I am sending this mail. > > Best Regards, > Akihiro Motoki (amotoki) > > [1] https://zuul.opendev.org/t/openstack/build/daaeaedb0a184e29a03eeaae59157c78 > [2] https://zuul.opendev.org/t/openstack/build/525dc7f926684e54be8b565a7bbf7193 > [3] https://zuul.opendev.org/t/openstack/build/adbc53b8d1f74dac9cd606f4a796c442/log/tox/py27dj110-0.log#3 > -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB From ltoscano at redhat.com Fri Jan 10 10:06:52 2020 From: ltoscano at redhat.com (Luigi Toscano) Date: Fri, 10 Jan 2020 11:06:52 +0100 Subject: [infra][stable] python3 is used by default in older stable branches In-Reply-To: References: Message-ID: <15221729.TVbWr6RBN8@whitebase.usersys.redhat.com> On Friday, 10 January 2020 10:50:51 CET Andreas Jaeger wrote: > On 10/01/2020 07.55, Akihiro Motoki wrote: > > Hi, > > > > The horizon team recently noticed that python3 is used as a default > > python interpreter in older stable branches like pike or ocata. > > For example, horizon pep8 job in stable/pike and stable/ocata fails > > [1][2]. > > We also noticed that some jobs which are expected to run with python2 > > (using the tox default interpreter as of the release) are now run with > > python3 [3]. > > Do you know what changed? I don't remember any intended change here, so > I'm curious why this happens suddenly. > > https://zuul.opendev.org/t/openstack/build/0615d1df250144e6a137f0615c25ce66/ > logs from 27th of December already shows this on rocky > > https://zuul.opendev.org/t/openstack/build/085c0d9ea5d8466099eef3bb0ffb2213 > from the 18th of December uses python 2.7. > Maybe this is related to http://lists.openstack.org/pipermail/openstack-discuss/2019-November/ 010957.html http://eavesdrop.openstack.org/meetings/infra/2019/infra. 2019-11-19-19.05.log.html#l-102 ? Ciao -- Luigi From aj at suse.com Fri Jan 10 10:13:09 2020 From: aj at suse.com (Andreas Jaeger) Date: Fri, 10 Jan 2020 11:13:09 +0100 Subject: [infra][stable] python3 is used by default in older stable branches In-Reply-To: <15221729.TVbWr6RBN8@whitebase.usersys.redhat.com> References: <15221729.TVbWr6RBN8@whitebase.usersys.redhat.com> Message-ID: On 10/01/2020 11.06, Luigi Toscano wrote: > On Friday, 10 January 2020 10:50:51 CET Andreas Jaeger wrote: >> On 10/01/2020 07.55, Akihiro Motoki wrote: >>> Hi, >>> >>> The horizon team recently noticed that python3 is used as a default >>> python interpreter in older stable branches like pike or ocata. >>> For example, horizon pep8 job in stable/pike and stable/ocata fails >>> [1][2]. >>> We also noticed that some jobs which are expected to run with python2 >>> (using the tox default interpreter as of the release) are now run with >>> python3 [3]. >> >> Do you know what changed? I don't remember any intended change here, so >> I'm curious why this happens suddenly. >> >> https://zuul.opendev.org/t/openstack/build/0615d1df250144e6a137f0615c25ce66/ >> logs from 27th of December already shows this on rocky >> >> https://zuul.opendev.org/t/openstack/build/085c0d9ea5d8466099eef3bb0ffb2213 >> from the 18th of December uses python 2.7. >> > > Maybe this is related to > http://lists.openstack.org/pipermail/openstack-discuss/2019-November/ > 010957.html that one speaks about Bionic, and these failures are on Xenial nodes. The timing seems of - if it worked on the 18th of December and failed sometime later. thanks, Andreas > > http://eavesdrop.openstack.org/meetings/infra/2019/infra. > 2019-11-19-19.05.log.html#l-102 > > ? > > Ciao > -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB From anlin.kong at gmail.com Fri Jan 10 10:44:27 2020 From: anlin.kong at gmail.com (Lingxian Kong) Date: Fri, 10 Jan 2020 23:44:27 +1300 Subject: [aodh][keystone] handling of webhook / alarm authentication In-Reply-To: References: Message-ID: Hi Corne, I didn't fully understand your question, could you please provide the doc mentioned and if possible, an example of aodh alarm you want to create would be better. - Best regards, Lingxian Kong Catalyst Cloud On Fri, Jan 10, 2020 at 10:30 PM info at dantalion.nl wrote: > Hello, > > I was wondering how a service receiving an aodh webhook could perform > authentication? > > The documentation describes the webhook as a simple post-request so I > was wondering if a keystone token context is available when these > requests are received? > > If not, I was wondering if anyone had any recommendation on how to > perform authentication upon received post-requests? > > So far I have come up with limiting the functionality of these webhooks > such as rate-limiting and administrators having to explicitly enable > these webhooks before they work. > > Hope anyone else could provide further valuable information. > > Kind regards, > Corne Lukken > Watcher core-reviewer > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.page at canonical.com Fri Jan 10 11:22:46 2020 From: james.page at canonical.com (James Page) Date: Fri, 10 Jan 2020 11:22:46 +0000 Subject: [charms][watcher] OpenStack Watcher Charm In-Reply-To: References: <159661b1-7edf-e55d-c7b9-cf3b97bffffb@admin.grnet.gr> Message-ID: Dropping direct recipients as this causes a reject from openstack-discuss! On Fri, Jan 10, 2020 at 11:12 AM James Page wrote: > Hi Stamatis > > Thankyou for this work! > > I'll take a look at your charm over the next few days. > > On Wed, Jan 8, 2020 at 11:25 AM Stamatis Katsaounis < > skatsaounis at admin.grnet.gr> wrote: > >> Hi all, >> >> Purpose of this email is to let you know that we released an unofficial >> charm of OpenStack Watcher [1]. This charm gave us the opportunity to >> deploy OpenStack Watcher to our charmed OpenStack deployment. >> >> After seeing value in it, we decided to publish it through GRNET GitHub >> Organization account for several reasons. First of all, we would love to >> get feedback on it as it is our first try on creating an OpenStack reactive >> charm. Secondly, we would be glad to see other OpenStack operators deploy >> Watcher and share with us knowledge on the project and possible use cases. >> Finally, it would be ideal to come up with an official OpenStack Watcher >> charm repository under charmers umbrella. By doing this, another OpenStack >> project is going to be available not only for Train version but for any >> future version of OpenStack. Most important, the CI tests are going to >> ensure that the code is not broken and persuade other operators to use it. >> >> Before closing my email, I would like to give some insight on the >> architecture of the code base and the deployment process. To begin with, >> charm-watcher is based on other reactive OpenStack charms. During its >> deployment Barbican, Designate, Octavia and other charms' code bases were >> counseled. Furthermore, the structure is the same as any official OpenStack >> charm, of course without functional tests, which is something we cannot >> provide. >> > I'd suggest that we initiate the process to include your watcher charm as > part of the OpenStack Charmers project on opendev.org; once the initial > migration completes adding some functional tests should be fairly easy as > you'll be able to run them on the Canonical 3rd party CI infrastructure. > > This requires that a couple of reviews be raised - here are examples for > the new Manila Ganesha charms: > > https://review.opendev.org/#/c/693463/ > https://review.opendev.org/#/c/693462/ > > One is for the infrastructure setup, the other is to formally include the > repositories as part of the TC approved project. If you would like to > raise them for the watcher charm I'm happy to review with Frode (who is the > current PTL). > >> Speaking about the deployment process, apart from having a basic charmed >> OpenStack deployment, operator has to change two tiny configuration options >> on Nova cloud controller and Cinder. As explained in the Watcher >> configuration guide, special care has to be done with Oslo notifications >> for Nova and Cinder [2]. In order to achieve that in charmed OpenStack some >> issues were met and solved with the following patches [3], [4], [5], [6]. >> With these patches, operator can set the extra Oslo configuration and this >> is the only extra configuration needs to take place. Finally, with [7] >> Keystone charm can accept a relation with Watcher charm instead of ignoring >> it. >> >> To be able to deploy GRNET Watcher charm on Train, patches [3], [4], [5] >> and [7] have to be back-ported to stable/19.10 branch but that will require >> the approval of charmers team. Please let me know if such an option is >> available and in that case I am going to open the relevant patches. >> Furthermore, if you think that it could be a good option to create a spec >> and then introduce an official Watcher charm, I would love to help on that. >> > I'd rather we wait until the 20.02 charm release - dependent changes have > all landed and will be included. > > I wish all a happy new year and I am looking forward to your response and >> possible feedback. >> > > Happy new year to you as well! > > PS. If we could have an Ubuntu package for watcher-dashboard [8] like >> octavia-dashboard [9] we would release a charm for it as well. >> > > I'll chat with coreycb and see if we might be able to package that for > 20.04/Ussuri. > > Cheers > > James > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Fri Jan 10 11:56:47 2020 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 10 Jan 2020 12:56:47 +0100 Subject: [ops][largescale-sig] Collecting scaling stories Message-ID: <07b7df31-999d-8de9-839f-85830628855b@openstack.org> Hi everyone, As part of its goal of further pushing back scaling limits within a given cluster, the Large Scale SIG would like to collect scaling stories from OpenStack users. There is a size/load limit for single clusters past which things in OpenStack start to break, and we need to start using multiple clusters or cells to scale out. The SIG is interested in hearing: - what broke first for you, is it RabbitMQ or something else - what were the first symptoms - at what size/load did it start to break This will be a great help to document expected limits, and identify where improvements should be focused. You can contribute your experience by replying directly to this thread, or adding to the following etherpad: https://etherpad.openstack.org/p/scaling-stories Thanks in advance for your help ! -- Thierry Carrez (ttx) on behalf of the Large Scale SIG From mark at stackhpc.com Fri Jan 10 12:15:12 2020 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 10 Jan 2020 12:15:12 +0000 Subject: [release] making releases fast again (was: decentralising release approvals) In-Reply-To: <20200109170115.GA453843@sm-workstation> References: <20200109170115.GA453843@sm-workstation> Message-ID: On Thu, 9 Jan 2020 at 17:01, Sean McGinnis wrote: > > On Fri, Dec 20, 2019 at 11:04:36AM +0100, Thierry Carrez wrote: > > Mark Goddard wrote: > > > [...] > > > As kolla PTL and ironic release liaison I've proposed a number of > > > release patches recently. Generally the release team is good at churning > > > through these, but sometimes patches can hang around for a while. > > > Usually a ping on IRC will get things moving again within a day or so > > > (thanks in particular to Sean who has been very responsive). > > > > I agree we've seen an increase in processing delay lately, and I'd like to > > correct that. There are generally three things that would cause a > > perceptible delay in release processing... > > > > 1- wait for two release managers +2 > > > > This is something we put in place some time ago, as we had a lot of new > > members and thought that would be a good way to onboard them. Lately it > > created delays as a lot of those were not as active. > > > > 2- stable releases > > > > Two subcases in there... Eitherthe deliverable is under stable policy and > > there are *significant* delays there as we have to pause to give a chance to > > stable-maint-core people to voice an opinion. Or the deliverable is not > > under stable policy, but we do a manual check on the changes, as a way to > > educate the requester on semver. > > > > 3- waiting for PTL/release liaison to approve > > > > That can take a long time, but the release management team is not really at > > fault there. > > > > Coming back to hopefully wrap this up... > > We discussed this in today's release team meeting and decided to make some > changes to hopefully make things a little smoother. We will now use the > following guidelines for reviewing and approving release requests: > > For releases in the current development (including some time for the previous > cycle for the release-trailing deliverables) we will only require a single > reviewer. If everything looks good and there are no concerns, we will +2 and > approve the release request without waiting for a second. If the reviewer has > any doubts or hesitation, they can decide to wait for a second reviewer, but > this should be a much less common situation. > > For stable releases, we will require two +2s. We will not, however, wait for a > designated day for stable team review. If we can get one, all the better, but > the normal release team should be aware of stable rules and look for them for > any stable release request. Keeping the requirement for two reviewers should > help make sure nothing is overlooked with stable policy. > > We do still want PTL/liaison +1 to appove, so we will continue to wait for > that. Thierry is working on some job automation to make checking for that a > little easier, so hopefully that will help make that process as smooth as > possible. Thanks for taking action on this - I expect the above changes will be a big improvement. > > If there are any other questions or concerns, please do let us know. > > Sean > From mark at stackhpc.com Fri Jan 10 12:22:54 2020 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 10 Jan 2020 12:22:54 +0000 Subject: [kolla] Kayobe 7.0.0 released for Train Message-ID: Hi, I'm pleased to announce the release of Kayobe 7.0.0 - the first release in the Train series, and the first release as a deliverable of the Kolla project. Full details are available in the release notes which are available here: https://docs.openstack.org/releasenotes/kayobe/train.html We anticipate significant changes during the Train release series to support a migration to CentOS 8. We will communicate which releases are affected. Thanks to everyone who contributed to this release, and for the Kolla project for accepting us. I look forward to the continued integration of these teams. Cheers, Mark From thierry at openstack.org Fri Jan 10 12:40:22 2020 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 10 Jan 2020 13:40:22 +0100 Subject: [largescale-sig] Next meeting: Jan 15, 9utc Message-ID: <6e31eca3-31ee-b393-7c7d-0def96185b00@openstack.org> Hi everyone, The Large Scale SIG will have a meeting next week on Wednesday, Jan 15 at 9 UTC[1] in #openstack-meeting on IRC. I'll not be around but Belmiro Moreira volunteered to chair the meeting. [1] https://www.timeanddate.com/worldclock/fixedtime.html?iso=20200115T09 We had several TODOs out of our December meeting, so I invite you to review the summary of that meeting in preparation of the next: http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011667.html As always, the agenda for the meeting next week is available at: https://etherpad.openstack.org/p/large-scale-sig-meeting Regards, -- Thierry Carrez (ttx) From info at dantalion.nl Fri Jan 10 12:50:10 2020 From: info at dantalion.nl (info at dantalion.nl) Date: Fri, 10 Jan 2020 13:50:10 +0100 Subject: [aodh][keystone] handling of webhook / alarm authentication In-Reply-To: References: Message-ID: <75131451-9b07-0dc8-2ed2-3573434e0e7d@dantalion.nl> Hi Lingxian, The information referenced comes from: https://docs.openstack.org/aodh/latest/admin/telemetry-alarms.html Here it would be an alarm that would use the webhooks action. The endpoint in our use case would be Watcher for which we have just passed a spec: https://review.opendev.org/#/c/695646/ With these alarms that report using a webhook I am wondering how these received alarms can be authenticated and if the keystone token context is available? Hope this makes it clearer. Kind regards, Corne Lukken Watcher core-reviewer On 1/10/20 11:44 AM, Lingxian Kong wrote: > Hi Corne, > > I didn't fully understand your question, could you please provide the doc > mentioned and if possible, an example of aodh alarm you want to create > would be better. > > - > Best regards, > Lingxian Kong > Catalyst Cloud > > > On Fri, Jan 10, 2020 at 10:30 PM info at dantalion.nl > wrote: > >> Hello, >> >> I was wondering how a service receiving an aodh webhook could perform >> authentication? >> >> The documentation describes the webhook as a simple post-request so I >> was wondering if a keystone token context is available when these >> requests are received? >> >> If not, I was wondering if anyone had any recommendation on how to >> perform authentication upon received post-requests? >> >> So far I have come up with limiting the functionality of these webhooks >> such as rate-limiting and administrators having to explicitly enable >> these webhooks before they work. >> >> Hope anyone else could provide further valuable information. >> >> Kind regards, >> Corne Lukken >> Watcher core-reviewer >> >> > From corey.bryant at canonical.com Fri Jan 10 13:11:46 2020 From: corey.bryant at canonical.com (Corey Bryant) Date: Fri, 10 Jan 2020 08:11:46 -0500 Subject: [charms][watcher] OpenStack Watcher Charm In-Reply-To: References: <159661b1-7edf-e55d-c7b9-cf3b97bffffb@admin.grnet.gr> Message-ID: On Fri, Jan 10, 2020 at 6:12 AM James Page wrote: > Hi Stamatis > > Thankyou for this work! > > I'll take a look at your charm over the next few days. > > On Wed, Jan 8, 2020 at 11:25 AM Stamatis Katsaounis < > skatsaounis at admin.grnet.gr> wrote: > >> Hi all, >> >> Purpose of this email is to let you know that we released an unofficial >> charm of OpenStack Watcher [1]. This charm gave us the opportunity to >> deploy OpenStack Watcher to our charmed OpenStack deployment. >> >> After seeing value in it, we decided to publish it through GRNET GitHub >> Organization account for several reasons. First of all, we would love to >> get feedback on it as it is our first try on creating an OpenStack reactive >> charm. Secondly, we would be glad to see other OpenStack operators deploy >> Watcher and share with us knowledge on the project and possible use cases. >> Finally, it would be ideal to come up with an official OpenStack Watcher >> charm repository under charmers umbrella. By doing this, another OpenStack >> project is going to be available not only for Train version but for any >> future version of OpenStack. Most important, the CI tests are going to >> ensure that the code is not broken and persuade other operators to use it. >> >> Before closing my email, I would like to give some insight on the >> architecture of the code base and the deployment process. To begin with, >> charm-watcher is based on other reactive OpenStack charms. During its >> deployment Barbican, Designate, Octavia and other charms' code bases were >> counseled. Furthermore, the structure is the same as any official OpenStack >> charm, of course without functional tests, which is something we cannot >> provide. >> > I'd suggest that we initiate the process to include your watcher charm as > part of the OpenStack Charmers project on opendev.org; once the initial > migration completes adding some functional tests should be fairly easy as > you'll be able to run them on the Canonical 3rd party CI infrastructure. > > This requires that a couple of reviews be raised - here are examples for > the new Manila Ganesha charms: > > https://review.opendev.org/#/c/693463/ > https://review.opendev.org/#/c/693462/ > > One is for the infrastructure setup, the other is to formally include the > repositories as part of the TC approved project. If you would like to > raise them for the watcher charm I'm happy to review with Frode (who is the > current PTL). > >> Speaking about the deployment process, apart from having a basic charmed >> OpenStack deployment, operator has to change two tiny configuration options >> on Nova cloud controller and Cinder. As explained in the Watcher >> configuration guide, special care has to be done with Oslo notifications >> for Nova and Cinder [2]. In order to achieve that in charmed OpenStack some >> issues were met and solved with the following patches [3], [4], [5], [6]. >> With these patches, operator can set the extra Oslo configuration and this >> is the only extra configuration needs to take place. Finally, with [7] >> Keystone charm can accept a relation with Watcher charm instead of ignoring >> it. >> >> To be able to deploy GRNET Watcher charm on Train, patches [3], [4], [5] >> and [7] have to be back-ported to stable/19.10 branch but that will require >> the approval of charmers team. Please let me know if such an option is >> available and in that case I am going to open the relevant patches. >> Furthermore, if you think that it could be a good option to create a spec >> and then introduce an official Watcher charm, I would love to help on that. >> > I'd rather we wait until the 20.02 charm release - dependent changes have > all landed and will be included. > > I wish all a happy new year and I am looking forward to your response and >> possible feedback. >> > > Happy new year to you as well! > > PS. If we could have an Ubuntu package for watcher-dashboard [8] like >> octavia-dashboard [9] we would release a charm for it as well. >> > > I'll chat with coreycb and see if we might be able to package that for > 20.04/Ussuri. > > Hi, I'll take a look at packaging watcher-dashboard. Corey Cheers > > James > >> >> -- Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From marios at redhat.com Fri Jan 10 13:42:30 2020 From: marios at redhat.com (Marios Andreou) Date: Fri, 10 Jan 2020 15:42:30 +0200 Subject: [tripleo][ci] Proposing Sorin Barnea as TripleO-CI core Message-ID: I would like to propose Sorin Barnea (ssbarnea at redhat.com) as core on tripleo-ci repos (tripleo-ci, tripleo-quickstart, tripleo-quickstart-extras). Sorin has been a member of the tripleo-ci team for over one and a half years and has made many contributions across the tripleo-ci repos and beyond - highlights include helping the team to adopt molecule testing, leading linting efforts/changes/fixes and many others. Please vote by replying to this thread with +1 or -1 for any objections thanks marios -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlandy at redhat.com Fri Jan 10 13:52:54 2020 From: rlandy at redhat.com (Ronelle Landy) Date: Fri, 10 Jan 2020 08:52:54 -0500 Subject: [tripleo][ci] Proposing Sorin Barnea as TripleO-CI core In-Reply-To: References: Message-ID: +1 - thanks for your work here, Sorin, On Fri, Jan 10, 2020 at 8:44 AM Marios Andreou wrote: > I would like to propose Sorin Barnea (ssbarnea at redhat.com) as core on > tripleo-ci repos (tripleo-ci, tripleo-quickstart, > tripleo-quickstart-extras). > > Sorin has been a member of the tripleo-ci team for over one and a half > years and has made many contributions across the tripleo-ci repos and > beyond - highlights include helping the team to adopt molecule testing, > leading linting efforts/changes/fixes and many others. > > Please vote by replying to this thread with +1 or -1 for any objections > > thanks > marios > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdobreli at redhat.com Fri Jan 10 14:07:34 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Fri, 10 Jan 2020 15:07:34 +0100 Subject: [tripleo][ci] Proposing Sorin Barnea as TripleO-CI core In-Reply-To: References: Message-ID: <51174bf8-5fdc-28d5-74de-0daa0e1f425a@redhat.com> On 10.01.2020 14:42, Marios Andreou wrote: > I would like to propose Sorin Barnea (ssbarnea at redhat.com > ) as core on tripleo-ci repos (tripleo-ci, > tripleo-quickstart, tripleo-quickstart-extras). > > Sorin has been a  member of the tripleo-ci team for over one and a half > years and has made many contributions across the tripleo-ci repos and > beyond - highlights include helping the team to adopt molecule testing, > leading linting efforts/changes/fixes and many others. > > Please vote by replying to this thread with +1 or -1 for any objections +1 Well deserved! > > thanks > marios > -- Best regards, Bogdan Dobrelya, Irc #bogdando From lshort at redhat.com Fri Jan 10 14:11:10 2020 From: lshort at redhat.com (Luke Short) Date: Fri, 10 Jan 2020 09:11:10 -0500 Subject: [tripleo][ci] Proposing Sorin Barnea as TripleO-CI core In-Reply-To: References: Message-ID: +1 Sorin has been an incredibly resourceful and creative thinker. He definitely deserves a spot as a TripleO CI Core! Keep up the amazing work! Luke Short, RHCE Software Engineer, OpenStack Deployment Framework Red Hat, Inc. On Fri, Jan 10, 2020 at 8:48 AM Marios Andreou wrote: > I would like to propose Sorin Barnea (ssbarnea at redhat.com) as core on > tripleo-ci repos (tripleo-ci, tripleo-quickstart, > tripleo-quickstart-extras). > > Sorin has been a member of the tripleo-ci team for over one and a half > years and has made many contributions across the tripleo-ci repos and > beyond - highlights include helping the team to adopt molecule testing, > leading linting efforts/changes/fixes and many others. > > Please vote by replying to this thread with +1 or -1 for any objections > > thanks > marios > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlandy at redhat.com Fri Jan 10 14:11:17 2020 From: rlandy at redhat.com (Ronelle Landy) Date: Fri, 10 Jan 2020 09:11:17 -0500 Subject: [tripleo][ci] Proposing Chandan Kumar and Arx Cruz as TripleO-CI cores Message-ID: Hello All, I'd like to propose Arx Cruz (arxcruz at redhat.com) and Chandan Kumar ( chkumar at redhat.com) as core on tripleo-ci repos (tripleo-ci, tripleo-quickstart, tripleo-quickstart-extras). In addition to the extensive work that Arx and Chandan have done on the Tempest-related repos ( and Tempest interface/settings within the Tripleo CI repos) , they have become active contributors to the core Tripleo CI repos, in general, in the past two years. Please vote by replying to this thread with +1 or -1 for any objections. We will close the vote 7 days from now. Thank you, Ronelle -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Fri Jan 10 14:36:59 2020 From: aschultz at redhat.com (Alex Schultz) Date: Fri, 10 Jan 2020 07:36:59 -0700 Subject: [tripleo][ci] Proposing Sorin Barnea as TripleO-CI core In-Reply-To: References: Message-ID: +1 On Fri, Jan 10, 2020 at 6:48 AM Marios Andreou wrote: > > I would like to propose Sorin Barnea (ssbarnea at redhat.com) as core on tripleo-ci repos (tripleo-ci, tripleo-quickstart, tripleo-quickstart-extras). > > Sorin has been a member of the tripleo-ci team for over one and a half years and has made many contributions across the tripleo-ci repos and beyond - highlights include helping the team to adopt molecule testing, leading linting efforts/changes/fixes and many others. > > Please vote by replying to this thread with +1 or -1 for any objections > > thanks > marios > From aschultz at redhat.com Fri Jan 10 14:37:24 2020 From: aschultz at redhat.com (Alex Schultz) Date: Fri, 10 Jan 2020 07:37:24 -0700 Subject: [tripleo][ci] Proposing Chandan Kumar and Arx Cruz as TripleO-CI cores In-Reply-To: References: Message-ID: +1 On Fri, Jan 10, 2020 at 7:16 AM Ronelle Landy wrote: > > Hello All, > > I'd like to propose Arx Cruz (arxcruz at redhat.com) and Chandan Kumar (chkumar at redhat.com) as core on tripleo-ci repos (tripleo-ci, tripleo-quickstart, tripleo-quickstart-extras). > > In addition to the extensive work that Arx and Chandan have done on the Tempest-related repos ( and Tempest interface/settings within the Tripleo CI repos) , they have become active contributors to the core Tripleo CI repos, in general, in the past two years. > > Please vote by replying to this thread with +1 or -1 for any objections. We will close the vote 7 days from now. > > Thank you, > Ronelle From marios at redhat.com Fri Jan 10 14:49:11 2020 From: marios at redhat.com (Marios Andreou) Date: Fri, 10 Jan 2020 16:49:11 +0200 Subject: [tripleo][ci] Proposing Chandan Kumar and Arx Cruz as TripleO-CI cores In-Reply-To: References: Message-ID: +1 On Fri, Jan 10, 2020 at 4:13 PM Ronelle Landy wrote: > Hello All, > > I'd like to propose Arx Cruz (arxcruz at redhat.com) and Chandan Kumar ( > chkumar at redhat.com) as core on tripleo-ci repos (tripleo-ci, > tripleo-quickstart, tripleo-quickstart-extras). > > In addition to the extensive work that Arx and Chandan have done on the > Tempest-related repos ( and Tempest interface/settings within the Tripleo > CI repos) , they have become active contributors to the core Tripleo CI > repos, in general, in the past two years. > > Please vote by replying to this thread with +1 or -1 for any objections. > We will close the vote 7 days from now. > > Thank you, > Ronelle > -------------- next part -------------- An HTML attachment was scrubbed... URL: From emilien at redhat.com Fri Jan 10 15:15:42 2020 From: emilien at redhat.com (Emilien Macchi) Date: Fri, 10 Jan 2020 10:15:42 -0500 Subject: [tripleo][ci] Proposing Sorin Barnea as TripleO-CI core In-Reply-To: References: Message-ID: +1, Sorin has been providing meaningful and careful reviews on the CI projects. I think it's safe to promote him core at this point. Keep the good work going! On Fri, Jan 10, 2020 at 9:44 AM Alex Schultz wrote: > +1 > > On Fri, Jan 10, 2020 at 6:48 AM Marios Andreou wrote: > > > > I would like to propose Sorin Barnea (ssbarnea at redhat.com) as core on > tripleo-ci repos (tripleo-ci, tripleo-quickstart, > tripleo-quickstart-extras). > > > > Sorin has been a member of the tripleo-ci team for over one and a half > years and has made many contributions across the tripleo-ci repos and > beyond - highlights include helping the team to adopt molecule testing, > leading linting efforts/changes/fixes and many others. > > > > Please vote by replying to this thread with +1 or -1 for any objections > > > > thanks > > marios > > > > > -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From kgiusti at gmail.com Fri Jan 10 15:24:51 2020 From: kgiusti at gmail.com (Ken Giusti) Date: Fri, 10 Jan 2020 10:24:51 -0500 Subject: [neutron][rabbitmq][oslo] Neutron-server service shows deprecated "AMQPDeprecationWarning" In-Reply-To: References: <4D3B074F-09F2-48BE-BD61-5D34CBFE509E@dkrz.de> <294c93b5-0ddc-284b-34a1-ffce654ba047@nemebean.com> <274FDC2A-837B-45CC-BFBF-8C09A182550A@dkrz.de> Message-ID: On Thu, Jan 9, 2020 at 2:27 PM Ben Nemec wrote: > > > On 1/9/20 5:57 AM, Amjad Kotobi wrote: > > Hi Ben, > > > >> On 7. Jan 2020, at 22:59, Ben Nemec wrote: > >> > >> > >> > >> On 1/7/20 9:14 AM, Amjad Kotobi wrote: > >>> Hi, > >>> Today we are facing losing connection of neutron especially during > instance creation or so as “systemctl status neutron-server” shows below > message > >>> be deprecated in amqp 2.2.0. > >>> Since amqp 2.0 you have to explicitly call Connection.connect() > >>> before using the connection. > >>> W_FORCE_CONNECT.format(attr=attr))) > >>> /usr/lib/python2.7/site-packages/amqp/connection.py:304: > AMQPDeprecationWarning: The .transport attribute on the connection was > accessed before > >>> the connection was established. This is supported for now, but will > >>> be deprecated in amqp 2.2.0. > >>> Since amqp 2.0 you have to explicitly call Connection.connect() > >>> before using the connection. > >>> W_FORCE_CONNECT.format(attr=attr))) > >> > >> It looks like this is a red herring, but it should be fixed in the > current oslo.messaging pike release. See [0] and the related bug. > >> > >> 0: https://review.opendev.org/#/c/605324/ > >> > >>> OpenStack release which we are running is “Pike”. > >>> Is there any way to remedy this? > >> > >> I don't think this should be a fatal problem in and of itself so I > suspect it's masking something else. However, I would recommend updating to > the latest pike release of oslo.messaging where the deprecated feature is > not used. If that doesn't fix the problem, please send us whatever errors > remain after this one is eliminated. > > > > I checked it out, we are having the latest pike Oslo.messaging and it > still showing the same upper messages. Any ideas? > > Hmm, not sure then. Are there any other log messages around that one > which might provide more context on where this is happening? > > I've also copied a couple of our messaging folks in case they have a > better idea what might be going on. > > This deprecation warning should not result in a connection failure. After the amqp connection code issues that warning it immediately calls connect() for you, which establishes the connection. What concerns me is that you're hitting that warning in latest Pike. That version of oslo.messaging should no longer trigger that warning. If at all possible can you get a traceback when the warning is issued? When did these failures start to occur? Did something change - upgrade/downgrade, etc? Otherwise if you can reproduce the problem can you get a debug-level log trace? thanks, > >> > >>> Thanks > >>> Amjad > > > > > > -- Ken Giusti (kgiusti at gmail.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: From emilien at redhat.com Fri Jan 10 15:38:41 2020 From: emilien at redhat.com (Emilien Macchi) Date: Fri, 10 Jan 2020 10:38:41 -0500 Subject: [tripleo][ci] Proposing Chandan Kumar and Arx Cruz as TripleO-CI cores In-Reply-To: References: Message-ID: +1 for Chandan; no doubt; he's always available on IRC to help when things go wrong in gate or promotion, and very often he's proposing the fix. Providing thoroughful reviews, and multi-project contributors, I've seen Chandan involved not only in TripleO CI but also in other projects like RDO and TripleO itself. I've seen him contributing to the tripleo-common and tripleoclient projects; which make him someone capable to understand not only how CI works but also how the project in general works. Having him core is to me natural. Number of commits/reviews shows his interests in the CI repos: https://www.stackalytics.com/?user_id=chandankumar-093047&release=train&metric=marks https://www.stackalytics.com/?user_id=chandankumar-093047&release=train&metric=commits ---- I hate playing devil's advocate here but I'll give my honest (and hopefully constructive) opinion. I would like to see more involvement from Arx in the TripleO community. He did a tremendous work on openstack-ansible-os_tempest; however this repo isn't governed by TripleO CI group. I would like to see more reviews; where he can bring his expertise; and not only in Gerrit but also on IRC when things aren't going well (gate issues, promotion blockers, etc). Number of commits/reviews aren't low but IMHO can be better for a core reviewer. https://www.stackalytics.com/?user_id=arxcruz&release=train&metric=commits https://www.stackalytics.com/?user_id=arxcruz&release=train&metric=marks I don't think it'll take time until Arx gets there but to me it's a -1 for now, for what it's worth. Emilien On Fri, Jan 10, 2020 at 9:20 AM Ronelle Landy wrote: > Hello All, > > I'd like to propose Arx Cruz (arxcruz at redhat.com) and Chandan Kumar ( > chkumar at redhat.com) as core on tripleo-ci repos (tripleo-ci, > tripleo-quickstart, tripleo-quickstart-extras). > > In addition to the extensive work that Arx and Chandan have done on the > Tempest-related repos ( and Tempest interface/settings within the Tripleo > CI repos) , they have become active contributors to the core Tripleo CI > repos, in general, in the past two years. > > Please vote by replying to this thread with +1 or -1 for any objections. > We will close the vote 7 days from now. > > Thank you, > Ronelle > -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From sgolovat at redhat.com Fri Jan 10 16:45:21 2020 From: sgolovat at redhat.com (Sergii Golovatiuk) Date: Fri, 10 Jan 2020 17:45:21 +0100 Subject: [tripleo][ci] Proposing Sorin Barnea as TripleO-CI core In-Reply-To: References: Message-ID: +1 пт, 10 янв. 2020 г. в 16:17, Emilien Macchi : > +1, Sorin has been providing meaningful and careful reviews on the CI > projects. > I think it's safe to promote him core at this point. > > Keep the good work going! > > On Fri, Jan 10, 2020 at 9:44 AM Alex Schultz wrote: > >> +1 >> >> On Fri, Jan 10, 2020 at 6:48 AM Marios Andreou wrote: >> > >> > I would like to propose Sorin Barnea (ssbarnea at redhat.com) as core on >> tripleo-ci repos (tripleo-ci, tripleo-quickstart, >> tripleo-quickstart-extras). >> > >> > Sorin has been a member of the tripleo-ci team for over one and a half >> years and has made many contributions across the tripleo-ci repos and >> beyond - highlights include helping the team to adopt molecule testing, >> leading linting efforts/changes/fixes and many others. >> > >> > Please vote by replying to this thread with +1 or -1 for any objections >> > >> > thanks >> > marios >> > >> >> >> > > -- > Emilien Macchi > -- Sergii Golovatiuk Senior Software Developer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From whayutin at redhat.com Fri Jan 10 17:12:00 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Fri, 10 Jan 2020 10:12:00 -0700 Subject: [tripleo] rocky builds Message-ID: Greetings, I've confirmed that builds from the Rocky release will no longer be imported. Looking for input from the upstream folks with regards to maintaining the Rocky release for upstream. Can you please comment if you have any requirement to continue building, patching Rocky as I know there are active reviews [1]. I've added this topic to be discussed at the next meeting [2] Thank you! [1] https://review.opendev.org/#/q/status:open+tripleo+branch:stable/rocky [2] https://etherpad.openstack.org/p/tripleo-meeting-items -------------- next part -------------- An HTML attachment was scrubbed... URL: From duc.openstack at gmail.com Fri Jan 10 17:18:05 2020 From: duc.openstack at gmail.com (Duc Truong) Date: Fri, 10 Jan 2020 09:18:05 -0800 Subject: [aodh][keystone] handling of webhook / alarm authentication In-Reply-To: <75131451-9b07-0dc8-2ed2-3573434e0e7d@dantalion.nl> References: <75131451-9b07-0dc8-2ed2-3573434e0e7d@dantalion.nl> Message-ID: Senlin implements unauthenticated webhooks [1] that can be called by aodh. The webhook id is a uuid that is generated for each webhook. When the webhook is created, Senlin creates a keystone trust with the user to perform actions on their behalf when the webhook is received. That is probably the easiest way to implement webhooks without worrying about passing the keystone token context. [1] https://docs.openstack.org/api-ref/clustering/#trigger-webhook-action On Fri, Jan 10, 2020 at 4:48 AM info at dantalion.nl wrote: > > Hi Lingxian, > > The information referenced comes from: > https://docs.openstack.org/aodh/latest/admin/telemetry-alarms.html > > Here it would be an alarm that would use the webhooks action. The > endpoint in our use case would be Watcher for which we have just passed > a spec: https://review.opendev.org/#/c/695646/ > > With these alarms that report using a webhook I am wondering how these > received alarms can be authenticated and if the keystone token context > is available? > > Hope this makes it clearer. > > Kind regards, > Corne Lukken > Watcher core-reviewer > > On 1/10/20 11:44 AM, Lingxian Kong wrote: > > Hi Corne, > > > > I didn't fully understand your question, could you please provide the doc > > mentioned and if possible, an example of aodh alarm you want to create > > would be better. > > > > - > > Best regards, > > Lingxian Kong > > Catalyst Cloud > > > > > > On Fri, Jan 10, 2020 at 10:30 PM info at dantalion.nl > > wrote: > > > >> Hello, > >> > >> I was wondering how a service receiving an aodh webhook could perform > >> authentication? > >> > >> The documentation describes the webhook as a simple post-request so I > >> was wondering if a keystone token context is available when these > >> requests are received? > >> > >> If not, I was wondering if anyone had any recommendation on how to > >> perform authentication upon received post-requests? > >> > >> So far I have come up with limiting the functionality of these webhooks > >> such as rate-limiting and administrators having to explicitly enable > >> these webhooks before they work. > >> > >> Hope anyone else could provide further valuable information. > >> > >> Kind regards, > >> Corne Lukken > >> Watcher core-reviewer > >> > >> > > > From sshnaidm at redhat.com Fri Jan 10 17:25:56 2020 From: sshnaidm at redhat.com (Sagi Shnaidman) Date: Fri, 10 Jan 2020 19:25:56 +0200 Subject: [tripleo][ci] Proposing Sorin Barnea as TripleO-CI core In-Reply-To: References: Message-ID: +1! On Fri, Jan 10, 2020 at 3:44 PM Marios Andreou wrote: > I would like to propose Sorin Barnea (ssbarnea at redhat.com) as core on > tripleo-ci repos (tripleo-ci, tripleo-quickstart, > tripleo-quickstart-extras). > > Sorin has been a member of the tripleo-ci team for over one and a half > years and has made many contributions across the tripleo-ci repos and > beyond - highlights include helping the team to adopt molecule testing, > leading linting efforts/changes/fixes and many others. > > Please vote by replying to this thread with +1 or -1 for any objections > > thanks > marios > > -- Best regards Sagi Shnaidman -------------- next part -------------- An HTML attachment was scrubbed... URL: From whayutin at redhat.com Fri Jan 10 21:36:25 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Fri, 10 Jan 2020 14:36:25 -0700 Subject: [tripleo] introducing zuul-runner Message-ID: Greetings, Hey everyone I want to highlight work that some of our good friends in zuul are working another way to try and help everyone debug their jobs. The tool is called zuul-runner and the spec is here [1], code is here [2] Please have a look through the spec and vote to show support for Tristan's work. Thanks to Arx Cruz for contributing and testing it as well w/ TripleO jobs. This will be another very nice tool in the toolbox for us if we can get it through. Thanks! [1] https://review.opendev.org/#/c/681277/ [2] https://review.opendev.org/#/c/607078/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Fri Jan 10 21:57:58 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 10 Jan 2020 15:57:58 -0600 Subject: [all] Announcing OpenStack Victoria! Message-ID: <20200110215758.GB536693@sm-workstation> Hello everyone, The polling results are in, and the legal vetting process has now completed. We now have an official name for the "V" release. The full results of the poll can be found here: https://civs.cs.cornell.edu/cgi-bin/results.pl?num_winners=1&id=E_13ccd49b66cfd1b4&rkey=4e184724fa32eed6&algorithm=minimax While Victoria and Vancouver were technically a tie, based on the Minimax rankingi, it puts Victoria slightly ahead of Vancouver based on the votes. In addition to that, we chose to have the TC do a tie breaker vote which confirmed Victoria as the winner. Victoria is the capital city of British Columbia: https://en.wikipedia.org/wiki/Victoria,_British_Columbia Thank you all for participating in the release naming! Sean From sean.mcginnis at gmx.com Fri Jan 10 22:02:17 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 10 Jan 2020 16:02:17 -0600 Subject: [release] Release countdown for week R-17, January 13-17 Message-ID: <20200110220217.GC536693@sm-workstation> Development Focus ----------------- The Ussuri-2 milestone will happen in next month, on February 13. Ussuri-related specs should now be finalized so that teams can move to implementation ASAP. Some teams observe specific deadlines on the second milestone (mostly spec freezes): please refer to https://releases.openstack.org/ussuri/schedule.html for details. General Information ------------------- Please remember that libraries need to be released at least once per milestone period. At milestone 2, the release team will propose releases for any library that has not been otherwise released since milestone 1. Other non-library deliverables that follow the cycle-with-intermediary release model should have an intermediary release before milestone-2. Those who haven't will be proposed to switch to the cycle-with-rc model, which is more suited to deliverables that are released only once per cycle. At milestone-2 we also freeze the contents of the final release. If you have a new deliverable that should be included in the final release, you should make sure it has a deliverable file in: https://opendev.org/openstack/releases/src/branch/master/deliverables/ussuri You should request a beta release (or intermediary release) for those new deliverables by milestone-2. We understand some may not be quite ready for a full release yet, but if you have something minimally viable to get released it would be good to do a 0.x release to exercise the release tooling for your deliverables. See the MembershipFreeze description for more details: https://releases.openstack.org/ussuri/schedule.html#u-mf Finally, now may be a good time for teams to check on any stable releases that need to be done for your deliverables. If you have bugfixes that have been backported, but no stable release getting those. If you are unsure what is out there committed but not released, in the openstack/releases repo, running the command "tools/list_stable_unreleased_changes.sh " gives a nice report. Upcoming Deadlines & Dates -------------------------- Ussuri-2 Milestone: February 13 (R-13 week) From anlin.kong at gmail.com Sat Jan 11 01:06:37 2020 From: anlin.kong at gmail.com (Lingxian Kong) Date: Sat, 11 Jan 2020 14:06:37 +1300 Subject: [aodh][keystone] handling of webhook / alarm authentication In-Reply-To: <75131451-9b07-0dc8-2ed2-3573434e0e7d@dantalion.nl> References: <75131451-9b07-0dc8-2ed2-3573434e0e7d@dantalion.nl> Message-ID: On Sat, Jan 11, 2020 at 1:47 AM info at dantalion.nl wrote: > With these alarms that report using a webhook I am wondering how these > received alarms can be authenticated and if the keystone token context > is available? > Aodh supports to create an alarm with actions such as 'trust+http://', once the alarm is triggered, the URL service will receive POST request with 'X-Auth-Token' in the headers and alarm information in the body. - Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From agarwalvishakha18 at gmail.com Sat Jan 11 10:37:30 2020 From: agarwalvishakha18 at gmail.com (vishakha agarwal) Date: Sat, 11 Jan 2020 16:07:30 +0530 Subject: [keystone] Keystone Team Update - Week of 6 January 2020 Message-ID: # Keystone Team Update - Week of 6 January 2020 ## News ### User Support and Bug Duty The person in-charge for bug duty for current and upcoming week can be seen on the etherpad [1] [1] https://etherpad.openstack.org/p/keystone-l1-duty ## Action Items The one-fourth of the ussuri cycle is almost over. We need to find a new mechanism for the retrospective and to check our progress for this cycle which is more convenient and less time consuming for the members. ## Open Specs Ussuri specs: https://bit.ly/2XDdpkU Ongoing specs: https://bit.ly/2OyDLTh ## Recently Merged Changes Search query: https://bit.ly/2pquOwT We merged 19 changes this week. ## Changes that need Attention Search query: https://bit.ly/2tymTje There are 37 changes that are passing CI, not in merge conflict, have no negative reviews and aren't proposed by bots. ### Priority Reviews * Community Goals https://review.opendev.org/#/c/699127/ [ussuri][goal] Drop python 2.7 support and testing keystone-tempest-plugin https://review.opendev.org/#/c/699126/ [ussuri][goal] Drop python 2.7 support and testing ldappool https://review.opendev.org/#/c/699119/ [ussuri][goal] Drop python 2.7 support and testing python-keystoneclient * Special Requests https://review.opendev.org/#/c/662734/ Change the default Identity endpoint to internal https://review.opendev.org/#/c/699013/ Always have username in CADF initiator https://review.opendev.org/#/c/700826/ Fix role_assignments role.id filter ## Bugs This week we opened 3 new bugs and closed 5. Bugs opened (3) Bug #1858410 (keystone:Low): Got error 'NoneType' when executing unittest on stable/rocky - Opened by Eric Xie https://bugs.launchpad.net/keystone/+bug/1858410 Bug #1858186 (keystoneauth:Undecided): http_log_request will print debug info include pki certificate which is unsafety - Opened by kuangpeiling https://bugs.launchpad.net/keystoneauth/+bug/1858186 Bug #1858189 (keystoneauth:Undecided): http_log_request will print debug info include pki certificate which is unsafety - Opened by kuangpeiling https://bugs.launchpad.net/keystoneauth/+bug/1858189 Bugs closed (5) Bug #1858186 (keystoneauth:Undecided) https://bugs.launchpad.net/keystoneauth/+bug/1858186 Bug #1856881 (keystone:Medium): keystone-manage bootstrap fails with ambiguous role names - Fixed by Lance Bragstad https://bugs.launchpad.net/keystone/+bug/1856881 Bug #1856962 (keystone:Undecided): openid method failed when federation_group_ids is empty list - Fixed by Colleen Murphy https://bugs.launchpad.net/keystone/+bug/1856962 Bug #1857086 (keystone: Won't Fix) https://bugs.launchpad.net/keystone/+bug/1857086 Bug #1831018 (keystone: Invalid)https://bugs.launchpad.net/keystone/+bug/1831018 ## Milestone Outlook https://releases.openstack.org/ussuri/schedule.html Spec freeze is on the week of 10 February. All the specs targeted for this cycle should be ready for the review soon. ## Help with this newsletter Help contribute to this newsletter by editing the etherpad: https://etherpad.openstack.org/p/keystone-team-newsletter From arxcruz at redhat.com Sat Jan 11 14:39:09 2020 From: arxcruz at redhat.com (Arx Cruz) Date: Sat, 11 Jan 2020 15:39:09 +0100 Subject: [tripleo][ci] Proposing Chandan Kumar and Arx Cruz as TripleO-CI cores In-Reply-To: References: Message-ID: Hello Emilien, Thanks for your feedback, I really appreciate it. You are right, there are places that I really can improve, and I will work to improve it, and I really looking forward to have your help. Regarding the amount of reviews and commits, it’s true that I haven’t be so active on tripleo upstream projects, but please, remember that stackalytics only reflect the projects under tripleo umbrella, and you know that in tripleo-ci we also work on rdo side, where I’ve been working more activelly, right now, working on integration with thirdy party projects like podman and ceph-ansible, which is not directly related to Tripleo indeed, but are key projects to Tripleo work properly. Also, look only in the latest release patches doesn’t seems to be too fair, if you check the previous release I have more than double of reviews (although yes, the number of commits remains stable), and probably if you get the Ussuri release, I will not have too much reviews or commits, since I’ve been on vacation mostly of the december. Also, and please, correct me if I am wrong, I don’t remember anytime that people ping me on IRC and I did not reply, or was prompt to help, if that happens, please accept my sincere apologies, as you know, when things are on fire (long time without promotions for example, like the last sprint I was ruck and rover) our focus is to make things get back to normal. One more time, I am taking your feedback, and I’ll do my best to improve in the areas you point, and hopefully change your mind regarding my core promotion. Kind regards, Arx Cruz On Fri, 10 Jan 2020 at 16:45 Emilien Macchi wrote: > +1 for Chandan; no doubt; he's always available on IRC to help when things > go wrong in gate or promotion, and very often he's proposing the fix. > Providing thoroughful reviews, and multi-project contributors, I've seen > Chandan involved not only in TripleO CI but also in other projects like RDO > and TripleO itself. I've seen him contributing to the tripleo-common and > tripleoclient projects; which make him someone capable to understand not > only how CI works but also how the project in general works. Having him > core is to me natural. > > Number of commits/reviews shows his interests in the CI repos: > > https://www.stackalytics.com/?user_id=chandankumar-093047&release=train&metric=marks > > https://www.stackalytics.com/?user_id=chandankumar-093047&release=train&metric=commits > > ---- > > I hate playing devil's advocate here but I'll give my honest (and > hopefully constructive) opinion. > I would like to see more involvement from Arx in the TripleO community. He > did a tremendous work on openstack-ansible-os_tempest; however this repo > isn't governed by TripleO CI group. I would like to see more reviews; where > he can bring his expertise; and not only in Gerrit but also on IRC when > things aren't going well (gate issues, promotion blockers, etc). > > Number of commits/reviews aren't low but IMHO can be better for a core > reviewer. > https://www.stackalytics.com/?user_id=arxcruz&release=train&metric=commits > https://www.stackalytics.com/?user_id=arxcruz&release=train&metric=marks > > I don't think it'll take time until Arx gets there but to me it's a -1 for > now, for what it's worth. > > Emilien > > On Fri, Jan 10, 2020 at 9:20 AM Ronelle Landy wrote: > >> Hello All, >> >> I'd like to propose Arx Cruz (arxcruz at redhat.com) and Chandan Kumar ( >> chkumar at redhat.com) as core on tripleo-ci repos (tripleo-ci, >> tripleo-quickstart, tripleo-quickstart-extras). >> >> In addition to the extensive work that Arx and Chandan have done on the >> Tempest-related repos ( and Tempest interface/settings within the Tripleo >> CI repos) , they have become active contributors to the core Tripleo CI >> repos, in general, in the past two years. >> >> Please vote by replying to this thread with +1 or -1 for any objections. >> We will close the vote 7 days from now. >> >> Thank you, >> Ronelle >> > > > -- > Emilien Macchi > -- Arx Cruz Software Engineer Red Hat EMEA arxcruz at redhat.com @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From emilien at redhat.com Sat Jan 11 16:06:36 2020 From: emilien at redhat.com (Emilien Macchi) Date: Sat, 11 Jan 2020 11:06:36 -0500 Subject: [tripleo][ci] Proposing Chandan Kumar and Arx Cruz as TripleO-CI cores In-Reply-To: References: Message-ID: Arx, First of all I want to repeat that it has nothing to do with the quality of your work. Again, I'm aware of what you've been working on and I appreciate what you have been doing with the CI team. The major issue that I'm dealing with as a major maintainer of TripleO is that over the past years we have promoted a lot of people to be core reviewers; but if you closely look at numbers: most of the reviews are done by 3 people; this is problematic when one of us is absent; and even more problematic if one of us one day leave. I have the feeling that promoting more core developers hasn't solved that problem; and there are few folks currently core that should not be core anymore IMO; because they don't review much and aren't much involved as "core maintainers". Being a core reviewer means you're an official maintainer. You maintain the code, wherever it is; if it's something that your direct peer wrote or something that $random_contributor wrote. Very often we have promoted cores who only review things from their direct peers and this has been problematic because 1) reviews are done by silos and 2) some parts of the project aren't reviewed at all. It has nothing to do with you but just to give you a bit of context on why I'm being more conservative now. You said that you have spent major time on things not under TripleO umbrella: please know that I'm aware of this, I'm watching it and I appreciate it. However we are talking about TripleO CI core which is under TripleO umbrella. Not Podman, not RDO CI etc. Which is why I went looking to Stackalytics to see numbers (even if I take them with a grain of salt). The core promotion is a decision that is taken as a group. My -1 doesn't mean you won't be core, it just means I had to provide some feedback on why I'm reluctant of you being core as of now. It doesn't mean I don't find your work valuable or that you're not helping on IRC; actually you're doing great. I just think that the bar is a bit higher compared to my taste and I don't think you're far from reaching it. Now, this is only my opinion and what it's worth. My hope is that 1) you continue to improve your involvement in TripleO and 2) our core reviewers do more reviews because it can't only be 3 persons who do more than 70% of the reviews. Have a great weekend, Emilien On Sat, Jan 11, 2020 at 9:39 AM Arx Cruz wrote: > Hello Emilien, > > Thanks for your feedback, I really appreciate it. > You are right, there are places that I really can improve, and I will work > to improve it, and I really looking forward to have your help. > > Regarding the amount of reviews and commits, it’s true that I haven’t be > so active on tripleo upstream projects, but please, remember that > stackalytics only reflect the projects under tripleo umbrella, and you know > that in tripleo-ci we also work on rdo side, where I’ve been working more > activelly, right now, working on integration with thirdy party projects > like podman and ceph-ansible, which is not directly related to Tripleo > indeed, but are key projects to Tripleo work properly. > > Also, look only in the latest release patches doesn’t seems to be too > fair, if you check the previous release I have more than double of reviews > (although yes, the number of commits remains stable), and probably if you > get the Ussuri release, I will not have too much reviews or commits, since > I’ve been on vacation mostly of the december. > > Also, and please, correct me if I am wrong, I don’t remember anytime that > people ping me on IRC and I did not reply, or was prompt to help, if that > happens, please accept my sincere apologies, as you know, when things are > on fire (long time without promotions for example, like the last sprint I > was ruck and rover) our focus is to make things get back to normal. > > One more time, I am taking your feedback, and I’ll do my best to improve > in the areas you point, and hopefully change your mind regarding my core > promotion. > > Kind regards, > Arx Cruz > > On Fri, 10 Jan 2020 at 16:45 Emilien Macchi wrote: > >> +1 for Chandan; no doubt; he's always available on IRC to help when >> things go wrong in gate or promotion, and very often he's proposing the fix. >> Providing thoroughful reviews, and multi-project contributors, I've seen >> Chandan involved not only in TripleO CI but also in other projects like RDO >> and TripleO itself. I've seen him contributing to the tripleo-common and >> tripleoclient projects; which make him someone capable to understand not >> only how CI works but also how the project in general works. Having him >> core is to me natural. >> >> Number of commits/reviews shows his interests in the CI repos: >> >> https://www.stackalytics.com/?user_id=chandankumar-093047&release=train&metric=marks >> >> https://www.stackalytics.com/?user_id=chandankumar-093047&release=train&metric=commits >> >> ---- >> >> I hate playing devil's advocate here but I'll give my honest (and >> hopefully constructive) opinion. >> I would like to see more involvement from Arx in the TripleO community. >> He did a tremendous work on openstack-ansible-os_tempest; however this repo >> isn't governed by TripleO CI group. I would like to see more reviews; where >> he can bring his expertise; and not only in Gerrit but also on IRC when >> things aren't going well (gate issues, promotion blockers, etc). >> >> Number of commits/reviews aren't low but IMHO can be better for a core >> reviewer. >> https://www.stackalytics.com/?user_id=arxcruz&release=train&metric=commits >> https://www.stackalytics.com/?user_id=arxcruz&release=train&metric=marks >> >> I don't think it'll take time until Arx gets there but to me it's a -1 >> for now, for what it's worth. >> >> Emilien >> >> On Fri, Jan 10, 2020 at 9:20 AM Ronelle Landy wrote: >> >>> Hello All, >>> >>> I'd like to propose Arx Cruz (arxcruz at redhat.com) and Chandan Kumar ( >>> chkumar at redhat.com) as core on tripleo-ci repos (tripleo-ci, >>> tripleo-quickstart, tripleo-quickstart-extras). >>> >>> In addition to the extensive work that Arx and Chandan have done on the >>> Tempest-related repos ( and Tempest interface/settings within the Tripleo >>> CI repos) , they have become active contributors to the core Tripleo CI >>> repos, in general, in the past two years. >>> >>> Please vote by replying to this thread with +1 or -1 for any objections. >>> We will close the vote 7 days from now. >>> >>> Thank you, >>> Ronelle >>> >> >> >> -- >> Emilien Macchi >> > -- > > Arx Cruz > > Software Engineer > > Red Hat EMEA > > arxcruz at redhat.com > @RedHat Red Hat > Red Hat > > > -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From emiller at genesishosting.com Sat Jan 11 17:44:44 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Sat, 11 Jan 2020 11:44:44 -0600 Subject: [magnum][kolla] etcd wal sync duration issue Message-ID: <046E9C0290DD9149B106B72FC9156BEA0477170E@gmsxchsvr01.thecreation.com> Hi, We are using the following coe cluster template and cluster create commands on an OpenStack Stein installation that installs Magnum 8.2.0 Kolla containers installed by Kolla-Ansible 8.0.1: openstack coe cluster template create \ --image Fedora-AtomicHost-29-20191126.0.x86_64_raw \ --keypair userkey \ --external-network ext-net \ --dns-nameserver 1.1.1.1 \ --master-flavor c5sd.4xlarge \ --flavor m5sd.4xlarge \ --coe kubernetes \ --network-driver flannel \ --volume-driver cinder \ --docker-storage-driver overlay2 \ --docker-volume-size 100 \ --registry-enabled \ --master-lb-enabled \ --floating-ip-disabled \ --fixed-network KubernetesProjectNetwork001 \ --fixed-subnet KubernetesProjectSubnet001 \ --labels kube_tag=v1.15.7,cloud_provider_tag=v1.15.0,heat_container_agent_tag=ste in-dev,master_lb_floating_ip_enabled=true \ k8s-cluster-template-1.15.7-production-private openstack coe cluster create \ --cluster-template k8s-cluster-template-1.15.7-production-private \ --keypair userkey \ --master-count 3 \ --node-count 3 \ k8s-cluster001 The deploy process works perfectly, however, the cluster health status flips between healthy and unhealthy. The unhealthy status indicates that etcd has an issue. When logged into master-0 (out of 3, as configured above), "systemctl status etcd" shows the stdout from etcd, which shows: Jan 11 17:27:36 k8s-cluster001-4effrc2irvjq-master-0.novalocal runc[2725]: 2020-01-11 17:27:36.548453 W | etcdserver: timed out waiting for read index response Jan 11 17:28:02 k8s-cluster001-4effrc2irvjq-master-0.novalocal runc[2725]: 2020-01-11 17:28:02.960977 W | wal: sync duration of 1.696804699s, expected less than 1s Jan 11 17:28:31 k8s-cluster001-4effrc2irvjq-master-0.novalocal runc[2725]: 2020-01-11 17:28:31.292753 W | wal: sync duration of 2.249722223s, expected less than 1s We also see: Jan 11 17:40:39 k8s-cluster001-4effrc2irvjq-master-0.novalocal runc[2725]: 2020-01-11 17:40:39.132459 I | etcdserver/api/v3rpc: grpc: Server.processUnaryRPC failed to write status: stream error: code = DeadlineExceeded desc = "context deadline exceeded" We initially used relatively small flavors, but increased these to something very large to be sure resources were not constrained in any way. "top" reported no CPU nor memory contention on any nodes in either case. Multiple clusters have been deployed, and they all have this issue, including empty clusters that were just deployed. I see a very large number of reports of similar issues with etcd, but discussions lead to disk performance, which can't be the cause here, not only because persistent storage for etcd isn't configured in Magnum, but also the disks are "very" fast in this environment. Looking at "vmstat -D" from within master-0, the number of writes is minimal. Ceilometer logs about 15 to 20 write IOPS for this VM in Gnocchi. Any ideas? We are finalizing procedures to upgrade to Train, so we wanted to be sure that we weren't running into some common issue with Stein that would immediately be solved with Train. If so, we will simply proceed with the upgrade and avoid diagnosing this issue further. Thanks! Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ssbarnea at redhat.com Sun Jan 12 10:02:55 2020 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Sun, 12 Jan 2020 10:02:55 +0000 Subject: [tripleo][ci] Proposing Sorin Barnea as TripleO-CI core In-Reply-To: References: Message-ID: <88ED8214-C7A3-44D7-B8C0-431C31DFC4F2@redhat.com> # Thanks marios and everyone for your support! I am looking forward to simplify tripleo, with extra focus on improving testing (with results in minutes, not hours) Cheers Sorin > On 10 Jan 2020, at 13:42, Marios Andreou wrote: > > I would like to propose Sorin Barnea (ssbarnea at redhat.com) as core on tripleo-ci repos (tripleo-ci, tripleo-quickstart, tripleo-quickstart-extras). > > Sorin has been a member of the tripleo-ci team for over one and a half years and has made many contributions across the tripleo-ci repos and beyond - highlights include helping the team to adopt molecule testing, leading linting efforts/changes/fixes and many others. > > Please vote by replying to this thread with +1 or -1 for any objections > > thanks > marios > From hongbin034 at gmail.com Sun Jan 12 16:44:46 2020 From: hongbin034 at gmail.com (Hongbin Lu) Date: Sun, 12 Jan 2020 11:44:46 -0500 Subject: [neutron] Bug deputy report - Jan 06 to 12 Message-ID: Hi, I was on bug deputy last week. Below is my summary of it. Critical: https://bugs.launchpad.net/neutron/+bug/1858645 https://bugs.launchpad.net/neutron/+bug/1858421 High: https://bugs.launchpad.net/neutron/+bug/1858661 https://bugs.launchpad.net/neutron/+bug/1858642 Medium: https://bugs.launchpad.net/neutron/+bug/1859163 https://bugs.launchpad.net/neutron/+bug/1858680 https://bugs.launchpad.net/neutron/+bug/1858419 Low: https://bugs.launchpad.net/neutron/+bug/1859258 https://bugs.launchpad.net/neutron/+bug/1859190 https://bugs.launchpad.net/neutron/+bug/1858783 RFE: https://bugs.launchpad.net/neutron/+bug/1858610 -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Sun Jan 12 18:20:27 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Sun, 12 Jan 2020 19:20:27 +0100 Subject: [all] DevStack jobs broken due to setuptools not available for Python 2 Message-ID: Hi all, I noticed DevStack jobs fail all over the place [1] due to: UnsupportedPythonVersion: Package 'setuptools' requires a different Python: 2.7.17 not in '>=3.5' Bug reported in [2]. Notice USE_PYTHON3=True does not help as stack.sh is hardcoded to versionless Python. [1] https://zuul.opendev.org/t/openstack/builds?result=RETRY_LIMIT&result=FAILURE [2] https://bugs.launchpad.net/devstack/+bug/1859350 -yoctozepto From sundar.nadathur at intel.com Sun Jan 12 21:41:16 2020 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Sun, 12 Jan 2020 21:41:16 +0000 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> Message-ID: Hi Arkady and all, Good discussions and questions. First, it is good to clarify what we mean by lifecycle management. It includes: * Discovery: We need to get more than just the PCI IDs/addresses of devices. We would need their properties and features as well. This is especially the case for programmable devices, as the properties and changes can change over time, though the PCI ID may not. * Scheduling: We would want to schedule the application that needs offload based on the properties/features discovered above. * Programming/configuration and/or firmware update. More on this later. * Health management: discover the health of a device, esp. if programming/configuration etc. fail. * Inventory management: Track the fleet of accelerators based on their properties/features. * Other aspects that I won't dwell on here. In short, lifecycle management is more than just firmware update. Secondly, regarding the difference between programming and firmware updates, some key questions are: 1. What does the device configuration do? A. Expose properties/features relevant to scheduling: Could be for a specific application or workload (e.g. apply a new AI algorithm) Or expose new/premium device features (e.g. enable more memory banks) B. Update general features not relevant to scheduling. E.g. fix a bug in BMC firmware. 2. When/how is the device configuration done? A. Dynamically: as instances are provisioned/retired, based on time of day, workload demand, etc. This would be part of OpenStack workflow. B. Statically: as part of the physical host configuration. This is typically done 'offline', perhaps in a maintenance window, often using external frameworks like Redfish/Ansible/Puppet/Chef/... The combination 1A+2A is what I'd call programming, while 1B+2B is firmware update. I don't see a motivation for 1B+2A. The case 1A+2B is interesting. It basically means that a programmable device is being treated like a fixed-function accelerator for a period of time before it gets reprogrammed offline. This model is being used in the industry today, esp. telcos. I am fine with calling this a 'firmware update' too. There are some grey areas to consider. For example, many FPGA deployments are structured to have a 'shell', which is hardware logic that exposes some generic features like PCI and DMA, and a separate user/custom logic that is application/workload-specific. Would updating the shell qualify as 'programming' or a 'firmware update'? Today, it often falls under 2B, esp. if it requires server reboots. But it could conceivably come under 1A+2A as products and use cases evolve. IOW, what is called a firmware update today could become a programming update tomorrow. Cyborg is designed for programming, i.e. 1A+2A. It can be used with Nova (to program devices as instances are provisioned/retired) or standalone (based on time of day, traffic patterns, etc.) Other cases (1A/1B + 2B) can be classified as firmware update and outside of Cyborg. TL;DR * Agree with Arkady that firmware updates should follow the server vendors' guidelines, and can/should be done as part of the server configuration. * If the claim is that firmware updates, as defined above (i.e. 1A/1B + 2B), should be done by Ironic, I am fine with it. * To reiterate, it is NOT enough to handle devices based only on their PCI IDs -- we should be able to use their features/properties for scheduling, inventory management, etc. This is extra true for programmable devices where features can change dynamically while PCI IDs potentially stay constant. * Cyborg is designed for these devices and its stated role includes all other aspects of lifecycle management. * I see value in having Cyborg and Ironic work together, esp. for 1A+2B, where Ironic can do the 'firmware update' and Cyborg discovers the schedulable properties of the device. > From: Arkady.Kanevsky at dell.com > Sent: Friday, January 3, 2020 1:19 PM > To: openstack-discuss at lists.openstack.org > Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management > 1. Application user need to program a portion of the device ... Sure. > 2. Administrator need to program the whole device for specific usage. That covers the scenario when device can only support single tenant or single use case. Why does it have to be single-tenant or single use-case? For example, one could program an FPGA with an Open Vswitch implementation, which is shared by VMs from different tenants. > That is done once during OpenStack deployment but may need reprogramming to configure device for different usage. If the change exposes workload-specific or schedulable properties, this would not necessarily be a one-shot thing at deployment time. > 3. Administrator need to setup device for its use, like burning specific FW on it. This is typically done as part of server life-cycle event. With the definition of firmware update as above, I agree. > The first 2 cases cover application life cycle of device usage. Yes. > The last one covers device life cycle independently how it is used. Here's where I beg to disagree. As I said, the term 'device lifecycle' is far broader than just firmware update. > Managing life cycle of devices is Ironic responsibility, Disagree here. To the best of my knowledge, Ironic handles devices based on PCI IDs. Cyborg is designed to go deeper for discovering device features/properties and utilize Placement for scheduling based on these. > One cannot and should not manage lifecycle of server components independently. If what you meant to say is: ' do not update device firmware independently of other server components', agreed. > Managing server devices outside server management violates customer service agreements with server vendors and breaks server support agreements. Sure. > Nova and Neutron are getting info about all devices and their capabilities from Ironic; that they use for scheduling Hmm, this seems overly broad to me: not every deployment includes Ironic, and getting PCI IDs is not enough for scheduling and management. > Finally we want Cyborg to be able to be used in standalone capacity, say for Kubernetes +1 > Thus, I propose that Cyborg cover use cases 1 & 2, and Ironic would cover use case 3 Use case 3 says "setup device for its use, like burning specific FW." With the definition of firmware above, I agree. Other aspects of lifecycle management, not covered by use cases 1 - 3, would come under Cyborg. > Thus, move all device Life-cycle code from Cyborg to Ironic To recap, there is more to device lifecycle than firmware update. I'd suggest the other aspects can remain in Cyborg. Regards, Sundar -------------- next part -------------- An HTML attachment was scrubbed... URL: From sundar.nadathur at intel.com Sun Jan 12 21:42:35 2020 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Sun, 12 Jan 2020 21:42:35 +0000 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> <7b55e3b28d644492a846fdb10f7b127b@AUSX13MPS308.AMER.DELL.COM> Message-ID: > From: Julia Kreger > Sent: Monday, January 6, 2020 1:33 PM > To: Arkady.Kanevsky at dell.com > Cc: Zhipeng Huang ; openstack-discuss discuss at lists.openstack.org> > Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators > management Hi Julia, Lots of good points here. > Greetings Arkady, > > I think your message makes a very good case and raises a point that I've been > trying to type out for the past hour, but with only different words. > > We have multiple USER driven interactions with a similarly desired, if not the > exact same desired end result where different paths can be taken, as we > perceive use cases from "As a user, I would like a VM with a configured > accelerator", "I would like any compute resource (VM or Baremetal), with a > configured accelerator", to "As an administrator, I need to reallocate a > baremetal node for this different use, so my user can leverage its accelerator > once they know how and are ready to use it.", and as suggested "I as a user > want baremetal with k8s and configured accelerators." > And I suspect this diversity of use patterns is where things begin to become > difficult. As such I believe, we in essence, have a question of a support or > compatibility matrix that definitely has gaps depending on "how" the "user" > wants or needs to achieve their goals. Yes, there are a wide variety of deployments and use cases. There may not be a single silver bullet solution for all of them. There may be different solutions, such as Ironic standalone, Ironic with Nova, and potentially some combination with Cyborg. > And, I think where this entire discussion _can_ go sideways is... > (from what I understand) some of these devices need to be flashed by the > application user with firmware on demand to meet the user's needs, which is > where lifecycle and support interactions begin to become... > conflicted. We are probably using different definitions of the term 'firmware.' As I said in another response in this thread, if a device configuration exposes application-specific features or schedulable features, then the term 'firmware update' may not be applicable IMHO, since it is going to be done dynamically as workloads spin up and retire. This is especially so given Arkady's stipulation that firmware updates are done as part of server configuration and as per server vendor's guidelines. > Further complicating matters is the "Metal to Tenant" use cases where the > user requesting the machine is not an administrator, but has some level of > inherent administrative access to all Operating System accessible devices once > their OS has booted. Which makes me wonder "What if the cloud > administrators WANT to block the tenant's direct ability to write/flash > firmware into accelerator/smartnic/etc?" Yes, admins may want to do that. This can be done (partly) via RBAC, by having different roles for tenants who can use devices but not reprogram them, and for tenants who can program the device with application/scheduling-relevant features (but not firmware), etc. > I suspect if cloud administrators want to block such hardware access, > vendors will want to support such a capability. Devices can and usually do offer separate mechanisms for reading from registers, writing to them, updating flash etc. each with associated access permissions. A device vendor can go a bit extra by requiring specific Linux capabilities, such as say CAP_IPC_LOCK for mmap access, in their device driver. > Blocking such access inherently forces some actions into hardware > management/maintenance workflows, and may ultimately may cause some of > a support matrix's use cases to be unsupportable, again ultimately depending > on what exactly the user is attempting to achieve. Not sure if you are expressing a concern here. If the admin is using device features or RBAC to restrict access, then she is intentionally blocking some combinations in your support matrix, right? Users in such a deployment need to live with that. > Is there any documentation at present that details the desired support and > use cases? I think this would at least help my understanding, since everything > that requires the power to be on would still need to be integrated with-in > workflows for eventual tighter integration. The Cyborg spec [1] addresses the Nova/VM-based use cases. [1] https://opendev.org/openstack/cyborg-specs/src/branch/master/specs/train/approved/cyborg-nova-placement.rst > Also, has Cyborg drafted any plans or proposals for integration? For Nova integration, we have a spec [2]. [2] https://review.opendev.org/#/c/684151/ > -Julia Regards, Sundar From Arkady.Kanevsky at dell.com Mon Jan 13 00:44:11 2020 From: Arkady.Kanevsky at dell.com (Arkady.Kanevsky at dell.com) Date: Mon, 13 Jan 2020 00:44:11 +0000 Subject: [all] Announcing OpenStack Victoria! In-Reply-To: <20200110215758.GB536693@sm-workstation> References: <20200110215758.GB536693@sm-workstation> Message-ID: <93fc266f5188442d983ce1caf261a18b@AUSX13MPS308.AMER.DELL.COM> Hurray to Victoria. -----Original Message----- From: Sean McGinnis Sent: Friday, January 10, 2020 3:58 PM To: openstack-discuss at lists.openstack.org Subject: [all] Announcing OpenStack Victoria! [EXTERNAL EMAIL] Hello everyone, The polling results are in, and the legal vetting process has now completed. We now have an official name for the "V" release. The full results of the poll can be found here: https://civs.cs.cornell.edu/cgi-bin/results.pl?num_winners=1&id=E_13ccd49b66cfd1b4&rkey=4e184724fa32eed6&algorithm=minimax While Victoria and Vancouver were technically a tie, based on the Minimax rankingi, it puts Victoria slightly ahead of Vancouver based on the votes. In addition to that, we chose to have the TC do a tie breaker vote which confirmed Victoria as the winner. Victoria is the capital city of British Columbia: https://en.wikipedia.org/wiki/Victoria,_British_Columbia Thank you all for participating in the release naming! Sean From iwienand at redhat.com Mon Jan 13 06:05:54 2020 From: iwienand at redhat.com (Ian Wienand) Date: Mon, 13 Jan 2020 17:05:54 +1100 Subject: [all] DevStack jobs broken due to setuptools not available for Python 2 In-Reply-To: References: Message-ID: <20200113060554.GA3819219@fedora19.localdomain> On Sun, Jan 12, 2020 at 07:20:27PM +0100, Radosław Piliszek wrote: > I noticed DevStack jobs fail all over the place [1] due to: > UnsupportedPythonVersion: Package 'setuptools' requires a different > Python: 2.7.17 not in '>=3.5' I think there's a wide variety of things going on here. Firstly, I think pip should be not be trying to install this ... you clearly felt the same thing and have filed [1] where it seems that it might be due to the wheels we create not specifying "data-requires-python" in our links to the wheel. This is the first I've heard of this ... we will need to look into this wrt to our wheel building and I have filed [2]. The plain "virtualenv" call that sets up the requirements virtualenv should be using Python 3 I think; proposed in [3]. This would avoid the issue by using python3 on master. The other places calling "virtualenv" appear to be related to TRACK_DEPENDS, which I think we can remove now to avoid further confusion. Proposed in [4] However, this leaves devstack-gate which is used by grenade. I *think* that [5] will work if the older branch of devstack also installs with python3. The short answer is, yes, this is a big mess :/ -i [1] https://github.com/pypa/pip/issues/7586#issuecomment-573460206 [2] https://storyboard.openstack.org/#!/story/2007084 [3] https://review.opendev.org/702162 [4] https://review.opendev.org/702163 [5] https://review.opendev.org/702126 From prash.ing.pucsd at gmail.com Mon Jan 13 07:30:03 2020 From: prash.ing.pucsd at gmail.com (prashant) Date: Mon, 13 Jan 2020 13:00:03 +0530 Subject: [horizon][stein]issue: Message-ID: when i m deleting multiple instances into the stein, could not able to see all remaining instances until i refresh instances list again. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Mon Jan 13 14:08:20 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 13 Jan 2020 08:08:20 -0600 Subject: [all] DevStack jobs broken due to setuptools not available for Python 2 In-Reply-To: <20200113060554.GA3819219@fedora19.localdomain> References: <20200113060554.GA3819219@fedora19.localdomain> Message-ID: <16f9f3be487.10f2e3b05395088.6540142001370012765@ghanshyammann.com> ---- On Mon, 13 Jan 2020 00:05:54 -0600 Ian Wienand wrote ---- > On Sun, Jan 12, 2020 at 07:20:27PM +0100, Radosław Piliszek wrote: > > I noticed DevStack jobs fail all over the place [1] due to: > > UnsupportedPythonVersion: Package 'setuptools' requires a different > > Python: 2.7.17 not in '>=3.5' > > I think there's a wide variety of things going on here. > > Firstly, I think pip should be not be trying to install this ... you > clearly felt the same thing and have filed [1] where it seems that it > might be due to the wheels we create not specifying > "data-requires-python" in our links to the wheel. This is the first > I've heard of this ... we will need to look into this wrt to our wheel > building and I have filed [2]. > > The plain "virtualenv" call that sets up the requirements virtualenv > should be using Python 3 I think; proposed in [3]. This would avoid > the issue by using python3 on master. > > The other places calling "virtualenv" appear to be related to > TRACK_DEPENDS, which I think we can remove now to avoid further > confusion. Proposed in [4] > > However, this leaves devstack-gate which is used by grenade. I > *think* that [5] will work if the older branch of devstack also > installs with python3. Yes, grenade master jobs use py3 in both (new and older devstack) which is expected testing behaviour. We avoided (or at least not done yet) any mixed (py2->py3) upgrade testing. -gmann > > The short answer is, yes, this is a big mess :/ > > -i > > [1] https://github.com/pypa/pip/issues/7586#issuecomment-573460206 > [2] https://storyboard.openstack.org/#!/story/2007084 > [3] https://review.opendev.org/702162 > [4] https://review.opendev.org/702163 > [5] https://review.opendev.org/702126 > > > From radoslaw.piliszek at gmail.com Mon Jan 13 14:16:58 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 13 Jan 2020 15:16:58 +0100 Subject: [all] DevStack jobs broken due to setuptools not available for Python 2 In-Reply-To: <16f9f3be487.10f2e3b05395088.6540142001370012765@ghanshyammann.com> References: <20200113060554.GA3819219@fedora19.localdomain> <16f9f3be487.10f2e3b05395088.6540142001370012765@ghanshyammann.com> Message-ID: It turned out to be real mess to fix devstack and devstack-gate and whatever else could be impacted. The current goal is to get rid of setuptools wheel so that it does not interfere in Zuul. As more packages go py3-only we will be forced to generate proper metadata in HTML indices... -yoctozepto pon., 13 sty 2020 o 15:08 Ghanshyam Mann napisał(a): > > ---- On Mon, 13 Jan 2020 00:05:54 -0600 Ian Wienand wrote ---- > > On Sun, Jan 12, 2020 at 07:20:27PM +0100, Radosław Piliszek wrote: > > > I noticed DevStack jobs fail all over the place [1] due to: > > > UnsupportedPythonVersion: Package 'setuptools' requires a different > > > Python: 2.7.17 not in '>=3.5' > > > > I think there's a wide variety of things going on here. > > > > Firstly, I think pip should be not be trying to install this ... you > > clearly felt the same thing and have filed [1] where it seems that it > > might be due to the wheels we create not specifying > > "data-requires-python" in our links to the wheel. This is the first > > I've heard of this ... we will need to look into this wrt to our wheel > > building and I have filed [2]. > > > > The plain "virtualenv" call that sets up the requirements virtualenv > > should be using Python 3 I think; proposed in [3]. This would avoid > > the issue by using python3 on master. > > > > The other places calling "virtualenv" appear to be related to > > TRACK_DEPENDS, which I think we can remove now to avoid further > > confusion. Proposed in [4] > > > > However, this leaves devstack-gate which is used by grenade. I > > *think* that [5] will work if the older branch of devstack also > > installs with python3. > > Yes, grenade master jobs use py3 in both (new and older devstack) which > is expected testing behaviour. We avoided (or at least not done yet) any mixed (py2->py3) > upgrade testing. > > > -gmann > > > > > The short answer is, yes, this is a big mess :/ > > > > -i > > > > [1] https://github.com/pypa/pip/issues/7586#issuecomment-573460206 > > [2] https://storyboard.openstack.org/#!/story/2007084 > > [3] https://review.opendev.org/702162 > > [4] https://review.opendev.org/702163 > > [5] https://review.opendev.org/702126 > > > > > > > From fungi at yuggoth.org Mon Jan 13 14:43:56 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 13 Jan 2020 14:43:56 +0000 Subject: [all] DevStack jobs broken due to setuptools not available for Python 2 In-Reply-To: References: <20200113060554.GA3819219@fedora19.localdomain> <16f9f3be487.10f2e3b05395088.6540142001370012765@ghanshyammann.com> Message-ID: <20200113144356.mj3ot2crequlcowc@yuggoth.org> On 2020-01-13 15:16:58 +0100 (+0100), Radosław Piliszek wrote: > It turned out to be real mess to fix devstack and devstack-gate > and whatever else could be impacted. The current goal is to get > rid of setuptools wheel so that it does not interfere in Zuul. As > more packages go py3-only we will be forced to generate proper > metadata in HTML indices... [...] Yes, pardon me as I'm still catching up. Based on what I've read so far, the PyPI simple API provides some additional information with its package indices indicating which Python releases are supported by a given package. Our wheel cache is just being served from Apache with mod_autoindex providing basic indexing of the files, so does not have that information to provide. We've discussed a number of possible solutions to the problem. One option is to (temporarily) stop using the pre-built wheel cache, but that presupposes that it doesn't do much for us in the first place. That cache is there to provide pre-built wheels for packages which otherwise take a *very* long time to build from sdist and for which there is on usable wheel on PyPI (at least for the platforms on which we're running our jobs). Another option is to stop unnecessarily copying wheels already available on PyPI into that cache. It's used as a sieve, so that jobs can pull wheels from it when they exist but will still fall back to PyPI for any the cache doesn't contain. I suspect that the majority of projects dropping compatibility with older Python releases publish usable wheels for our platforms on PyPI already, so their presence in our cache is redundant. We could remove the latest Setuptools release from the wheel-mirror volume as a short-term solution, but will need to temporarily stop further updates to the cache since the job which builds it would just put that file right back. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From dms at danplanet.com Mon Jan 13 15:16:30 2020 From: dms at danplanet.com (Dan Smith) Date: Mon, 13 Jan 2020 07:16:30 -0800 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: (Sundar Nadathur's message of "Sun, 12 Jan 2020 21:41:16 +0000") References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> Message-ID: > TL;DR > > * Agree with Arkady that firmware updates should follow the server > vendors' guidelines, and can/should be done as part of the server > configuration. I'm worried there's a little bit of confusion about "which nova" and "which ironic" in this case, especially since Arkady mentioned tripleo. More on that below. However, I agree that if you're using ironic to manage the nodes that form your actual (over)cloud, then having ironic update firmware on your accelerator device in the same way that it might update firmware on a regular NIC, GPU card, or anything else makes sense. However, if you're talking about services all at the same level (i.e. nova working with ironic to provide metal as a tenant as well as VMs) then *that* ironic is not going to be managing firmware on accelerators that you're handing to your VM instances on the compute nodes. >> Managing life cycle of devices is Ironic responsibility, > > Disagree here. Me too, but in a general sense. I would not agree with the assessment that "Managing life cycle of devices is Ironic responsibility." Specifically the wide scope of "devices" being more than just physical machines. It's true that Ironic manages the lifecycle of physical machines, which may be used in a tripleo type of environment to manage the lifecycle of things like compute nodes. I *think* you both agree with that clarification, because of the next point, but I think it's important to avoid such statements that imply "all devices." > To the best of my knowledge, Ironic handles devices based on PCI > IDs. Cyborg is designed to go deeper for discovering device > features/properties and utilize Placement for scheduling based on > these. What does this matter though? If you're talking about firmware for an FPGA card, that's what you need to know in order to apply the correct firmware to it, independent of whatever application-level bitstream is going to go in there right? >> One cannot and should not manage lifecycle of server components independently. > > If what you meant to say is: ' do not update device firmware > independently of other server components', agreed. I'm not really sure what this original point from Arkady really means. Are (either of) you saying that if there's a CVE for the firmware in some card that the firmware patch shouldn't be applied without taking the box through a full lifecycle event or something? AFAIK, Ironic can't just do this in isolation, which means that if you've got a compute node managed by ironic in a tripleo type of environment, you're looking to move workloads away from that node, destroy it, apply updates, and re-create it before you can use it again. I guess I'd be surprised if people are doing this every time intel releases another microcode update. Am I wrong about that? Either way, I'm not sure how the firmware for accelerator cards is any different from the firmware for other devices on the system. Maybe the confusion is just that Cyborg does "programming" which seems similar to "updating firmware"? >> Nova and Neutron are getting info about all devices and their >> capabilities from Ironic; that they use for scheduling > > Hmm, this seems overly broad to me: not every deployment includes > Ironic, and getting PCI IDs is not enough for scheduling and > management. I also don't think it's correct. Nova does not get info about devices from Ironic, and I kinda doubt Neutron does either. If Nova is using ironic to provide metal as tenants, then...sure, but in the case where nova is providing VMs with accelerator cards, Ironic is not involved. >> Thus, I propose that Cyborg cover use cases 1 & 2, and Ironic would cover use case 3 > > Use case 3 says "setup device for its use, like burning specific FW." > With the definition of firmware above, I agree. Other aspects of > lifecycle management, not covered by use cases 1 - 3, would come under > Cyborg. > >> Thus, move all device Life-cycle code from Cyborg to Ironic > > To recap, there is more to device lifecycle than firmware update. I'd > suggest the other aspects can remain in Cyborg. Didn't you say that firmware programming (as defined here) is not something that Cyborg currently does? Thus, nothing Cyborg currently does should be moved to Ironic, AFAICT. If that is true, then I agree. I guess my summary is: firmware updates for accelerators can and should be handled the same as for other devices on the system, in whatever way the operator currently does that. Programming an application-level bitstream should not be confused with the former activity, and is fully within the domain of Cyborg's responsibilities. --Dan From cboylan at sapwetik.org Mon Jan 13 15:40:14 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 13 Jan 2020 07:40:14 -0800 Subject: =?UTF-8?Q?Re:_[all]_DevStack_jobs_broken_due_to_setuptools_not_available?= =?UTF-8?Q?_for_Python_2?= In-Reply-To: <20200113144356.mj3ot2crequlcowc@yuggoth.org> References: <20200113060554.GA3819219@fedora19.localdomain> <16f9f3be487.10f2e3b05395088.6540142001370012765@ghanshyammann.com> <20200113144356.mj3ot2crequlcowc@yuggoth.org> Message-ID: <88fd3935-61c4-4ccc-a3ef-f4e72391dbff@www.fastmail.com> On Mon, Jan 13, 2020, at 6:43 AM, Jeremy Stanley wrote: > On 2020-01-13 15:16:58 +0100 (+0100), Radosław Piliszek wrote: > > It turned out to be real mess to fix devstack and devstack-gate > > and whatever else could be impacted. The current goal is to get > > rid of setuptools wheel so that it does not interfere in Zuul. As > > more packages go py3-only we will be forced to generate proper > > metadata in HTML indices... > [...] > > Yes, pardon me as I'm still catching up. Based on what I've read so > far, the PyPI simple API provides some additional information with > its package indices indicating which Python releases are supported > by a given package. Our wheel cache is just being served from Apache > with mod_autoindex providing basic indexing of the files, so does > not have that information to provide. > PyPI supplies this via the 'data-requires-python' element attributes in the html source of the index. You can see these viewing the source of eg https://pypi.org/simple/setuptools/. The problem with relying on this is not all packages supply this data (it is provided at package upload time iirc) and not all versions of pip support it. While we rely on mod_autoindex for the per package indexes (this is where data-requires-python goes) we do generate the top level index manually via https://opendev.org/openstack/project-config/src/branch/master/roles/copy-wheels/files/wheel-index.sh. It is possible this script could be extended to write index files for each package as well. If we can determine what python versions are required we could then include that info. However, as mentioned above this won't fix all cases. I think the simplest option would be to simply stop building and mirroring wheels for packages which already have wheels on pypi. Let the wheel mirror be a true fallback for slow building packages like lxml and libvirt-python. Note that setuptools is a bit of an exception here because it is the bootstrap module. With setuptools in place we can control python specific package version selections using environment markers. This is already something we do a fair bit, https://opendev.org/openstack/requirements/src/branch/master/global-requirements.txt#L47-L48, which means we have the tooling in place to manage it. Clark From fungi at yuggoth.org Mon Jan 13 15:41:44 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 13 Jan 2020 15:41:44 +0000 Subject: [all] DevStack jobs broken due to setuptools not available for Python 2 In-Reply-To: <20200113144356.mj3ot2crequlcowc@yuggoth.org> References: <20200113060554.GA3819219@fedora19.localdomain> <16f9f3be487.10f2e3b05395088.6540142001370012765@ghanshyammann.com> <20200113144356.mj3ot2crequlcowc@yuggoth.org> Message-ID: <20200113154144.os44mrihcotvwt3r@yuggoth.org> On 2020-01-13 14:43:56 +0000 (+0000), Jeremy Stanley wrote: > On 2020-01-13 15:16:58 +0100 (+0100), Radosław Piliszek wrote: > > It turned out to be real mess to fix devstack and devstack-gate > > and whatever else could be impacted. The current goal is to get > > rid of setuptools wheel so that it does not interfere in Zuul. As > > more packages go py3-only we will be forced to generate proper > > metadata in HTML indices... > [...] > > Yes, pardon me as I'm still catching up. Based on what I've read so > far, the PyPI simple API provides some additional information with > its package indices indicating which Python releases are supported > by a given package. Our wheel cache is just being served from Apache > with mod_autoindex providing basic indexing of the files, so does > not have that information to provide. > > We've discussed a number of possible solutions to the problem. One > option is to (temporarily) stop using the pre-built wheel cache, but > that presupposes that it doesn't do much for us in the first place. > That cache is there to provide pre-built wheels for packages which > otherwise take a *very* long time to build from sdist and for which > there is on usable wheel on PyPI (at least for the platforms on > which we're running our jobs). This was proposed via https://review.opendev.org/702166 for the record. > Another option is to stop unnecessarily copying wheels already > available on PyPI into that cache. It's used as a sieve, so that > jobs can pull wheels from it when they exist but will still fall > back to PyPI for any the cache doesn't contain. I suspect that the > majority of projects dropping compatibility with older Python > releases publish usable wheels for our platforms on PyPI already, so > their presence in our cache is redundant. We could remove the latest > Setuptools release from the wheel-mirror volume as a short-term > solution, but will need to temporarily stop further updates to the > cache since the job which builds it would just put that file right > back. I've proposed https://review.opendev.org/702244 for a smaller-scale version of this now (explicitly blacklisting wheels for pip, setuptools and virtualenv) but we can expand it to something more thorough once it's put through its paces. If we merge that, then we can manually delete the affected setuptools wheel from our wheel-mirror volume and not have to worry about it coming back. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From rosmaita.fossdev at gmail.com Mon Jan 13 15:42:09 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Mon, 13 Jan 2020 10:42:09 -0500 Subject: [cinder] ussuri virtual mid-cycle next week Message-ID: <53eacc20-8762-2ef2-f115-8732b8f1827a@gmail.com> The poll results are in. There was only one time when everyone can meet (apologies to the "if need be" people, but the need be). Session One of the Cinder Ussuri virtual mid-cycle will be held: DATE: 21 JANUARY 2020 TIME: 1300-1500 UTC LOCATION: https://bluejeans.com/3228528973 The meeting will be recorded. Please add topics to the planning etherpad: https://etherpad.openstack.org/p/cinder-ussuri-mid-cycle-planning cheers, brian From fungi at yuggoth.org Mon Jan 13 16:53:41 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 13 Jan 2020 16:53:41 +0000 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> Message-ID: <20200113165340.ge3hitlqrdfhj52m@yuggoth.org> On 2020-01-13 07:16:30 -0800 (-0800), Dan Smith wrote: [...] > What does this matter though? If you're talking about firmware for an > FPGA card, that's what you need to know in order to apply the correct > firmware to it, independent of whatever application-level bitstream is > going to go in there right? [...] > Either way, I'm not sure how the firmware for accelerator cards is any > different from the firmware for other devices on the system. Maybe the > confusion is just that Cyborg does "programming" which seems similar to > "updating firmware"? [...] FPGA configuration is a compiled binary blob written into non-volatile memory through a hardware interface. These similarities to firmware also result in many people actually calling it "firmware" even though, you're right, technically it's a mapping of gate interconnections and not really firmware in the conventional sense. In retrospect maybe I shouldn't have brought it up. I wouldn't be surprised, though, if there *are* NFV-related cases where the users of the virtual machines into which some network hardware is mapped need access to alter parts of, say, an interface controller's firmware. The Linux kernel has for years incorporated features to write or rewrite firmware and other microcode for certain devices at boot time for similar reasons, after all. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Mon Jan 13 17:32:20 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 13 Jan 2020 17:32:20 +0000 Subject: [all] DevStack jobs broken due to setuptools not available for Python 2 In-Reply-To: <20200113154144.os44mrihcotvwt3r@yuggoth.org> References: <20200113060554.GA3819219@fedora19.localdomain> <16f9f3be487.10f2e3b05395088.6540142001370012765@ghanshyammann.com> <20200113144356.mj3ot2crequlcowc@yuggoth.org> <20200113154144.os44mrihcotvwt3r@yuggoth.org> Message-ID: <20200113173220.sdryzodzxvhxhvqc@yuggoth.org> On 2020-01-13 15:41:44 +0000 (+0000), Jeremy Stanley wrote: [...] > I've proposed https://review.opendev.org/702244 for a smaller-scale > version of this now (explicitly blacklisting wheels for pip, > setuptools and virtualenv) but we can expand it to something more > thorough once it's put through its paces. If we merge that, then we > can manually delete the affected setuptools wheel from our > wheel-mirror volume and not have to worry about it coming back. This has since merged, and as of 16:30 UTC (roughly an hour ago) I deleted all copies of the setuptools-45.0.0-py2.py3-none-any.whl file from our AFS volumes. We're testing now to see if previously broken jobs are working again, but suspect things should be back to normal. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From colleen at gazlene.net Mon Jan 13 17:38:06 2020 From: colleen at gazlene.net (Colleen Murphy) Date: Mon, 13 Jan 2020 09:38:06 -0800 Subject: [ops] Federated Identity Management survey In-Reply-To: <4a7a0c41-59ce-4aac-839e-0840eeb50348@www.fastmail.com> References: <4a7a0c41-59ce-4aac-839e-0840eeb50348@www.fastmail.com> Message-ID: <2116da33-6d85-4132-94e5-68bcea0c8385@www.fastmail.com> On Mon, Dec 23, 2019, at 09:32, Colleen Murphy wrote: > Hello operators, > > A researcher from the University of Kent who was influential in the > design of keystone's federation implementation has asked the keystone > team to gauge adoption of federated identity management in OpenStack > deployments. This is something we've neglected to track well in the > last few OpenStack user surveys, so I'd like to ask OpenStack operators > to please take a few minutes to complete the following survey about > your usage of identity federation in your OpenStack deployment (even if > you don't use federation): > > https://uok.typeform.com/to/KuRY0q > > The results of this survey will benefit not only university research > but also the keystone team as it will help us understand where to focus > our efforts. Your participation is greatly appreciated. > > Thanks for your time, > > Colleen (cmurphy) > > Thanks to everyone who has completed this survey so far! The survey will be closing in about a week, so if you have not yet completed it, we appreciate you taking the time to do so now. Colleen (cmurphy) From dms at danplanet.com Mon Jan 13 17:58:00 2020 From: dms at danplanet.com (Dan Smith) Date: Mon, 13 Jan 2020 09:58:00 -0800 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: <20200113165340.ge3hitlqrdfhj52m@yuggoth.org> (Jeremy Stanley's message of "Mon, 13 Jan 2020 16:53:41 +0000") References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> <20200113165340.ge3hitlqrdfhj52m@yuggoth.org> Message-ID: > FPGA configuration is a compiled binary blob written into > non-volatile memory through a hardware interface. These similarities > to firmware also result in many people actually calling it > "firmware" even though, you're right, technically it's a mapping of > gate interconnections and not really firmware in the conventional > sense. In retrospect maybe I shouldn't have brought it up. It's a super easy thing to conflate those two topics I think. Probably calling one the "firmware" and the other the "bitstream" is the most common distinction I've heard. The latter also potentially being the "application" or "function." > I wouldn't be surprised, though, if there *are* NFV-related cases > where the users of the virtual machines into which some network > hardware is mapped need access to alter parts of, say, an interface > controller's firmware. The Linux kernel has for years incorporated > features to write or rewrite firmware and other microcode for > certain devices at boot time for similar reasons, after all. Yeah, I'm not sure because I don't have a lot of experience with these devices. I guess I kinda expected that they have effectively two devices on each card: one being the FPGA itself and the other being just a management device that lets you flash the FPGA. If the FPGA is connected to the bus as well, I'd expect it to be able to define its own interaction (i.e. be like a NIC or be like a compression accelerator), and the actual "firmware" being purely a function of the management device. Either way, I think my point is that ironic's ability to manage the firmware part regardless of how often you need it to change is limited (currently, AFAIK) to the cleaning/prep phase of the lifecycle, and only really applies anyway if a compute node when it is a workload on top of the undercloud. For people that don't use ironic to provision their compute nodes, ironic wouldn't even have the opportunity to manage the firmware of those devices. I'm not saying Cyborg should fill the firmware gap, just not saying we should expect that Ironic will. --Dan From sundar.nadathur at intel.com Mon Jan 13 18:16:20 2020 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Mon, 13 Jan 2020 18:16:20 +0000 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> Message-ID: > From: Dan Smith > Sent: Monday, January 13, 2020 7:17 AM > To: Nadathur, Sundar > Cc: Arkady.Kanevsky at dell.com; openstack-discuss at lists.openstack.org > Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators > management > > > TL;DR > > > > * Agree with Arkady that firmware updates should follow the server > > vendors' guidelines, and can/should be done as part of the server > > configuration. > > I'm worried there's a little bit of confusion about "which nova" and "which > ironic" in this case, especially since Arkady mentioned tripleo. More on that > below. However, I agree that if you're using ironic to manage the nodes that > form your actual (over)cloud, then having ironic update firmware on your > accelerator device in the same way that it might update firmware on a regular > NIC, GPU card, or anything else makes sense. > > However, if you're talking about services all at the same level (i.e. nova > working with ironic to provide metal as a tenant as well as > VMs) then *that* ironic is not going to be managing firmware on accelerators > that you're handing to your VM instances on the compute nodes. This goes back to the definition of firmware update vs. programming in my earlier post. In a Nova + Ironic + Cyborg env, I'd expect Cyborg to do programming. Firmware updates can be done by Ironic, Ansible/Redfish/... , some combination like Ironic with Redfish driver, or whatever the operator chooses. > > To the best of my knowledge, Ironic handles devices based on PCI IDs. > > Cyborg is designed to go deeper for discovering device > > features/properties and utilize Placement for scheduling based on > > these. > > What does this matter though? If you're talking about firmware for an FPGA > card, that's what you need to know in order to apply the correct firmware to > it, independent of whatever application-level bitstream is going to go in there > right? The device properties are needed for scheduling: users are often interested in getting a VM with an accelerator that has specific properties: e.g. implements a specific version of gzip, has 4 GB or more of device-local memory etc. Device properties are also needed for management of accelerator inventory: admins want to know how many FPGAs have a particular bitstream burnt into them, etc. Re. programming, sometimes we may need to determine what's in a device (beyond PCI ID) before programming it to ensure the image being programmed and the existing device contents are compatible. > >> One cannot and should not manage lifecycle of server components > independently. > > > > If what you meant to say is: ' do not update device firmware > > independently of other server components', agreed. > > I'm not really sure what this original point from Arkady really means. Are > (either of) you saying that if there's a CVE for the firmware in some card that > the firmware patch shouldn't be applied without taking the box through a full > lifecycle event or something? My paraphrase of Arkady's points: a. Updating CPU firmware/microcode should be done as per the server/CPU vendor's rules (use their specific tools, or some specific mechanisms like Redfish, with auditing, ....) b. Updating firmware for devices/accelerators should be done the same way. By a "full lifecycle event", you presumably mean vacating the entire node. For device updates, that is not always needed: one could disconnect just the instances using that device. The server/device vendor rules must specify the 'lifecycle event' involved for a specific update. > AFAIK, Ironic can't just do this in isolation, which > means that if you've got a compute node managed by ironic in a tripleo type > of environment, you're looking to move workloads away from that node, > destroy it, apply updates, and re-create it before you can use it again. I guess > I'd be surprised if people are doing this every time intel releases another > microcode update. Am I wrong about that? Not making any official statements but, generally, if a microcode/firmware update requires a reboot, one would have to do that. The admin would declare a maintenance window and combine software/firmware/configuration updates in that window. > Either way, I'm not sure how the firmware for accelerator cards is any > different from the firmware for other devices on the system. Updates of other devices, like CPU or motherboard components, often require server reboots. Accelerator updates may or may not require them, depending on ... all kinds of things. > Maybe the confusion is just that Cyborg does "programming" which seems similar to > "updating firmware"? Yes, indeed. That is why I went at length on the distinction between the two. > >> Nova and Neutron are getting info about all devices and their > >> capabilities from Ironic; that they use for scheduling > > > > Hmm, this seems overly broad to me: not every deployment includes > > Ironic, and getting PCI IDs is not enough for scheduling and > > management. > > I also don't think it's correct. Nova does not get info about devices from > Ironic, and I kinda doubt Neutron does either. If Nova is using ironic to > provide metal as tenants, then...sure, but in the case where nova is providing > VMs with accelerator cards, Ironic is not involved. +1 > >> Thus, move all device Life-cycle code from Cyborg to Ironic > > > > To recap, there is more to device lifecycle than firmware update. I'd > > suggest the other aspects can remain in Cyborg. > > Didn't you say that firmware programming (as defined here) is not something > that Cyborg currently does? Thus, nothing Cyborg currently does should be > moved to Ironic, AFAICT. If that is true, then I agree. Yes ^. > I guess my summary is: firmware updates for accelerators can and should be > handled the same as for other devices on the system, in whatever way the > operator currently does that. Programming an application-level bitstream > should not be confused with the former activity, and is fully within the > domain of Cyborg's responsibilities. Agreed. > --Dan Regards, Sundar From sundar.nadathur at intel.com Mon Jan 13 18:26:23 2020 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Mon, 13 Jan 2020 18:26:23 +0000 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: <20200113165340.ge3hitlqrdfhj52m@yuggoth.org> References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> <20200113165340.ge3hitlqrdfhj52m@yuggoth.org> Message-ID: > -----Original Message----- > From: Jeremy Stanley > Sent: Monday, January 13, 2020 8:54 AM > To: openstack-discuss at lists.openstack.org > Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators > management > > On 2020-01-13 07:16:30 -0800 (-0800), Dan Smith wrote: > [...] > > What does this matter though? If you're talking about firmware for an > > FPGA card, that's what you need to know in order to apply the correct > > firmware to it, independent of whatever application-level bitstream is > > going to go in there right? > [...] > > Either way, I'm not sure how the firmware for accelerator cards is any > > different from the firmware for other devices on the system. Maybe the > > confusion is just that Cyborg does "programming" which seems similar > > to "updating firmware"? > [...] > > FPGA configuration is a compiled binary blob written into non-volatile > memory through a hardware interface. These similarities to firmware also > result in many people actually calling it "firmware" even though, you're right, > technically it's a mapping of gate interconnections and not really firmware in > the conventional sense. +1 > I wouldn't be surprised, though, if there *are* NFV-related cases where the > users of the virtual machines into which some network hardware is mapped > need access to alter parts of, say, an interface controller's firmware. The Linux > kernel has for years incorporated features to write or rewrite firmware and > other microcode for certain devices at boot time for similar reasons, after all. This aspect does come up for discussion a lot. Generally, operators and device vendors get alarmed at the prospect of letting a user/VNF/instance program an image/bitstream into a device directly -- we wouldn't know what image it is, etc. Cyborg doesn't support that. But Cyborg could program an image/bitstream on behalf of the user/VNF. That said, the VNF or VM (in a non-networking context) can configure a device by reading from registers/DDR on the card or writing to them. They can be handled using standard access permissions, Linux capabilities, etc. For example, the VM may memory-map a region of the device's address space using the mmap system call, and that access can be controlled. > -- > Jeremy Stanley Regards, Sundar From smooney at redhat.com Mon Jan 13 18:53:00 2020 From: smooney at redhat.com (Sean Mooney) Date: Mon, 13 Jan 2020 18:53:00 +0000 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> <20200113165340.ge3hitlqrdfhj52m@yuggoth.org> Message-ID: On Mon, 2020-01-13 at 18:26 +0000, Nadathur, Sundar wrote: > > -----Original Message----- > > From: Jeremy Stanley > > Sent: Monday, January 13, 2020 8:54 AM > > To: openstack-discuss at lists.openstack.org > > Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators > > management > > > > On 2020-01-13 07:16:30 -0800 (-0800), Dan Smith wrote: > > [...] > > > What does this matter though? If you're talking about firmware for an > > > FPGA card, that's what you need to know in order to apply the correct > > > firmware to it, independent of whatever application-level bitstream is > > > going to go in there right? > > > > [...] > > > Either way, I'm not sure how the firmware for accelerator cards is any > > > different from the firmware for other devices on the system. Maybe the > > > confusion is just that Cyborg does "programming" which seems similar > > > to "updating firmware"? > > > > [...] > > > > FPGA configuration is a compiled binary blob written into non-volatile > > memory through a hardware interface. These similarities to firmware also > > result in many people actually calling it "firmware" even though, you're right, > > technically it's a mapping of gate interconnections and not really firmware in > > the conventional sense. > > +1 > > > I wouldn't be surprised, though, if there *are* NFV-related cases where the > > users of the virtual machines into which some network hardware is mapped > > need access to alter parts of, say, an interface controller's firmware. The Linux > > kernel has for years incorporated features to write or rewrite firmware and > > other microcode for certain devices at boot time for similar reasons, after all. > > This aspect does come up for discussion a lot. Generally, operators and device vendors get alarmed at the prospect of > letting a user/VNF/instance program an image/bitstream into a device directly -- we wouldn't know what image it is, > etc. Cyborg doesn't support that. But Cyborg could program an image/bitstream on behalf of the user/VNF. to be fair if you device support reprogramming over pcie then you can enable the guest to reprogram the device using nova's pci passthough feature by passing through the entire pf. cyborgs role is to provide a magaged acclerator not an unmanaged one. if we wanted to use use pre programed fpga or fix function acclerator you have been able to do that with pci passtough for the better part of 4 years. so i would consider unmanaged acclerator out of scope of cyborg at least until the integration of managed accllerator is done. nova already handelds vGPU, vPMEM(persistent memeory), generic pci passthough, sriov for neutron ports and hardware offloaded ovs VF(e.g. smart nic integration). cyborgs add value is in managing things nova cannot provide easily. arguing that ironic shoudl mangage fpga bitstream becasue it can manage firmware from a nova point of view is arguaing the virt driver should manage all devices that are provide to the guest meaning in the libvirt case it and not cyborg shoudl be continuted to be extended to mange fpgas and any other devices directly. we coudl do that but that would leave only one thing for cyborge to manage which woudl be remote acclartor that could be proved to instnace over a network fabric. making it a kind of cinder of acclerators. that is a usecase that nova and ironic both woudl be ill sutied for but it is not the dirction the cyborg project has moved in so unless you are suggesing cyborg should piviot i dont think we should redesign the interaction between nova ironic cyborg and neutron to have ironci manage the devices. i do think there is merrit in some integration between the ironic python agent and cyborg for discovery and perhaps programing of the fpga on an ironic node assuming the actual discovery and programing logic live in cyborg and ironic simply runs/deploys/configures the cyborg agent in the ipa image or invokes the cyborg code directly. > > That said, the VNF or VM (in a non-networking context) can configure a device by reading from registers/DDR on the > card or writing to them. They can be handled using standard access permissions, Linux capabilities, etc. For example, > the VM may memory-map a region of the device's address space using the mmap system call, and that access can be > controlled. > > > -- > > Jeremy Stanley > > Regards, > Sundar > From dms at danplanet.com Mon Jan 13 19:15:23 2020 From: dms at danplanet.com (Dan Smith) Date: Mon, 13 Jan 2020 11:15:23 -0800 Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management In-Reply-To: (Sundar Nadathur's message of "Mon, 13 Jan 2020 18:16:20 +0000") References: <87344b65740d47fc9777ae710dcf4af9@AUSX13MPS308.AMER.DELL.COM> Message-ID: > This goes back to the definition of firmware update vs. programming in > my earlier post. In a Nova + Ironic + Cyborg env, I'd expect Cyborg to > do programming. Firmware updates can be done by Ironic, > Ansible/Redfish/... , some combination like Ironic with Redfish > driver, or whatever the operator chooses. Yes, this is my point. I think we're in agreement here. >> What does this matter though? If you're talking about firmware for an FPGA >> card, that's what you need to know in order to apply the correct firmware to >> it, independent of whatever application-level bitstream is going to go in there >> right? > > The device properties are needed for scheduling: users are often > interested in getting a VM with an accelerator that has specific > properties: e.g. implements a specific version of gzip, has 4 GB or > more of device-local memory etc. Right, I'm saying I don't think Ironic needs to know anything other than the PCI ID of a card in order to update its firmware, correct? You and I are definitely in agreement that Ironic should have nothing to do with _programming_ and thus nothing to do with _scheduling_ of workloads (affined-) to accelerators. > By a "full lifecycle event", you presumably mean vacating the entire > node. For device updates, that is not always needed: one could > disconnect just the instances using that device. The server/device > vendor rules must specify the 'lifecycle event' involved for a > specific update. Right, I'm saying that today (AFAIK) Ironic can only do the "vacate, destroy, clean, re-image" sort of lifecycle, which is very heavyweight to just update firmware on a card. > Updates of other devices, like CPU or motherboard components, often > require server reboots. Accelerator updates may or may not require > them, depending on ... all kinds of things. Yep, all of this is lighter-weight than Ironic destroying, cleaning, and re-imaging a node. I'm making the case for "sure, Ironic could do the firmware update if it's cleaning a node, but in most cases you probably want a more lightweight process like ansible and a reboot." So again, I think we're in full agreement on the classification of operation, and the subset of that which is wholly owned by Cyborg, as well as what of that *may* be owned by Ironic or any other hardware management tool. --Dan From feilong at catalyst.net.nz Mon Jan 13 19:03:57 2020 From: feilong at catalyst.net.nz (feilong) Date: Tue, 14 Jan 2020 08:03:57 +1300 Subject: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken In-Reply-To: References: <59ed63745f4e4c42a63692c3ee4eb10d@ncwmexgp031.CORP.CHARTERCOM.com> <6c8f45f2-da74-18fd-7909-84c9c6762fe3@catalyst.net.nz> Message-ID: <24a8d164-8e38-1512-caf3-9447f070b8fd@catalyst.net.nz> Hi Bhupathi, Firstly, I would suggest setting the use_podman=False when using fedora atomic image. And it would be nice to set the "kube_tag", e.g. v1.15.6 explicitly. Then please trigger a new cluster creation. Then if you still run into error. Here is the debug steps: 1. ssh into the master node, check log /var/log/cloud-init-output.log 2. If there is no error in above log file, then run journalctl -u heat-container-agent to check the heat-container-agent log. If above step is correct, then you must be able to see something useful here. On 11/01/20 12:15 AM, Bhupathi, Ramakrishna wrote: > > Wang, > > Here it is  . I added the labels subsequently. My nova and neutron are > working all right as I installed various systems there working with no > issues.. > >   > >   > > *From:*Feilong Wang [mailto:feilong at catalyst.net.nz] > *Sent:* Thursday, January 9, 2020 6:12 PM > *To:* openstack-discuss at lists.openstack.org > *Subject:* Re: [magnum]: K8s cluster creation times out. OpenStack > Train : [ERROR]: Unable to render networking. Network config is likely > broken > >   > > Hi Bhupathi, > > Could you please share your cluster template? And please make sure > your Nova/Neutron works. > >   > > On 10/01/20 2:45 AM, Bhupathi, Ramakrishna wrote: > > Folks, > > I am building a Kubernetes Cluster( Openstack Train) and using > fedora atomic-29 image . The nodes come up  fine ( I have a simple > 1 master and 1 node) , but the cluster creation times out,  and > when I access the cloud-init logs I see this error .  Wondering > what I am missing as this used to work before.  I wonder if this > is image related . > >   > > [ERROR]: Unable to render networking. Network config is likely > broken: No available network renderers found. Searched through > list: ['eni', 'sysconfig', 'netplan'] > >   > > Essentially the stack creation fails in “kube_cluster_deploy” > >   > > Can somebody help me debug this ? Any help is appreciated. > >   > > --RamaK > > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. > > -- > Cheers & Best regards, > Feilong Wang (王飞龙) > Head of R&D > Catalyst Cloud - Cloud Native New Zealand > -------------------------------------------------------------------------- > Tel: +64-48032246 > Email: flwang at catalyst.net.nz > Level 6, Catalyst House, 150 Willis Street, Wellington > -------------------------------------------------------------------------- > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. -- Cheers & Best regards, Feilong Wang (王飞龙) ------------------------------------------------------ Senior Cloud Software Engineer Tel: +64-48032246 Email: flwang at catalyst.net.nz Catalyst IT Limited Level 6, Catalyst House, 150 Willis Street, Wellington ------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 55224 bytes Desc: not available URL: From pierre at stackhpc.com Mon Jan 13 20:20:50 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Mon, 13 Jan 2020 20:20:50 +0000 Subject: [blazar] No IRC meeting tomorrow Message-ID: Hello, As announced in the last meeting, due to travel the weekly Blazar IRC meeting of January 14 is cancelled. The next meeting will be held on Jan 21. Thanks, Pierre Riteau (priteau) From feilong at catalyst.net.nz Mon Jan 13 21:10:29 2020 From: feilong at catalyst.net.nz (Feilong Wang) Date: Tue, 14 Jan 2020 10:10:29 +1300 Subject: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken In-Reply-To: References: <59ed63745f4e4c42a63692c3ee4eb10d@ncwmexgp031.CORP.CHARTERCOM.com> <6c8f45f2-da74-18fd-7909-84c9c6762fe3@catalyst.net.nz> <24a8d164-8e38-1512-caf3-9447f070b8fd@catalyst.net.nz> Message-ID: Hi Donny, Do you mean Fedore CoreOS or just CoreOS? The current CoreOS driver is not actively maintained, I would suggest migrating to Fedora CoreOS and I'm happy to help if you have any question. Thanks. On 14/01/20 9:57 AM, Donny Davis wrote: > FWIW I was only able to get the coreos image working with magnum oob.. > the rest just didn't work.  > > On Mon, Jan 13, 2020 at 2:31 PM feilong > wrote: > > Hi Bhupathi, > > Firstly, I would suggest setting the use_podman=False when using > fedora atomic image. And it would be nice to set the "kube_tag", > e.g. v1.15.6 explicitly. Then please trigger a new cluster > creation. Then if you still run into error. Here is the debug steps: > > 1. ssh into the master node, check log /var/log/cloud-init-output.log > > 2. If there is no error in above log file, then run journalctl -u > heat-container-agent to check the heat-container-agent log. If > above step is correct, then you must be able to see something > useful here. > > > On 11/01/20 12:15 AM, Bhupathi, Ramakrishna wrote: >> >> Wang, >> >> Here it is  . I added the labels subsequently. My nova and >> neutron are working all right as I installed various systems >> there working with no issues.. >> >>   >> >>   >> >> *From:*Feilong Wang [mailto:feilong at catalyst.net.nz] >> *Sent:* Thursday, January 9, 2020 6:12 PM >> *To:* openstack-discuss at lists.openstack.org >> >> *Subject:* Re: [magnum]: K8s cluster creation times out. >> OpenStack Train : [ERROR]: Unable to render networking. Network >> config is likely broken >> >>   >> >> Hi Bhupathi, >> >> Could you please share your cluster template? And please make >> sure your Nova/Neutron works. >> >>   >> >> On 10/01/20 2:45 AM, Bhupathi, Ramakrishna wrote: >> >> Folks, >> >> I am building a Kubernetes Cluster( Openstack Train) and >> using fedora atomic-29 image . The nodes come up  fine ( I >> have a simple 1 master and 1 node) , but the cluster creation >> times out,  and when I access the cloud-init logs I see this >> error .  Wondering what I am missing as this used to work >> before.  I wonder if this is image related . >> >>   >> >> [ERROR]: Unable to render networking. Network config is >> likely broken: No available network renderers found. Searched >> through list: ['eni', 'sysconfig', 'netplan'] >> >>   >> >> Essentially the stack creation fails in “kube_cluster_deploy” >> >>   >> >> Can somebody help me debug this ? Any help is appreciated. >> >>   >> >> --RamaK >> >> The contents of this e-mail message and >> any attachments are intended solely for the >> addressee(s) and may contain confidential >> and/or legally privileged information. If you >> are not the intended recipient of this message >> or if this message has been addressed to you >> in error, please immediately alert the sender >> by reply e-mail and then delete this message >> and any attachments. If you are not the >> intended recipient, you are notified that >> any use, dissemination, distribution, copying, >> or storage of this message or any attachment >> is strictly prohibited. >> >> -- >> Cheers & Best regards, >> Feilong Wang (王飞龙) >> Head of R&D >> Catalyst Cloud - Cloud Native New Zealand >> -------------------------------------------------------------------------- >> Tel: +64-48032246 >> Email: flwang at catalyst.net.nz >> Level 6, Catalyst House, 150 Willis Street, Wellington >> -------------------------------------------------------------------------- >> The contents of this e-mail message and >> any attachments are intended solely for the >> addressee(s) and may contain confidential >> and/or legally privileged information. If you >> are not the intended recipient of this message >> or if this message has been addressed to you >> in error, please immediately alert the sender >> by reply e-mail and then delete this message >> and any attachments. If you are not the >> intended recipient, you are notified that >> any use, dissemination, distribution, copying, >> or storage of this message or any attachment >> is strictly prohibited. > > -- > Cheers & Best regards, > Feilong Wang (王飞龙) > ------------------------------------------------------ > Senior Cloud Software Engineer > Tel: +64-48032246 > Email: flwang at catalyst.net.nz > Catalyst IT Limited > Level 6, Catalyst House, 150 Willis Street, Wellington > ------------------------------------------------------ > > > > -- > ~/DonnyD > C: 805 814 6800 > "No mission too difficult. No sacrifice too great. Duty First" -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 55224 bytes Desc: not available URL: From donny at fortnebula.com Mon Jan 13 21:21:33 2020 From: donny at fortnebula.com (Donny Davis) Date: Mon, 13 Jan 2020 16:21:33 -0500 Subject: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken In-Reply-To: References: <59ed63745f4e4c42a63692c3ee4eb10d@ncwmexgp031.CORP.CHARTERCOM.com> <6c8f45f2-da74-18fd-7909-84c9c6762fe3@catalyst.net.nz> <24a8d164-8e38-1512-caf3-9447f070b8fd@catalyst.net.nz> Message-ID: Just Coreos - I tried them all and it was the only one that worked oob. On Mon, Jan 13, 2020 at 4:10 PM Feilong Wang wrote: > Hi Donny, > > Do you mean Fedore CoreOS or just CoreOS? The current CoreOS driver is not > actively maintained, I would suggest migrating to Fedora CoreOS and I'm > happy to help if you have any question. Thanks. > > > On 14/01/20 9:57 AM, Donny Davis wrote: > > FWIW I was only able to get the coreos image working with magnum oob.. the > rest just didn't work. > > On Mon, Jan 13, 2020 at 2:31 PM feilong wrote: > >> Hi Bhupathi, >> >> Firstly, I would suggest setting the use_podman=False when using fedora >> atomic image. And it would be nice to set the "kube_tag", e.g. v1.15.6 >> explicitly. Then please trigger a new cluster creation. Then if you still >> run into error. Here is the debug steps: >> >> 1. ssh into the master node, check log /var/log/cloud-init-output.log >> >> 2. If there is no error in above log file, then run journalctl -u >> heat-container-agent to check the heat-container-agent log. If above step >> is correct, then you must be able to see something useful here. >> >> >> On 11/01/20 12:15 AM, Bhupathi, Ramakrishna wrote: >> >> Wang, >> >> Here it is . I added the labels subsequently. My nova and neutron are >> working all right as I installed various systems there working with no >> issues.. >> >> >> >> >> >> *From:* Feilong Wang [mailto:feilong at catalyst.net.nz >> ] >> *Sent:* Thursday, January 9, 2020 6:12 PM >> *To:* openstack-discuss at lists.openstack.org >> *Subject:* Re: [magnum]: K8s cluster creation times out. OpenStack Train >> : [ERROR]: Unable to render networking. Network config is likely broken >> >> >> >> Hi Bhupathi, >> >> Could you please share your cluster template? And please make sure your >> Nova/Neutron works. >> >> >> >> On 10/01/20 2:45 AM, Bhupathi, Ramakrishna wrote: >> >> Folks, >> >> I am building a Kubernetes Cluster( Openstack Train) and using fedora >> atomic-29 image . The nodes come up fine ( I have a simple 1 master and 1 >> node) , but the cluster creation times out, and when I access the >> cloud-init logs I see this error . Wondering what I am missing as this >> used to work before. I wonder if this is image related . >> >> >> >> [ERROR]: Unable to render networking. Network config is likely broken: No >> available network renderers found. Searched through list: ['eni', >> 'sysconfig', 'netplan'] >> >> >> >> Essentially the stack creation fails in “kube_cluster_deploy” >> >> >> >> Can somebody help me debug this ? Any help is appreciated. >> >> >> >> --RamaK >> >> The contents of this e-mail message and >> any attachments are intended solely for the >> addressee(s) and may contain confidential >> and/or legally privileged information. If you >> are not the intended recipient of this message >> or if this message has been addressed to you >> in error, please immediately alert the sender >> by reply e-mail and then delete this message >> and any attachments. If you are not the >> intended recipient, you are notified that >> any use, dissemination, distribution, copying, >> or storage of this message or any attachment >> is strictly prohibited. >> >> -- >> >> Cheers & Best regards, >> >> Feilong Wang (王飞龙) >> >> Head of R&D >> >> Catalyst Cloud - Cloud Native New Zealand >> >> -------------------------------------------------------------------------- >> >> Tel: +64-48032246 >> >> Email: flwang at catalyst.net.nz >> >> Level 6, Catalyst House, 150 Willis Street, Wellington >> >> -------------------------------------------------------------------------- >> >> The contents of this e-mail message and >> any attachments are intended solely for the >> addressee(s) and may contain confidential >> and/or legally privileged information. If you >> are not the intended recipient of this message >> or if this message has been addressed to you >> in error, please immediately alert the sender >> by reply e-mail and then delete this message >> and any attachments. If you are not the >> intended recipient, you are notified that >> any use, dissemination, distribution, copying, >> or storage of this message or any attachment >> is strictly prohibited. >> >> -- >> Cheers & Best regards, >> Feilong Wang (王飞龙) >> ------------------------------------------------------ >> Senior Cloud Software Engineer >> Tel: +64-48032246 >> Email: flwang at catalyst.net.nz >> Catalyst IT Limited >> Level 6, Catalyst House, 150 Willis Street, Wellington >> ------------------------------------------------------ >> >> > > -- > ~/DonnyD > C: 805 814 6800 > "No mission too difficult. No sacrifice too great. Duty First" > > -- > Cheers & Best regards, > Feilong Wang (王飞龙) > Head of R&D > Catalyst Cloud - Cloud Native New Zealand > -------------------------------------------------------------------------- > Tel: +64-48032246 > Email: flwang at catalyst.net.nz > Level 6, Catalyst House, 150 Willis Street, Wellington > -------------------------------------------------------------------------- > > -- ~/DonnyD C: 805 814 6800 "No mission too difficult. No sacrifice too great. Duty First" -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 55224 bytes Desc: not available URL: From feilong at catalyst.net.nz Mon Jan 13 21:25:08 2020 From: feilong at catalyst.net.nz (Feilong Wang) Date: Tue, 14 Jan 2020 10:25:08 +1300 Subject: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken In-Reply-To: References: <59ed63745f4e4c42a63692c3ee4eb10d@ncwmexgp031.CORP.CHARTERCOM.com> <6c8f45f2-da74-18fd-7909-84c9c6762fe3@catalyst.net.nz> <24a8d164-8e38-1512-caf3-9447f070b8fd@catalyst.net.nz> Message-ID: <5846dbb7-fbcd-db22-7342-5ba2b6e4a1d3@catalyst.net.nz> OK, if you're happy to stay on CoreOS, all good. If you're interested in migrating to Fedora CoreOS and have questions, then you're welcome to popup in #openstack-containers. Cheers. On 14/01/20 10:21 AM, Donny Davis wrote: > Just Coreos - I tried them all and it was the only one that worked oob.  > > On Mon, Jan 13, 2020 at 4:10 PM Feilong Wang > wrote: > > Hi Donny, > > Do you mean Fedore CoreOS or just CoreOS? The current CoreOS > driver is not actively maintained, I would suggest migrating to > Fedora CoreOS and I'm happy to help if you have any question. Thanks. > > > On 14/01/20 9:57 AM, Donny Davis wrote: >> FWIW I was only able to get the coreos image working with magnum >> oob.. the rest just didn't work.  >> >> On Mon, Jan 13, 2020 at 2:31 PM feilong > > wrote: >> >> Hi Bhupathi, >> >> Firstly, I would suggest setting the use_podman=False when >> using fedora atomic image. And it would be nice to set the >> "kube_tag", e.g. v1.15.6 explicitly. Then please trigger a >> new cluster creation. Then if you still run into error. Here >> is the debug steps: >> >> 1. ssh into the master node, check log >> /var/log/cloud-init-output.log >> >> 2. If there is no error in above log file, then run >> journalctl -u heat-container-agent to check the >> heat-container-agent log. If above step is correct, then you >> must be able to see something useful here. >> >> >> On 11/01/20 12:15 AM, Bhupathi, Ramakrishna wrote: >>> >>> Wang, >>> >>> Here it is  . I added the labels subsequently. My nova and >>> neutron are working all right as I installed various systems >>> there working with no issues.. >>> >>>   >>> >>>   >>> >>> *From:*Feilong Wang [mailto:feilong at catalyst.net.nz] >>> *Sent:* Thursday, January 9, 2020 6:12 PM >>> *To:* openstack-discuss at lists.openstack.org >>> >>> *Subject:* Re: [magnum]: K8s cluster creation times out. >>> OpenStack Train : [ERROR]: Unable to render networking. >>> Network config is likely broken >>> >>>   >>> >>> Hi Bhupathi, >>> >>> Could you please share your cluster template? And please >>> make sure your Nova/Neutron works. >>> >>>   >>> >>> On 10/01/20 2:45 AM, Bhupathi, Ramakrishna wrote: >>> >>> Folks, >>> >>> I am building a Kubernetes Cluster( Openstack Train) and >>> using fedora atomic-29 image . The nodes come up  fine ( >>> I have a simple 1 master and 1 node) , but the cluster >>> creation times out,  and when I access the cloud-init >>> logs I see this error .  Wondering what I am missing as >>> this used to work before.  I wonder if this is image >>> related . >>> >>>   >>> >>> [ERROR]: Unable to render networking. Network config is >>> likely broken: No available network renderers found. >>> Searched through list: ['eni', 'sysconfig', 'netplan'] >>> >>>   >>> >>> Essentially the stack creation fails in >>> “kube_cluster_deploy” >>> >>>   >>> >>> Can somebody help me debug this ? Any help is appreciated. >>> >>>   >>> >>> --RamaK >>> >>> The contents of this e-mail message and >>> any attachments are intended solely for the >>> addressee(s) and may contain confidential >>> and/or legally privileged information. If you >>> are not the intended recipient of this message >>> or if this message has been addressed to you >>> in error, please immediately alert the sender >>> by reply e-mail and then delete this message >>> and any attachments. If you are not the >>> intended recipient, you are notified that >>> any use, dissemination, distribution, copying, >>> or storage of this message or any attachment >>> is strictly prohibited. >>> >>> -- >>> Cheers & Best regards, >>> Feilong Wang (王飞龙) >>> Head of R&D >>> Catalyst Cloud - Cloud Native New Zealand >>> -------------------------------------------------------------------------- >>> Tel: +64-48032246 >>> Email: flwang at catalyst.net.nz >>> Level 6, Catalyst House, 150 Willis Street, Wellington >>> -------------------------------------------------------------------------- >>> The contents of this e-mail message and >>> any attachments are intended solely for the >>> addressee(s) and may contain confidential >>> and/or legally privileged information. If you >>> are not the intended recipient of this message >>> or if this message has been addressed to you >>> in error, please immediately alert the sender >>> by reply e-mail and then delete this message >>> and any attachments. If you are not the >>> intended recipient, you are notified that >>> any use, dissemination, distribution, copying, >>> or storage of this message or any attachment >>> is strictly prohibited. >> >> -- >> Cheers & Best regards, >> Feilong Wang (王飞龙) >> ------------------------------------------------------ >> Senior Cloud Software Engineer >> Tel: +64-48032246 >> Email: flwang at catalyst.net.nz >> Catalyst IT Limited >> Level 6, Catalyst House, 150 Willis Street, Wellington >> ------------------------------------------------------ >> >> >> >> -- >> ~/DonnyD >> C: 805 814 6800 >> "No mission too difficult. No sacrifice too great. Duty First" > > -- > Cheers & Best regards, > Feilong Wang (王飞龙) > Head of R&D > Catalyst Cloud - Cloud Native New Zealand > -------------------------------------------------------------------------- > Tel: +64-48032246 > Email: flwang at catalyst.net.nz > Level 6, Catalyst House, 150 Willis Street, Wellington > -------------------------------------------------------------------------- > > > > -- > ~/DonnyD > C: 805 814 6800 > "No mission too difficult. No sacrifice too great. Duty First" -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 55224 bytes Desc: not available URL: From feilong at catalyst.net.nz Mon Jan 13 21:38:56 2020 From: feilong at catalyst.net.nz (Feilong Wang) Date: Tue, 14 Jan 2020 10:38:56 +1300 Subject: [magnum][kolla] etcd wal sync duration issue In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA0477170E@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA0477170E@gmsxchsvr01.thecreation.com> Message-ID: <3f3fe0d1-7b61-d2f9-da65-d126ea5ed336@catalyst.net.nz> Hi Eric, That issue looks familiar for me. There are some questions I'd like to check before answering if you should upgrade to train. 1. Are using the default v3.2.7 version for etcd? 2. Did you try to reproduce this with devstack, using Fedora CoreOS driver? The etcd version could be 3.2.26 I asked above questions because I saw the same error when I used Fedora Atomic with etcd v3.2.7 and I can't reproduce it with Fedora CoreOS + etcd 3.2.26 On 12/01/20 6:44 AM, Eric K. Miller wrote: > > Hi, > >   > > We are using the following coe cluster template and cluster create > commands on an OpenStack Stein installation that installs Magnum 8.2.0 > Kolla containers installed by Kolla-Ansible 8.0.1: > >   > > openstack coe cluster template create \ > >   --image Fedora-AtomicHost-29-20191126.0.x86_64_raw \ > >   --keypair userkey \ > >   --external-network ext-net \ > >   --dns-nameserver 1.1.1.1 \ > >   --master-flavor c5sd.4xlarge \ > >   --flavor m5sd.4xlarge \ > >   --coe kubernetes \ > >   --network-driver flannel \ > >   --volume-driver cinder \ > >   --docker-storage-driver overlay2 \ > >   --docker-volume-size 100 \ > >   --registry-enabled \ > >  --master-lb-enabled \ > >   --floating-ip-disabled \ > >   --fixed-network KubernetesProjectNetwork001 \ > >   --fixed-subnet KubernetesProjectSubnet001 \ > >   --labels > kube_tag=v1.15.7,cloud_provider_tag=v1.15.0,heat_container_agent_tag=stein-dev,master_lb_floating_ip_enabled=true > \ > >   k8s-cluster-template-1.15.7-production-private > >   > > openstack coe cluster create \ > >   --cluster-template k8s-cluster-template-1.15.7-production-private \ > >   --keypair userkey \ > >   --master-count 3 \ > >   --node-count 3 \ > >   k8s-cluster001 > >   > > The deploy process works perfectly, however, the cluster health status > flips between healthy and unhealthy.  The unhealthy status indicates > that etcd has an issue. > >   > > When logged into master-0 (out of 3, as configured above), "systemctl > status etcd" shows the stdout from etcd, which shows: > >   > > Jan 11 17:27:36 k8s-cluster001-4effrc2irvjq-master-0.novalocal > runc[2725]: 2020-01-11 17:27:36.548453 W | etcdserver: timed out > waiting for read index response > > Jan 11 17:28:02 k8s-cluster001-4effrc2irvjq-master-0.novalocal > runc[2725]: 2020-01-11 17:28:02.960977 W | wal: sync duration of > 1.696804699s, expected less than 1s > > Jan 11 17:28:31 k8s-cluster001-4effrc2irvjq-master-0.novalocal > runc[2725]: 2020-01-11 17:28:31.292753 W | wal: sync duration of > 2.249722223s, expected less than 1s > >   > > We also see: > > Jan 11 17:40:39 k8s-cluster001-4effrc2irvjq-master-0.novalocal > runc[2725]: 2020-01-11 17:40:39.132459 I | etcdserver/api/v3rpc: grpc: > Server.processUnaryRPC failed to write status: stream error: code = > DeadlineExceeded desc = "context deadline exceeded" > >   > > We initially used relatively small flavors, but increased these to > something very large to be sure resources were not constrained in any > way.  "top" reported no CPU nor memory contention on any nodes in > either case. > >   > > Multiple clusters have been deployed, and they all have this issue, > including empty clusters that were just deployed. > >   > > I see a very large number of reports of similar issues with etcd, but > discussions lead to disk performance, which can't be the cause here, > not only because persistent storage for etcd isn't configured in > Magnum, but also the disks are "very" fast in this environment.  > Looking at "vmstat -D" from within master-0, the number of writes is > minimal.  Ceilometer logs about 15 to 20 write IOPS for this VM in > Gnocchi. > >   > > Any ideas? > >   > > We are finalizing procedures to upgrade to Train, so we wanted to be > sure that we weren't running into some common issue with Stein that > would immediately be solved with Train.  If so, we will simply proceed > with the upgrade and avoid diagnosing this issue further. > > > Thanks! > >   > > Eric > >   > -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Mon Jan 13 21:46:28 2020 From: mihalis68 at gmail.com (Chris Morgan) Date: Mon, 13 Jan 2020 16:46:28 -0500 Subject: [ops] No ops meetup team meeting tomorrow (2020/1/14) on IRC Message-ID: We had a great OpenStack Ops meetup last week in London. Those involved now need to catch up on other work, so we'll resume regular meetings on Jan 21st. Our thanks to everyone who attended and/or helped make it one of the best meetups in a while. Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From emiller at genesishosting.com Mon Jan 13 21:52:54 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Mon, 13 Jan 2020 15:52:54 -0600 Subject: [magnum][kolla] etcd wal sync duration issue In-Reply-To: <3f3fe0d1-7b61-d2f9-da65-d126ea5ed336@catalyst.net.nz> References: <046E9C0290DD9149B106B72FC9156BEA0477170E@gmsxchsvr01.thecreation.com> <3f3fe0d1-7b61-d2f9-da65-d126ea5ed336@catalyst.net.nz> Message-ID: <046E9C0290DD9149B106B72FC9156BEA04771716@gmsxchsvr01.thecreation.com> Hi Feilong, Thanks for responding! I am, indeed, using the default v3.2.7 version for etcd, which is the only available image. I did not try to reproduce with any other driver (we have never used DevStack, honestly, only Kolla-Ansible deployments). I did see a number of people indicating similar issues with etcd versions in the 3.3.x range, so I didn't think of it being an etcd issue, but then again most issues seem to be a result of people using HDDs and not SSDs, which makes sense. Interesting that you saw the same issue, though. We haven't tried Fedora CoreOS, but I think we would need Train for this. Everything I read about etcd indicates that it is extremely latency sensitive, due to the fact that it replicates all changes to all nodes and sends an fsync to Linux each time, so data is always guaranteed to be stored. I can see this becoming an issue quickly without super-low-latency network and storage. We are using Ceph-based SSD volumes for the Kubernetes Master node disks, which is extremely fast (likely 10x or better than anything people recommend for etcd), but network latency is always going to be higher with VMs on OpenStack with DVR than bare metal with VLANs due to all of the abstractions. Do you know who maintains the etcd images for Magnum here? Is there an easy way to create a newer image? https://hub.docker.com/r/openstackmagnum/etcd/tags/ Eric From: Feilong Wang [mailto:feilong at catalyst.net.nz] Sent: Monday, January 13, 2020 3:39 PM To: openstack-discuss at lists.openstack.org Subject: Re: [magnum][kolla] etcd wal sync duration issue Hi Eric, That issue looks familiar for me. There are some questions I'd like to check before answering if you should upgrade to train. 1. Are using the default v3.2.7 version for etcd? 2. Did you try to reproduce this with devstack, using Fedora CoreOS driver? The etcd version could be 3.2.26 I asked above questions because I saw the same error when I used Fedora Atomic with etcd v3.2.7 and I can't reproduce it with Fedora CoreOS + etcd 3.2.26 From Albert.Braden at synopsys.com Tue Jan 14 01:01:30 2020 From: Albert.Braden at synopsys.com (Albert Braden) Date: Tue, 14 Jan 2020 01:01:30 +0000 Subject: Designate zone troubleshooting Message-ID: I would like to improve this document so that it can be more useful. https://docs.openstack.org/designate/rocky/admin/troubleshooting.html I'm experiencing "I have a broken zone" in my dev cluster right now, and I would like to update this document with the repair procedure. Can anyone help me figure out what that is? The logs no longer contain the original failure; I want to figure out and then document the procedure that would change my zone statuses from "ERROR" back to "ACTIVE." root at us01odc-dev1-ctrl1:/var/log/designate# openstack zone list --all-projects +--------------------------------------+----------------------------------+-----------------------------+---------+------------+--------+--------+ | id | project_id | name | type | serial | status | action | +--------------------------------------+----------------------------------+-----------------------------+---------+------------+--------+--------+ | d9a74e85-22a7-4844-968d-35e0aefd9997 | cb36981f16674c1a8b2a73f30370f88e | dg.us01-dev1.synopsys.com. | PRIMARY | 1578962764 | ERROR | CREATE | | 29484d33-eb26-4a35-aff8-22f84acf16cd | 474ae347d8ad426f8118e55eee47dcfd | it.us01-dev1.synopsys.com. | PRIMARY | 1578962485 | ACTIVE | NONE | | 05356780-26c7-4649-8532-a42e3c2b75a3 | 1cc94ed7c37a4b4d86e1af3c92a8967c | 112.195.10.in-addr.arpa. | PRIMARY | 1578962486 | ACTIVE | NONE | | cc8290ba-12f8-485e-a9bb-6de3324764ef | eb5fa5310ca648d19cc0d35fdf13953a | seg.us01-dev1.synopsys.com. | PRIMARY | 1578962207 | ERROR | CREATE | | e3abb13c-58f6-49da-9aab-0a143c7c4fb8 | 1cc94ed7c37a4b4d86e1af3c92a8967c | 117.195.10.in-addr.arpa. | PRIMARY | 1578962208 | ERROR | CREATE | | 236949dc-ea7e-4ad7-a570-b62fccd05fac | 1cc94ed7c37a4b4d86e1af3c92a8967c | 113.195.10.in-addr.arpa. | PRIMARY | 1578962765 | ERROR | CREATE | +--------------------------------------+----------------------------------+-----------------------------+---------+------------+--------+--------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Tue Jan 14 07:37:16 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 14 Jan 2020 08:37:16 +0100 Subject: [magnum][kolla] etcd wal sync duration issue In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04771716@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA0477170E@gmsxchsvr01.thecreation.com> <3f3fe0d1-7b61-d2f9-da65-d126ea5ed336@catalyst.net.nz> <046E9C0290DD9149B106B72FC9156BEA04771716@gmsxchsvr01.thecreation.com> Message-ID: Just to clarify: this etcd is not provided by Kolla nor installed by Kolla-Ansible. -yoctozepto pon., 13 sty 2020 o 22:54 Eric K. Miller napisał(a): > > Hi Feilong, > > Thanks for responding! I am, indeed, using the default v3.2.7 version for etcd, which is the only available image. > > I did not try to reproduce with any other driver (we have never used DevStack, honestly, only Kolla-Ansible deployments). I did see a number of people indicating similar issues with etcd versions in the 3.3.x range, so I didn't think of it being an etcd issue, but then again most issues seem to be a result of people using HDDs and not SSDs, which makes sense. > > Interesting that you saw the same issue, though. We haven't tried Fedora CoreOS, but I think we would need Train for this. > > Everything I read about etcd indicates that it is extremely latency sensitive, due to the fact that it replicates all changes to all nodes and sends an fsync to Linux each time, so data is always guaranteed to be stored. I can see this becoming an issue quickly without super-low-latency network and storage. We are using Ceph-based SSD volumes for the Kubernetes Master node disks, which is extremely fast (likely 10x or better than anything people recommend for etcd), but network latency is always going to be higher with VMs on OpenStack with DVR than bare metal with VLANs due to all of the abstractions. > > Do you know who maintains the etcd images for Magnum here? Is there an easy way to create a newer image? > https://hub.docker.com/r/openstackmagnum/etcd/tags/ > > Eric > > > > From: Feilong Wang [mailto:feilong at catalyst.net.nz] > Sent: Monday, January 13, 2020 3:39 PM > To: openstack-discuss at lists.openstack.org > Subject: Re: [magnum][kolla] etcd wal sync duration issue > > Hi Eric, > That issue looks familiar for me. There are some questions I'd like to check before answering if you should upgrade to train. > 1. Are using the default v3.2.7 version for etcd? > 2. Did you try to reproduce this with devstack, using Fedora CoreOS driver? The etcd version could be 3.2.26 > I asked above questions because I saw the same error when I used Fedora Atomic with etcd v3.2.7 and I can't reproduce it with Fedora CoreOS + etcd 3.2.26 > > From skaplons at redhat.com Tue Jan 14 08:20:54 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 14 Jan 2020 09:20:54 +0100 Subject: [all] DevStack jobs broken due to setuptools not available for Python 2 In-Reply-To: <20200113173220.sdryzodzxvhxhvqc@yuggoth.org> References: <20200113060554.GA3819219@fedora19.localdomain> <16f9f3be487.10f2e3b05395088.6540142001370012765@ghanshyammann.com> <20200113144356.mj3ot2crequlcowc@yuggoth.org> <20200113154144.os44mrihcotvwt3r@yuggoth.org> <20200113173220.sdryzodzxvhxhvqc@yuggoth.org> Message-ID: Hi, > On 13 Jan 2020, at 18:32, Jeremy Stanley wrote: > > On 2020-01-13 15:41:44 +0000 (+0000), Jeremy Stanley wrote: > [...] >> I've proposed https://review.opendev.org/702244 for a smaller-scale >> version of this now (explicitly blacklisting wheels for pip, >> setuptools and virtualenv) but we can expand it to something more >> thorough once it's put through its paces. If we merge that, then we >> can manually delete the affected setuptools wheel from our >> wheel-mirror volume and not have to worry about it coming back. > > This has since merged, and as of 16:30 UTC (roughly an hour ago) I > deleted all copies of the setuptools-45.0.0-py2.py3-none-any.whl > file from our AFS volumes. We're testing now to see if previously > broken jobs are working again, but suspect things should be back to > normal. Thx Jeremy for quick fix for this issue. It seems that, at least for Neutron, all jobs are working again :) > -- > Jeremy Stanley — Slawek Kaplonski Senior software engineer Red Hat From renat.akhmerov at gmail.com Tue Jan 14 09:28:37 2020 From: renat.akhmerov at gmail.com (Renat Akhmerov) Date: Tue, 14 Jan 2020 16:28:37 +0700 Subject: [mistral][core] Promoting Eyal Bar-Ilan to the Mistral core team In-Reply-To: <11f24b32-4512-4b4d-95a0-71d485850ec3@Spark> References: <11f24b32-4512-4b4d-95a0-71d485850ec3@Spark> Message-ID: <248b0b4c-b8bf-484c-aca6-6c1b6429ec18@Spark> Hi, I’d like to promote Eyal Bar-Ilan to the Mistral core team since he’s shown a great contribution performance in the recent months. Eyal always reacts on various CI issues timely and provides fixes very quickly. He’s also completed a number of useful functional Mistral features in Train and Ussuri. And his overall statistics for Ussuri ([1]) makes him a clear candidate for core membership. Core reviewers, please let me know if you have any objections. [1] https://www.stackalytics.com/?module=mistral-group&release=ussuri&user_id=eyal.bar-ilan at nokia.com&metric=commits Thanks Renat Akhmerov @Nokia -------------- next part -------------- An HTML attachment was scrubbed... URL: From apetrich at redhat.com Tue Jan 14 10:18:10 2020 From: apetrich at redhat.com (Adriano Petrich) Date: Tue, 14 Jan 2020 11:18:10 +0100 Subject: [mistral][core] Promoting Eyal Bar-Ilan to the Mistral core team In-Reply-To: <248b0b4c-b8bf-484c-aca6-6c1b6429ec18@Spark> References: <11f24b32-4512-4b4d-95a0-71d485850ec3@Spark> <248b0b4c-b8bf-484c-aca6-6c1b6429ec18@Spark> Message-ID: +1 from me On Tue, 14 Jan 2020 at 10:28, Renat Akhmerov wrote: > Hi, > > I’d like to promote Eyal Bar-Ilan to the Mistral core team since he’s > shown a great contribution performance in the recent months. Eyal always > reacts on various CI issues timely and provides fixes very quickly. He’s > also completed a number of useful functional Mistral features in Train and > Ussuri. And his overall statistics for Ussuri ([1]) makes him a clear > candidate for core membership. > > Core reviewers, please let me know if you have any objections. > > [1] > https://www.stackalytics.com/?module=mistral-group&release=ussuri&user_id=eyal.bar-ilan at nokia.com&metric=commits > > > Thanks > > Renat Akhmerov > @Nokia > -- Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Laurie Krebs, Michael O'Neill, Thomas Savage -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangbailin at inspur.com Tue Jan 14 11:31:18 2020 From: zhangbailin at inspur.com (=?gb2312?B?QnJpbiBaaGFuZyjVxbDZwdYp?=) Date: Tue, 14 Jan 2020 11:31:18 +0000 Subject: [cyborg] enable launchpad or storyboard for Cyborg Message-ID: Hi Sundar and all: I think we should enable launchpad for the Cyborg project to record its reported bugs, submitted blueprints, etc., so that we can keep track of project updates and changes. Now I found there are some specifications in the cyborg-specs, and there has not been management by the Launchpad (https://launchpad.net/cyborg) or storyboard (https://storyboard.openstack.org/#!/project/openstack/cyborg). Personally recommend using Launchpad, it looks very intuitive. brinzhang -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Tue Jan 14 11:42:59 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 14 Jan 2020 12:42:59 +0100 Subject: [cyborg] enable launchpad or storyboard for Cyborg In-Reply-To: References: Message-ID: Two cents from me: I believe storyboard is the way forward. -yoctozepto wt., 14 sty 2020 o 12:39 Brin Zhang(张百林) napisał(a): > > Hi Sundar and all: > > I think we should enable launchpad for the Cyborg project to record its reported bugs, submitted blueprints, etc., so that we can keep track of project updates and changes. > > Now I found there are some specifications in the cyborg-specs, and there has not been management by the Launchpad (https://launchpad.net/cyborg) or storyboard (https://storyboard.openstack.org/#!/project/openstack/cyborg). > > Personally recommend using Launchpad, it looks very intuitive. > > > > brinzhang > > From andras.1.kovi at nokia.com Tue Jan 14 10:11:50 2020 From: andras.1.kovi at nokia.com (Kovi, Andras 1. (Nokia - HU/Budapest)) Date: Tue, 14 Jan 2020 10:11:50 +0000 Subject: [mistral][core] Promoting Eyal Bar-Ilan to the Mistral core team In-Reply-To: <248b0b4c-b8bf-484c-aca6-6c1b6429ec18@Spark> References: <11f24b32-4512-4b4d-95a0-71d485850ec3@Spark>, <248b0b4c-b8bf-484c-aca6-6c1b6429ec18@Spark> Message-ID: Workflow +1 Very welcome in the team! A ________________________________ From: Renat Akhmerov Sent: Tuesday, January 14, 2020 10:28:37 AM To: openstack-discuss at lists.openstack.org Subject: [mistral][core] Promoting Eyal Bar-Ilan to the Mistral core team Hi, I’d like to promote Eyal Bar-Ilan to the Mistral core team since he’s shown a great contribution performance in the recent months. Eyal always reacts on various CI issues timely and provides fixes very quickly. He’s also completed a number of useful functional Mistral features in Train and Ussuri. And his overall statistics for Ussuri ([1]) makes him a clear candidate for core membership. Core reviewers, please let me know if you have any objections. [1] https://www.stackalytics.com/?module=mistral-group&release=ussuri&user_id=eyal.bar-ilan at nokia.com&metric=commits Thanks Renat Akhmerov @Nokia -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Tue Jan 14 13:09:22 2020 From: smooney at redhat.com (Sean Mooney) Date: Tue, 14 Jan 2020 13:09:22 +0000 Subject: [cyborg] enable launchpad or storyboard for Cyborg In-Reply-To: References: Message-ID: <5760300f37d4d4e3cd71037688fb23ee76c5c685.camel@redhat.com> cyborg is already using storyboard. i dont really like it but since its already moved i dont think its worth moving back. i think launchpad is more intuitive but i think that boat has already sailed. On Tue, 2020-01-14 at 12:42 +0100, Radosław Piliszek wrote: > Two cents from me: I believe storyboard is the way forward. > > -yoctozepto > > wt., 14 sty 2020 o 12:39 Brin Zhang(张百林) napisał(a): > > > > Hi Sundar and all: > > > > I think we should enable launchpad for the Cyborg project to record its reported bugs, submitted blueprints, etc., > > so that we can keep track of project updates and changes. > > > > Now I found there are some specifications in the cyborg-specs, and there has not been management by the Launchpad ( > > https://launchpad.net/cyborg) or storyboard (https://storyboard.openstack.org/#!/project/openstack/cyborg). > > > > Personally recommend using Launchpad, it looks very intuitive. > > > > > > > > brinzhang > > > > > > From beagles at redhat.com Tue Jan 14 14:36:39 2020 From: beagles at redhat.com (Brent Eagles) Date: Tue, 14 Jan 2020 11:06:39 -0330 Subject: [tripleo] rocky builds In-Reply-To: References: Message-ID: <7fe24eb0-3a96-18a3-34d7-1a1495506b10@redhat.com> Hi, On 2020-01-10 1:42 p.m., Wesley Hayutin wrote: > Greetings, > > I've confirmed that builds from the Rocky release will no longer be > imported.  Looking for input from the upstream folks with regards to > maintaining the Rocky release for upstream.  Can you please comment if > you have any requirement to continue building, patching Rocky as I know > there are active reviews [1].  I've added this topic to be discussed at > the next meeting [2] > > Thank you! > > > [1] https://review.opendev.org/#/q/status:open+tripleo+branch:stable/rocky > [2] https://etherpad.openstack.org/p/tripleo-meeting-items Am I correct in that this only applies to rocky and we will continue to build and run CI on queens? Cheers, Brent From sean.mcginnis at gmx.com Tue Jan 14 14:44:38 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Tue, 14 Jan 2020 08:44:38 -0600 Subject: [tripleo] rocky builds In-Reply-To: <7fe24eb0-3a96-18a3-34d7-1a1495506b10@redhat.com> References: <7fe24eb0-3a96-18a3-34d7-1a1495506b10@redhat.com> Message-ID: <63cef2c5-27bc-5e4f-34fa-639dbafd9230@gmx.com> On 1/14/20 8:36 AM, Brent Eagles wrote: > Hi, > > On 2020-01-10 1:42 p.m., Wesley Hayutin wrote: >> Greetings, >> >> I've confirmed that builds from the Rocky release will no longer be >> imported.  Looking for input from the upstream folks with regards to >> maintaining the Rocky release for upstream.  Can you please comment >> if you have any requirement to continue building, patching Rocky as I >> know there are active reviews [1].  I've added this topic to be >> discussed at the next meeting [2] >> >> Thank you! >> >> >> [1] >> https://review.opendev.org/#/q/status:open+tripleo+branch:stable/rocky >> [2] https://etherpad.openstack.org/p/tripleo-meeting-items > > Am I correct in that this only applies to rocky and we will continue > to build and run CI on queens? > > Cheers, > > Brent > Rocky is still in the Maintained phase: https://releases.openstack.org/#release-series Older stable branches are in extended maintenance, so if the plan is to not support them anymore, that needs to be declared, with six months allowed for someone else to have a window to offer to continue providing that extended maintenance: https://docs.openstack.org/project-team-guide/stable-branches.html#maintenance-phases After the six months, if no one offers to maintain the code, it can then be marked as EOL. From radoslaw.piliszek at gmail.com Tue Jan 14 17:19:20 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 14 Jan 2020 18:19:20 +0100 Subject: [deployment][kolla][tripleo][osa][docs] Intro to OpenStack Message-ID: Hiya, Folks! I had some thought after talking with people new to OpenStack - our deployment tools are too well-hidden in the ecosystem. People try deploying devstack for fun (and some still use packstack!) and then give up on manual installation for production due to its complexity and fear of upgrades. I also got voices that Kolla/TripleO/OSA (order random) is an "unofficial" way to deploy OpenStack and the installation guide is the only "official" one (whatever that may mean in this very context). So I decided I go the "newbie" route and inspected the website. https://www.openstack.org invites us to browse: https://www.openstack.org/software/start/ which is nice and dandy, presents options of enterprise-grade solutions etc. but fails to really mention OpenStack has deployment tools (if one does not look at the submenu bar) and instead points the "newbie" to the installation guide: https://docs.openstack.org/install-guide/overview.html which kind-of negates that OpenStack ecosystem has any ready tools of deployment by saying: "After becoming familiar with basic installation, configuration, operation, and troubleshooting of these OpenStack services, you should consider the following steps toward deployment using a production architecture: ... Implement a deployment tool such as Ansible, Chef, Puppet, or Salt to automate deployment and management of the production environment." Just some food for thought. Extra for Kolla and OSA: https://docs.openstack.org/train/deploy/ seems we no longer deploy OpenStack since Train. -yoctozepto From noonedeadpunk at ya.ru Tue Jan 14 17:53:54 2020 From: noonedeadpunk at ya.ru (Dmitriy Rabotyagov) Date: Tue, 14 Jan 2020 19:53:54 +0200 Subject: [deployment][kolla][tripleo][osa][docs] Intro to OpenStack In-Reply-To: References: Message-ID: <20673451579024434@myt3-9168aea9495d.qloud-c.yandex.net> This part is really strage, since there are links for stein and ussuri, but not for train... While OSA has train deployment docs [1] [1] https://docs.openstack.org/project-deploy-guide/openstack-ansible/train/ 14.01.2020, 19:24, "Radosław Piliszek" : > > Extra for Kolla and OSA: > https://docs.openstack.org/train/deploy/ > seems we no longer deploy OpenStack since Train. > > -yoctozepto --  Kind Regards, Dmitriy Rabotyagov From radoslaw.piliszek at gmail.com Tue Jan 14 18:05:00 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 14 Jan 2020 19:05:00 +0100 Subject: [deployment][kolla][tripleo][osa][docs] Intro to OpenStack In-Reply-To: <20673451579024434@myt3-9168aea9495d.qloud-c.yandex.net> References: <20673451579024434@myt3-9168aea9495d.qloud-c.yandex.net> Message-ID: As does Kolla-Ansible to be complete: https://docs.openstack.org/project-deploy-guide/kolla-ansible/train/ -yoctozepto wt., 14 sty 2020 o 19:01 Dmitriy Rabotyagov napisał(a): > > This part is really strage, since there are links for stein and ussuri, but not for train... While OSA has train deployment docs [1] > > [1] https://docs.openstack.org/project-deploy-guide/openstack-ansible/train/ > > 14.01.2020, 19:24, "Radosław Piliszek" : > > > > > Extra for Kolla and OSA: > > https://docs.openstack.org/train/deploy/ > > seems we no longer deploy OpenStack since Train. > > > > -yoctozepto > > -- > Kind Regards, > Dmitriy Rabotyagov > > From C-Ramakrishna.Bhupathi at charter.com Tue Jan 14 20:56:42 2020 From: C-Ramakrishna.Bhupathi at charter.com (Bhupathi, Ramakrishna) Date: Tue, 14 Jan 2020 20:56:42 +0000 Subject: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken In-Reply-To: <5846dbb7-fbcd-db22-7342-5ba2b6e4a1d3@catalyst.net.nz> References: <59ed63745f4e4c42a63692c3ee4eb10d@ncwmexgp031.CORP.CHARTERCOM.com> <6c8f45f2-da74-18fd-7909-84c9c6762fe3@catalyst.net.nz> <24a8d164-8e38-1512-caf3-9447f070b8fd@catalyst.net.nz> <5846dbb7-fbcd-db22-7342-5ba2b6e4a1d3@catalyst.net.nz> Message-ID: I just moved to Fedora core OS image (fedora-coreos-31) to build my K8s Magnum cluster and cluster creation fails with ERROR: The Parameter (octavia_ingress_controller_tag) was not defined in template. I wonder why I need that tag. Any help please? --RamaK From: Feilong Wang [mailto:feilong at catalyst.net.nz] Sent: Monday, January 13, 2020 4:25 PM To: Donny Davis Cc: OpenStack Discuss Subject: Re: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken OK, if you're happy to stay on CoreOS, all good. If you're interested in migrating to Fedora CoreOS and have questions, then you're welcome to popup in #openstack-containers. Cheers. On 14/01/20 10:21 AM, Donny Davis wrote: Just Coreos - I tried them all and it was the only one that worked oob. On Mon, Jan 13, 2020 at 4:10 PM Feilong Wang > wrote: Hi Donny, Do you mean Fedore CoreOS or just CoreOS? The current CoreOS driver is not actively maintained, I would suggest migrating to Fedora CoreOS and I'm happy to help if you have any question. Thanks. On 14/01/20 9:57 AM, Donny Davis wrote: FWIW I was only able to get the coreos image working with magnum oob.. the rest just didn't work. On Mon, Jan 13, 2020 at 2:31 PM feilong > wrote: Hi Bhupathi, Firstly, I would suggest setting the use_podman=False when using fedora atomic image. And it would be nice to set the "kube_tag", e.g. v1.15.6 explicitly. Then please trigger a new cluster creation. Then if you still run into error. Here is the debug steps: 1. ssh into the master node, check log /var/log/cloud-init-output.log 2. If there is no error in above log file, then run journalctl -u heat-container-agent to check the heat-container-agent log. If above step is correct, then you must be able to see something useful here. On 11/01/20 12:15 AM, Bhupathi, Ramakrishna wrote: Wang, Here it is . I added the labels subsequently. My nova and neutron are working all right as I installed various systems there working with no issues.. [cid:image001.png at 01D5CAF2.D1A55F30] From: Feilong Wang [mailto:feilong at catalyst.net.nz] Sent: Thursday, January 9, 2020 6:12 PM To: openstack-discuss at lists.openstack.org Subject: Re: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken Hi Bhupathi, Could you please share your cluster template? And please make sure your Nova/Neutron works. On 10/01/20 2:45 AM, Bhupathi, Ramakrishna wrote: Folks, I am building a Kubernetes Cluster( Openstack Train) and using fedora atomic-29 image . The nodes come up fine ( I have a simple 1 master and 1 node) , but the cluster creation times out, and when I access the cloud-init logs I see this error . Wondering what I am missing as this used to work before. I wonder if this is image related . [ERROR]: Unable to render networking. Network config is likely broken: No available network renderers found. Searched through list: ['eni', 'sysconfig', 'netplan'] Essentially the stack creation fails in “kube_cluster_deploy” Can somebody help me debug this ? Any help is appreciated. --RamaK The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. -- Cheers & Best regards, Feilong Wang (王飞龙) ------------------------------------------------------ Senior Cloud Software Engineer Tel: +64-48032246 Email: flwang at catalyst.net.nz Catalyst IT Limited Level 6, Catalyst House, 150 Willis Street, Wellington ------------------------------------------------------ -- ~/DonnyD C: 805 814 6800 "No mission too difficult. No sacrifice too great. Duty First" -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- -- ~/DonnyD C: 805 814 6800 "No mission too difficult. No sacrifice too great. Duty First" -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- E-MAIL CONFIDENTIALITY NOTICE: The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 55224 bytes Desc: image001.png URL: From donny at fortnebula.com Tue Jan 14 21:04:42 2020 From: donny at fortnebula.com (Donny Davis) Date: Tue, 14 Jan 2020 16:04:42 -0500 Subject: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken In-Reply-To: References: <59ed63745f4e4c42a63692c3ee4eb10d@ncwmexgp031.CORP.CHARTERCOM.com> <6c8f45f2-da74-18fd-7909-84c9c6762fe3@catalyst.net.nz> <24a8d164-8e38-1512-caf3-9447f070b8fd@catalyst.net.nz> <5846dbb7-fbcd-db22-7342-5ba2b6e4a1d3@catalyst.net.nz> Message-ID: Did you update your cluster distro? Can you share your current cluster template? Donny Davis c: 805 814 6800 On Tue, Jan 14, 2020, 3:56 PM Bhupathi, Ramakrishna < C-Ramakrishna.Bhupathi at charter.com> wrote: > I just moved to Fedora core OS image (fedora-coreos-31) to build my K8s > Magnum cluster and cluster creation fails with > > ERROR: The Parameter (octavia_ingress_controller_tag) was not defined in > template. > > > > I wonder why I need that tag. Any help please? > > > > --RamaK > > > > *From:* Feilong Wang [mailto:feilong at catalyst.net.nz] > *Sent:* Monday, January 13, 2020 4:25 PM > *To:* Donny Davis > *Cc:* OpenStack Discuss > *Subject:* Re: [magnum]: K8s cluster creation times out. OpenStack Train > : [ERROR]: Unable to render networking. Network config is likely broken > > > > OK, if you're happy to stay on CoreOS, all good. If you're interested in > migrating to Fedora CoreOS and have questions, then you're welcome to popup > in #openstack-containers. Cheers. > > > > On 14/01/20 10:21 AM, Donny Davis wrote: > > Just Coreos - I tried them all and it was the only one that worked oob. > > > > On Mon, Jan 13, 2020 at 4:10 PM Feilong Wang > wrote: > > Hi Donny, > > Do you mean Fedore CoreOS or just CoreOS? The current CoreOS driver is not > actively maintained, I would suggest migrating to Fedora CoreOS and I'm > happy to help if you have any question. Thanks. > > > > On 14/01/20 9:57 AM, Donny Davis wrote: > > FWIW I was only able to get the coreos image working with magnum oob.. the > rest just didn't work. > > > > On Mon, Jan 13, 2020 at 2:31 PM feilong wrote: > > Hi Bhupathi, > > Firstly, I would suggest setting the use_podman=False when using fedora > atomic image. And it would be nice to set the "kube_tag", e.g. v1.15.6 > explicitly. Then please trigger a new cluster creation. Then if you still > run into error. Here is the debug steps: > > 1. ssh into the master node, check log /var/log/cloud-init-output.log > > 2. If there is no error in above log file, then run journalctl -u > heat-container-agent to check the heat-container-agent log. If above step > is correct, then you must be able to see something useful here. > > > > On 11/01/20 12:15 AM, Bhupathi, Ramakrishna wrote: > > Wang, > > Here it is . I added the labels subsequently. My nova and neutron are > working all right as I installed various systems there working with no > issues.. > > > > > > *From:* Feilong Wang [mailto:feilong at catalyst.net.nz > ] > *Sent:* Thursday, January 9, 2020 6:12 PM > *To:* openstack-discuss at lists.openstack.org > *Subject:* Re: [magnum]: K8s cluster creation times out. OpenStack Train > : [ERROR]: Unable to render networking. Network config is likely broken > > > > Hi Bhupathi, > > Could you please share your cluster template? And please make sure your > Nova/Neutron works. > > > > On 10/01/20 2:45 AM, Bhupathi, Ramakrishna wrote: > > Folks, > > I am building a Kubernetes Cluster( Openstack Train) and using fedora > atomic-29 image . The nodes come up fine ( I have a simple 1 master and 1 > node) , but the cluster creation times out, and when I access the > cloud-init logs I see this error . Wondering what I am missing as this > used to work before. I wonder if this is image related . > > > > [ERROR]: Unable to render networking. Network config is likely broken: No > available network renderers found. Searched through list: ['eni', > 'sysconfig', 'netplan'] > > > > Essentially the stack creation fails in “kube_cluster_deploy” > > > > Can somebody help me debug this ? Any help is appreciated. > > > > --RamaK > > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. > > -- > > Cheers & Best regards, > > Feilong Wang (王飞龙) > > Head of R&D > > Catalyst Cloud - Cloud Native New Zealand > > -------------------------------------------------------------------------- > > Tel: +64-48032246 > > Email: flwang at catalyst.net.nz > > Level 6, Catalyst House, 150 Willis Street, Wellington > > -------------------------------------------------------------------------- > > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. > > -- > > Cheers & Best regards, > > Feilong Wang (王飞龙) > > ------------------------------------------------------ > > Senior Cloud Software Engineer > > Tel: +64-48032246 > > Email: flwang at catalyst.net.nz > > Catalyst IT Limited > > Level 6, Catalyst House, 150 Willis Street, Wellington > > ------------------------------------------------------ > > > > > -- > > ~/DonnyD > > C: 805 814 6800 > > "No mission too difficult. No sacrifice too great. Duty First" > > -- > > Cheers & Best regards, > > Feilong Wang (王飞龙) > > Head of R&D > > Catalyst Cloud - Cloud Native New Zealand > > -------------------------------------------------------------------------- > > Tel: +64-48032246 > > Email: flwang at catalyst.net.nz > > Level 6, Catalyst House, 150 Willis Street, Wellington > > -------------------------------------------------------------------------- > > > > > -- > > ~/DonnyD > > C: 805 814 6800 > > "No mission too difficult. No sacrifice too great. Duty First" > > -- > > Cheers & Best regards, > > Feilong Wang (王飞龙) > > Head of R&D > > Catalyst Cloud - Cloud Native New Zealand > > -------------------------------------------------------------------------- > > Tel: +64-48032246 > > Email: flwang at catalyst.net.nz > > Level 6, Catalyst House, 150 Willis Street, Wellington > > -------------------------------------------------------------------------- > > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 55224 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 55224 bytes Desc: not available URL: From Albert.Braden at synopsys.com Wed Jan 15 00:49:19 2020 From: Albert.Braden at synopsys.com (Albert Braden) Date: Wed, 15 Jan 2020 00:49:19 +0000 Subject: Designate zone troubleshooting [designate] Message-ID: Trying again: I would like to improve this document so that it can be more useful. https://docs.openstack.org/designate/rocky/admin/troubleshooting.html I'm experiencing "I have a broken zone" in my dev cluster right now, and I would like to update this document with the repair procedure. Can anyone help me figure out what that is? The logs no longer contain the original failure; I want to figure out and then document the procedure that would change my zone statuses from "ERROR" back to "ACTIVE." root at us01odc-dev1-ctrl1:/var/log/designate# openstack zone list --all-projects +--------------------------------------+----------------------------------+-----------------------------+---------+------------+--------+--------+ | id | project_id | name | type | serial | status | action | +--------------------------------------+----------------------------------+-----------------------------+---------+------------+--------+--------+ | d9a74e85-22a7-4844-968d-35e0aefd9997 | cb36981f16674c1a8b2a73f30370f88e | dg.us01-dev1.synopsys.com. | PRIMARY | 1578962764 | ERROR | CREATE | | 29484d33-eb26-4a35-aff8-22f84acf16cd | 474ae347d8ad426f8118e55eee47dcfd | it.us01-dev1.synopsys.com. | PRIMARY | 1578962485 | ACTIVE | NONE | | 05356780-26c7-4649-8532-a42e3c2b75a3 | 1cc94ed7c37a4b4d86e1af3c92a8967c | 112.195.10.in-addr.arpa. | PRIMARY | 1578962486 | ACTIVE | NONE | | cc8290ba-12f8-485e-a9bb-6de3324764ef | eb5fa5310ca648d19cc0d35fdf13953a | seg.us01-dev1.synopsys.com. | PRIMARY | 1578962207 | ERROR | CREATE | | e3abb13c-58f6-49da-9aab-0a143c7c4fb8 | 1cc94ed7c37a4b4d86e1af3c92a8967c | 117.195.10.in-addr.arpa. | PRIMARY | 1578962208 | ERROR | CREATE | | 236949dc-ea7e-4ad7-a570-b62fccd05fac | 1cc94ed7c37a4b4d86e1af3c92a8967c | 113.195.10.in-addr.arpa. | PRIMARY | 1578962765 | ERROR | CREATE | +--------------------------------------+----------------------------------+-----------------------------+---------+------------+--------+--------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From vgvoleg at gmail.com Wed Jan 15 07:09:53 2020 From: vgvoleg at gmail.com (Oleg Ovcharuk) Date: Wed, 15 Jan 2020 10:09:53 +0300 Subject: [mistral][core] Promoting Eyal Bar-Ilan to the Mistral core team In-Reply-To: References: Message-ID: <5DEAE4DB-3D39-46EB-9F20-D7414A0FE4C1@gmail.com> +1 > 14 янв. 2020 г., в 15:12, Kovi, Andras 1. (Nokia - HU/Budapest) написал(а): > >  > Workflow +1 > > Very welcome in the team! > > A > > From: Renat Akhmerov > Sent: Tuesday, January 14, 2020 10:28:37 AM > To: openstack-discuss at lists.openstack.org > Subject: [mistral][core] Promoting Eyal Bar-Ilan to the Mistral core team > > Hi, > > I’d like to promote Eyal Bar-Ilan to the Mistral core team since he’s shown a great contribution performance in the recent months. Eyal always reacts on various CI issues timely and provides fixes very quickly. He’s also completed a number of useful functional Mistral features in Train and Ussuri. And his overall statistics for Ussuri ([1]) makes him a clear candidate for core membership. > > Core reviewers, please let me know if you have any objections. > > [1] https://www.stackalytics.com/?module=mistral-group&release=ussuri&user_id=eyal.bar-ilan at nokia.com&metric=commits > > > Thanks > > Renat Akhmerov > @Nokia -------------- next part -------------- An HTML attachment was scrubbed... URL: From renat.akhmerov at gmail.com Wed Jan 15 07:13:07 2020 From: renat.akhmerov at gmail.com (Renat Akhmerov) Date: Wed, 15 Jan 2020 14:13:07 +0700 Subject: [mistral][core] Promoting Eyal Bar-Ilan to the Mistral core team In-Reply-To: <5DEAE4DB-3D39-46EB-9F20-D7414A0FE4C1@gmail.com> References: <5DEAE4DB-3D39-46EB-9F20-D7414A0FE4C1@gmail.com> Message-ID: <076d1816-cbdb-47d7-b9b6-8cd5337bdf81@Spark> Eyal, congrats! You just became a core member :) You now can vote with +2 (or -2) and approve patches. Keep up the good work! Thanks Renat Akhmerov @Nokia On 15 Jan 2020, 14:09 +0700, Oleg Ovcharuk , wrote: > +1 > > > 14 янв. 2020 г., в 15:12, Kovi, Andras 1. (Nokia - HU/Budapest) написал(а): > > > > Workflow +1 > > > > Very welcome in the team! > > > > A > > > > From: Renat Akhmerov > > Sent: Tuesday, January 14, 2020 10:28:37 AM > > To: openstack-discuss at lists.openstack.org > > Subject: [mistral][core] Promoting Eyal Bar-Ilan to the Mistral core team > > > > Hi, > > > > I’d like to promote Eyal Bar-Ilan to the Mistral core team since he’s shown a great contribution performance in the recent months. Eyal always reacts on various CI issues timely and provides fixes very quickly. He’s also completed a number of useful functional Mistral features in Train and Ussuri. And his overall statistics for Ussuri ([1]) makes him a clear candidate for core membership. > > > > Core reviewers, please let me know if you have any objections. > > > > [1] https://www.stackalytics.com/?module=mistral-group&release=ussuri&user_id=eyal.bar-ilan at nokia.com&metric=commits > > > > > > Thanks > > > > Renat Akhmerov > > @Nokia -------------- next part -------------- An HTML attachment was scrubbed... URL: From aj at suse.com Wed Jan 15 07:29:13 2020 From: aj at suse.com (Andreas Jaeger) Date: Wed, 15 Jan 2020 08:29:13 +0100 Subject: [deployment][kolla][tripleo][osa][docs] Intro to OpenStack In-Reply-To: References: Message-ID: On 14/01/2020 18.19, Radosław Piliszek wrote: > Hiya, Folks! > > I had some thought after talking with people new to OpenStack - our > deployment tools are too well-hidden in the ecosystem. > People try deploying devstack for fun (and some still use packstack!) > and then give up on manual installation for production due to its > complexity and fear of upgrades. > I also got voices that Kolla/TripleO/OSA (order random) is an > "unofficial" way to deploy OpenStack and the installation guide is the > only "official" one (whatever that may mean in this very context). > > So I decided I go the "newbie" route and inspected the website. > https://www.openstack.org invites us to browse: > https://www.openstack.org/software/start/ > which is nice and dandy, presents options of enterprise-grade solutions etc. > but fails to really mention OpenStack has deployment tools (if one > does not look at the submenu bar) and instead points the "newbie" to > the installation guide: > https://docs.openstack.org/install-guide/overview.html > which kind-of negates that OpenStack ecosystem has any ready tools of > deployment by saying: > "After becoming familiar with basic installation, configuration, > operation, and troubleshooting of these OpenStack services, you should > consider the following steps toward deployment using a production > architecture: ... > Implement a deployment tool such as Ansible, Chef, Puppet, or Salt to > automate deployment and management of the production environment." The goal of the install guide is to learn: "This guide covers step-by-step deployment of the major OpenStack services using a functional example architecture suitable for new users of OpenStack with sufficient Linux experience. This guide is not intended to be used for production system installations, but to create a minimum proof-of-concept for the purpose of learning about OpenStack." And then comes the above cited. We can change the text and point to the deployment pages if you want. That's from the page you mention, Andreas > Just some food for thought. > > Extra for Kolla and OSA: > https://docs.openstack.org/train/deploy/ > seems we no longer deploy OpenStack since Train. It was not ready when be branched it - and nobody added it to the index page, see https://docs.openstack.org/doc-contrib-guide/doc-index.html, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB From radoslaw.piliszek at gmail.com Wed Jan 15 09:33:00 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Wed, 15 Jan 2020 10:33:00 +0100 Subject: [deployment][kolla][tripleo][osa][docs] Intro to OpenStack In-Reply-To: References: Message-ID: > "This guide covers step-by-step deployment of the major OpenStack > services using a functional example architecture suitable for new users > of OpenStack with sufficient Linux experience. This guide is not > intended to be used for production system installations, but to create a > minimum proof-of-concept for the purpose of learning about OpenStack." > > And then comes the above cited. > > We can change the text and point to the deployment pages if you want. Yup, that's true but somehow the effect is opposite. It would be better if we mentioned deployment tools in this place as well, I agree. This paragraph deserves a little rewrite. Though I would opt also for renovation of the OpenStack foundation page in that respect - it's there were interested parties gather and are guided to deploy OpenStack with the installation guide. Installation guide is in "Deploy OpenStack", devstack is in "Try OpenStack". I think both really belong to "Try OpenStack" (the installation guide should just add "the harder way" :-) ). I did not investigate yet how to propose a change there and, tbh, I don't have a concrete idea how to make it so that deployment tools are really visible w/o the notion they are something 3rd party and not cared about. > > Extra for Kolla and OSA: > > https://docs.openstack.org/train/deploy/ > > seems we no longer deploy OpenStack since Train. > > It was not ready when be branched it - and nobody added it to the index > page, see https://docs.openstack.org/doc-contrib-guide/doc-index.html, Ah, thanks. We will add it to Kolla procedures then. -yoctozepto śr., 15 sty 2020 o 08:29 Andreas Jaeger napisał(a): > > On 14/01/2020 18.19, Radosław Piliszek wrote: > > Hiya, Folks! > > > > I had some thought after talking with people new to OpenStack - our > > deployment tools are too well-hidden in the ecosystem. > > People try deploying devstack for fun (and some still use packstack!) > > and then give up on manual installation for production due to its > > complexity and fear of upgrades. > > I also got voices that Kolla/TripleO/OSA (order random) is an > > "unofficial" way to deploy OpenStack and the installation guide is the > > only "official" one (whatever that may mean in this very context). > > > > So I decided I go the "newbie" route and inspected the website. > > https://www.openstack.org invites us to browse: > > https://www.openstack.org/software/start/ > > which is nice and dandy, presents options of enterprise-grade solutions etc. > > but fails to really mention OpenStack has deployment tools (if one > > does not look at the submenu bar) and instead points the "newbie" to > > the installation guide: > > https://docs.openstack.org/install-guide/overview.html > > which kind-of negates that OpenStack ecosystem has any ready tools of > > deployment by saying: > > "After becoming familiar with basic installation, configuration, > > operation, and troubleshooting of these OpenStack services, you should > > consider the following steps toward deployment using a production > > architecture: ... > > Implement a deployment tool such as Ansible, Chef, Puppet, or Salt to > > automate deployment and management of the production environment." > > The goal of the install guide is to learn: > > "This guide covers step-by-step deployment of the major OpenStack > services using a functional example architecture suitable for new users > of OpenStack with sufficient Linux experience. This guide is not > intended to be used for production system installations, but to create a > minimum proof-of-concept for the purpose of learning about OpenStack." > > And then comes the above cited. > > We can change the text and point to the deployment pages if you want. > > That's from the page you mention, > > Andreas > > > Just some food for thought. > > > > Extra for Kolla and OSA: > > https://docs.openstack.org/train/deploy/ > > seems we no longer deploy OpenStack since Train. > > It was not ready when be branched it - and nobody added it to the index > page, see https://docs.openstack.org/doc-contrib-guide/doc-index.html, > > Andreas > -- > Andreas Jaeger aj at suse.com Twitter: jaegerandi > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB From stig.openstack at telfer.org Wed Jan 15 10:26:18 2020 From: stig.openstack at telfer.org (Stig Telfer) Date: Wed, 15 Jan 2020 10:26:18 +0000 Subject: [scientific-sig] IRC Meeting today 1100UTC - large scale & planning for 2020 Message-ID: <6757CB11-C2B9-4FE5-A26C-E7BC5D318BF4@telfer.org> Hi All - We have a Scientific SIG meeting today at 1100UTC (about 30 minutes time) in channel #openstack-meeting. Everyone is welcome. Today’s agenda is here: https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_January_15th_2020 We’d like to gather some datapoints for the Large Scale SIG, and talk CFPs and conferences for 2020. Cheers, Stig -------------- next part -------------- An HTML attachment was scrubbed... URL: From sfinucan at redhat.com Wed Jan 15 11:22:21 2020 From: sfinucan at redhat.com (Stephen Finucane) Date: Wed, 15 Jan 2020 11:22:21 +0000 Subject: [nova] Removal of 'deallocate_networks_on_reschedule' virt driver API method Message-ID: <7eef48707decf612b619700df13d8f14383e6967.camel@redhat.com> Just FYI, the 'deallocate_networks_on_reschedule' method of the nova virt driver API has been removed in [1]. It was only used for nova- network based flows and is therefore surplus to requirements. Third party drivers that implement this function can now remove it. Stephen [1] https://review.opendev.org/#/c/696516/ From aj at suse.com Wed Jan 15 13:42:09 2020 From: aj at suse.com (Andreas Jaeger) Date: Wed, 15 Jan 2020 14:42:09 +0100 Subject: [deployment][kolla][tripleo][osa][docs] Intro to OpenStack In-Reply-To: References: Message-ID: Proposed fix: https://review.opendev.org/702666 Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB From emilien at redhat.com Wed Jan 15 14:34:50 2020 From: emilien at redhat.com (Emilien Macchi) Date: Wed, 15 Jan 2020 09:34:50 -0500 Subject: [tripleo] Switch to tripleo-container-manage by default (replacing Paunch) Message-ID: Hi folks, Some work has been done to replace Paunch and use the new "podman_container" Ansible module; It's possible thanks to a role that is now documented here: https://docs.openstack.org/tripleo-ansible/latest/roles/role-tripleo-container-manage.html While some efforts are still ongoing to replace container-puppet.py (which uses Paunch to execute a podman run); the tripleo-container-manage role has reached stability and enough maturity to be the default now. I would like to give it a try and for that I have 2 patches that would need to be landed: https://review.opendev.org/#/c/700737 https://review.opendev.org/#/c/700738 The most popular question that has been asked about $topic so far is: how can I run paunch debug to print the podman commands. Answer: https://docs.openstack.org/tripleo-ansible/latest/roles/role-tripleo-container-manage.html#check-mode Please raise any concern here and we'll address it. Hopefully we can make the default on time before U cycle ends. On a side note: I prepared all the backports to stable/train so the feature will be available in that branch as well. Thanks, -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin at cloudnull.com Wed Jan 15 15:31:24 2020 From: kevin at cloudnull.com (Carter, Kevin) Date: Wed, 15 Jan 2020 09:31:24 -0600 Subject: [tripleo] Switch to tripleo-container-manage by default (replacing Paunch) In-Reply-To: References: Message-ID: Nicely done Emilien! -- Kevin Carter IRC: Cloudnull On Wed, Jan 15, 2020 at 8:38 AM Emilien Macchi wrote: > Hi folks, > > Some work has been done to replace Paunch and use the new > "podman_container" Ansible module; It's possible thanks to a role that is > now documented here: > > https://docs.openstack.org/tripleo-ansible/latest/roles/role-tripleo-container-manage.html > > While some efforts are still ongoing to replace container-puppet.py (which > uses Paunch to execute a podman run); the tripleo-container-manage role has > reached stability and enough maturity to be the default now. > I would like to give it a try and for that I have 2 patches that would > need to be landed: > https://review.opendev.org/#/c/700737 > https://review.opendev.org/#/c/700738 > > The most popular question that has been asked about $topic so far is: how > can I run paunch debug to print the podman commands. > Answer: > https://docs.openstack.org/tripleo-ansible/latest/roles/role-tripleo-container-manage.html#check-mode > > Please raise any concern here and we'll address it. > Hopefully we can make the default on time before U cycle ends. > > On a side note: I prepared all the backports to stable/train so the > feature will be available in that branch as well. > > Thanks, > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From emilien at redhat.com Wed Jan 15 15:42:57 2020 From: emilien at redhat.com (Emilien Macchi) Date: Wed, 15 Jan 2020 10:42:57 -0500 Subject: [tripleo] Switch to tripleo-container-manage by default (replacing Paunch) In-Reply-To: References: Message-ID: On Wed, Jan 15, 2020 at 10:31 AM Carter, Kevin wrote: > Nicely done Emilien! > > On Wed, Jan 15, 2020 at 8:38 AM Emilien Macchi wrote: > >> [...] >> >> The most popular question that has been asked about $topic so far is: how >> can I run paunch debug to print the podman commands. >> Answer: >> https://docs.openstack.org/tripleo-ansible/latest/roles/role-tripleo-container-manage.html#check-mode >> > I just thought about it but I thought we could have a tripleo command like : $ openstack tripleo container deploy --name keystone --host overcloud-controller1 $ openstack tripleo container deploy --name keystone --host overcloud-controller1 --dry-run It would use the new Ansible-runner to execute something like: https://docs.openstack.org/tripleo-ansible/latest/roles/role-tripleo-container-manage.html#example-with-one-container Dry-run would basically run the same thing with Ansible in check mode. In overall, we would still have a CLI (in tripleoclient instead of Paunch); and most of the container logic resides in podman_container which aims to be shared outside of TripleO which has been the main goal driving this effort. What do you think? -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin at cloudnull.com Wed Jan 15 15:57:33 2020 From: kevin at cloudnull.com (Carter, Kevin) Date: Wed, 15 Jan 2020 09:57:33 -0600 Subject: [tripleo] Switch to tripleo-container-manage by default (replacing Paunch) In-Reply-To: References: Message-ID: On Wed, Jan 15, 2020 at 9:43 AM Emilien Macchi wrote: > > > On Wed, Jan 15, 2020 at 10:31 AM Carter, Kevin > wrote: > >> Nicely done Emilien! >> >> On Wed, Jan 15, 2020 at 8:38 AM Emilien Macchi >> wrote: >> >>> [...] >>> >>> The most popular question that has been asked about $topic so far is: >>> how can I run paunch debug to print the podman commands. >>> Answer: >>> https://docs.openstack.org/tripleo-ansible/latest/roles/role-tripleo-container-manage.html#check-mode >>> >> > I just thought about it but I thought we could have a tripleo command like > : > > $ openstack tripleo container deploy --name keystone --host > overcloud-controller1 > $ openstack tripleo container deploy --name keystone --host > overcloud-controller1 --dry-run > > +1 > It would use the new Ansible-runner to execute something like: > https://docs.openstack.org/tripleo-ansible/latest/roles/role-tripleo-container-manage.html#example-with-one-container > Dry-run would basically run the same thing with Ansible in check mode. > > In overall, we would still have a CLI (in tripleoclient instead of > Paunch); and most of the container logic resides in podman_container which > aims to be shared outside of TripleO which has been the main goal driving > this effort. > What do you think? > I like it! I think this is a very natural progression, especially now that we're using ansible-runner and have built a solid foundation of roles, filters, modules, etc. > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdobreli at redhat.com Wed Jan 15 16:44:50 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Wed, 15 Jan 2020 17:44:50 +0100 Subject: [tripleo] Switch to tripleo-container-manage by default (replacing Paunch) In-Reply-To: References: Message-ID: On 15.01.2020 16:42, Emilien Macchi wrote: > > > On Wed, Jan 15, 2020 at 10:31 AM Carter, Kevin > wrote: > > Nicely done Emilien! > > On Wed, Jan 15, 2020 at 8:38 AM Emilien Macchi > wrote: > > [...] > > The most popular question that has been asked about $topic so > far is: how can I run paunch debug to print the podman commands. > Answer: > https://docs.openstack.org/tripleo-ansible/latest/roles/role-tripleo-container-manage.html#check-mode > > > I just thought about it but I thought we could have a tripleo command like : > > $ openstack tripleo container deploy --name keystone --host > overcloud-controller1 > $ openstack tripleo container deploy --name keystone --host > overcloud-controller1 --dry-run > > It would use the new Ansible-runner to execute something like: > https://docs.openstack.org/tripleo-ansible/latest/roles/role-tripleo-container-manage.html#example-with-one-container > Dry-run would basically run the same thing with Ansible in check mode. > > In overall, we would still have a CLI (in tripleoclient instead of > Paunch); and most of the container logic resides in podman_container > which aims to be shared outside of TripleO which has been the main goal > driving this effort. > What do you think? Yes please, "shared outside of TripleO" is great aim to accomplish. I think a simple standalone pomdan+systemd+tripleo-container-manage might lead us much further than only Tripleo, and only OpenStack cases. > -- > Emilien Macchi -- Best regards, Bogdan Dobrelya, Irc #bogdando From C-Ramakrishna.Bhupathi at charter.com Wed Jan 15 19:09:19 2020 From: C-Ramakrishna.Bhupathi at charter.com (Bhupathi, Ramakrishna) Date: Wed, 15 Jan 2020 19:09:19 +0000 Subject: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken In-Reply-To: References: <59ed63745f4e4c42a63692c3ee4eb10d@ncwmexgp031.CORP.CHARTERCOM.com> <6c8f45f2-da74-18fd-7909-84c9c6762fe3@catalyst.net.nz> <24a8d164-8e38-1512-caf3-9447f070b8fd@catalyst.net.nz> <5846dbb7-fbcd-db22-7342-5ba2b6e4a1d3@catalyst.net.nz> Message-ID: Donny, Yes. Here it is. Cluster-template info as well as the Image info. magnum cluster-template-show kt-coreOS +-----------------------+--------------------------------------+ | Property | Value | +-----------------------+--------------------------------------+ | insecure_registry | - | | http_proxy | - | | updated_at | 2020-01-15T19:05:56+00:00 | | floating_ip_enabled | True | | fixed_subnet | - | | master_flavor_id | - | | user_id | 8d22ae284924432ba026e8a6236bc52e | | uuid | 6aea495e-6d8d-420b-8ca3-6e7fed73f3c7 | | no_proxy | - | | https_proxy | - | | tls_disabled | True | | keypair_id | ramak-test | | hidden | False | | project_id | 0c1abff4e920448ba86638bd0d78f7ca | | public | False | | labels | {'use_podman': 'false', ' | | | kube_tag': 'v1.16.2'} | | docker_volume_size | 5 | | server_type | vm | | external_network_id | thunder-public-vlan280 | | cluster_distro | coreos | | image_id | b1354e4e-8281-4330-a4b2-b5fdb022f805 | | volume_driver | - | | registry_enabled | False | | docker_storage_driver | devicemapper | | apiserver_port | - | | name | kt-coreOS | | created_at | 2020-01-14T20:32:04+00:00 | | network_driver | flannel | | fixed_network | - | | coe | kubernetes | | flavor_id | kuber-node | | master_lb_enabled | False | | dns_nameserver | 8.8.8.8 | +-----------------------+--------------------------------------+ ubuntu at kolla-ubuntu:~$ glance image-show 2fa8b3d8-c2e5-4568-9340-a18dd3d3120a +------------------+----------------------------------------------------------------------------------+ | Property | Value | +------------------+----------------------------------------------------------------------------------+ | checksum | cfbdc70bde5cd7df73a05a0fdc8e806c | | container_format | bare | | created_at | 2020-01-14T14:53:01Z | | disk_format | qcow2 | | id | 2fa8b3d8-c2e5-4568-9340-a18dd3d3120a | | locations | [{"url": "rbd://8c7d79a9-1275-4487-8ed0-6ea1fedccbef/images/2fa8b3d8-c2e5-4568-9 | | | 340-a18dd3d3120a/snap", "metadata": {}}] | | min_disk | 0 | | min_ram | 0 | | name | coreOS-latest | | os_distro | coreos | | os_hash_algo | sha512 | | os_hash_value | e6c4ce2e3e9dac4606f0edf689erf8782f99e249cc07887f620db69c9b91631301b480086c0e | | | 8ef5f42f4909b3fc3ef110e0erwff2922c0ca6a665dd11c57a | | os_hidden | False | | owner | 0c1abff4e920448ba86638bd0d78f7ca | | protected | False | | size | 1068171264 | | status | active | | tags | [] | | updated_at | 2020-01-14T14:53:39Z | | virtual_size | None | | visibility | public | +------------------+----------------------------------------------------------------------------------+ --RamaK From: Donny Davis [mailto:donny at fortnebula.com] Sent: Tuesday, January 14, 2020 4:05 PM To: Bhupathi, Ramakrishna Cc: OpenStack Discuss Subject: Re: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken Did you update your cluster distro? Can you share your current cluster template? Donny Davis c: 805 814 6800 On Tue, Jan 14, 2020, 3:56 PM Bhupathi, Ramakrishna > wrote: I just moved to Fedora core OS image (fedora-coreos-31) to build my K8s Magnum cluster and cluster creation fails with ERROR: The Parameter (octavia_ingress_controller_tag) was not defined in template. I wonder why I need that tag. Any help please? --RamaK From: Feilong Wang [mailto:feilong at catalyst.net.nz] Sent: Monday, January 13, 2020 4:25 PM To: Donny Davis > Cc: OpenStack Discuss > Subject: Re: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken OK, if you're happy to stay on CoreOS, all good. If you're interested in migrating to Fedora CoreOS and have questions, then you're welcome to popup in #openstack-containers. Cheers. On 14/01/20 10:21 AM, Donny Davis wrote: Just Coreos - I tried them all and it was the only one that worked oob. On Mon, Jan 13, 2020 at 4:10 PM Feilong Wang > wrote: Hi Donny, Do you mean Fedore CoreOS or just CoreOS? The current CoreOS driver is not actively maintained, I would suggest migrating to Fedora CoreOS and I'm happy to help if you have any question. Thanks. On 14/01/20 9:57 AM, Donny Davis wrote: FWIW I was only able to get the coreos image working with magnum oob.. the rest just didn't work. On Mon, Jan 13, 2020 at 2:31 PM feilong > wrote: Hi Bhupathi, Firstly, I would suggest setting the use_podman=False when using fedora atomic image. And it would be nice to set the "kube_tag", e.g. v1.15.6 explicitly. Then please trigger a new cluster creation. Then if you still run into error. Here is the debug steps: 1. ssh into the master node, check log /var/log/cloud-init-output.log 2. If there is no error in above log file, then run journalctl -u heat-container-agent to check the heat-container-agent log. If above step is correct, then you must be able to see something useful here. On 11/01/20 12:15 AM, Bhupathi, Ramakrishna wrote: Wang, Here it is . I added the labels subsequently. My nova and neutron are working all right as I installed various systems there working with no issues.. [cid:image001.png at 01D5CAF2.D1A55F30] From: Feilong Wang [mailto:feilong at catalyst.net.nz] Sent: Thursday, January 9, 2020 6:12 PM To: openstack-discuss at lists.openstack.org Subject: Re: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken Hi Bhupathi, Could you please share your cluster template? And please make sure your Nova/Neutron works. On 10/01/20 2:45 AM, Bhupathi, Ramakrishna wrote: Folks, I am building a Kubernetes Cluster( Openstack Train) and using fedora atomic-29 image . The nodes come up fine ( I have a simple 1 master and 1 node) , but the cluster creation times out, and when I access the cloud-init logs I see this error . Wondering what I am missing as this used to work before. I wonder if this is image related . [ERROR]: Unable to render networking. Network config is likely broken: No available network renderers found. Searched through list: ['eni', 'sysconfig', 'netplan'] Essentially the stack creation fails in “kube_cluster_deploy” Can somebody help me debug this ? Any help is appreciated. --RamaK The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. -- Cheers & Best regards, Feilong Wang (王飞龙) ------------------------------------------------------ Senior Cloud Software Engineer Tel: +64-48032246 Email: flwang at catalyst.net.nz Catalyst IT Limited Level 6, Catalyst House, 150 Willis Street, Wellington ------------------------------------------------------ -- ~/DonnyD C: 805 814 6800 "No mission too difficult. No sacrifice too great. Duty First" -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- -- ~/DonnyD C: 805 814 6800 "No mission too difficult. No sacrifice too great. Duty First" -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. E-MAIL CONFIDENTIALITY NOTICE: The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bharat at stackhpc.com Wed Jan 15 19:52:23 2020 From: bharat at stackhpc.com (Bharat Kunwar) Date: Wed, 15 Jan 2020 19:52:23 +0000 Subject: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken In-Reply-To: References: Message-ID: <294DAE7D-9BDC-41B3-A591-FA8AF0B99E92@stackhpc.com> The os_distro label needs to be fedora-coreos. Sent from my iPhone > On 15 Jan 2020, at 19:49, Bhupathi, Ramakrishna wrote: > >  > Donny, > Yes. Here it is. Cluster-template info as well as the Image info. > > > magnum cluster-template-show kt-coreOS > +-----------------------+--------------------------------------+ > | Property | Value | > +-----------------------+--------------------------------------+ > | insecure_registry | - | > | http_proxy | - | > | updated_at | 2020-01-15T19:05:56+00:00 | > | floating_ip_enabled | True | > | fixed_subnet | - | > | master_flavor_id | - | > | user_id | 8d22ae284924432ba026e8a6236bc52e | > | uuid | 6aea495e-6d8d-420b-8ca3-6e7fed73f3c7 | > | no_proxy | - | > | https_proxy | - | > | tls_disabled | True | > | keypair_id | ramak-test | > | hidden | False | > | project_id | 0c1abff4e920448ba86638bd0d78f7ca | > | public | False | > | labels | {'use_podman': 'false', ' | > | | kube_tag': 'v1.16.2'} | > | docker_volume_size | 5 | > | server_type | vm | > | external_network_id | thunder-public-vlan280 | > | cluster_distro | coreos | > | image_id | b1354e4e-8281-4330-a4b2-b5fdb022f805 | > | volume_driver | - | > | registry_enabled | False | > | docker_storage_driver | devicemapper | > | apiserver_port | - | > | name | kt-coreOS | > | created_at | 2020-01-14T20:32:04+00:00 | > | network_driver | flannel | > | fixed_network | - | > | coe | kubernetes | > | flavor_id | kuber-node | > | master_lb_enabled | False | > | dns_nameserver | 8.8.8.8 | > +-----------------------+--------------------------------------+ > > ubuntu at kolla-ubuntu:~$ glance image-show 2fa8b3d8-c2e5-4568-9340-a18dd3d3120a > +------------------+----------------------------------------------------------------------------------+ > | Property | Value | > +------------------+----------------------------------------------------------------------------------+ > | checksum | cfbdc70bde5cd7df73a05a0fdc8e806c | > | container_format | bare | > | created_at | 2020-01-14T14:53:01Z | > | disk_format | qcow2 | > | id | 2fa8b3d8-c2e5-4568-9340-a18dd3d3120a | > | locations | [{"url": "rbd://8c7d79a9-1275-4487-8ed0-6ea1fedccbef/images/2fa8b3d8-c2e5-4568-9 | > | | 340-a18dd3d3120a/snap", "metadata": {}}] | > | min_disk | 0 | > | min_ram | 0 | > | name | coreOS-latest | > | os_distro | coreos | > | os_hash_algo | sha512 | > | os_hash_value | e6c4ce2e3e9dac4606f0edf689erf8782f99e249cc07887f620db69c9b91631301b480086c0e | > | | 8ef5f42f4909b3fc3ef110e0erwff2922c0ca6a665dd11c57a | > | os_hidden | False | > | owner | 0c1abff4e920448ba86638bd0d78f7ca | > | protected | False | > | size | 1068171264 | > | status | active | > | tags | [] | > | updated_at | 2020-01-14T14:53:39Z | > | virtual_size | None | > | visibility | public | > +------------------+----------------------------------------------------------------------------------+ > > --RamaK > > From: Donny Davis [mailto:donny at fortnebula.com] > Sent: Tuesday, January 14, 2020 4:05 PM > To: Bhupathi, Ramakrishna > Cc: OpenStack Discuss > Subject: Re: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken > > Did you update your cluster distro? Can you share your current cluster template? > > Donny Davis > c: 805 814 6800 > > On Tue, Jan 14, 2020, 3:56 PM Bhupathi, Ramakrishna wrote: > I just moved to Fedora core OS image (fedora-coreos-31) to build my K8s Magnum cluster and cluster creation fails with > ERROR: The Parameter (octavia_ingress_controller_tag) was not defined in template. > > I wonder why I need that tag. Any help please? > > --RamaK > > From: Feilong Wang [mailto:feilong at catalyst.net.nz] > Sent: Monday, January 13, 2020 4:25 PM > To: Donny Davis > Cc: OpenStack Discuss > Subject: Re: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken > > OK, if you're happy to stay on CoreOS, all good. If you're interested in migrating to Fedora CoreOS and have questions, then you're welcome to popup in #openstack-containers. Cheers. > > > > On 14/01/20 10:21 AM, Donny Davis wrote: > Just Coreos - I tried them all and it was the only one that worked oob. > > On Mon, Jan 13, 2020 at 4:10 PM Feilong Wang wrote: > Hi Donny, > > Do you mean Fedore CoreOS or just CoreOS? The current CoreOS driver is not actively maintained, I would suggest migrating to Fedora CoreOS and I'm happy to help if you have any question. Thanks. > > > > On 14/01/20 9:57 AM, Donny Davis wrote: > FWIW I was only able to get the coreos image working with magnum oob.. the rest just didn't work. > > On Mon, Jan 13, 2020 at 2:31 PM feilong wrote: > Hi Bhupathi, > > Firstly, I would suggest setting the use_podman=False when using fedora atomic image. And it would be nice to set the "kube_tag", e.g. v1.15.6 explicitly. Then please trigger a new cluster creation. Then if you still run into error. Here is the debug steps: > > 1. ssh into the master node, check log /var/log/cloud-init-output.log > > 2. If there is no error in above log file, then run journalctl -u heat-container-agent to check the heat-container-agent log. If above step is correct, then you must be able to see something useful here. > > > > On 11/01/20 12:15 AM, Bhupathi, Ramakrishna wrote: > Wang, > Here it is . I added the labels subsequently. My nova and neutron are working all right as I installed various systems there working with no issues.. > > > > From: Feilong Wang [mailto:feilong at catalyst.net.nz] > Sent: Thursday, January 9, 2020 6:12 PM > To: openstack-discuss at lists.openstack.org > Subject: Re: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken > > Hi Bhupathi, > > Could you please share your cluster template? And please make sure your Nova/Neutron works. > > > > On 10/01/20 2:45 AM, Bhupathi, Ramakrishna wrote: > Folks, > I am building a Kubernetes Cluster( Openstack Train) and using fedora atomic-29 image . The nodes come up fine ( I have a simple 1 master and 1 node) , but the cluster creation times out, and when I access the cloud-init logs I see this error . Wondering what I am missing as this used to work before. I wonder if this is image related . > > [ERROR]: Unable to render networking. Network config is likely broken: No available network renderers found. Searched through list: ['eni', 'sysconfig', 'netplan'] > > Essentially the stack creation fails in “kube_cluster_deploy” > > Can somebody help me debug this ? Any help is appreciated. > > --RamaK > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. > -- > Cheers & Best regards, > Feilong Wang (王飞龙) > Head of R&D > Catalyst Cloud - Cloud Native New Zealand > -------------------------------------------------------------------------- > Tel: +64-48032246 > Email: flwang at catalyst.net.nz > Level 6, Catalyst House, 150 Willis Street, Wellington > -------------------------------------------------------------------------- > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. > -- > Cheers & Best regards, > Feilong Wang (王飞龙) > ------------------------------------------------------ > Senior Cloud Software Engineer > Tel: +64-48032246 > Email: flwang at catalyst.net.nz > Catalyst IT Limited > Level 6, Catalyst House, 150 Willis Street, Wellington > ------------------------------------------------------ > > > -- > ~/DonnyD > C: 805 814 6800 > "No mission too difficult. No sacrifice too great. Duty First" > -- > Cheers & Best regards, > Feilong Wang (王飞龙) > Head of R&D > Catalyst Cloud - Cloud Native New Zealand > -------------------------------------------------------------------------- > Tel: +64-48032246 > Email: flwang at catalyst.net.nz > Level 6, Catalyst House, 150 Willis Street, Wellington > -------------------------------------------------------------------------- > > > -- > ~/DonnyD > C: 805 814 6800 > "No mission too difficult. No sacrifice too great. Duty First" > -- > Cheers & Best regards, > Feilong Wang (王飞龙) > Head of R&D > Catalyst Cloud - Cloud Native New Zealand > -------------------------------------------------------------------------- > Tel: +64-48032246 > Email: flwang at catalyst.net.nz > Level 6, Catalyst House, 150 Willis Street, Wellington > -------------------------------------------------------------------------- > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aj at suse.com Wed Jan 15 20:31:02 2020 From: aj at suse.com (Andreas Jaeger) Date: Wed, 15 Jan 2020 21:31:02 +0100 Subject: [infra] Retire x/zmq-event-publisher Message-ID: <57e6c329-430f-9d16-31b1-3b7c88a7e9ae@suse.com> This repo is not used anymore, it was forked and is maintained for Jenkins now elsewhere. I'll retire the repo now with topic retire-zmq-event-publisher, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = EF18 1673 38C4 A372 86B1 E699 5294 24A3 FF91 2ACB From feilong at catalyst.net.nz Wed Jan 15 20:36:19 2020 From: feilong at catalyst.net.nz (feilong) Date: Thu, 16 Jan 2020 09:36:19 +1300 Subject: [magnum][kolla] etcd wal sync duration issue In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04771716@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA0477170E@gmsxchsvr01.thecreation.com> <3f3fe0d1-7b61-d2f9-da65-d126ea5ed336@catalyst.net.nz> <046E9C0290DD9149B106B72FC9156BEA04771716@gmsxchsvr01.thecreation.com> Message-ID: <279cedf1-8bf4-fcf1-cfc2-990c97685531@catalyst.net.nz> Hi Eric, If you're using SSD, then I think the IO performance should  be OK. You can use this https://github.com/etcd-io/etcd/tree/master/tools/benchmark to verify and confirm that 's the root cause. Meanwhile, you can review the config of etcd cluster deployed by Magnum. I'm not an export of Etcd, so TBH I can't see anything wrong with the config. Most of them are just default configurations. As for the etcd image, it's built from https://github.com/projectatomic/atomic-system-containers/tree/master/etcd or you can refer CERN's repo https://gitlab.cern.ch/cloud/atomic-system-containers/blob/cern-qa/etcd/ *Spyros*, any comments? On 14/01/20 10:52 AM, Eric K. Miller wrote: > Hi Feilong, > > Thanks for responding! I am, indeed, using the default v3.2.7 version for etcd, which is the only available image. > > I did not try to reproduce with any other driver (we have never used DevStack, honestly, only Kolla-Ansible deployments). I did see a number of people indicating similar issues with etcd versions in the 3.3.x range, so I didn't think of it being an etcd issue, but then again most issues seem to be a result of people using HDDs and not SSDs, which makes sense. > > Interesting that you saw the same issue, though. We haven't tried Fedora CoreOS, but I think we would need Train for this. > > Everything I read about etcd indicates that it is extremely latency sensitive, due to the fact that it replicates all changes to all nodes and sends an fsync to Linux each time, so data is always guaranteed to be stored. I can see this becoming an issue quickly without super-low-latency network and storage. We are using Ceph-based SSD volumes for the Kubernetes Master node disks, which is extremely fast (likely 10x or better than anything people recommend for etcd), but network latency is always going to be higher with VMs on OpenStack with DVR than bare metal with VLANs due to all of the abstractions. > > Do you know who maintains the etcd images for Magnum here? Is there an easy way to create a newer image? > https://hub.docker.com/r/openstackmagnum/etcd/tags/ > > Eric > > > > From: Feilong Wang [mailto:feilong at catalyst.net.nz] > Sent: Monday, January 13, 2020 3:39 PM > To: openstack-discuss at lists.openstack.org > Subject: Re: [magnum][kolla] etcd wal sync duration issue > > Hi Eric, > That issue looks familiar for me. There are some questions I'd like to check before answering if you should upgrade to train. > 1. Are using the default v3.2.7 version for etcd? > 2. Did you try to reproduce this with devstack, using Fedora CoreOS driver? The etcd version could be 3.2.26 > I asked above questions because I saw the same error when I used Fedora Atomic with etcd v3.2.7 and I can't reproduce it with Fedora CoreOS + etcd 3.2.26 > > -- Cheers & Best regards, Feilong Wang (王飞龙) ------------------------------------------------------ Senior Cloud Software Engineer Tel: +64-48032246 Email: flwang at catalyst.net.nz Catalyst IT Limited Level 6, Catalyst House, 150 Willis Street, Wellington ------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From feilong at catalyst.net.nz Wed Jan 15 20:45:36 2020 From: feilong at catalyst.net.nz (feilong) Date: Thu, 16 Jan 2020 09:45:36 +1300 Subject: [magnum]: K8s cluster creation times out. OpenStack Train : [ERROR]: Unable to render networking. Network config is likely broken In-Reply-To: References: <59ed63745f4e4c42a63692c3ee4eb10d@ncwmexgp031.CORP.CHARTERCOM.com> <6c8f45f2-da74-18fd-7909-84c9c6762fe3@catalyst.net.nz> <24a8d164-8e38-1512-caf3-9447f070b8fd@catalyst.net.nz> <5846dbb7-fbcd-db22-7342-5ba2b6e4a1d3@catalyst.net.nz> Message-ID: <4b997f76-b300-9c96-ea16-5d4c84ea244f@catalyst.net.nz> Hi Bhupathi, Please read https://docs.openstack.org/magnum/latest/user/#use-podman When you're using Fedora CoreOS driver, you have to use the use_podman=True, because in Magnum Fedora CoreOS driver, podman is the only option. Please take my devstack cluster template as a reference. feilong at feilong-pc:~$ occt show c5ed303d-d255-45e8-8efb-bffac97f852c +-----------------------+-------------------------------------------------------------------------------------------------------------------------+ | Field                 | Value                                                                                                                   | +-----------------------+-------------------------------------------------------------------------------------------------------------------------+ | insecure_registry     | -                                                                                                                       | | labels                | {u'use_podman': u'true', u'kube_tag': u'v1.16.4', u'etcd_tag': u'3.2.26', u'heat_container_agent_tag': u'train-stable'} | | updated_at            | 2020-01-06T23:14:12+00:00                                                                                               | | floating_ip_enabled   | True                                                                                                                    | | fixed_subnet          | -                                                                                                                       | | master_flavor_id      | ds2G                                                                                                                    | | uuid                  | c5ed303d-d255-45e8-8efb-bffac97f852c                                                                                    | | no_proxy              | -                                                                                                                       | | https_proxy           | -                                                                                                                       | | tls_disabled          | False                                                                                                                   | | keypair_id            | feilong                                                                                                                 | | public                | False                                                                                                                   | | http_proxy            | -                                                                                                                       | | docker_volume_size    | -                                                                                                                       | | server_type           | vm                                                                                                                      | | external_network_id   | public                                                                                                                  | | cluster_distro        | fedora-coreos                                                                                                           | | image_id              | c089d627-0265-4cbc-8c96-957eb529b024                                                                                    | | volume_driver         | -                                                                                                                       | | registry_enabled      | False                                                                                                                   | | docker_storage_driver | overlay2                                                                                                                | | apiserver_port        | -                                                                                                                       | | name                  | k8s-fc31-v1.16.4                                                                                                        | | created_at            | 2020-01-05T22:26:41+00:00                                                                                               | | network_driver        | calico                                                                                                                  | | fixed_network         | -                                                                                                                       | | coe                   | kubernetes                                                                                                              | | flavor_id             | ds1G                                                                                                                    | | master_lb_enabled     | False                                                                                                                   | | dns_nameserver        | 8.8.8.8                                                                                                                 | | hidden                | False                                                                                                                   | +-----------------------+-------------------------------------------------------------------------------------------------------------------------+ On 16/01/20 8:09 AM, Bhupathi, Ramakrishna wrote: > > Donny, > > Yes. Here it is. Cluster-template info as well as the Image info. > >   > >   > > magnum cluster-template-show kt-coreOS > > +-----------------------+--------------------------------------+ > > | Property              | Value                                | > > +-----------------------+--------------------------------------+ > > | insecure_registry     | -                                    | > > | http_proxy            | -                                    | > > | updated_at            | 2020-01-15T19:05:56+00:00            | > > | floating_ip_enabled   | True                                 | > > | fixed_subnet          | -                                    | > > | master_flavor_id      | -                                    | > > | user_id               | 8d22ae284924432ba026e8a6236bc52e     | > > | uuid                  | 6aea495e-6d8d-420b-8ca3-6e7fed73f3c7 | > > | no_proxy              | -                                    | > > | https_proxy           | -                                    | > > | tls_disabled          | True                                 | > > | keypair_id            | ramak-test                           | > > | hidden                | False                                | > > | project_id            | 0c1abff4e920448ba86638bd0d78f7ca     | > > | public                | False                                | > > | labels                | {'use_podman': 'false', '            | > > |                       | kube_tag': 'v1.16.2'}                | > > | docker_volume_size    | 5                                    | > > | server_type           | vm                                   | > > | external_network_id   | thunder-public-vlan280         | > > | cluster_distro        | coreos                               | > > | image_id              | b1354e4e-8281-4330-a4b2-b5fdb022f805 | > > | volume_driver         | -                                    | > > | registry_enabled      | False                                | > > | docker_storage_driver | devicemapper                         | > > | apiserver_port        | -                                    | > > | name                  | kt-coreOS                            | > > | created_at            | 2020-01-14T20:32:04+00:00            | > > | network_driver        | flannel                              | > > | fixed_network         | -                                    | > > | coe                   | kubernetes                           | > > | flavor_id             | kuber-node                           | > > | master_lb_enabled     | False                                | > > | dns_nameserver        | 8.8.8.8                              | > > +-----------------------+--------------------------------------+ > >   > > ubuntu at kolla-ubuntu:~$ glance image-show  > 2fa8b3d8-c2e5-4568-9340-a18dd3d3120a > > +------------------+----------------------------------------------------------------------------------+ > > | Property         | > Value                                                                            > | > > +------------------+----------------------------------------------------------------------------------+ > > | checksum         | > cfbdc70bde5cd7df73a05a0fdc8e806c                                                 > | > > | container_format | > bare                                                                             > | > > | created_at       | > 2020-01-14T14:53:01Z                                                             > | > > | disk_format      | > qcow2                                                                            > | > > | id               | > 2fa8b3d8-c2e5-4568-9340-a18dd3d3120a                          >                    | > > | locations        | [{"url": > "rbd://8c7d79a9-1275-4487-8ed0-6ea1fedccbef/images/2fa8b3d8-c2e5-4568-9 | > > |                  | 340-a18dd3d3120a/snap", "metadata": > {}}]                                         | > > | min_disk         | 0     >                                                                            | > > | min_ram          | > 0                                                                                > | > > | name             | > coreOS-latest                                         >                            | > > | os_distro        | > coreos                                                                           > | > > | os_hash_algo     | > sha512                                                                           > | > > | os_hash_value    | > e6c4ce2e3e9dac4606f0edf689erf8782f99e249cc07887f620db69c9b91631301b480086c0e > | > > |                  | > 8ef5f42f4909b3fc3ef110e0erwff2922c0ca6a665dd11c57a                                 > | > > | os_hidden        | False                                         >                                    | > > | owner            | > 0c1abff4e920448ba86638bd0d78f7ca                                                 > | > > | protected        | > False                                                                            > | > > | size             | > 1068171264                                                                       > | > > | status           | > active                                                                           > | > > | tags             | > []                                                                               > | > > | updated_at       | > 2020-01-14T14:53:39Z                                                             > | > > | virtual_size     | None                       >                                                       | > > | visibility       | > public                                                                           > | > > +------------------+----------------------------------------------------------------------------------+ > >   > > --RamaK > >   > > *From:*Donny Davis [mailto:donny at fortnebula.com] > *Sent:* Tuesday, January 14, 2020 4:05 PM > *To:* Bhupathi, Ramakrishna > *Cc:* OpenStack Discuss > *Subject:* Re: [magnum]: K8s cluster creation times out. OpenStack > Train : [ERROR]: Unable to render networking. Network config is likely > broken > >   > > Did you update your cluster distro? Can you share your current cluster > template? > > Donny Davis > c: 805 814 6800 > >   > > On Tue, Jan 14, 2020, 3:56 PM Bhupathi, Ramakrishna > > wrote: > > I just moved to Fedora core OS image (fedora-coreos-31)  to build > my K8s Magnum cluster and  cluster creation fails with  > > ERROR: The Parameter (octavia_ingress_controller_tag) was not > defined in template. > >   > > I wonder why I need that tag. Any help please? > >   > > --RamaK > >   > > *From:*Feilong Wang [mailto:feilong at catalyst.net.nz > ] > *Sent:* Monday, January 13, 2020 4:25 PM > *To:* Donny Davis > > *Cc:* OpenStack Discuss > > *Subject:* Re: [magnum]: K8s cluster creation times out. OpenStack > Train : [ERROR]: Unable to render networking. Network config is > likely broken > >   > > OK, if you're happy to stay on CoreOS, all good. If you're > interested in migrating to Fedora CoreOS and have questions, then > you're welcome to popup in #openstack-containers. Cheers. > >   > > On 14/01/20 10:21 AM, Donny Davis wrote: > > Just Coreos - I tried them all and it was the only one that > worked oob.  > >   > > On Mon, Jan 13, 2020 at 4:10 PM Feilong Wang > > wrote: > > Hi Donny, > > Do you mean Fedore CoreOS or just CoreOS? The current > CoreOS driver is not actively maintained, I would suggest > migrating to Fedora CoreOS and I'm happy to help if you > have any question. Thanks. > >   > > On 14/01/20 9:57 AM, Donny Davis wrote: > > FWIW I was only able to get the coreos image working > with magnum oob.. the rest just didn't work.  > >   > > On Mon, Jan 13, 2020 at 2:31 PM feilong > > wrote: > > Hi Bhupathi, > > Firstly, I would suggest setting the > use_podman=False when using fedora atomic image. > And it would be nice to set the "kube_tag", e.g. > v1.15.6 explicitly. Then please trigger a new > cluster creation. Then if you still run into > error. Here is the debug steps: > > 1. ssh into the master node, check log > /var/log/cloud-init-output.log > > 2. If there is no error in above log file, then > run journalctl -u heat-container-agent to check > the heat-container-agent log. If above step is > correct, then you must be able to see something > useful here. > >   > > On 11/01/20 12:15 AM, Bhupathi, Ramakrishna wrote: > > Wang, > > Here it is  . I added the labels subsequently. > My nova and neutron are working all right as I > installed various systems there working with > no issues.. > >   > >   > > *From:* Feilong Wang > [mailto:feilong at catalyst.net.nz] > *Sent:* Thursday, January 9, 2020 6:12 PM > *To:* openstack-discuss at lists.openstack.org > > *Subject:* Re: [magnum]: K8s cluster creation > times out. OpenStack Train : [ERROR]: Unable > to render networking. Network config is likely > broken > >   > > Hi Bhupathi, > > Could you please share your cluster template? > And please make sure your Nova/Neutron works. > >   > > On 10/01/20 2:45 AM, Bhupathi, Ramakrishna wrote: > > Folks, > > I am building a Kubernetes Cluster( > Openstack Train) and using fedora > atomic-29 image . The nodes come up  fine > ( I have a simple 1 master and 1 node) , > but the cluster creation times out,  and > when I access the cloud-init logs I see > this error .  Wondering what I am missing > as this used to work before.  I wonder if > this is image related . > >   > > [ERROR]: Unable to render networking. > Network config is likely broken: No > available network renderers found. > Searched through list: ['eni', > 'sysconfig', 'netplan'] > >   > > Essentially the stack creation fails in > “kube_cluster_deploy” > >   > > Can somebody help me debug this ? Any help > is appreciated. > >   > > --RamaK > > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. > > -- > > Cheers & Best regards, > > Feilong Wang (王飞龙) > > Head of R&D > > Catalyst Cloud - Cloud Native New Zealand > > -------------------------------------------------------------------------- > > Tel: +64-48032246 > > Email: flwang at catalyst.net.nz > > Level 6, Catalyst House, 150 Willis Street, Wellington > > -------------------------------------------------------------------------- > > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. > > -- > > Cheers & Best regards, > > Feilong Wang (王飞龙) > > ------------------------------------------------------ > > Senior Cloud Software Engineer > > Tel: +64-48032246 > > Email: flwang at catalyst.net.nz > > Catalyst IT Limited > > Level 6, Catalyst House, 150 Willis Street, Wellington > > ------------------------------------------------------ > > >   > > -- > > ~/DonnyD > > C: 805 814 6800 > > "No mission too difficult. No sacrifice too great. > Duty First" > > -- > > Cheers & Best regards, > > Feilong Wang (王飞龙) > > Head of R&D > > Catalyst Cloud - Cloud Native New Zealand > > -------------------------------------------------------------------------- > > Tel: +64-48032246 > > Email: flwang at catalyst.net.nz > > Level 6, Catalyst House, 150 Willis Street, Wellington > > -------------------------------------------------------------------------- > > >   > > -- > > ~/DonnyD > > C: 805 814 6800 > > "No mission too difficult. No sacrifice too great. Duty First" > > -- > > Cheers & Best regards, > > Feilong Wang (王飞龙) > > Head of R&D > > Catalyst Cloud - Cloud Native New Zealand > > -------------------------------------------------------------------------- > > Tel: +64-48032246 > > Email: flwang at catalyst.net.nz > > Level 6, Catalyst House, 150 Willis Street, Wellington > > -------------------------------------------------------------------------- > > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. > > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. -- Cheers & Best regards, Feilong Wang (王飞龙) ------------------------------------------------------ Senior Cloud Software Engineer Tel: +64-48032246 Email: flwang at catalyst.net.nz Catalyst IT Limited Level 6, Catalyst House, 150 Willis Street, Wellington ------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Jan 15 23:01:01 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 15 Jan 2020 17:01:01 -0600 Subject: [tc][all] Updates on Ussuri cycle community-wide goals In-Reply-To: <16e860b4a50.10ee607ea44010.6012230745285412048@ghanshyammann.com> References: <16e181adbc4.1191b0166302215.2291880664205036921@ghanshyammann.com> <16e860b4a50.10ee607ea44010.6012230745285412048@ghanshyammann.com> Message-ID: <16fab704bc9.11f04c9ec93844.5066595270036750282@ghanshyammann.com> ---- On Tue, 19 Nov 2019 17:41:57 -0600 Ghanshyam Mann wrote ---- > ---- On Tue, 29 Oct 2019 10:20:43 -0500 Ghanshyam Mann wrote ---- > > Hello Everyone, > > > > We have two goals with their champions ready for review. Please review and provide your feedback on Gerrit. > > > > > > 1. Add goal for project specific PTL and contributor guides - Kendall Nelson > > - https://review.opendev.org/#/c/691737/ > > > > 2. Propose a new goal to migrate all legacy zuul jobs - Luigi Toscano > > - https://review.opendev.org/#/c/691278/ > > > > Hello Everyone, > > From the Forum and PTG discussions[1], we agreed to proceed with below two goals for the Ussuri cyle. > > 1. Drop Python 2.7 Support - Already Accepted. > Patches on almost all services are up for review and merge[2]. Merge those fast to avoid your projects gate > break due to cross projects dropping py2. > > 2. Project Specific New Contributor & PTL Docs - Under Review > The goal patch is under review. Feel Free to provide your feedback on https://review.opendev.org/#/c/691737/ > > 'migrate all legacy zuul job' is pre-selected as V cycle goal and under review in > https://review.opendev.org/#/c/691278/ This is the final update of Ussuri cycle community-wide goals selection. 2nd community-wide goal for the Ussuri cycle has been merged today. Below two are the final goals for this cycle[1]: 1. Drop Python 2.7 Support - In-progress - https://governance.openstack.org/tc/goals/selected/ussuri/drop-py27.html 2. Project Specific New Contributor & PTL Docs - Selected for Ussuri. - https://governance.openstack.org/tc/goals/selected/ussuri/project-ptl-and-contrib-docs.htm [1] https://governance.openstack.org/tc/goals/selected/ussuri/index.html -gmann > > > [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010943.html > [2] https://review.opendev.org/#/q/topic:drop-py27-support+(status:open+OR+status:merged) > > -gmann > > > We are still looking for the Champion volunteer for RBAC goal[1]. If you have any new ideas for goal, do not hesitate to add in etherpad[2] > > > > [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010291.html > > [2] https://etherpad.openstack.org/p/PVG-u-series-goals > > > > -gmann & diablo_rojo > > > > > > > > > From tony.pearce at cinglevue.com Thu Jan 16 07:36:45 2020 From: tony.pearce at cinglevue.com (Tony Pearce) Date: Thu, 16 Jan 2020 15:36:45 +0800 Subject: DR options with openstack Message-ID: <5e201295.1c69fb81.a69b.d77d@mx.google.com> An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Thu Jan 16 09:23:26 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Thu, 16 Jan 2020 09:23:26 +0000 Subject: [blazar] No IRC meeting today Message-ID: Hello, Similarly to Tuesday, I have to cancel today's Blazar IRC meeting. Sorry for the late notice. Thanks, Pierre Riteau (priteau) From amy at demarco.com Thu Jan 16 12:53:38 2020 From: amy at demarco.com (Amy Marrich) Date: Thu, 16 Jan 2020 06:53:38 -0600 Subject: Rails Girls Summer of Code Message-ID: Hi All, I was contacted about this program to see if OpenStack might be interested in participating and despite the name it is language agnostic. Moe information on the program can be found at Rails Girls Summer of Code, I'm willing to help organize our efforts but would need to know level of interest to participate and mentor. Thanks, Amy (spotz) Chair, Diversity and Inclusion WG Chair, User Committee -------------- next part -------------- An HTML attachment was scrubbed... URL: From francois.scheurer at everyware.ch Thu Jan 16 12:56:49 2020 From: francois.scheurer at everyware.ch (Francois Scheurer) Date: Thu, 16 Jan 2020 13:56:49 +0100 Subject: [cinder] consistency group not working In-Reply-To: <20191111140016.qyftq5iy27ekmdtj@localhost> References: <7adf0a5d-43b3-c606-2ba8-00d97b96cbdc@everyware.ch> <20191111140016.qyftq5iy27ekmdtj@localhost> Message-ID: Dear Gorka Many thanks for your answer. Cheers Francois On 11/11/19 3:00 PM, Gorka Eguileor wrote: > On 27/09, Francois Scheurer wrote: >> Dear Cinder Experts >> >> >> We are running the rocky release. >> >> |We can create a consistency group: openstack consistency group create >> --volume-type b9f67298-cf68-4cb2-bed2-c806c5f83487 fsc-consgroup Bug 1: but >> adding volumes is not working: openstack consistency group add volume >> c3f49ef0-601e-4558-a75a-9b758304ce3b b48752e3-641f-4a49-a892-6cb54ab6b74d >> c0022411-59a4-4c7c-9474-c7ea8ccc7691 0f4c6493-dbe2-4f75-8e37-5541a267e3f2 => >> Invalid volume: Volume is not local to this node. (HTTP 400) (Request-ID: >> req-7f67934a-5835-40ef-b25c-12591fd79f85) Bug 2: deleting consistency group >> is also not working (silently failing): openstack consistency group delete >> c3f49ef0-601e-4558-a75a-9b758304ce3b |||=> AttributeError: 'RBDDriver' >> object has no attribute 'delete_consistencygroup'| See details below. Using >> the --force option makes no difference and the consistency group is not >> deleted. Do you think this is a bug or a configuration issue? Thank you in >> advance. | >> >> Cheers >> >> Francois > Hi, > > It seems you are trying to use consistency groups with the RBD driver, > which doesn't currently support consistency groups. > > Cheers, > Gorka. > >> |Details: ==> >> /var/lib/docker/volumes/kolla_logs/_data/cinder/cinder-api-access.log <== >> 10.0.129.17 - - [27/Sep/2019:12:16:24 +0200] "POST /v3/f099965b37ac41489e9cac8c9d208711/consistencygroups/3706bbab-e2df-4507-9168-08ef811e452c/delete >> HTTP/1.1" 202 - 109720 "-" "python-cinderclient" ==> >> /var/lib/docker/volumes/kolla_logs/_data/cinder/cinder-volume.log <== >> 2019-09-27 12:16:24.491 30 ERROR oslo_messaging.rpc.server >> [req-9010336e-d569-47ad-84e2-8dd8b729939c b141574ee71f49a0b53a05ae968576c5 >> f099965b37ac41489e9cac8c9d208711 - default default] Exception during message >> handling: AttributeError: 'RBDDriver' object has no attribute >> 'delete_consistencygroup' 2019-09-27 12:16:24.491 30 ERROR >> oslo_messaging.rpc.server Traceback (most recent call last): 2019-09-27 >> 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", >> line 163, in _process_incoming 2019-09-27 12:16:24.491 30 ERROR >> oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-09-27 >> 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", >> line 265, in dispatch 2019-09-27 12:16:24.491 30 ERROR >> oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, >> args) 2019-09-27 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", >> line 194, in _do_dispatch 2019-09-27 12:16:24.491 30 ERROR >> oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-09-27 >> 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/osprofiler/profiler.py", >> line 159, in wrapper 2019-09-27 12:16:24.491 30 ERROR >> oslo_messaging.rpc.server result = f(*args, **kwargs) 2019-09-27 >> 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/cinder/volume/manager.py", >> line 3397, in delete_group 2019-09-27 12:16:24.491 30 ERROR >> oslo_messaging.rpc.server vol_obj.save() 2019-09-27 12:16:24.491 30 ERROR >> oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", >> line 220, in __exit__ 2019-09-27 12:16:24.491 30 ERROR >> oslo_messaging.rpc.server self.force_reraise() 2019-09-27 12:16:24.491 30 >> ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", >> line 196, in force_reraise 2019-09-27 12:16:24.491 30 ERROR >> oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) >> 2019-09-27 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/cinder/volume/manager.py", >> line 3362, in delete_group 2019-09-27 12:16:24.491 30 ERROR >> oslo_messaging.rpc.server self.driver.delete_consistencygroup(context, cg, >> 2019-09-27 12:16:24.491 30 ERROR oslo_messaging.rpc.server AttributeError: >> 'RBDDriver' object has no attribute 'delete_consistencygroup' 2019-09-27 >> 12:16:24.491 30 ERROR oslo_messaging.rpc.server| >> >> >> >> >> -- >> >> >> EveryWare AG >> François Scheurer >> Senior Systems Engineer >> Zurlindenstrasse 52a >> CH-8003 Zürich >> >> tel: +41 44 466 60 00 >> fax: +41 44 466 60 10 >> mail: francois.scheurer at everyware.ch >> web: http://www.everyware.ch >> > -- EveryWare AG François Scheurer Senior Systems Engineer Zurlindenstrasse 52a CH-8003 Zürich tel: +41 44 466 60 00 fax: +41 44 466 60 10 mail: francois.scheurer at everyware.ch web: http://www.everyware.ch -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5978 bytes Desc: not available URL: From radoslaw.piliszek at gmail.com Thu Jan 16 13:30:51 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 16 Jan 2020 14:30:51 +0100 Subject: [all] cirros-cloud.net is down Message-ID: Hi, Folks! If your CI jobs depend on downloaded cirros image, then be aware that cirros links are seemingly permanently down atm. I reported [1] due to lack of a better place (that I know of). [1] https://github.com/cirros-dev/cirros/issues/12 -yoctozepto From Martin.Gehrke at twosigma.com Thu Jan 16 13:37:04 2020 From: Martin.Gehrke at twosigma.com (Martin Gehrke) Date: Thu, 16 Jan 2020 13:37:04 +0000 Subject: [ops] live-migration progress Message-ID: Hi, Last week at the Openstack Operators meetup in London, someone mentioned that there was an issue with the progress updates during a live migration and that by turning it off you could increase your success rate. Does anyone know more TIA Martin Gehrke DevOps Manager & OpenStack Tech Lead Two Sigma Investments, LP New York, NY -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcin.juszkiewicz at linaro.org Thu Jan 16 13:52:28 2020 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Thu, 16 Jan 2020 14:52:28 +0100 Subject: [all] cirros-cloud.net is down In-Reply-To: References: Message-ID: <6764c27d-f7c1-42d1-2f4b-4dcedde2b5d7@linaro.org> W dniu 16.01.2020 o 14:30, Radosław Piliszek pisze: > Hi, Folks! > > If your CI jobs depend on downloaded cirros image, then be aware that > cirros links are seemingly permanently down atm. > > I reported [1] due to lack of a better place (that I know of). > > [1] https://github.com/cirros-dev/cirros/issues/12 This is official place now. We moved Cirros from Launchpad to Github in December. From jean-philippe at evrard.me Thu Jan 16 15:05:29 2020 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Thu, 16 Jan 2020 16:05:29 +0100 Subject: [tc] January meeting agenda In-Reply-To: <1d35cdc723dbd4d50ab6a933b6a6a2c8a8ee4153.camel@evrard.me> References: <1d35cdc723dbd4d50ab6a933b6a6a2c8a8ee4153.camel@evrard.me> Message-ID: <05c9dde499c4ac9577e99f59071c85b0b5029a91.camel@evrard.me> Hello, The meeting logs are available here: http://eavesdrop.openstack.org/meetings/tc/2020/tc.2020-01-16-14.00.html Thank you everyone! Regards, JP From emiller at genesishosting.com Thu Jan 16 17:00:20 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Thu, 16 Jan 2020 11:00:20 -0600 Subject: [magnum][kolla] etcd wal sync duration issue In-Reply-To: <279cedf1-8bf4-fcf1-cfc2-990c97685531@catalyst.net.nz> References: <046E9C0290DD9149B106B72FC9156BEA0477170E@gmsxchsvr01.thecreation.com> <3f3fe0d1-7b61-d2f9-da65-d126ea5ed336@catalyst.net.nz> <046E9C0290DD9149B106B72FC9156BEA04771716@gmsxchsvr01.thecreation.com> <279cedf1-8bf4-fcf1-cfc2-990c97685531@catalyst.net.nz> Message-ID: <046E9C0290DD9149B106B72FC9156BEA04771749@gmsxchsvr01.thecreation.com> Hi Feilong, Before I was able to use the benchmark tool you mentioned, we saw some other slowdowns with Ceph (all flash). It appears that something must have crashed somewhere since we had to restart a couple things, after which etcd has been performing fine and no more health issues being reported by Magnum. So, it looks like it wasn't etcd related afterall. However, while researching, I found that etcd's fsync on every write (so it guarantees a write cache flush for each write) apparently creates some havoc with some SSDs, where the SSD performs a full cache flush of multiple caches. This article explains it a LOT better: https://yourcmc.ru/wiki/Ceph_performance (scroll to the "Drive cache is slowing you down" section) It seems that the optimal configuration for etcd would be to use local drives in each node and be sure that the write cache is disabled in the SSDs - as opposed to using Ceph volumes, which already adds network latency, but can create even more latency for synchronizations due to Ceph's replication. Eric From: feilong [mailto:feilong at catalyst.net.nz] Sent: Wednesday, January 15, 2020 2:36 PM To: Eric K. Miller; openstack-discuss at lists.openstack.org Cc: Spyros Trigazis Subject: Re: [magnum][kolla] etcd wal sync duration issue Hi Eric, If you're using SSD, then I think the IO performance should be OK. You can use this https://github.com/etcd-io/etcd/tree/master/tools/benchmark to verify and confirm that 's the root cause. Meanwhile, you can review the config of etcd cluster deployed by Magnum. I'm not an export of Etcd, so TBH I can't see anything wrong with the config. Most of them are just default configurations. As for the etcd image, it's built from https://github.com/projectatomic/atomic-system-containers/tree/master/etcd or you can refer CERN's repo https://gitlab.cern.ch/cloud/atomic-system-containers/blob/cern-qa/etcd/ Spyros, any comments? On 14/01/20 10:52 AM, Eric K. Miller wrote: Hi Feilong, Thanks for responding! I am, indeed, using the default v3.2.7 version for etcd, which is the only available image. I did not try to reproduce with any other driver (we have never used DevStack, honestly, only Kolla-Ansible deployments). I did see a number of people indicating similar issues with etcd versions in the 3.3.x range, so I didn't think of it being an etcd issue, but then again most issues seem to be a result of people using HDDs and not SSDs, which makes sense. Interesting that you saw the same issue, though. We haven't tried Fedora CoreOS, but I think we would need Train for this. Everything I read about etcd indicates that it is extremely latency sensitive, due to the fact that it replicates all changes to all nodes and sends an fsync to Linux each time, so data is always guaranteed to be stored. I can see this becoming an issue quickly without super-low-latency network and storage. We are using Ceph-based SSD volumes for the Kubernetes Master node disks, which is extremely fast (likely 10x or better than anything people recommend for etcd), but network latency is always going to be higher with VMs on OpenStack with DVR than bare metal with VLANs due to all of the abstractions. Do you know who maintains the etcd images for Magnum here? Is there an easy way to create a newer image? https://hub.docker.com/r/openstackmagnum/etcd/tags/ Eric From: Feilong Wang [mailto:feilong at catalyst.net.nz] Sent: Monday, January 13, 2020 3:39 PM To: openstack-discuss at lists.openstack.org Subject: Re: [magnum][kolla] etcd wal sync duration issue Hi Eric, That issue looks familiar for me. There are some questions I'd like to check before answering if you should upgrade to train. 1. Are using the default v3.2.7 version for etcd? 2. Did you try to reproduce this with devstack, using Fedora CoreOS driver? The etcd version could be 3.2.26 I asked above questions because I saw the same error when I used Fedora Atomic with etcd v3.2.7 and I can't reproduce it with Fedora CoreOS + etcd 3.2.26 -- Cheers & Best regards, Feilong Wang (王飞龙) ------------------------------------------------------ Senior Cloud Software Engineer Tel: +64-48032246 Email: flwang at catalyst.net.nz Catalyst IT Limited Level 6, Catalyst House, 150 Willis Street, Wellington ------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Thu Jan 16 17:55:06 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 16 Jan 2020 18:55:06 +0100 Subject: [all] cirros-cloud.net is down In-Reply-To: References: Message-ID: We investigated the issue further with Clark (@clarkb). It seems infra provides cached cirros 0.4.0 but some jobs use older cirros versions. Please verify your jobs use cirros 0.4.0 and use the cache in /opt/cache/files to avoid failures due to mirror flakiness. Default DevStack already uses it. -yoctozepto czw., 16 sty 2020 o 14:30 Radosław Piliszek napisał(a): > > Hi, Folks! > > If your CI jobs depend on downloaded cirros image, then be aware that > cirros links are seemingly permanently down atm. > > I reported [1] due to lack of a better place (that I know of). > > [1] https://github.com/cirros-dev/cirros/issues/12 > > -yoctozepto From Albert.Braden at synopsys.com Thu Jan 16 19:49:08 2020 From: Albert.Braden at synopsys.com (Albert Braden) Date: Thu, 16 Jan 2020 19:49:08 +0000 Subject: DR options with openstack In-Reply-To: <5e201295.1c69fb81.a69b.d77d@mx.google.com> References: <5e201295.1c69fb81.a69b.d77d@mx.google.com> Message-ID: Hi Tony, It looks like Cheesecake didn’t survive but apparently some components of it did; details in https://docs.openstack.org/cinder/pike/contributor/replication.html I’m not using Cinder now; we used it at eBay with Ceph and Netapp backends. Netapp makes it easy but is expensive; Ceph is free but you have to figure out how to make it work. You’re right about forking; we did it and then upgrading turned from an incredibly difficult ordeal to an impossible one. It’s better to stay with the “official” code so that upgrading remains an option. I’m just an operator; hopefully someone more expert will reply with more useful info. It’s true that our community lacks participation. It’s very difficult for a new operator to start using openstack and get help with the issues that they encounter. So far this mailing list has been the best resource for me. IRC and Ask Openstack are mostly unattended. I try to help out in #openstack when I can, but I don’t know a lot so I mostly end up telling people to ask on the list. On IRC sometimes I find help by asking in other openstack-* channels. Sometimes people complain that I’m asking in a developer channel, but sometimes I get help. Persistence is the key. If I keep asking long enough in enough places, eventually someone will answer. If all else fails, I open a bug. Good luck and welcome to the Openstack community! From: Tony Pearce Sent: Wednesday, January 15, 2020 11:37 PM To: openstack-discuss at lists.openstack.org Subject: DR options with openstack Hi all My questions are; 1. How are people using iSCSI Cinder storage with Openstack to-date? For example a Nimble Storage array backend. I mean to say, are people using backend integration drivers for other hardware (like netapp)? Or are they using backend iscsi for example? 2. How are people managing DR with Openstack in terms of backend storage replication to another array in another location and continuing to use Openstack? The environment which I am currently using; 1 x Nimble Storage array (iSCSI) with nimble.py Cinder driver 1 x virtualised Controller node 2 x physical compute nodes This is Openstack Pike. In addition, I have a 2nd Nimble Storage array in another location. To explain the questions I’d like to put forward my thoughts for question 2 first: For point 2 above, I have been searching for a way to utilise replicated volumes on the 2nd array from Openstack with existing instances. For example, if site 1 goes down how would I bring up openstack in the 2nd location and boot up the instances where their volumes are stored on the 2nd array. I found a proposal for something called “cheesecake” ref: https://specs.openstack.org/openstack/cinder-specs/specs/rocky/cheesecake-promote-backend.html But I could not find if it had been approved or implemented. So I return to square 1. I have some thoughts about failing over the controller VM and compute node but I don’t think there’s any need to go into here because of the above blocker and for brevity anyway. The nimble.py driver which I am using came with Openstack Pike and it appears Nimble / HPE are not maintaining it any longer. I saw a commit to remove nimble.py in Openstack Train release. The driver uses the REST API to perform actions on the array. Such as creating a volume, downloading the image, mounting the volume to the instance, snapshots, clones etc. This is great for me because to date I have around 10TB of openstack storage data allocated and the Nimble array shows the amount of data being consumed is <900GB. This is due to the compression and zero-byte snapshots and clones. So coming back to question 2 – is it possible? Can you drop me some keywords that I can search for such as an Openstack component like Cheesecake? I think basically what I am looking for is a supported way of telling Openstack that the instance volumes are now located at the new / second array. This means a new cinder backend. Example, new iqn, IP address, volume serial number. I think I could probably hack the cinder db but I really want to avoid that. So failing the above, it brings me to the question 1 I asked before. How are people using Cinder volumes? May be I am going about this the wrong way and need to take a few steps backwards to go forwards? I need storage to be able to deploy instances onto. Snapshots and clones are desired. At the moment these operations take less time than the horizon dashboard takes to load because of the waiting API responses. When searching for information about the above as an end-user / consumer I get a bit concerned. Is it right that Openstack usage is dropping? There’s no web forum to post questions. The chatroom on freenode is filled with ~300 ghosts. Ask Openstack questions go without response. Earlier this week (before I found this mail list) I had to use facebook to report that the Openstack.org website had been hacked. Basically it seems that if you’re a developer that can write code then you’re in but that’s it. I have never been a coder and so I am somewhat stuck. Thanks in advance Sent from Mail for Windows 10 -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Thu Jan 16 21:24:15 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 16 Jan 2020 21:24:15 +0000 Subject: DR options with openstack In-Reply-To: References: <5e201295.1c69fb81.a69b.d77d@mx.google.com> Message-ID: <20200116212414.ugbths4zeilnylxc@yuggoth.org> On 2020-01-16 19:49:08 +0000 (+0000), Albert Braden wrote: [...] > On IRC sometimes I find help by asking in other openstack-* > channels. Sometimes people complain that I’m asking in a developer > channel, but sometimes I get help [...] I hope we don't have "developer[-only] channels" in OpenStack. The way of free/libre open source software is that users often become developers once they gain an increased familiarity with a project, so telling them to go away when they have a question is absolutely the wrong approach if we want this to be a sustainable effort longer term. I'm a developer on a number of projects where I still regularly have questions as a user, so even for selfish reasons I don't think that sort of discussion should be off-topic. If software developers get annoyed by users asking them too many questions or the same questions over and over, they should see that as a clear sign that they need to improve the documentation they maintain. So just to reassure you, you are absolutely doing the right thing by asking folks in project-specific IRC channels (or on this mailing list) when documentation about something is unclear or you encounter an undocumented behavior you'd like help investigating. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From Burak.Hoban at iag.com.au Thu Jan 16 22:41:56 2020 From: Burak.Hoban at iag.com.au (Burak Hoban) Date: Thu, 16 Jan 2020 22:41:56 +0000 Subject: DR options with openstack Message-ID: Hey Tony, Keep in mind that if you're looking to run OpenStack, but you're not feeling comfortable with the community support then there's always the option to go with a vendor backed version. These are usually a good option for those a little more risk adverse, or who don't have the time/or skills to maintain upstream releases - however going down that path usually means you can do less with OpenStack (depending on the vendor), but you have a large pool of resources to help troubleshoot and answer questions. We do both approaches internally for different clusters, so both approaches have their pro and cons. You touched on a few points in your original email... > If you had two OpenStack clusters, one in "site 1" and another in "site 2", then you could look at below for backup/restore of instances cross-cluster: - Freezer -> https://wiki.openstack.org/wiki/Freezer - Trillio (basically just a series of nova snapshots under the cover) -> https://www.trilio.io/ You could then over the top roll out a file level based backup tool on each instance, this would pretty much offer you replication functionality without having to do block-level tinkering. > Failover of OpenStack controller/computes If you have two sites, you can always go for 3x Controller deployment spanning cross site. Depending on latency obviously, however all you really need is a good enough link for RabbitMQ/Galera to talk reliably etc. Failing that, I'd recommend backing up your Controller with ReaR. From there you can also schedule frequent automated jobs to do a OpenStack DB backups. Recovering should be a case of ReaR restore, load latest OpenStack DB and start everything up... You'll probably want to ensure your VLANs are spanned cross-site so you can reuse same IP addresses. https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/backup_and_restore/05_rear.html https://superuser.openstack.org/articles/tutorial-rear-openstack-deployment/ In reality, the best solution would be to have two isolated clusters, and your workloads spanned across both sites. Obviously that isn't always possible (from personal experience), but pushing people down the Kubernetes path and then for the rest automation/backup utilities may cater for your needs. Having said that, Albert's link does look promising -> https://docs.openstack.org/cinder/pike/contributor/replication.html Date: Thu, 16 Jan 2020 19:49:08 +0000 From: Albert Braden To: Tony Pearce , "openstack-discuss at lists.openstack.org" Subject: RE: DR options with openstack Message-ID: Content-Type: text/plain; charset="utf-8" Hi Tony, It looks like Cheesecake didn’t survive but apparently some components of it did; details in https://docs.openstack.org/cinder/pike/contributor/replication.html I’m not using Cinder now; we used it at eBay with Ceph and Netapp backends. Netapp makes it easy but is expensive; Ceph is free but you have to figure out how to make it work. You’re right about forking; we did it and then upgrading turned from an incredibly difficult ordeal to an impossible one. It’s better to stay with the “official” code so that upgrading remains an option. I’m just an operator; hopefully someone more expert will reply with more useful info. It’s true that our community lacks participation. It’s very difficult for a new operator to start using openstack and get help with the issues that they encounter. So far this mailing list has been the best resource for me. IRC and Ask Openstack are mostly unattended. I try to help out in #openstack when I can, but I don’t know a lot so I mostly end up telling people to ask on the list. On IRC sometimes I find help by asking in other openstack-* channels. Sometimes people complain that I’m asking in a developer channel, but sometimes I get help. Persistence is the key. If I keep asking long enough in enough places, eventually someone will answer. If all else fails, I open a bug. Good luck and welcome to the Openstack community! From: Tony Pearce Sent: Wednesday, January 15, 2020 11:37 PM To: openstack-discuss at lists.openstack.org Subject: DR options with openstack Hi all My questions are; 1. How are people using iSCSI Cinder storage with Openstack to-date? For example a Nimble Storage array backend. I mean to say, are people using backend integration drivers for other hardware (like netapp)? Or are they using backend iscsi for example? 2. How are people managing DR with Openstack in terms of backend storage replication to another array in another location and continuing to use Openstack? The environment which I am currently using; 1 x Nimble Storage array (iSCSI) with nimble.py Cinder driver 1 x virtualised Controller node 2 x physical compute nodes This is Openstack Pike. In addition, I have a 2nd Nimble Storage array in another location. To explain the questions I’d like to put forward my thoughts for question 2 first: For point 2 above, I have been searching for a way to utilise replicated volumes on the 2nd array from Openstack with existing instances. For example, if site 1 goes down how would I bring up openstack in the 2nd location and boot up the instances where their volumes are stored on the 2nd array. I found a proposal for something called “cheesecake” ref: https://specs.openstack.org/openstack/cinder-specs/specs/rocky/cheesecake-promote-backend.html But I could not find if it had been approved or implemented. So I return to square 1. I have some thoughts about failing over the controller VM and compute node but I don’t think there’s any need to go into here because of the above blocker and for brevity anyway. The nimble.py driver which I am using came with Openstack Pike and it appears Nimble / HPE are not maintaining it any longer. I saw a commit to remove nimble.py in Openstack Train release. The driver uses the REST API to perform actions on the array. Such as creating a volume, downloading the image, mounting the volume to the instance, snapshots, clones etc. This is great for me because to date I have around 10TB of openstack storage data allocated and the Nimble array shows the amount of data being consumed is <900GB. This is due to the compression and zero-byte snapshots and clones. So coming back to question 2 – is it possible? Can you drop me some keywords that I can search for such as an Openstack component like Cheesecake? I think basically what I am looking for is a supported way of telling Openstack that the instance volumes are now located at the new / second array. This means a new cinder backend. Example, new iqn, IP address, volume serial number. I think I could probably hack the cinder db but I really want to avoid that. So failing the above, it brings me to the question 1 I asked before. How are people using Cinder volumes? May be I am going about this the wrong way and need to take a few steps backwards to go forwards? I need storage to be able to deploy instances onto. Snapshots and clones are desired. At the moment these operations take less time than the horizon dashboard takes to load because of the waiting API responses. When searching for information about the above as an end-user / consumer I get a bit concerned. Is it right that Openstack usage is dropping? There’s no web forum to post questions. The chatroom on freenode is filled with ~300 ghosts. Ask Openstack questions go without response. Earlier this week (before I found this mail list) I had to use facebook to report that the Openstack.org website had been hacked. Basically it seems that if you’re a developer that can write code then you’re in but that’s it. I have never been a coder and so I am somewhat stuck. Thanks in advance Sent from Mail for Windows 10 _____________________________________________________________________ The information transmitted in this message and its attachments (if any) is intended only for the person or entity to which it is addressed. The message may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information, by persons or entities other than the intended recipient is prohibited. If you have received this in error, please contact the sender and delete this e-mail and associated material from any computer. The intended recipient of this e-mail may only use, reproduce, disclose or distribute the information contained in this e-mail and any attached files, with the permission of the sender. This message has been scanned for viruses. _____________________________________________________________________ From ignaziocassano at gmail.com Thu Jan 16 23:00:37 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 17 Jan 2020 00:00:37 +0100 Subject: DR options with openstack In-Reply-To: References: Message-ID: Hello, I suggest hystax for openstack failover and failback between two openstack sites. It works with openstack upstream As well. Ignazio Il Gio 16 Gen 2020, 23:46 Burak Hoban ha scritto: > Hey Tony, > > Keep in mind that if you're looking to run OpenStack, but you're not > feeling comfortable with the community support then there's always the > option to go with a vendor backed version. These are usually a good option > for those a little more risk adverse, or who don't have the time/or skills > to maintain upstream releases - however going down that path usually means > you can do less with OpenStack (depending on the vendor), but you have a > large pool of resources to help troubleshoot and answer questions. We do > both approaches internally for different clusters, so both approaches have > their pro and cons. > > You touched on a few points in your original email... > > > If you had two OpenStack clusters, one in "site 1" and another in "site > 2", then you could look at below for backup/restore of instances > cross-cluster: > - Freezer -> https://wiki.openstack.org/wiki/Freezer > - Trillio (basically just a series of nova snapshots under the cover) -> > https://www.trilio.io/ > > You could then over the top roll out a file level based backup tool on > each instance, this would pretty much offer you replication functionality > without having to do block-level tinkering. > > > Failover of OpenStack controller/computes > If you have two sites, you can always go for 3x Controller deployment > spanning cross site. Depending on latency obviously, however all you really > need is a good enough link for RabbitMQ/Galera to talk reliably etc. > > Failing that, I'd recommend backing up your Controller with ReaR. From > there you can also schedule frequent automated jobs to do a OpenStack DB > backups. Recovering should be a case of ReaR restore, load latest OpenStack > DB and start everything up... You'll probably want to ensure your VLANs are > spanned cross-site so you can reuse same IP addresses. > > > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/backup_and_restore/05_rear.html > > https://superuser.openstack.org/articles/tutorial-rear-openstack-deployment/ > > > In reality, the best solution would be to have two isolated clusters, and > your workloads spanned across both sites. Obviously that isn't always > possible (from personal experience), but pushing people down the Kubernetes > path and then for the rest automation/backup utilities may cater for your > needs. > > Having said that, Albert's link does look promising -> > https://docs.openstack.org/cinder/pike/contributor/replication.html > > > > Date: Thu, 16 Jan 2020 19:49:08 +0000 > From: Albert Braden > To: Tony Pearce , > "openstack-discuss at lists.openstack.org" > > Subject: RE: DR options with openstack > Message-ID: > < > BN8PR12MB3636451FC8E2BC6A50216425D9360 at BN8PR12MB3636.namprd12.prod.outlook.com > > > > Content-Type: text/plain; charset="utf-8" > > Hi Tony, > > It looks like Cheesecake didn’t survive but apparently some components of > it did; details in > https://docs.openstack.org/cinder/pike/contributor/replication.html > > I’m not using Cinder now; we used it at eBay with Ceph and Netapp > backends. Netapp makes it easy but is expensive; Ceph is free but you have > to figure out how to make it work. You’re right about forking; we did it > and then upgrading turned from an incredibly difficult ordeal to an > impossible one. It’s better to stay with the “official” code so that > upgrading remains an option. > > I’m just an operator; hopefully someone more expert will reply with more > useful info. > > It’s true that our community lacks participation. It’s very difficult for > a new operator to start using openstack and get help with the issues that > they encounter. So far this mailing list has been the best resource for me. > IRC and Ask Openstack are mostly unattended. I try to help out in > #openstack when I can, but I don’t know a lot so I mostly end up telling > people to ask on the list. On IRC sometimes I find help by asking in other > openstack-* channels. Sometimes people complain that I’m asking in a > developer channel, but sometimes I get help. Persistence is the key. If I > keep asking long enough in enough places, eventually someone will answer. > If all else fails, I open a bug. > > Good luck and welcome to the Openstack community! > > From: Tony Pearce > Sent: Wednesday, January 15, 2020 11:37 PM > To: openstack-discuss at lists.openstack.org > Subject: DR options with openstack > > Hi all > > My questions are; > > > 1. How are people using iSCSI Cinder storage with Openstack to-date? > For example a Nimble Storage array backend. I mean to say, are people using > backend integration drivers for other hardware (like netapp)? Or are they > using backend iscsi for example? > 2. How are people managing DR with Openstack in terms of backend > storage replication to another array in another location and continuing to > use Openstack? > > The environment which I am currently using; > 1 x Nimble Storage array (iSCSI) with nimble.py Cinder driver > 1 x virtualised Controller node > 2 x physical compute nodes > This is Openstack Pike. > > In addition, I have a 2nd Nimble Storage array in another location. > > To explain the questions I’d like to put forward my thoughts for question > 2 first: > For point 2 above, I have been searching for a way to utilise replicated > volumes on the 2nd array from Openstack with existing instances. For > example, if site 1 goes down how would I bring up openstack in the 2nd > location and boot up the instances where their volumes are stored on the > 2nd array. I found a proposal for something called “cheesecake” ref: > https://specs.openstack.org/openstack/cinder-specs/specs/rocky/cheesecake-promote-backend.html > < > https://urldefense.proofpoint.com/v2/url?u=https-3A__specs.openstack.org_openstack_cinder-2Dspecs_specs_rocky_cheesecake-2Dpromote-2Dbackend.html&d=DwMFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=e2jC0sFEUAs6byl7JOv5IAZTKPkABl-Eh6rQwQ55tWk&s=oVEr3DpxprOpbuxZ_4WSfSqAVCaZUlPCFT6g6DsqQHQ&e= > > > But I could not find if it had been approved or implemented. So I return > to square 1. I have some thoughts about failing over the controller VM and > compute node but I don’t think there’s any need to go into here because of > the above blocker and for brevity anyway. > > The nimble.py driver which I am using came with Openstack Pike and it > appears Nimble / HPE are not maintaining it any longer. I saw a commit to > remove nimble.py in Openstack Train release. The driver uses the REST API > to perform actions on the array. Such as creating a volume, downloading the > image, mounting the volume to the instance, snapshots, clones etc. This is > great for me because to date I have around 10TB of openstack storage data > allocated and the Nimble array shows the amount of data being consumed is > <900GB. This is due to the compression and zero-byte snapshots and clones. > > So coming back to question 2 – is it possible? Can you drop me some > keywords that I can search for such as an Openstack component like > Cheesecake? I think basically what I am looking for is a supported way of > telling Openstack that the instance volumes are now located at the new / > second array. This means a new cinder backend. Example, new iqn, IP > address, volume serial number. I think I could probably hack the cinder db > but I really want to avoid that. > > So failing the above, it brings me to the question 1 I asked before. How > are people using Cinder volumes? May be I am going about this the wrong way > and need to take a few steps backwards to go forwards? I need storage to be > able to deploy instances onto. Snapshots and clones are desired. At the > moment these operations take less time than the horizon dashboard takes to > load because of the waiting API responses. > > When searching for information about the above as an end-user / consumer I > get a bit concerned. Is it right that Openstack usage is dropping? There’s > no web forum to post questions. The chatroom on freenode is filled with > ~300 ghosts. Ask Openstack questions go without response. Earlier this week > (before I found this mail list) I had to use facebook to report that the > Openstack.org website had been hacked. Basically it seems that if you’re a > developer that can write code then you’re in but that’s it. I have never > been a coder and so I am somewhat stuck. > > Thanks in advance > > Sent from Mail< > https://urldefense.proofpoint.com/v2/url?u=https-3A__go.microsoft.com_fwlink_-3FLinkId-3D550986&d=DwMFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=e2jC0sFEUAs6byl7JOv5IAZTKPkABl-Eh6rQwQ55tWk&s=Qo1wKkAeo1uTCH83dVO-IVt4MWhQRk7rg3xKmlzPGhI&e=> > for Windows 10 > > > _____________________________________________________________________ > > The information transmitted in this message and its attachments (if any) > is intended > only for the person or entity to which it is addressed. > The message may contain confidential and/or privileged material. Any > review, > retransmission, dissemination or other use of, or taking of any action in > reliance > upon this information, by persons or entities other than the intended > recipient is > prohibited. > > If you have received this in error, please contact the sender and delete > this e-mail > and associated material from any computer. > > The intended recipient of this e-mail may only use, reproduce, disclose or > distribute > the information contained in this e-mail and any attached files, with the > permission > of the sender. > > This message has been scanned for viruses. > _____________________________________________________________________ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony.pearce at cinglevue.com Fri Jan 17 03:17:40 2020 From: tony.pearce at cinglevue.com (Tony Pearce) Date: Fri, 17 Jan 2020 11:17:40 +0800 Subject: DR options with openstack In-Reply-To: References: Message-ID: Hi all. Thanks to all that replied to no end. Lots of helpful information there. I apologise for not making this point but I am not looking for a 3rd party tool to achieve this. What I am looking for at this time are components already existing within openstack and open source is desired. I currently run Pike so I expect I may need to upgrade to get components I need. I did come across Freezer but not that wiki page. I'll work on setting up a test for this :) It’s true that our community lacks participation. It’s very difficult for a > new operator to start using openstack and get help with the issues that > they encounter. If I keep asking long enough in enough places, eventually someone will > answer. Yes it was difficult for me to learn. I managed to find a way through which worked for me. I started with Packstack. With regards to IRC - in my experience, once you get passed the authentication problems and often session timeout/kick out, you see the chat room with 300 people but no one chatting or answering. Kind of reduces the worth of the chatroom this way in my opinion. Although, I am in Australia so the timezone I am in could be a contributor. Thanks again - I have enough hints from you guys to go away and do some research. Best regards, *Tony Pearce* | *Senior Network Engineer / Infrastructure Lead**Cinglevue International * Email: tony.pearce at cinglevue.com Web: http://www.cinglevue.com *Australia* 1 Walsh Loop, Joondalup, WA 6027 Australia. Direct: +61 8 6202 0036 | Main: +61 8 6202 0024 Note: This email and all attachments are the sole property of Cinglevue International Pty Ltd. (or any of its subsidiary entities), and the information contained herein must be considered confidential, unless specified otherwise. If you are not the intended recipient, you must not use or forward the information contained in these documents. If you have received this message in error, please delete the email and notify the sender. On Fri, 17 Jan 2020 at 07:00, Ignazio Cassano wrote: > Hello, I suggest hystax for openstack failover and failback between two > openstack sites. > It works with openstack upstream As well. > Ignazio > > Il Gio 16 Gen 2020, 23:46 Burak Hoban ha scritto: > >> Hey Tony, >> >> Keep in mind that if you're looking to run OpenStack, but you're not >> feeling comfortable with the community support then there's always the >> option to go with a vendor backed version. These are usually a good option >> for those a little more risk adverse, or who don't have the time/or skills >> to maintain upstream releases - however going down that path usually means >> you can do less with OpenStack (depending on the vendor), but you have a >> large pool of resources to help troubleshoot and answer questions. We do >> both approaches internally for different clusters, so both approaches have >> their pro and cons. >> >> You touched on a few points in your original email... >> >> > If you had two OpenStack clusters, one in "site 1" and another in "site >> 2", then you could look at below for backup/restore of instances >> cross-cluster: >> - Freezer -> https://wiki.openstack.org/wiki/Freezer >> - Trillio (basically just a series of nova snapshots under the cover) -> >> https://www.trilio.io/ >> >> You could then over the top roll out a file level based backup tool on >> each instance, this would pretty much offer you replication functionality >> without having to do block-level tinkering. >> >> > Failover of OpenStack controller/computes >> If you have two sites, you can always go for 3x Controller deployment >> spanning cross site. Depending on latency obviously, however all you really >> need is a good enough link for RabbitMQ/Galera to talk reliably etc. >> >> Failing that, I'd recommend backing up your Controller with ReaR. From >> there you can also schedule frequent automated jobs to do a OpenStack DB >> backups. Recovering should be a case of ReaR restore, load latest OpenStack >> DB and start everything up... You'll probably want to ensure your VLANs are >> spanned cross-site so you can reuse same IP addresses. >> >> >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/backup_and_restore/05_rear.html >> >> https://superuser.openstack.org/articles/tutorial-rear-openstack-deployment/ >> >> >> In reality, the best solution would be to have two isolated clusters, and >> your workloads spanned across both sites. Obviously that isn't always >> possible (from personal experience), but pushing people down the Kubernetes >> path and then for the rest automation/backup utilities may cater for your >> needs. >> >> Having said that, Albert's link does look promising -> >> https://docs.openstack.org/cinder/pike/contributor/replication.html >> >> >> >> Date: Thu, 16 Jan 2020 19:49:08 +0000 >> From: Albert Braden >> To: Tony Pearce , >> "openstack-discuss at lists.openstack.org" >> >> Subject: RE: DR options with openstack >> Message-ID: >> < >> BN8PR12MB3636451FC8E2BC6A50216425D9360 at BN8PR12MB3636.namprd12.prod.outlook.com >> > >> >> Content-Type: text/plain; charset="utf-8" >> >> Hi Tony, >> >> It looks like Cheesecake didn’t survive but apparently some components of >> it did; details in >> https://docs.openstack.org/cinder/pike/contributor/replication.html >> >> I’m not using Cinder now; we used it at eBay with Ceph and Netapp >> backends. Netapp makes it easy but is expensive; Ceph is free but you have >> to figure out how to make it work. You’re right about forking; we did it >> and then upgrading turned from an incredibly difficult ordeal to an >> impossible one. It’s better to stay with the “official” code so that >> upgrading remains an option. >> >> I’m just an operator; hopefully someone more expert will reply with more >> useful info. >> >> It’s true that our community lacks participation. It’s very difficult for >> a new operator to start using openstack and get help with the issues that >> they encounter. So far this mailing list has been the best resource for me. >> IRC and Ask Openstack are mostly unattended. I try to help out in >> #openstack when I can, but I don’t know a lot so I mostly end up telling >> people to ask on the list. On IRC sometimes I find help by asking in other >> openstack-* channels. Sometimes people complain that I’m asking in a >> developer channel, but sometimes I get help. Persistence is the key. If I >> keep asking long enough in enough places, eventually someone will answer. >> If all else fails, I open a bug. >> >> Good luck and welcome to the Openstack community! >> >> From: Tony Pearce >> Sent: Wednesday, January 15, 2020 11:37 PM >> To: openstack-discuss at lists.openstack.org >> Subject: DR options with openstack >> >> Hi all >> >> My questions are; >> >> >> 1. How are people using iSCSI Cinder storage with Openstack to-date? >> For example a Nimble Storage array backend. I mean to say, are people using >> backend integration drivers for other hardware (like netapp)? Or are they >> using backend iscsi for example? >> 2. How are people managing DR with Openstack in terms of backend >> storage replication to another array in another location and continuing to >> use Openstack? >> >> The environment which I am currently using; >> 1 x Nimble Storage array (iSCSI) with nimble.py Cinder driver >> 1 x virtualised Controller node >> 2 x physical compute nodes >> This is Openstack Pike. >> >> In addition, I have a 2nd Nimble Storage array in another location. >> >> To explain the questions I’d like to put forward my thoughts for question >> 2 first: >> For point 2 above, I have been searching for a way to utilise replicated >> volumes on the 2nd array from Openstack with existing instances. For >> example, if site 1 goes down how would I bring up openstack in the 2nd >> location and boot up the instances where their volumes are stored on the >> 2nd array. I found a proposal for something called “cheesecake” ref: >> https://specs.openstack.org/openstack/cinder-specs/specs/rocky/cheesecake-promote-backend.html >> < >> https://urldefense.proofpoint.com/v2/url?u=https-3A__specs.openstack.org_openstack_cinder-2Dspecs_specs_rocky_cheesecake-2Dpromote-2Dbackend.html&d=DwMFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=e2jC0sFEUAs6byl7JOv5IAZTKPkABl-Eh6rQwQ55tWk&s=oVEr3DpxprOpbuxZ_4WSfSqAVCaZUlPCFT6g6DsqQHQ&e= >> > >> But I could not find if it had been approved or implemented. So I return >> to square 1. I have some thoughts about failing over the controller VM and >> compute node but I don’t think there’s any need to go into here because of >> the above blocker and for brevity anyway. >> >> The nimble.py driver which I am using came with Openstack Pike and it >> appears Nimble / HPE are not maintaining it any longer. I saw a commit to >> remove nimble.py in Openstack Train release. The driver uses the REST API >> to perform actions on the array. Such as creating a volume, downloading the >> image, mounting the volume to the instance, snapshots, clones etc. This is >> great for me because to date I have around 10TB of openstack storage data >> allocated and the Nimble array shows the amount of data being consumed is >> <900GB. This is due to the compression and zero-byte snapshots and clones. >> >> So coming back to question 2 – is it possible? Can you drop me some >> keywords that I can search for such as an Openstack component like >> Cheesecake? I think basically what I am looking for is a supported way of >> telling Openstack that the instance volumes are now located at the new / >> second array. This means a new cinder backend. Example, new iqn, IP >> address, volume serial number. I think I could probably hack the cinder db >> but I really want to avoid that. >> >> So failing the above, it brings me to the question 1 I asked before. How >> are people using Cinder volumes? May be I am going about this the wrong way >> and need to take a few steps backwards to go forwards? I need storage to be >> able to deploy instances onto. Snapshots and clones are desired. At the >> moment these operations take less time than the horizon dashboard takes to >> load because of the waiting API responses. >> >> When searching for information about the above as an end-user / consumer >> I get a bit concerned. Is it right that Openstack usage is dropping? >> There’s no web forum to post questions. The chatroom on freenode is filled >> with ~300 ghosts. Ask Openstack questions go without response. Earlier this >> week (before I found this mail list) I had to use facebook to report that >> the Openstack.org website had been hacked. Basically it seems that if >> you’re a developer that can write code then you’re in but that’s it. I have >> never been a coder and so I am somewhat stuck. >> >> Thanks in advance >> >> Sent from Mail< >> https://urldefense.proofpoint.com/v2/url?u=https-3A__go.microsoft.com_fwlink_-3FLinkId-3D550986&d=DwMFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=e2jC0sFEUAs6byl7JOv5IAZTKPkABl-Eh6rQwQ55tWk&s=Qo1wKkAeo1uTCH83dVO-IVt4MWhQRk7rg3xKmlzPGhI&e=> >> for Windows 10 >> >> >> _____________________________________________________________________ >> >> The information transmitted in this message and its attachments (if any) >> is intended >> only for the person or entity to which it is addressed. >> The message may contain confidential and/or privileged material. Any >> review, >> retransmission, dissemination or other use of, or taking of any action in >> reliance >> upon this information, by persons or entities other than the intended >> recipient is >> prohibited. >> >> If you have received this in error, please contact the sender and delete >> this e-mail >> and associated material from any computer. >> >> The intended recipient of this e-mail may only use, reproduce, disclose >> or distribute >> the information contained in this e-mail and any attached files, with the >> permission >> of the sender. >> >> This message has been scanned for viruses. >> _____________________________________________________________________ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.carden at gmail.com Fri Jan 17 03:35:50 2020 From: mike.carden at gmail.com (Mike Carden) Date: Fri, 17 Jan 2020 14:35:50 +1100 Subject: DR options with openstack In-Reply-To: References: Message-ID: On Fri, Jan 17, 2020 at 2:25 PM Tony Pearce wrote: > With regards to IRC - in my experience, once you get passed the > authentication problems and often session timeout/kick out, you see the > chat room with 300 people but no one chatting or answering. Kind of reduces > the worth of the chatroom this way in my opinion. Although, I am in > Australia so the timezone I am in could be a contributor. > I'm also in Australia, but my IRC experience has been different from yours. I find that the individual, project-specific OpenStack IRC channels are a great resource, often attended by really helpful experts. 'openstack-ansible' 'openstack-ironic' 'openstack-qa' etc. Also, I keep a teeny tiny VM running in Google's cloud (AU 20 cents a month) to run quassel core so I have a 24/7 IRC connection to channels I watch so that people can reply to me while I'm asleep and I can catch up the next day. -- MC -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Jan 17 03:51:15 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 17 Jan 2020 03:51:15 +0000 Subject: DR options with openstack In-Reply-To: References: Message-ID: <20200117035115.l5zahi4apmjogpuf@yuggoth.org> On 2020-01-17 11:17:40 +0800 (+0800), Tony Pearce wrote: [...] > With regards to IRC - in my experience, once you get passed the > authentication problems These are a relatively recent and unfortunate addition to our channels, necessitated by spammers randomly popping in and generally being nuisances for everyone. We keep testing the waters by lifting the identification requirement here and there, but the coast is not yet clear. We'd really rather people were able to freely join and ask questions without setting up accounts, it's just a bit hard to keep our channels usable that way at the moment. > and often session timeout/kick out, you see the chat room with 300 > people but no one chatting or answering. Kind of reduces the worth > of the chatroom this way in my opinion. Although, I am in > Australia so the timezone I am in could be a contributor. Certainly the bulk of discussion for most projects happens when Europe and the Americas are awake, so likely less in the middle of your day and a lot more overnight for you. There may be some increased activity in your mornings or evenings at least. But if this is the #openstack channel, the bigger problem is that it's just not got a lot of people with answers to user questions paying attention in there (I too am guilty of forgetting to keep tabs on it). The fundamental truth is that whenever you balkanize communications into topic areas for "users" and "developers," the end result is that the user forum is all questions nobody's answering because most of the folks with answers are all conversing somewhere else in places where such questions are discouraged. We used to have separate mailing lists for user questions, sharing between operators, and development topics; those suffered precisely the same problem and I'm quite happy we agreed as a community to merge the lists into one where users' questions *are* getting seen by people who already possess the necessary knowledge to provide accurate answers and guidance. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From tony.pearce at cinglevue.com Fri Jan 17 03:54:19 2020 From: tony.pearce at cinglevue.com (Tony Pearce) Date: Fri, 17 Jan 2020 11:54:19 +0800 Subject: DR options with openstack In-Reply-To: References: Message-ID: I had not discovered the channels like openstack-ansible. I just googled for the channel list. "openstack" is described as being meant for general questions. When I need to I'll try and use a more specific topic channel. Thanks for the advice. For reference, the channel list is https://wiki.openstack.org/wiki/IRC *Tony Pearce* | *Senior Network Engineer / Infrastructure Lead**Cinglevue International * Email: tony.pearce at cinglevue.com Web: http://www.cinglevue.com *Australia* 1 Walsh Loop, Joondalup, WA 6027 Australia. Direct: +61 8 6202 0036 | Main: +61 8 6202 0024 Note: This email and all attachments are the sole property of Cinglevue International Pty Ltd. (or any of its subsidiary entities), and the information contained herein must be considered confidential, unless specified otherwise. If you are not the intended recipient, you must not use or forward the information contained in these documents. If you have received this message in error, please delete the email and notify the sender. On Fri, 17 Jan 2020 at 11:41, Mike Carden wrote: > > > On Fri, Jan 17, 2020 at 2:25 PM Tony Pearce > wrote: > >> With regards to IRC - in my experience, once you get passed the >> authentication problems and often session timeout/kick out, you see the >> chat room with 300 people but no one chatting or answering. Kind of reduces >> the worth of the chatroom this way in my opinion. Although, I am in >> Australia so the timezone I am in could be a contributor. >> > > I'm also in Australia, but my IRC experience has been different from > yours. I find that the individual, project-specific OpenStack IRC channels > are a great resource, often attended by really helpful experts. > 'openstack-ansible' 'openstack-ironic' 'openstack-qa' etc. > > Also, I keep a teeny tiny VM running in Google's cloud (AU 20 cents a > month) to run quassel core so I have a 24/7 IRC connection to channels I > watch so that people can reply to me while I'm asleep and I can catch up > the next day. > > -- > MC > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Jan 17 04:02:05 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 16 Jan 2020 22:02:05 -0600 Subject: [qa][stable][tempest-plugins]: Tempest & plugins py2 jobs failure for stable branches (1860033: the EOLing python2 drama) Message-ID: <16fb1aa4aae.10e957b6324515.5822370422740200537@ghanshyammann.com> Hello Everyone, This is regarding bug: https://bugs.launchpad.net/tempest/+bug/1860033. Using Radosław's fancy statement of 'EOLing python2 drama' in subject :). neutron tempest plugin job on stable/rocky started failing as neutron-lib dropped the py2. neutron-lib 2.0.0 is py3 only and so does u-c on the master has been updated to 2.0.0. All tempest and its plugin uses the master u-c for stable branch testing which is the valid way because of master Tempest & plugin is being used to test the stable branches which need u-c from master itself. These failed jobs also used master u-c[1] which is trying to install the latest neutron-lib and failing. This is not just neutron tempest plugin issue but for all Tempest plugins jobs. Any lib used by Tempest or plugins can drop the py2 now and leads to this failure. Its just neutron-lib raised the flag first before I plan to hack on Tempest & plugins jobs for py2 drop from master and kepe testing py2 on stable bracnhes. We have two way to fix this: 1. Separate out the testing of python2 jobs with python2 supported version of Tempest plugins and with respective u-c. For example, test all python2 job with tempest plugin train version (or any latest version if any which support py2) and use u-c from stable/train. This will cap the Tempest & plugins with respective u-c for stable branches testing. 2. Second option is to install the tempest and plugins in py3 env on py2 jobs also. This should be an easy and preferred way. I am trying this first[2] and testing[3]. [1] https://zuul.opendev.org/t/openstack/build/fb8a928ed3614e09a9a3cf4637f2f6c2/log/job-output.txt#33040 [2] https://review.opendev.org/#/c/703011/ [3] https://review.opendev.org/#/c/703012/ -gmanne From fungi at yuggoth.org Fri Jan 17 04:10:05 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 17 Jan 2020 04:10:05 +0000 Subject: [qa][stable][tempest-plugins]: Tempest & plugins py2 jobs failure for stable branches (1860033: the EOLing python2 drama) In-Reply-To: <16fb1aa4aae.10e957b6324515.5822370422740200537@ghanshyammann.com> References: <16fb1aa4aae.10e957b6324515.5822370422740200537@ghanshyammann.com> Message-ID: <20200117041005.cgxggu5wrv3amheh@yuggoth.org> On 2020-01-16 22:02:05 -0600 (-0600), Ghanshyam Mann wrote: [...] > Second option is to install the tempest and plugins in py3 env on > py2 jobs also. This should be an easy and preferred way. [...] This makes more sense anyway. Tempest and its plug-ins are already segregated from the system with a virtualenv due to conflicts with stable branch requirements, so hopefully switching that virtualenv to Python 3.x for all jobs is trivial (but I won't be surprised to learn there are subtle challenges hidden just beneath the surface). -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From agarwalvishakha18 at gmail.com Fri Jan 17 06:21:01 2020 From: agarwalvishakha18 at gmail.com (Vishakha Agarwal) Date: Fri, 17 Jan 2020 11:51:01 +0530 Subject: [keystone] Keystone Team Update - Week of 13 January 2020 Message-ID: # Keystone Team Update - Week of 13 January 2020 ## News ### Roadmap Review The Team has decided to review the roadmap every other week so as to keep up the development momentum of the ussuri cycle [1]. [1] https://tree.taiga.io/project/keystone-ussuri-roadmap/kanban ### User Support and Bug Duty Every week the duty is being rotated between the members. The person-in-charge for bug duty for current and upcoming week can be seen on the etherpad [2] [2] https://etherpad.openstack.org/p/keystone-l1-duty ## Open Specs Ussuri specs: https://bit.ly/2XDdpkU Ongoing specs: https://bit.ly/2OyDLTh ## Recently Merged Changes Search query: https://bit.ly/2pquOwT We merged 7 changes this week. ## Changes that need Attention Search query: https://bit.ly/2tymTje There are 36 changes that are passing CI, not in merge conflict, have no negative reviews and aren't proposed by bots. ### Priority Reviews * Community Goals https://review.opendev.org/#/c/699127/ [ussuri][goal] Drop python 2.7 support and testing keystone-tempest-plugin https://review.opendev.org/#/c/699119/ [ussuri][goal] Drop python 2.7 support and testing python-keystoneclient * Special Requests https://review.opendev.org/#/c/662734/ Change the default Identity endpoint to internal https://review.opendev.org/#/c/699013/ Always have username in CADF initiator https://review.opendev.org/#/c/700826/ Fix role_assignments role.id filter https://review.opendev.org/#/c/697444/ Adding options to user cli https://review.opendev.org/#/c/702374/ Cleanup doc/requirements.txt ## Bugs This week we opened 2 new bugs and closed 4. Bugs opened (2) Bug #1859759 (keystone:Undecided): Keystone is unable to remove role-assignment for deleted LDAP users - Opened by Eigil Obrestad https://bugs.launchpad.net/keystone/+bug/1859759 Bug #1859844 (keystone:Undecided): Impossible to rename the Default domain id to the string 'default.' - Opened by Marcelo Subtil Marcal https://bugs.launchpad.net/keystone/+bug/1859844 Bugs closed (4) Bug #1833207 (keystoneauth:Undecided) https://bugs.launchpad.net/keystoneauth/+bug/1833207 Bug #1858189 (keystoneauth:Undecided) https://bugs.launchpad.net/keystoneauth/+bug/1858189 Bug #1857086 (keystone:Won't Fix) https://bugs.launchpad.net/keystone/+bug/1857086 Bug #1859844 (keystone:Invalid) https://bugs.launchpad.net/keystone/+bug/1859844 ## Milestone Outlook https://releases.openstack.org/ussuri/schedule.html Reminder for Spec freeze as it is on the week of 10 Feburary. ## Help with this newsletter Help contribute to this newsletter by editing the etherpad: https://etherpad.openstack.org/p/keystone-team-newsletter ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From radoslaw.piliszek at gmail.com Fri Jan 17 07:49:32 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Fri, 17 Jan 2020 08:49:32 +0100 Subject: [qa][stable][tempest-plugins]: Tempest & plugins py2 jobs failure for stable branches (1860033: the EOLing python2 drama) In-Reply-To: <20200117041005.cgxggu5wrv3amheh@yuggoth.org> References: <16fb1aa4aae.10e957b6324515.5822370422740200537@ghanshyammann.com> <20200117041005.cgxggu5wrv3amheh@yuggoth.org> Message-ID: +1 for py3 in tempest venv. Makes most sense. Though the test is failing now: 2020-01-17 04:30:06.975801 | controller | ERROR: Could not find a version that satisfies the requirement neutron-lib===2.0.0 (from -c u-c-m.txt (line 79)) (from versions: 0.0.1, 0.0.2, 0.0.3, 0.1.0, 0.2.0, 0.3.0, 0.4.0, 1.0.0, 1.1.0, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.9.0, 1.9.1, 1.9.2, 1.10.0, 1.11.0, 1.12.0, 1.13.0, 1.14.0, 1.15.0, 1.16.0, 1.17.0, 1.18.0, 1.19.0, 1.20.0, 1.21.0, 1.22.0, 1.23.0, 1.24.0, 1.25.0, 1.26.0, 1.27.0, 1.28.0, 1.29.0, 1.29.1, 1.30.0, 1.31.0) 2020-01-17 04:30:06.993738 | controller | ERROR: No matching distribution found for neutron-lib===2.0.0 (from -c u-c-m.txt (line 79)) and the reason is: pypi: data-requires-python=">=3.6" 3.5 < 3.6 Need some newer python in there. -yoctozepto pt., 17 sty 2020 o 05:15 Jeremy Stanley napisał(a): > > On 2020-01-16 22:02:05 -0600 (-0600), Ghanshyam Mann wrote: > [...] > > Second option is to install the tempest and plugins in py3 env on > > py2 jobs also. This should be an easy and preferred way. > [...] > > This makes more sense anyway. Tempest and its plug-ins are already > segregated from the system with a virtualenv due to conflicts with > stable branch requirements, so hopefully switching that virtualenv > to Python 3.x for all jobs is trivial (but I won't be surprised to > learn there are subtle challenges hidden just beneath the surface). > -- > Jeremy Stanley From radoslaw.piliszek at gmail.com Fri Jan 17 08:34:09 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Fri, 17 Jan 2020 09:34:09 +0100 Subject: DR options with openstack In-Reply-To: References: Message-ID: On #openstack-kolla we, kolla cores, help users (being or becoming OpenStack operators) with kolla-based deployments. I observed a nice trend that users give us positive feedback about our support efforts, stick to the channel and help other users as well. That's the spirit. As for ask, I don't have any positive experience with it. Most support efforts that I saw end like the one below: https://ask.openstack.org/en/question/124531/how-to-install-kolla-ansible-with-5-mon/ so it's a bit discouraging. Personally I would vote +1 for archiving the ask, seemingly doing more bad than good these days. -yoctozepto pt., 17 sty 2020 o 05:00 Tony Pearce napisał(a): > > I had not discovered the channels like openstack-ansible. I just googled for the channel list. "openstack" is described as being meant for general questions. When I need to I'll try and use a more specific topic channel. Thanks for the advice. > > For reference, the channel list is https://wiki.openstack.org/wiki/IRC > > > Tony Pearce | Senior Network Engineer / Infrastructure Lead > Cinglevue International > > Email: tony.pearce at cinglevue.com > Web: http://www.cinglevue.com > > Australia > 1 Walsh Loop, Joondalup, WA 6027 Australia. > > Direct: +61 8 6202 0036 | Main: +61 8 6202 0024 > > Note: This email and all attachments are the sole property of Cinglevue International Pty Ltd. (or any of its subsidiary entities), and the information contained herein must be considered confidential, unless specified otherwise. If you are not the intended recipient, you must not use or forward the information contained in these documents. If you have received this message in error, please delete the email and notify the sender. > > > > > > On Fri, 17 Jan 2020 at 11:41, Mike Carden wrote: >> >> >> >> On Fri, Jan 17, 2020 at 2:25 PM Tony Pearce wrote: >>> >>> With regards to IRC - in my experience, once you get passed the authentication problems and often session timeout/kick out, you see the chat room with 300 people but no one chatting or answering. Kind of reduces the worth of the chatroom this way in my opinion. Although, I am in Australia so the timezone I am in could be a contributor. >> >> >> I'm also in Australia, but my IRC experience has been different from yours. I find that the individual, project-specific OpenStack IRC channels are a great resource, often attended by really helpful experts. 'openstack-ansible' 'openstack-ironic' 'openstack-qa' etc. >> >> Also, I keep a teeny tiny VM running in Google's cloud (AU 20 cents a month) to run quassel core so I have a 24/7 IRC connection to channels I watch so that people can reply to me while I'm asleep and I can catch up the next day. >> >> -- >> MC >> >> >> From tony.pearce at cinglevue.com Fri Jan 17 09:56:25 2020 From: tony.pearce at cinglevue.com (Tony Pearce) Date: Fri, 17 Jan 2020 17:56:25 +0800 Subject: Cinder snapshot delete successful when expected to fail Message-ID: Could anyone help by pointing me where to go to be able to dig into this issue further? I have installed a test Openstack environment using RDO Packstack. I wanted to install the same version that I have in Production (Pike) but it's not listed in the CentOS repo via yum search. So I installed Queens. I am using nimble.py Cinder driver. Nimble Storage is a storage array accessed via iscsi from the Openstack host, and is controlled from Openstack by the driver and API. *What I expected to happen:* 1. create an instance with volume (the volume is created on the storage array successfully and instance boots from it) 2. take a snapshot (snapshot taken on the volume on the array successfully) 3. create a new instance from the snapshot (the api tells the array to clone the snapshot into a new volume on the array and use that volume for the instance) 4. try and delete the snapshot Expected Result - Openstack gives the user a message like "you're not allowed to do that". Note: Step 3 above creates a child volume from the parent snapshot. It's impossible to delete the parent snapshot because IO READ is sent to that part of the original volume (as I understand it). *My production problem is this: * 1. create an instance with volume (the volume is created on the storage array successfully) 2. take a snapshot (snapshot taken on the volume on the array successfully) 3. create a new instance from the snapshot (the api tells the array to clone the snapshot into a new volume on the array and use that volume for the instance) 4. try and delete the snapshot Result - snapshot goes into error state and later, all Cinder operations fail such as new instance/create volume etc. until the correct service is restarted. Then everything works once again. To troubleshoot the above, I installed the RDP Packstack Queens (because I couldnt get Pike). I tested the above and now, the result is the snapshot is successfully deleted from openstack but not deleted on the array. The log is below for reference. But I can see the in the log that the array sends back info to openstack saying the snapshot has a clone and the delete cannot be done because of that. Also response code 409. *Some info about why the problem with Pike started in the first place* 1. Vendor is Nimble Storage which HPE purchased 2. HPE/Nimble have dropped support for openstack. Latest supported version is Queens and Nimble array version v4.x. The current Array version is v5.x. Nimble say there are no guarantees with openstack, the driver and the array version v5.x 3. I was previously advised by Nimble that the array version v5.x will work fine and so left our DR array on v5.x with a pending upgrade that had a blocker due to an issue. This issue was resolved in December and the pending upgrade completed to match the DR array took place around 30 days ago. With regards to the production issue, I assumed that the array API has some changes between v4.x and v5.x and it's causing an issue with Cinder due to the API response. Although I have not been able to find out if or what changes there are that may have occurred after the array upgrade, as the documentation for this is Nimble internal-only. *So with that - some questions if I may:* When Openstack got the 409 error response from the API (as seen in the log below), why would Openstack then proceed to delete the snapshot on the Openstack side? How could I debug this further? I'm not sure what Openstack Cinder is acting on in terns of the response as yet. Maybe Openstack is not specifically looking for the error code in the response? The snapshot that got deleted on the openstack side is a problem. Would this be related to the driver? Could it be possible that the driver did not pass the error response to Cinder? Thanks in advance. Just for reference, the log snippet is below. ==> volume.log <== > 2020-01-17 16:53:23.718 24723 WARNING py.warnings > [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 > 87e34c89e6fb41d2af25085b64011a55 - default default] > /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852: > InsecureRequestWarning: Unverified HTTPS request is being made. Adding > certificate verification is strongly advised. See: > https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings > InsecureRequestWarning) > : NimbleAPIException: Failed to execute api > snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 > Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume > volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. > ==> api.log <== > 2020-01-17 16:53:23.769 25242 INFO cinder.api.openstack.wsgi > [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814 > 87e34c89e6fb41d2af25085b64011a55 - default default] > http://192.168.53.45:8776/v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail > returned with HTTP 200 > 2020-01-17 16:53:23.770 25242 INFO eventlet.wsgi.server > [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814 > 87e34c89e6fb41d2af25085b64011a55 - default default] 192.168.53.45 "GET > /v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail HTTP/1.1" status: 200 > len: 4657 time: 0.1152730 > ==> volume.log <== > 2020-01-17 16:53:23.811 24723 WARNING py.warnings > [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 > 87e34c89e6fb41d2af25085b64011a55 - default default] > /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852: > InsecureRequestWarning: Unverified HTTPS request is being made. Adding > certificate verification is strongly advised. See: > https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings > InsecureRequestWarning) > : NimbleAPIException: Failed to execute api > snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 > Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume > volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. > 2020-01-17 16:53:23.902 24723 ERROR cinder.volume.drivers.nimble > [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 > 87e34c89e6fb41d2af25085b64011a55 - default default] Re-throwing Exception > Failed to execute api snapshots/0464a5fd65d75fcfe1000000000000011100001a41: > Error Code: 409 Message: Snapshot > snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume > volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.: > NimbleAPIException: Failed to execute api > snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 > Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume > volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. > 2020-01-17 16:53:23.903 24723 WARNING cinder.volume.drivers.nimble > [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 > 87e34c89e6fb41d2af25085b64011a55 - default default] Snapshot > snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 : has a clone: > NimbleAPIException: Failed to execute api > snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 > Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume > volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. > 2020-01-17 16:53:23.964 24723 WARNING cinder.quota > [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 > 87e34c89e6fb41d2af25085b64011a55 - default default] Deprecated: Default > quota for resource: snapshots_Nimble-DR is set by the default quota flag: > quota_snapshots_Nimble-DR, it is now deprecated. Please use the default > quota class for default quota. > 2020-01-17 16:53:24.054 24723 INFO cinder.volume.manager > [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 > 87e34c89e6fb41d2af25085b64011a55 - default default] Delete snapshot > completed successfully. Regards, *Tony Pearce* | *Senior Network Engineer / Infrastructure Lead**Cinglevue International * Email: tony.pearce at cinglevue.com Web: http://www.cinglevue.com *Australia* 1 Walsh Loop, Joondalup, WA 6027 Australia. Direct: +61 8 6202 0036 | Main: +61 8 6202 0024 Note: This email and all attachments are the sole property of Cinglevue International Pty Ltd. (or any of its subsidiary entities), and the information contained herein must be considered confidential, unless specified otherwise. If you are not the intended recipient, you must not use or forward the information contained in these documents. If you have received this message in error, please delete the email and notify the sender. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bcafarel at redhat.com Fri Jan 17 10:14:49 2020 From: bcafarel at redhat.com (Bernard Cafarelli) Date: Fri, 17 Jan 2020 11:14:49 +0100 Subject: [qa][stable][tempest-plugins]: Tempest & plugins py2 jobs failure for stable branches (1860033: the EOLing python2 drama) In-Reply-To: <20200117041005.cgxggu5wrv3amheh@yuggoth.org> References: <16fb1aa4aae.10e957b6324515.5822370422740200537@ghanshyammann.com> <20200117041005.cgxggu5wrv3amheh@yuggoth.org> Message-ID: On Fri, 17 Jan 2020 at 05:11, Jeremy Stanley wrote: > On 2020-01-16 22:02:05 -0600 (-0600), Ghanshyam Mann wrote: > [...] > > Second option is to install the tempest and plugins in py3 env on > > py2 jobs also. This should be an easy and preferred way. > [...] > > This makes more sense anyway. Tempest and its plug-ins are already > segregated from the system with a virtualenv due to conflicts with > stable branch requirements, so hopefully switching that virtualenv > to Python 3.x for all jobs is trivial (but I won't be surprised to > learn there are subtle challenges hidden just beneath the surface). > That sounds good for supported releases. Once we have them back in working order, I wonder how it will turn out for queens. In neutron, there is a recent failure [1] as this EM branch now uses a pinned version of the plugin. The fix there is most likely to also pin tempest - to queens-em [2] but then will also require some fix for the EOLing python2 drama. As tempest is branchless, it looks like if we want to keep neutron-tempest-plugin tests for queens we will rather need solution 1 for this branch? (but let's focus first on getting the supported branches back in working order) [1] https://bugs.launchpad.net/neutron/+bug/1859988 [2] https://review.opendev.org/702868 -- Bernard Cafarelli -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Fri Jan 17 11:01:37 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 17 Jan 2020 12:01:37 +0100 Subject: [qa][stable][tempest-plugins]: Tempest & plugins py2 jobs failure for stable branches (1860033: the EOLing python2 drama) In-Reply-To: References: <16fb1aa4aae.10e957b6324515.5822370422740200537@ghanshyammann.com> <20200117041005.cgxggu5wrv3amheh@yuggoth.org> Message-ID: <3DCDAE2D-4368-4A0B-BF8B-7AF4BA729055@redhat.com> Hi, > On 17 Jan 2020, at 11:14, Bernard Cafarelli wrote: > > On Fri, 17 Jan 2020 at 05:11, Jeremy Stanley wrote: > On 2020-01-16 22:02:05 -0600 (-0600), Ghanshyam Mann wrote: > [...] > > Second option is to install the tempest and plugins in py3 env on > > py2 jobs also. This should be an easy and preferred way. > [...] > > This makes more sense anyway. Tempest and its plug-ins are already > segregated from the system with a virtualenv due to conflicts with > stable branch requirements, so hopefully switching that virtualenv > to Python 3.x for all jobs is trivial (but I won't be surprised to > learn there are subtle challenges hidden just beneath the surface). > > That sounds good for supported releases. Once we have them back in working order, I wonder how it will turn out for queens. > In neutron, there is a recent failure [1] as this EM branch now uses a pinned version of the plugin. The fix there is most likely to also pin tempest - to queens-em [2] but then will also require some fix for the EOLing python2 drama. But if we will use for queens branch tempest pinned to queens-em tag, we shouldn’t have any such problems there as all requirements will be also used from queens branch, or am I missing something here? > > As tempest is branchless, it looks like if we want to keep neutron-tempest-plugin tests for queens we will rather need solution 1 for this branch? (but let's focus first on getting the supported branches back in working order) > > [1] https://bugs.launchpad.net/neutron/+bug/1859988 > [2] https://review.opendev.org/702868 > > > -- > Bernard Cafarelli — Slawek Kaplonski Senior software engineer Red Hat From victoria at vmartinezdelacruz.com Fri Jan 17 11:43:49 2020 From: victoria at vmartinezdelacruz.com (=?UTF-8?Q?Victoria_Mart=C3=ADnez_de_la_Cruz?=) Date: Fri, 17 Jan 2020 08:43:49 -0300 Subject: Rails Girls Summer of Code In-Reply-To: References: Message-ID: Hi Amy, This is great! How is that agnostic? IIRC it was all related to Ruby on Rails projects? How OpenStack can join this effort? Thanks, V On Thu, Jan 16, 2020 at 9:55 AM Amy Marrich wrote: > Hi All, > > I was contacted about this program to see if OpenStack might be interested > in participating and despite the name it is language agnostic. Moe > information on the program can be found at Rails Girls Summer of Code, > > > I'm willing to help organize our efforts but would need to know level of > interest to participate and mentor. > > Thanks, > > Amy (spotz) > Chair, Diversity and Inclusion WG > Chair, User Committee > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bcafarel at redhat.com Fri Jan 17 11:51:28 2020 From: bcafarel at redhat.com (Bernard Cafarelli) Date: Fri, 17 Jan 2020 12:51:28 +0100 Subject: [qa][stable][tempest-plugins]: Tempest & plugins py2 jobs failure for stable branches (1860033: the EOLing python2 drama) In-Reply-To: <3DCDAE2D-4368-4A0B-BF8B-7AF4BA729055@redhat.com> References: <16fb1aa4aae.10e957b6324515.5822370422740200537@ghanshyammann.com> <20200117041005.cgxggu5wrv3amheh@yuggoth.org> <3DCDAE2D-4368-4A0B-BF8B-7AF4BA729055@redhat.com> Message-ID: On Fri, 17 Jan 2020 at 12:01, Slawek Kaplonski wrote: > Hi, > > > On 17 Jan 2020, at 11:14, Bernard Cafarelli wrote: > > > > On Fri, 17 Jan 2020 at 05:11, Jeremy Stanley wrote: > > On 2020-01-16 22:02:05 -0600 (-0600), Ghanshyam Mann wrote: > > [...] > > > Second option is to install the tempest and plugins in py3 env on > > > py2 jobs also. This should be an easy and preferred way. > > [...] > > > > This makes more sense anyway. Tempest and its plug-ins are already > > segregated from the system with a virtualenv due to conflicts with > > stable branch requirements, so hopefully switching that virtualenv > > to Python 3.x for all jobs is trivial (but I won't be surprised to > > learn there are subtle challenges hidden just beneath the surface). > > > > That sounds good for supported releases. Once we have them back in > working order, I wonder how it will turn out for queens. > > In neutron, there is a recent failure [1] as this EM branch now uses a > pinned version of the plugin. The fix there is most likely to also pin > tempest - to queens-em [2] but then will also require some fix for the > EOLing python2 drama. > > But if we will use for queens branch tempest pinned to queens-em tag, we > shouldn’t have any such problems there as all requirements will be also > used from queens branch, or am I missing something here? > Sadly not, from what I read in attempt [1] to limit neutron-lib to "old" version. And I see the same error in a test run with pinned tempest [2]: 2020-01-16 14:44:18.741517 | controller | 2020-01-16 14:44:18.741 | Collecting neutron-lib===2.0.0 (from -c u-c-m.txt (line 79)) 2020-01-16 14:44:19.023699 | controller | 2020-01-16 14:44:19.023 | Could not find a version that satisfies the requirement neutron-lib===2.0.0 (from -c u-c-m.txt (line 79)) (from versions: 0.0.1, 0.0.2, 0.0.3, 0.1.0, 0.2.0, 0.3.0, 0.4.0, 1.0.0, 1.1.0, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.9.0, 1.9.1, 1.9.2, 1.10.0, 1.11.0, 1.12.0, 1.13.0, 1.14.0, 1.15.0, 1.16.0, 1.17.0, 1.18.0, 1.19.0, 1.20.0, 1.21.0, 1.22.0, 1.23.0, 1.24.0, 1.25.0, 1.26.0, 1.27.0, 1.28.0, 1.29.0, 1.29.1, 1.30.0, 1.31.0) 2020-01-16 14:44:19.042505 | controller | 2020-01-16 14:44:19.042 | No matching distribution found for neutron-lib===2.0.0 (from -c u-c-m.txt (line 79)) [1] https://review.opendev.org/702986/ [2] https://review.opendev.org/#/c/701900/ https://zuul.opendev.org/t/openstack/build/ee8021c1470a4fb88f55d64cc16ed15e > > > > > As tempest is branchless, it looks like if we want to keep > neutron-tempest-plugin tests for queens we will rather need solution 1 for > this branch? (but let's focus first on getting the supported branches back > in working order) > > > > [1] https://bugs.launchpad.net/neutron/+bug/1859988 > > [2] https://review.opendev.org/702868 > -- Bernard Cafarelli -------------- next part -------------- An HTML attachment was scrubbed... URL: From waboring at hemna.com Fri Jan 17 13:10:34 2020 From: waboring at hemna.com (Walter Boring) Date: Fri, 17 Jan 2020 08:10:34 -0500 Subject: DR options with openstack In-Reply-To: <5e201295.1c69fb81.a69b.d77d@mx.google.com> References: <5e201295.1c69fb81.a69b.d77d@mx.google.com> Message-ID: Hi Tony, Looking at the nimble driver, it has been removed from Cinder due to lack of support and maintenance from the vendor. Also, Looking at the code prior to it's removal, it didn't have any support for replication and failover. Cinder is a community based opensource project that relies on vendors, operators and users to contribute and support the codebase. As a core member of the Cinder team, we do our best to provide support for folks using Cinder and this mailing list and the #openstack-cinder channel is the best mechanism to get in touch with us. The #openstack-cinder irc channel is not a developer only channel. We help when we can, but also remember we have our day jobs as well. Unfortunately Nimble stopped providing support for their driver quite a while ago now and part of the Cinder policy to have a driver in tree is to have CI (Continuous Integration) tests in place to ensure that cinder patches don't break a driver. If the CI isn't in place, then the Cinder team marks the driver as unsupported in a release, and the following release the driver gets removed. All that being said, the nimbe driver never supported the cheesecake replication/DR capabilities that were added in Cinder. Walt (hemna in irc) On Thu, Jan 16, 2020 at 2:49 AM Tony Pearce wrote: > Hi all > > > > My questions are; > > > > 1. How are people using iSCSI Cinder storage with Openstack to-date? > For example a Nimble Storage array backend. I mean to say, are people using > backend integration drivers for other hardware (like netapp)? Or are they > using backend iscsi for example? > 2. How are people managing DR with Openstack in terms of backend > storage replication to another array in another location and continuing to > use Openstack? > > > > The environment which I am currently using; > > 1 x Nimble Storage array (iSCSI) with nimble.py Cinder driver > > 1 x virtualised Controller node > > 2 x physical compute nodes > > This is Openstack Pike. > > > > In addition, I have a 2nd Nimble Storage array in another location. > > > > To explain the questions I’d like to put forward my thoughts for question > 2 first: > > For point 2 above, I have been searching for a way to utilise replicated > volumes on the 2nd array from Openstack with existing instances. For > example, if site 1 goes down how would I bring up openstack in the 2nd > location and boot up the instances where their volumes are stored on the 2 > nd array. I found a proposal for something called “cheesecake” ref: > https://specs.openstack.org/openstack/cinder-specs/specs/rocky/cheesecake-promote-backend.html > But I could not find if it had been approved or implemented. So I return > to square 1. I have some thoughts about failing over the controller VM and > compute node but I don’t think there’s any need to go into here because of > the above blocker and for brevity anyway. > > > > The nimble.py driver which I am using came with Openstack Pike and it > appears Nimble / HPE are not maintaining it any longer. I saw a commit to > remove nimble.py in Openstack Train release. The driver uses the REST API > to perform actions on the array. Such as creating a volume, downloading the > image, mounting the volume to the instance, snapshots, clones etc. This is > great for me because to date I have around 10TB of openstack storage data > allocated and the Nimble array shows the amount of data being consumed is > <900GB. This is due to the compression and zero-byte snapshots and clones. > > > > So coming back to question 2 – is it possible? Can you drop me some > keywords that I can search for such as an Openstack component like > Cheesecake? I think basically what I am looking for is a supported way of > telling Openstack that the instance volumes are now located at the new / > second array. This means a new cinder backend. Example, new iqn, IP > address, volume serial number. I think I could probably hack the cinder db > but I really want to avoid that. > > > > So failing the above, it brings me to the question 1 I asked before. How > are people using Cinder volumes? May be I am going about this the wrong way > and need to take a few steps backwards to go forwards? I need storage to be > able to deploy instances onto. Snapshots and clones are desired. At the > moment these operations take less time than the horizon dashboard takes to > load because of the waiting API responses. > > > > When searching for information about the above as an end-user / consumer I > get a bit concerned. Is it right that Openstack usage is dropping? There’s > no web forum to post questions. The chatroom on freenode is filled with > ~300 ghosts. Ask Openstack questions go without response. Earlier this week > (before I found this mail list) I had to use facebook to report that the > Openstack.org website had been hacked. Basically it seems that if you’re a > developer that can write code then you’re in but that’s it. I have never > been a coder and so I am somewhat stuck. > > > > Thanks in advance > > > > Sent from Mail for > Windows 10 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Fri Jan 17 13:13:46 2020 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 17 Jan 2020 14:13:46 +0100 Subject: [largescale-sig] Meeting summary and next actions Message-ID: Hi everyone, The Large Scale SIG held a meeting earlier this week. Thanks to belmiro for chairing it! You can access the summary and logs of the meeting at: http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-01-15-09.00.html For the "Scaling within one cluster, and instrumentation of the bottlenecks" goal, I created a ML thread and etherpad to collect user stories, so far without much success. masahito is still working on the draft for oslo.metrics, hopefully will be ready by end of January. [1] http://lists.openstack.org/pipermail/openstack-discuss/2020-January/011925.html [2] https://etherpad.openstack.org/p/scaling-stories Standing TODOs: - all post short descriptions of what happens (what breaks first) when scaling up a single cluster to https://etherpad.openstack.org/p/scaling-stories - masahito to produce first draft for the oslo.metric blueprint - all learn more about golden signals concept as described in https://landing.google.com/sre/book.html For the "Document large scale configuration and tips &tricks" goal, amorin started a thread[3] and etherpad[4] on documenting configuration defaults for large scale, to which slaweq contributed for Neutron. [3] http://lists.openstack.org/pipermail/openstack-discuss/2020-January/011820.html [4] https://etherpad.openstack.org/p/large-scale-sig-documentation Standing TODOs: - oneswig to follow up with Scientific community to find articles around large scale openstack The next meeting will happen on January 29, at 9:00 UTC on #openstack-meeting. They will happen from now on every two weeks. Cheers, -- Thierry Carrez (ttx) From gmann at ghanshyammann.com Fri Jan 17 13:19:54 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 17 Jan 2020 07:19:54 -0600 Subject: [qa][stable][tempest-plugins]: Tempest & plugins py2 jobs failure for stable branches (1860033: the EOLing python2 drama) In-Reply-To: References: <16fb1aa4aae.10e957b6324515.5822370422740200537@ghanshyammann.com> <20200117041005.cgxggu5wrv3amheh@yuggoth.org> Message-ID: <16fb3a8fd5e.12771fa0546403.7013538671378206138@ghanshyammann.com> ---- On Fri, 17 Jan 2020 04:14:49 -0600 Bernard Cafarelli wrote ---- > On Fri, 17 Jan 2020 at 05:11, Jeremy Stanley wrote: > On 2020-01-16 22:02:05 -0600 (-0600), Ghanshyam Mann wrote: > [...] > > Second option is to install the tempest and plugins in py3 env on > > py2 jobs also. This should be an easy and preferred way. > [...] > > This makes more sense anyway. Tempest and its plug-ins are already > segregated from the system with a virtualenv due to conflicts with > stable branch requirements, so hopefully switching that virtualenv > to Python 3.x for all jobs is trivial (but I won't be surprised to > learn there are subtle challenges hidden just beneath the surface). > > That sounds good for supported releases. Once we have them back in working order, I wonder how it will turn out for queens.In neutron, there is a recent failure [1] as this EM branch now uses a pinned version of the plugin. The fix there is most likely to also pin tempest - to queens-em [2] but then will also require some fix for the EOLing python2 drama. > As tempest is branchless, it looks like if we want to keep neutron-tempest-plugin tests for queens we will rather need solution 1 for this branch? (but let's focus first on getting the supported branches back in working order) Yes, for EM branch we need to apply the options#1. Tempest does not support EM branches and we will keep using Tempest master as long as it keeps passing. If it fails due to test incompatibility or any code behaviour, then we need to pin the Tempest. We did this for Ocata[1] and Pike[2]. But for phyton 2.7 drop case, we will use py3 env if possible to test the stable branch until failing due to other reasons then cap it. Currently, we support the Tempest pin by TEEMPEST_BRANCH but no way to pin Tempest Plugins which need some logic in devstack side to pick up the plugin tag from job. [1] https://review.opendev.org/#/c/681950/ [2] https://review.opendev.org/#/c/684769/ -gmann > [1] https://bugs.launchpad.net/neutron/+bug/1859988 [2] https://review.opendev.org/702868 > > -- > Bernard Cafarelli > From amy at demarco.com Fri Jan 17 13:23:50 2020 From: amy at demarco.com (Amy Marrich) Date: Fri, 17 Jan 2020 07:23:50 -0600 Subject: Rails Girls Summer of Code In-Reply-To: References: Message-ID: Victoria, I thought it was related to Ruby on Rails as well until I found the following on their site: Rails Girls Summer of Code is programming language agnostic, and students have contributed to an overall of 76 unique Open Source projects such as Bundler, Rails, Discourse, Tessel, NextCloud, Processing, Babel, impress.js, Lektor CMS, Hoodie, Speakerinnen, Lotus (now Hanami) and Servo. Maybe they've changed as the name is misleading when compared to that statement. So if OpenStack wanted to get involved we would submit an application and have some mentors/projects lined up similar to Outreachy and Google Summer of Cone. Thanks, Amy (spotz) On Fri, Jan 17, 2020 at 5:44 AM Victoria Martínez de la Cruz < victoria at vmartinezdelacruz.com> wrote: > Hi Amy, > > This is great! > > How is that agnostic? IIRC it was all related to Ruby on Rails projects? > How OpenStack can join this effort? > > Thanks, > > V > > On Thu, Jan 16, 2020 at 9:55 AM Amy Marrich wrote: > >> Hi All, >> >> I was contacted about this program to see if OpenStack might be >> interested in participating and despite the name it is language agnostic. >> Moe information on the program can be found at Rails Girls Summer of Code >> , >> >> I'm willing to help organize our efforts but would need to know level of >> interest to participate and mentor. >> >> Thanks, >> >> Amy (spotz) >> Chair, Diversity and Inclusion WG >> Chair, User Committee >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Jan 17 13:31:24 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 17 Jan 2020 07:31:24 -0600 Subject: [qa][stable][tempest-plugins]: Tempest & plugins py2 jobs failure for stable branches (1860033: the EOLing python2 drama) In-Reply-To: References: <16fb1aa4aae.10e957b6324515.5822370422740200537@ghanshyammann.com> <20200117041005.cgxggu5wrv3amheh@yuggoth.org> <3DCDAE2D-4368-4A0B-BF8B-7AF4BA729055@redhat.com> Message-ID: <16fb3b385b4.cbec982347026.4712409073911949240@ghanshyammann.com> ---- On Fri, 17 Jan 2020 05:51:28 -0600 Bernard Cafarelli wrote ---- > On Fri, 17 Jan 2020 at 12:01, Slawek Kaplonski wrote: > Hi, > > > On 17 Jan 2020, at 11:14, Bernard Cafarelli wrote: > > > > On Fri, 17 Jan 2020 at 05:11, Jeremy Stanley wrote: > > On 2020-01-16 22:02:05 -0600 (-0600), Ghanshyam Mann wrote: > > [...] > > > Second option is to install the tempest and plugins in py3 env on > > > py2 jobs also. This should be an easy and preferred way. > > [...] > > > > This makes more sense anyway. Tempest and its plug-ins are already > > segregated from the system with a virtualenv due to conflicts with > > stable branch requirements, so hopefully switching that virtualenv > > to Python 3.x for all jobs is trivial (but I won't be surprised to > > learn there are subtle challenges hidden just beneath the surface). > > > > That sounds good for supported releases. Once we have them back in working order, I wonder how it will turn out for queens. > > In neutron, there is a recent failure [1] as this EM branch now uses a pinned version of the plugin. The fix there is most likely to also pin tempest - to queens-em [2] but then will also require some fix for the EOLing python2 drama. > > But if we will use for queens branch tempest pinned to queens-em tag, we shouldn’t have any such problems there as all requirements will be also used from queens branch, or am I missing something here? > Sadly not, from what I read in attempt [1] to limit neutron-lib to "old" version. And I see the same error in a test run with pinned tempest [2]:2020-01-16 14:44:18.741517 | controller | 2020-01-16 14:44:18.741 | Collecting neutron-lib===2.0.0 (from -c u-c-m.txt (line 79)) > 2020-01-16 14:44:19.023699 | controller | 2020-01-16 14:44:19.023 | Could not find a version that satisfies the requirement neutron-lib===2.0.0 (from -c u-c-m.txt (line 79)) (from versions: 0.0.1, 0.0.2, 0.0.3, 0.1.0, 0.2.0, 0.3.0, 0.4.0, 1.0.0, 1.1.0, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.9.0, 1.9.1, 1.9.2, 1.10.0, 1.11.0, 1.12.0, 1.13.0, 1.14.0, 1.15.0, 1.16.0, 1.17.0, 1.18.0, 1.19.0, 1.20.0, 1.21.0, 1.22.0, 1.23.0, 1.24.0, 1.25.0, 1.26.0, 1.27.0, 1.28.0, 1.29.0, 1.29.1, 1.30.0, 1.31.0) > 2020-01-16 14:44:19.042505 | controller | 2020-01-16 14:44:19.042 | No matching distribution found for neutron-lib===2.0.0 (from -c u-c-m.txt (line 79)) Yes, Temepst venv uses the upper constraint from master always. We need to cap the u-c also accordingly. I did not do for Ocata/Pike case which I will fix as they can start failing any time. For Tempest, it is straight forward where u-c has to be used of the branch corresponding to pin Tempest. But for plugins, it is complex. All tempest plugins are being installed one by one un single logic in devstack so using different constraint for different plugins might not be possible (there should not be much cases like that where job tests more than one plugins tests but there are few). Best possible solution I can think of is to cap all the Tempest plugins together with Tempest and use corresponding stable branch u-c. Or we modify devstack logic with if-else condition for plugins require cap and rest else will be master. Any other thought? -gmann > > [1] https://review.opendev.org/702986/[2] https://review.opendev.org/#/c/701900/ https://zuul.opendev.org/t/openstack/build/ee8021c1470a4fb88f55d64cc16ed15e > > > > As tempest is branchless, it looks like if we want to keep neutron-tempest-plugin tests for queens we will rather need solution 1 for this branch? (but let's focus first on getting the supported branches back in working order) > > > > [1] https://bugs.launchpad.net/neutron/+bug/1859988 > > [2] https://review.opendev.org/702868 > > > -- > Bernard Cafarelli > From tony.pearce at cinglevue.com Fri Jan 17 13:44:37 2020 From: tony.pearce at cinglevue.com (Tony Pearce) Date: Fri, 17 Jan 2020 21:44:37 +0800 Subject: DR options with openstack In-Reply-To: References: <5e201295.1c69fb81.a69b.d77d@mx.google.com> Message-ID: Hi Walter Thank you for the information. It's unfortunate about the lack of support from nimble. With regards to replication, nimble has their own software implementation that I'm currently using. The problem I face is that the replicated volumes have a different iqn, serial number and are accessed via a different array IP. I didn't get time to read up on freezer today but I'm hopeful that I can use something there. 🙂 On Fri, 17 Jan 2020, 21:10 Walter Boring, wrote: > Hi Tony, > Looking at the nimble driver, it has been removed from Cinder due to > lack of support and maintenance from the vendor. Also, > Looking at the code prior to it's removal, it didn't have any support for > replication and failover. Cinder is a community based opensource project > that relies on vendors, operators and users to contribute and support the > codebase. As a core member of the Cinder team, we do our best to provide > support for folks using Cinder and this mailing list and the > #openstack-cinder channel is the best mechanism to get in touch with us. > The #openstack-cinder irc channel is not a developer only channel. We > help when we can, but also remember we have our day jobs as well. > > Unfortunately Nimble stopped providing support for their driver quite a > while ago now and part of the Cinder policy to have a driver in tree is to > have CI (Continuous Integration) tests in place to ensure that cinder > patches don't break a driver. If the CI isn't in place, then the Cinder > team marks the driver as unsupported in a release, and the following > release the driver gets removed. > > All that being said, the nimbe driver never supported the cheesecake > replication/DR capabilities that were added in Cinder. > > Walt (hemna in irc) > > On Thu, Jan 16, 2020 at 2:49 AM Tony Pearce > wrote: > >> Hi all >> >> >> >> My questions are; >> >> >> >> 1. How are people using iSCSI Cinder storage with Openstack to-date? >> For example a Nimble Storage array backend. I mean to say, are people using >> backend integration drivers for other hardware (like netapp)? Or are they >> using backend iscsi for example? >> 2. How are people managing DR with Openstack in terms of backend >> storage replication to another array in another location and continuing to >> use Openstack? >> >> >> >> The environment which I am currently using; >> >> 1 x Nimble Storage array (iSCSI) with nimble.py Cinder driver >> >> 1 x virtualised Controller node >> >> 2 x physical compute nodes >> >> This is Openstack Pike. >> >> >> >> In addition, I have a 2nd Nimble Storage array in another location. >> >> >> >> To explain the questions I’d like to put forward my thoughts for question >> 2 first: >> >> For point 2 above, I have been searching for a way to utilise replicated >> volumes on the 2nd array from Openstack with existing instances. For >> example, if site 1 goes down how would I bring up openstack in the 2nd >> location and boot up the instances where their volumes are stored on the 2 >> nd array. I found a proposal for something called “cheesecake” ref: >> https://specs.openstack.org/openstack/cinder-specs/specs/rocky/cheesecake-promote-backend.html >> But I could not find if it had been approved or implemented. So I return >> to square 1. I have some thoughts about failing over the controller VM and >> compute node but I don’t think there’s any need to go into here because of >> the above blocker and for brevity anyway. >> >> >> >> The nimble.py driver which I am using came with Openstack Pike and it >> appears Nimble / HPE are not maintaining it any longer. I saw a commit to >> remove nimble.py in Openstack Train release. The driver uses the REST API >> to perform actions on the array. Such as creating a volume, downloading the >> image, mounting the volume to the instance, snapshots, clones etc. This is >> great for me because to date I have around 10TB of openstack storage data >> allocated and the Nimble array shows the amount of data being consumed is >> <900GB. This is due to the compression and zero-byte snapshots and clones. >> >> >> >> So coming back to question 2 – is it possible? Can you drop me some >> keywords that I can search for such as an Openstack component like >> Cheesecake? I think basically what I am looking for is a supported way of >> telling Openstack that the instance volumes are now located at the new / >> second array. This means a new cinder backend. Example, new iqn, IP >> address, volume serial number. I think I could probably hack the cinder db >> but I really want to avoid that. >> >> >> >> So failing the above, it brings me to the question 1 I asked before. How >> are people using Cinder volumes? May be I am going about this the wrong way >> and need to take a few steps backwards to go forwards? I need storage to be >> able to deploy instances onto. Snapshots and clones are desired. At the >> moment these operations take less time than the horizon dashboard takes to >> load because of the waiting API responses. >> >> >> >> When searching for information about the above as an end-user / consumer >> I get a bit concerned. Is it right that Openstack usage is dropping? >> There’s no web forum to post questions. The chatroom on freenode is filled >> with ~300 ghosts. Ask Openstack questions go without response. Earlier this >> week (before I found this mail list) I had to use facebook to report that >> the Openstack.org website had been hacked. Basically it seems that if >> you’re a developer that can write code then you’re in but that’s it. I have >> never been a coder and so I am somewhat stuck. >> >> >> >> Thanks in advance >> >> >> >> Sent from Mail for >> Windows 10 >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ionut at fleio.com Fri Jan 17 13:54:12 2020 From: ionut at fleio.com (Ionut Biru) Date: Fri, 17 Jan 2020 15:54:12 +0200 Subject: [magnum] subnet created in public network? Message-ID: Hello, I'm using magnum 9.2.0 and while trying to experiment with this version, i was finding out that while deploying the cluster, heat creates the subnet into the public network. In the past, on rocky and stein, magnum/heat was creating a new network, with a router and an port within the public network for connectivity. I was wondering, if this is the expected behavior (subnet in public network). How do I revert to the old way of having new network? -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Fri Jan 17 14:12:27 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Fri, 17 Jan 2020 09:12:27 -0500 Subject: [cinder][ops] new driver/new target driver merge deadline Message-ID: <18aa16da-10bc-a9db-1caf-96be65530f9c@gmail.com> Greetings to anyone developing a new Cinder driver (or target driver), or anyone trying to get someone to develop such a driver, This is a reminder that the deadline for merging a new backend driver or a new target driver to Cinder for the Ussuri release is the Ussuri-2 milestone on 13 February 2020 (23:59 UTC). New drivers must be (a) code complete including unit tests, (b) merged into the code repository, and (c) must have a 3rd Party CI running reliably. (The idea is that new drivers will be included in a release at the second milestone and thus be easily available for downstream testing, documentation feedback, etc.) You can find more information about Cinder drivers here: https://docs.openstack.org/cinder/latest/drivers-all-about.html and you can ask questions in #openstack-cinder on IRC or here on the mailing list. cheers, brian From abishop at redhat.com Fri Jan 17 14:17:30 2020 From: abishop at redhat.com (Alan Bishop) Date: Fri, 17 Jan 2020 06:17:30 -0800 Subject: Cinder snapshot delete successful when expected to fail In-Reply-To: References: Message-ID: On Fri, Jan 17, 2020 at 2:01 AM Tony Pearce wrote: > Could anyone help by pointing me where to go to be able to dig into this > issue further? > > I have installed a test Openstack environment using RDO Packstack. I > wanted to install the same version that I have in Production (Pike) but > it's not listed in the CentOS repo via yum search. So I installed Queens. I > am using nimble.py Cinder driver. Nimble Storage is a storage array > accessed via iscsi from the Openstack host, and is controlled from > Openstack by the driver and API. > > *What I expected to happen:* > 1. create an instance with volume (the volume is created on the storage > array successfully and instance boots from it) > 2. take a snapshot (snapshot taken on the volume on the array > successfully) > 3. create a new instance from the snapshot (the api tells the array to > clone the snapshot into a new volume on the array and use that volume for > the instance) > 4. try and delete the snapshot > Expected Result - Openstack gives the user a message like "you're not > allowed to do that". > > Note: Step 3 above creates a child volume from the parent snapshot. It's > impossible to delete the parent snapshot because IO READ is sent to that > part of the original volume (as I understand it). > > *My production problem is this: * > 1. create an instance with volume (the volume is created on the storage > array successfully) > 2. take a snapshot (snapshot taken on the volume on the array > successfully) > 3. create a new instance from the snapshot (the api tells the array to > clone the snapshot into a new volume on the array and use that volume for > the instance) > 4. try and delete the snapshot > Result - snapshot goes into error state and later, all Cinder operations > fail such as new instance/create volume etc. until the correct service is > restarted. Then everything works once again. > > > To troubleshoot the above, I installed the RDP Packstack Queens (because I > couldnt get Pike). I tested the above and now, the result is the snapshot > is successfully deleted from openstack but not deleted on the array. The > log is below for reference. But I can see the in the log that the array > sends back info to openstack saying the snapshot has a clone and the delete > cannot be done because of that. Also response code 409. > > *Some info about why the problem with Pike started in the first place* > 1. Vendor is Nimble Storage which HPE purchased > 2. HPE/Nimble have dropped support for openstack. Latest supported version > is Queens and Nimble array version v4.x. The current Array version is v5.x. > Nimble say there are no guarantees with openstack, the driver and the array > version v5.x > 3. I was previously advised by Nimble that the array version v5.x will > work fine and so left our DR array on v5.x with a pending upgrade that had > a blocker due to an issue. This issue was resolved in December and the > pending upgrade completed to match the DR array took place around 30 days > ago. > > > With regards to the production issue, I assumed that the array API has > some changes between v4.x and v5.x and it's causing an issue with Cinder > due to the API response. Although I have not been able to find out if or > what changes there are that may have occurred after the array upgrade, as > the documentation for this is Nimble internal-only. > > > *So with that - some questions if I may:* > When Openstack got the 409 error response from the API (as seen in the > log below), why would Openstack then proceed to delete the snapshot on the > Openstack side? How could I debug this further? I'm not sure what Openstack > Cinder is acting on in terns of the response as yet. Maybe Openstack is not > specifically looking for the error code in the response? > > The snapshot that got deleted on the openstack side is a problem. Would > this be related to the driver? Could it be possible that the driver did not > pass the error response to Cinder? > Hi Tony, This is exactly what happened, and it appears to be a driver bug introduced in queens by [1]. The code in question [2] logs the error, but fails to propagate the exception. As far as the volume manager is concerned, the snapshot deletion was successful. [1] https://review.opendev.org/601492 [2] https://opendev.org/openstack/cinder/src/branch/stable/queens/cinder/volume/drivers/nimble.py#L1815 Alan Thanks in advance. Just for reference, the log snippet is below. > > > ==> volume.log <== >> 2020-01-17 16:53:23.718 24723 WARNING py.warnings >> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >> 87e34c89e6fb41d2af25085b64011a55 - default default] >> /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852: >> InsecureRequestWarning: Unverified HTTPS request is being made. Adding >> certificate verification is strongly advised. See: >> https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings >> InsecureRequestWarning) >> : NimbleAPIException: Failed to execute api >> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >> ==> api.log <== >> 2020-01-17 16:53:23.769 25242 INFO cinder.api.openstack.wsgi >> [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814 >> 87e34c89e6fb41d2af25085b64011a55 - default default] >> http://192.168.53.45:8776/v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail >> returned with HTTP 200 >> 2020-01-17 16:53:23.770 25242 INFO eventlet.wsgi.server >> [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814 >> 87e34c89e6fb41d2af25085b64011a55 - default default] 192.168.53.45 "GET >> /v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail HTTP/1.1" status: 200 >> len: 4657 time: 0.1152730 >> ==> volume.log <== >> 2020-01-17 16:53:23.811 24723 WARNING py.warnings >> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >> 87e34c89e6fb41d2af25085b64011a55 - default default] >> /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852: >> InsecureRequestWarning: Unverified HTTPS request is being made. Adding >> certificate verification is strongly advised. See: >> https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings >> InsecureRequestWarning) >> : NimbleAPIException: Failed to execute api >> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >> 2020-01-17 16:53:23.902 24723 ERROR cinder.volume.drivers.nimble >> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >> 87e34c89e6fb41d2af25085b64011a55 - default default] Re-throwing Exception >> Failed to execute api snapshots/0464a5fd65d75fcfe1000000000000011100001a41: >> Error Code: 409 Message: Snapshot >> snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.: >> NimbleAPIException: Failed to execute api >> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >> 2020-01-17 16:53:23.903 24723 WARNING cinder.volume.drivers.nimble >> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >> 87e34c89e6fb41d2af25085b64011a55 - default default] Snapshot >> snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 : has a clone: >> NimbleAPIException: Failed to execute api >> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >> 2020-01-17 16:53:23.964 24723 WARNING cinder.quota >> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >> 87e34c89e6fb41d2af25085b64011a55 - default default] Deprecated: Default >> quota for resource: snapshots_Nimble-DR is set by the default quota flag: >> quota_snapshots_Nimble-DR, it is now deprecated. Please use the default >> quota class for default quota. >> 2020-01-17 16:53:24.054 24723 INFO cinder.volume.manager >> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >> 87e34c89e6fb41d2af25085b64011a55 - default default] Delete snapshot >> completed successfully. > > > > Regards, > > *Tony Pearce* | > *Senior Network Engineer / Infrastructure Lead**Cinglevue International > * > > Email: tony.pearce at cinglevue.com > Web: http://www.cinglevue.com > > *Australia* > 1 Walsh Loop, Joondalup, WA 6027 Australia. > > Direct: +61 8 6202 0036 | Main: +61 8 6202 0024 > > Note: This email and all attachments are the sole property of Cinglevue > International Pty Ltd. (or any of its subsidiary entities), and the > information contained herein must be considered confidential, unless > specified otherwise. If you are not the intended recipient, you must not > use or forward the information contained in these documents. If you have > received this message in error, please delete the email and notify the > sender. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Jan 17 14:50:32 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 17 Jan 2020 14:50:32 +0000 Subject: [qa][stable][tempest-plugins]: Tempest & plugins py2 jobs failure for stable branches (1860033: the EOLing python2 drama) In-Reply-To: References: <16fb1aa4aae.10e957b6324515.5822370422740200537@ghanshyammann.com> <20200117041005.cgxggu5wrv3amheh@yuggoth.org> Message-ID: <20200117145031.j5tr6avxr3v7hdeg@yuggoth.org> On 2020-01-17 08:49:32 +0100 (+0100), Radosław Piliszek wrote: [...] > ERROR: No matching distribution found for neutron-lib===2.0.0 (from -c > u-c-m.txt (line 79)) > > and the reason is: > pypi: data-requires-python=">=3.6" > > 3.5 < 3.6 > > Need some newer python in there. [...] Or older neutron-lib? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From thierry at openstack.org Fri Jan 17 14:58:08 2020 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 17 Jan 2020 15:58:08 +0100 Subject: [Release-job-failures] release-post job for openstack/releases for ref refs/heads/master failed In-Reply-To: References: Message-ID: zuul at openstack.org wrote: > Build failed. > > - tag-releases https://zuul.opendev.org/t/openstack/build/a414023508294a65abe9715546757e41 : POST_FAILURE in 5m 13s > - publish-tox-docs-static https://zuul.opendev.org/t/openstack/build/None : SKIPPED There was an error running the post-job tasks on the tag job on https://review.opendev.org/702925. While trying to collect log output: ssh: connect to host 38.108.68.119 port 22: No route to host This looks like a transient error, and it can be ignored (the job itself had run and was a NOOP anyway). -- Thierry Carrez (ttx) From fungi at yuggoth.org Fri Jan 17 15:10:03 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 17 Jan 2020 15:10:03 +0000 Subject: [qa][stable][tempest-plugins]: Tempest & plugins py2 jobs failure for stable branches (1860033: the EOLing python2 drama) In-Reply-To: <16fb3b385b4.cbec982347026.4712409073911949240@ghanshyammann.com> References: <16fb1aa4aae.10e957b6324515.5822370422740200537@ghanshyammann.com> <20200117041005.cgxggu5wrv3amheh@yuggoth.org> <3DCDAE2D-4368-4A0B-BF8B-7AF4BA729055@redhat.com> <16fb3b385b4.cbec982347026.4712409073911949240@ghanshyammann.com> Message-ID: <20200117151003.hpho3gjmnldk6bdd@yuggoth.org> On 2020-01-17 07:31:24 -0600 (-0600), Ghanshyam Mann wrote: [...] > Best possible solution I can think of is to cap all the Tempest > plugins together with Tempest and use corresponding stable branch > u-c. Or we modify devstack logic with if-else condition for > plugins require cap and rest else will be master. Any other > thought? [...] Constraints is going to be at odds with PEP 503 data-requires-python signaling. If we didn't include neutron-lib in the constraints list for Tempest's virtualenv (maybe filter it out with the edit-constraints tool) then pip should select the highest possible version which matches the versionspec in the requirements list and supports the Python interpreter with which that virtualenv was built. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From i at liuyulong.me Fri Jan 17 15:11:09 2020 From: i at liuyulong.me (=?utf-8?B?TElVIFl1bG9uZw==?=) Date: Fri, 17 Jan 2020 23:11:09 +0800 Subject: [Neutron] cancel neutron L3 meeting Message-ID: Hi all, Hi guys, due to the Chinese Spring Festival I will be offline in next two weeks. So I will not be available to chair the L3 meeting. Let's cancel the next two meetings. Then the L3 meeting will be rescheduled on 5th Feb, 2020. OK, see you guys then.  And happy Chinese New Year! 春节快乐! Regards, LIU Yulong -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Fri Jan 17 15:22:25 2020 From: openstack at fried.cc (Eric Fried) Date: Fri, 17 Jan 2020 09:22:25 -0600 Subject: [nova] Nova CI busted, please hold rechecks Message-ID: <4f03483c-3702-b71f-baca-43585096ca10@fried.cc> The nova-live-migration job is failing 100% since yesterday morning [1]. Your rechecks won't work until that's resolved. I'll send an all-clear message when we're green again. Thanks, efried [1] https://bugs.launchpad.net/nova/+bug/1860021 From madhuri.kumari at intel.com Fri Jan 17 15:31:17 2020 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Fri, 17 Jan 2020 15:31:17 +0000 Subject: [ironic][nova][neutron][cloud-init] Infiniband Support in OpenStack Message-ID: <0512CBBECA36994BAA14C7FEDE986CA61A5528B0@BGSMSX102.gar.corp.intel.com> Hi, I am trying to deploy a node with infiniband in Ironic without any success. The node has two interfaces, eth0 and ib0. The deployment is successful, node becomes active but is not reachable. I debugged and checked that the issue is with cloud-init. The cloud-init fails to configure the network interfaces on the node complaining that the MAC address of infiniband port(ib0) is not known to the node. Ironic provides a fake MAC address for infiniband ports and cloud-init is supposed to generate the actual MAC address of infiband ports[1]. But it fails[2] before reaching there. I have posted the issue in cloud-init[3] as well. Can someone please help me with this issue? How do we specify "TYPE=InfiniBand" from OpenStack? Currently the type sent is "phy" only. [1] https://github.com/canonical/cloud-init/blob/master/cloudinit/sources/helpers/openstack.py#L686 [2] https://github.com/canonical/cloud-init/blob/master/cloudinit/sources/helpers/openstack.py#L677 [3] https://bugs.launchpad.net/cloud-init/+bug/1857031 Regards, Madhuri -------------- next part -------------- An HTML attachment was scrubbed... URL: From alawson at aqorn.com Fri Jan 17 15:54:36 2020 From: alawson at aqorn.com (Adam Peacock) Date: Fri, 17 Jan 2020 21:24:36 +0530 Subject: DR options with openstack In-Reply-To: References: <5e201295.1c69fb81.a69b.d77d@mx.google.com> Message-ID: I'm traveling in India right now and will reply later. I've architected several large OpenStack clouds from Cisco to Juniper to SAP to AT&T to HPE to Wells Fargo to -- you name it. Will share some things we've done regarding DR and more specifically how we handled replication and dividing the cloud up so it made sense from a design and operational perspective. Also, we need to be clear not everyone leans towards being a developer or even *wants* to go in that direction when using OpenStack. In fact, most don't and if there is that expectation by those entrenched with the OpenStack product, the OpenStack option gets dropped in favor of something else. It's developer-friendly but we need to be mega-mega-careful, as a community, to ensure development isn't the baseline or assumption for adequate support or to get questions answered. Especially since we've converged our communication channels. /soapbox More later. //adam On Fri, Jan 17, 2020, 7:19 PM Tony Pearce wrote: > Hi Walter > > Thank you for the information. > It's unfortunate about the lack of support from nimble. > > With regards to replication, nimble has their own software implementation > that I'm currently using. The problem I face is that the replicated volumes > have a different iqn, serial number and are accessed via a different array > IP. > > I didn't get time to read up on freezer today but I'm hopeful that I can > use something there. 🙂 > > > On Fri, 17 Jan 2020, 21:10 Walter Boring, wrote: > >> Hi Tony, >> Looking at the nimble driver, it has been removed from Cinder due to >> lack of support and maintenance from the vendor. Also, >> Looking at the code prior to it's removal, it didn't have any support for >> replication and failover. Cinder is a community based opensource project >> that relies on vendors, operators and users to contribute and support the >> codebase. As a core member of the Cinder team, we do our best to provide >> support for folks using Cinder and this mailing list and the >> #openstack-cinder channel is the best mechanism to get in touch with us. >> The #openstack-cinder irc channel is not a developer only channel. We >> help when we can, but also remember we have our day jobs as well. >> >> Unfortunately Nimble stopped providing support for their driver quite a >> while ago now and part of the Cinder policy to have a driver in tree is to >> have CI (Continuous Integration) tests in place to ensure that cinder >> patches don't break a driver. If the CI isn't in place, then the Cinder >> team marks the driver as unsupported in a release, and the following >> release the driver gets removed. >> >> All that being said, the nimbe driver never supported the cheesecake >> replication/DR capabilities that were added in Cinder. >> >> Walt (hemna in irc) >> >> On Thu, Jan 16, 2020 at 2:49 AM Tony Pearce >> wrote: >> >>> Hi all >>> >>> >>> >>> My questions are; >>> >>> >>> >>> 1. How are people using iSCSI Cinder storage with Openstack >>> to-date? For example a Nimble Storage array backend. I mean to say, are >>> people using backend integration drivers for other hardware (like netapp)? >>> Or are they using backend iscsi for example? >>> 2. How are people managing DR with Openstack in terms of backend >>> storage replication to another array in another location and continuing to >>> use Openstack? >>> >>> >>> >>> The environment which I am currently using; >>> >>> 1 x Nimble Storage array (iSCSI) with nimble.py Cinder driver >>> >>> 1 x virtualised Controller node >>> >>> 2 x physical compute nodes >>> >>> This is Openstack Pike. >>> >>> >>> >>> In addition, I have a 2nd Nimble Storage array in another location. >>> >>> >>> >>> To explain the questions I’d like to put forward my thoughts for >>> question 2 first: >>> >>> For point 2 above, I have been searching for a way to utilise replicated >>> volumes on the 2nd array from Openstack with existing instances. For >>> example, if site 1 goes down how would I bring up openstack in the 2nd >>> location and boot up the instances where their volumes are stored on the 2 >>> nd array. I found a proposal for something called “cheesecake” ref: >>> https://specs.openstack.org/openstack/cinder-specs/specs/rocky/cheesecake-promote-backend.html >>> But I could not find if it had been approved or implemented. So I return >>> to square 1. I have some thoughts about failing over the controller VM and >>> compute node but I don’t think there’s any need to go into here because of >>> the above blocker and for brevity anyway. >>> >>> >>> >>> The nimble.py driver which I am using came with Openstack Pike and it >>> appears Nimble / HPE are not maintaining it any longer. I saw a commit to >>> remove nimble.py in Openstack Train release. The driver uses the REST API >>> to perform actions on the array. Such as creating a volume, downloading the >>> image, mounting the volume to the instance, snapshots, clones etc. This is >>> great for me because to date I have around 10TB of openstack storage data >>> allocated and the Nimble array shows the amount of data being consumed is >>> <900GB. This is due to the compression and zero-byte snapshots and clones. >>> >>> >>> >>> So coming back to question 2 – is it possible? Can you drop me some >>> keywords that I can search for such as an Openstack component like >>> Cheesecake? I think basically what I am looking for is a supported way of >>> telling Openstack that the instance volumes are now located at the new / >>> second array. This means a new cinder backend. Example, new iqn, IP >>> address, volume serial number. I think I could probably hack the cinder db >>> but I really want to avoid that. >>> >>> >>> >>> So failing the above, it brings me to the question 1 I asked before. How >>> are people using Cinder volumes? May be I am going about this the wrong way >>> and need to take a few steps backwards to go forwards? I need storage to be >>> able to deploy instances onto. Snapshots and clones are desired. At the >>> moment these operations take less time than the horizon dashboard takes to >>> load because of the waiting API responses. >>> >>> >>> >>> When searching for information about the above as an end-user / consumer >>> I get a bit concerned. Is it right that Openstack usage is dropping? >>> There’s no web forum to post questions. The chatroom on freenode is filled >>> with ~300 ghosts. Ask Openstack questions go without response. Earlier this >>> week (before I found this mail list) I had to use facebook to report that >>> the Openstack.org website had been hacked. Basically it seems that if >>> you’re a developer that can write code then you’re in but that’s it. I have >>> never been a coder and so I am somewhat stuck. >>> >>> >>> >>> Thanks in advance >>> >>> >>> >>> Sent from Mail for >>> Windows 10 >>> >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Jan 17 16:30:32 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 17 Jan 2020 10:30:32 -0600 Subject: [qa][stable][tempest-plugins]: Tempest & plugins py2 jobs failure for stable branches (1860033: the EOLing python2 drama) In-Reply-To: <20200117151003.hpho3gjmnldk6bdd@yuggoth.org> References: <16fb1aa4aae.10e957b6324515.5822370422740200537@ghanshyammann.com> <20200117041005.cgxggu5wrv3amheh@yuggoth.org> <3DCDAE2D-4368-4A0B-BF8B-7AF4BA729055@redhat.com> <16fb3b385b4.cbec982347026.4712409073911949240@ghanshyammann.com> <20200117151003.hpho3gjmnldk6bdd@yuggoth.org> Message-ID: <16fb45784f9.f37c163a56011.2179039106164150297@ghanshyammann.com> ---- On Fri, 17 Jan 2020 09:10:03 -0600 Jeremy Stanley wrote ---- > On 2020-01-17 07:31:24 -0600 (-0600), Ghanshyam Mann wrote: > [...] > > Best possible solution I can think of is to cap all the Tempest > > plugins together with Tempest and use corresponding stable branch > > u-c. Or we modify devstack logic with if-else condition for > > plugins require cap and rest else will be master. Any other > > thought? > [...] > > Constraints is going to be at odds with PEP 503 data-requires-python > signaling. If we didn't include neutron-lib in the constraints list > for Tempest's virtualenv (maybe filter it out with the > edit-constraints tool) then pip should select the highest possible > version which matches the versionspec in the requirements list and > supports the Python interpreter with which that virtualenv was > built. There will be more lib like neutron-lib, basically all dependency of Tempest or plugins that will become py2 incompatible day by day. -gmann > -- > Jeremy Stanley > From cboylan at sapwetik.org Fri Jan 17 17:35:37 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Fri, 17 Jan 2020 09:35:37 -0800 Subject: [tc][infra] Splitting OpenDev out of OpenStack Governance Message-ID: Hello, About six weeks ago I kicked off a discussion on what the future of OpenDev's governance looks like [0]. I think we had expected part of this process would be to split OpenDev out of OpenStack's governance, but wanted to be sure we had a bit more of a plan before we made that official. After some discussion in this thread [0] I believe we now have enough of a plan to move forward on splitting out. I've pushed a change [1] to the openstack/governance repo to make this official in git. I also wanted to make sure this change had some visibility so am sending this email too. Now for some background. The OpenDev effort intends to make our software development tools and processes available to projects outside of the OpenStack itself. We have actually made these resources available since Stackforge, but one of the major concerns we hear over and over is the implication that a project hosted on our platforms is still "OpenStack". The next step to avoiding this confusion and better reflecting our goals is to formally remove OpenDev from OpenStack's governance. As mentioned in the original thread [0], OpenDev would still incorporate input from the OpenStack project as one of its users. We aren't going away and will continue to work closely together to meet OpenStack's needs. But now we'll formalize doing that with other projects as well. Feedback is still welcome, though I ask people to read through the original thread [0] first. Please let me know if there is anything else I can do to help with this process. [0] http://lists.openstack.org/pipermail/openstack-infra/2019-December/006537.html [1] https://review.opendev.org/703134 Clark From fungi at yuggoth.org Fri Jan 17 17:57:13 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 17 Jan 2020 17:57:13 +0000 Subject: [qa][stable][tempest-plugins]: Tempest & plugins py2 jobs failure for stable branches (1860033: the EOLing python2 drama) In-Reply-To: <16fb45784f9.f37c163a56011.2179039106164150297@ghanshyammann.com> References: <16fb1aa4aae.10e957b6324515.5822370422740200537@ghanshyammann.com> <20200117041005.cgxggu5wrv3amheh@yuggoth.org> <3DCDAE2D-4368-4A0B-BF8B-7AF4BA729055@redhat.com> <16fb3b385b4.cbec982347026.4712409073911949240@ghanshyammann.com> <20200117151003.hpho3gjmnldk6bdd@yuggoth.org> <16fb45784f9.f37c163a56011.2179039106164150297@ghanshyammann.com> Message-ID: <20200117175713.bz63guojhxa6raa3@yuggoth.org> On 2020-01-17 10:30:32 -0600 (-0600), Ghanshyam Mann wrote: > ---- On Fri, 17 Jan 2020 09:10:03 -0600 Jeremy Stanley wrote ---- > > On 2020-01-17 07:31:24 -0600 (-0600), Ghanshyam Mann wrote: > > [...] > > > Best possible solution I can think of is to cap all the Tempest > > > plugins together with Tempest and use corresponding stable branch > > > u-c. Or we modify devstack logic with if-else condition for > > > plugins require cap and rest else will be master. Any other > > > thought? > > [...] > > > > Constraints is going to be at odds with PEP 503 data-requires-python > > signaling. If we didn't include neutron-lib in the constraints list > > for Tempest's virtualenv (maybe filter it out with the > > edit-constraints tool) then pip should select the highest possible > > version which matches the versionspec in the requirements list and > > supports the Python interpreter with which that virtualenv was > > built. > > There will be more lib like neutron-lib, basically all dependency of Tempest > or plugins that will become py2 incompatible day by day. Yes, but the problem here isn't Python 2.7 incompatibility; it's Python 3.5 incompatibility. We can't run current Tempest on Ubuntu 16.04 LTS without installing a custom Python 3.6 interpreter. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ignaziocassano at gmail.com Fri Jan 17 18:30:13 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 17 Jan 2020 19:30:13 +0100 Subject: [queens][nova] iscsi issue Message-ID: Hello all we are testing openstack queens cinder driver for Unity iscsi (driver cinder.volume.drivers.dell_emc.unity.Driver). The unity storage is a Unity600 Version 4.5.10.5.001 We are facing an issue when we try to detach volume from a virtual machine with two or more volumes attached (this happens often but not always): The following is reported nova-compute.log: 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server self.connector.disconnect_volume(connection_info['data'], None) 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/os_brick/utils.py", line 137, in trace_logging_wrapper 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server return f(*args, **kwargs) 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server return f(*args, **kwargs) 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py", line 848, in disconnect_volume 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server device_info=device_info) 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py", line 892, in _cleanup_connection 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server path_used, was_multipath) 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py", line 271, in remove_connection 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server self.flush_multipath_device(multipath_name) 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py", line 329, in flush_multipath_device 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server root_helper=self._root_helper) 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52, in _execute 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server result = self.__execute(*args, **kwargs) 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py", line 169, in execute 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server return execute_root(*cmd, **kwargs) 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, in _wrap 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server return self.channel.remote_call(name, args, kwargs) 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server raise exc_type(*result[2]) 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server ProcessExecutionError: Unexpected error while running command. 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server Command: multipath -f 36006016006e04400d0c4215e3ec55757 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server Exit code: 1 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server Stdout: u'Jan 17 16:04:30 | 36006016006e04400d0c4215e3ec55757p1: map in use\nJan 17 16:04:31 | failed to remove multipath map 36006016006e04400d0c4215e3ec55757\n' 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server Stderr: u'' 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server Best Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Jan 17 18:39:15 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 17 Jan 2020 18:39:15 +0000 Subject: DR options with openstack In-Reply-To: References: <5e201295.1c69fb81.a69b.d77d@mx.google.com> Message-ID: <20200117183915.gugkawaqx42z6uvs@yuggoth.org> On 2020-01-17 21:24:36 +0530 (+0530), Adam Peacock wrote: [...] > Also, we need to be clear not everyone leans towards being a > developer or even *wants* to go in that direction when using > OpenStack. In fact, most don't and if there is that expectation by > those entrenched with the OpenStack product, the OpenStack option > gets dropped in favor of something else. It's developer-friendly > but we need to be mega-mega-careful, as a community, to ensure > development isn't the baseline or assumption for adequate support > or to get questions answered. Especially since we've converged our > communication channels. [...] Most users probably won't become developers on OpenStack, but some will, and I believe its long-term survival depends on that so we should do everything we can to encourage it. Users may also contribute in a variety of other ways like bug reporting and triage, outreach, revising or translating documentation, and so on. OpenStack isn't a "product," it's a community software collaboration on which many companies have built products (either by running it as a service or selling support for it). Treating the community the way you might treat a paid vendor is where all of this goes to a bad place very quickly. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From radoslaw.piliszek at gmail.com Fri Jan 17 18:42:00 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Fri, 17 Jan 2020 19:42:00 +0100 Subject: [api][sdk][dev][oslo] using uWSGI breaks CORS config Message-ID: Fellow Devs, as you might have noticed I started taking care of openstack/js-openstack-lib, now under the openstacksdk umbrella [1]. First goal is to modernize the CI to use Zuul v3, current devstack and nodejs, still WIP [2]. As part of the original suite of tests, the unit and functional tests are run from browsers as well as from node. And, as you may know, browsers care about CORS [3]. js-openstack-lib is connecting to various OpenStack APIs (currently limited to keystone, glance, neutron and nova) to act on behalf of the user (just like openstacksdk/client does). oslo.middleware, as used by those APIs, provides a way to configure CORS by setting params in the [cors] group but uWSGI seemingly ignores that completely [4]. I had to switch to mod_wsgi+apache instead of uwsgi+apache to get past that issue. I could not reproduce locally because kolla (thankfully) uses mostly mod_wsgi atm. The issue I see is that uWSGI is proposed as the future and mod_wsgi is termed deprecated. However, this means the future is broken w.r.t. CORS and so any modern web interface with it if not sitting on the exact same host and port (which is usually different between OpenStack APIs and any UI). [1] https://review.opendev.org/701854 [2] https://review.opendev.org/702132 [3] https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS [4] https://github.com/unbit/uwsgi/issues/1550 -yoctozepto From cboylan at sapwetik.org Fri Jan 17 22:11:23 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Fri, 17 Jan 2020 14:11:23 -0800 Subject: =?UTF-8?Q?Re:_[ironic][nova][neutron][cloud-init]_Infiniband_Support_in_?= =?UTF-8?Q?OpenStack?= In-Reply-To: <0512CBBECA36994BAA14C7FEDE986CA61A5528B0@BGSMSX102.gar.corp.intel.com> References: <0512CBBECA36994BAA14C7FEDE986CA61A5528B0@BGSMSX102.gar.corp.intel.com> Message-ID: On Fri, Jan 17, 2020, at 7:31 AM, Kumari, Madhuri wrote: > > Hi, > > > I am trying to deploy a node with infiniband in Ironic without any success. > > > The node has two interfaces, eth0 and ib0. The deployment is > successful, node becomes active but is not reachable. I debugged and > checked that the issue is with cloud-init. The cloud-init fails to > configure the network interfaces on the node complaining that the MAC > address of infiniband port(ib0) is not known to the node. Ironic > provides a fake MAC address for infiniband ports and cloud-init is > supposed to generate the actual MAC address of infiband ports[1]. But > it fails[2] before reaching there. Reading the cloud-init code [4][5] it appears that the ethernet format MAC should match bytes 13-15 + 18-20 of the infiniband address. Is the problem here that the fake MAC supplied is unrelated to the actual infiniband address? If so I think you'll either need cloud-init to ignore unknown interfaces (as proposed in the cloud-init bug), or have Ironic supply the mac address as bytes 13-15 + 18-20 of the actual infiniband address. > > I have posted the issue in cloud-init[3] as well. > > > Can someone please help me with this issue? How do we specify > “TYPE=InfiniBand” from OpenStack? Currently the type sent is “phy” only. > > > [1] > https://github.com/canonical/cloud-init/blob/master/cloudinit/sources/helpers/openstack.py#L686 > > [2] > https://github.com/canonical/cloud-init/blob/master/cloudinit/sources/helpers/openstack.py#L677 > > [3] https://bugs.launchpad.net/cloud-init/+bug/1857031 [4] https://github.com/canonical/cloud-init/blob/9bfb2ba7268e2c3c932023fc3d3020cdc6d6cc18/cloudinit/net/__init__.py#L793-L795 [5] https://github.com/canonical/cloud-init/blob/9bfb2ba7268e2c3c932023fc3d3020cdc6d6cc18/cloudinit/net/__init__.py#L844-L846 From Albert.Braden at synopsys.com Fri Jan 17 22:17:25 2020 From: Albert.Braden at synopsys.com (Albert Braden) Date: Fri, 17 Jan 2020 22:17:25 +0000 Subject: Galera config values Message-ID: I'm experimenting with Galera in my Rocky openstack-ansible dev cluster, and I'm finding that the default haproxy config values don't seem to work. Finding the correct values is a lot of work. For example, I spent this morning experimenting with different values for "timeout client" in /etc/haproxy/haproxy.cfg. The default is 1m, and with the default set I see this error in /var/log/nova/nova-scheduler.log on the controllers: 2020-01-17 13:54:26.059 443358 ERROR oslo_db.sqlalchemy.engines DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: u'SELECT 1'] (Background on this error at: http://sqlalche.me/e/e3q8) There are several timeout values in /etc/haproxy/haproxy.cfg. These are the values we started with: stats timeout 30s timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout check 10s At first I changed them all to 30m. This stopped the "Lost connection" error in nova-scheduler.log. Then, one at a time, I changed them back to the default. When I got to "timeout client" I found that setting it back to 1m caused the errors to start again. I changed it back and forth and found that 4 minutes causes errors, and 6m stops them, so I left it at 6m. These are my active variables: root at us01odc-dev2-ctrl1:/etc/mysql# mysql -e 'show variables;'|grep timeout connect_timeout 20 deadlock_timeout_long 50000000 deadlock_timeout_short 10000 delayed_insert_timeout 300 idle_readonly_transaction_timeout 0 idle_transaction_timeout 0 idle_write_transaction_timeout 0 innodb_flush_log_at_timeout 1 innodb_lock_wait_timeout 50 innodb_rollback_on_timeout OFF interactive_timeout 28800 lock_wait_timeout 86400 net_read_timeout 30 net_write_timeout 60 rpl_semi_sync_master_timeout 10000 rpl_semi_sync_slave_kill_conn_timeout 5 slave_net_timeout 60 thread_pool_idle_timeout 60 wait_timeout 3600 So it looks like the value of "timeout client" in haproxy.cfg needs to match or exceed the value of "wait_timeout" in mysql. Also in nova.conf I see "#connection_recycle_time = 3600" - I need to experiment to see how that value interacts with the timeouts in the other config files. Is this the best way to find the correct config values? It seems like there should be a document that talks about these timeouts and how to set them (or maybe more generally how the different timeout settings in the various config files interact). Does that document exist? If not, maybe I could write one, since I have to figure out the correct values anyway. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eblock at nde.ag Fri Jan 17 22:34:58 2020 From: eblock at nde.ag (Eugen Block) Date: Fri, 17 Jan 2020 22:34:58 +0000 Subject: Galera config values In-Reply-To: Message-ID: <20200117223458.Horde.JLSaQRGPwIoHALX8zGRcgmW@webmail.nde.ag> Hi, I'm pretty sure you'll have to figure it out yourself. I always found the deployment guides quite good, I got my cloud running without major issues. But when it comes to HA configuration the guide lacks many information. I had to fiure out many details on my own, though haproxy is currently not in use here. > So it looks like the value of "timeout client" in haproxy.cfg needs > to match or exceed the value of "wait_timeout" in mysql. Although I'm not entirely sure I tend to agree with you. Dealing with a Ceph RGW deployment I encountered a similar issue and had to increase some timeout values to get it working. I'm convinced that many people would appreciate if you created a doc for haproxy. Regards, Eugen Zitat von Albert Braden : > I'm experimenting with Galera in my Rocky openstack-ansible dev > cluster, and I'm finding that the default haproxy config values > don't seem to work. Finding the correct values is a lot of work. For > example, I spent this morning experimenting with different values > for "timeout client" in /etc/haproxy/haproxy.cfg. The default is > 1m, and with the default set I see this error in > /var/log/nova/nova-scheduler.log on the controllers: > > 2020-01-17 13:54:26.059 443358 ERROR oslo_db.sqlalchemy.engines > DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost > connection to MySQL server during query') [SQL: u'SELECT 1'] > (Background on this error at: http://sqlalche.me/e/e3q8) > > There are several timeout values in /etc/haproxy/haproxy.cfg. These > are the values we started with: > > stats timeout 30s > timeout http-request 10s > timeout queue 1m > timeout connect 10s > timeout client 1m > timeout server 1m > timeout check 10s > > At first I changed them all to 30m. This stopped the "Lost > connection" error in nova-scheduler.log. Then, one at a time, I > changed them back to the default. When I got to "timeout client" I > found that setting it back to 1m caused the errors to start again. I > changed it back and forth and found that 4 minutes causes errors, > and 6m stops them, so I left it at 6m. > > These are my active variables: > > root at us01odc-dev2-ctrl1:/etc/mysql# mysql -e 'show variables;'|grep timeout > connect_timeout 20 > deadlock_timeout_long 50000000 > deadlock_timeout_short 10000 > delayed_insert_timeout 300 > idle_readonly_transaction_timeout 0 > idle_transaction_timeout 0 > idle_write_transaction_timeout 0 > innodb_flush_log_at_timeout 1 > innodb_lock_wait_timeout 50 > innodb_rollback_on_timeout OFF > interactive_timeout 28800 > lock_wait_timeout 86400 > net_read_timeout 30 > net_write_timeout 60 > rpl_semi_sync_master_timeout 10000 > rpl_semi_sync_slave_kill_conn_timeout 5 > slave_net_timeout 60 > thread_pool_idle_timeout 60 > wait_timeout 3600 > > So it looks like the value of "timeout client" in haproxy.cfg needs > to match or exceed the value of "wait_timeout" in mysql. Also in > nova.conf I see "#connection_recycle_time = 3600" - I need to > experiment to see how that value interacts with the timeouts in the > other config files. > > Is this the best way to find the correct config values? It seems > like there should be a document that talks about these timeouts and > how to set them (or maybe more generally how the different timeout > settings in the various config files interact). Does that document > exist? If not, maybe I could write one, since I have to figure out > the correct values anyway. From alawson at aqorn.com Fri Jan 17 22:44:28 2020 From: alawson at aqorn.com (Adam Peacock) Date: Sat, 18 Jan 2020 04:14:28 +0530 Subject: DR options with openstack In-Reply-To: <20200117183915.gugkawaqx42z6uvs@yuggoth.org> References: <5e201295.1c69fb81.a69b.d77d@mx.google.com> <20200117183915.gugkawaqx42z6uvs@yuggoth.org> Message-ID: How we view OpenStack within our community here is usually vastly different than the majority of enterprises and how they view it. Side note: My biggest gripe with OpenStack leadership is actually that everything is viewed from the lens of a developer which, I feel, is contributing to the plateau/decline in its adoption. That is but that's a topic for another day. Most organizations ( as I've seen anyway) view OpenStack as a product that is compared to other cloud products like vCloud Director/similar. And after 8 years architecting clouds with it, I see it the same way. So I'm not exactly inclined to split hairs with how it is characterized. Bottom line though, ensuring that non-developers are able to easily able to get their questions answered will, in my personal opinion, either promote OpenStack or promote the conception that it requires a team of developers to understand and run which kills any serious consideration in the boardroom. Sorry to the OP, didn't mean to hijack your thread here. :) just raises an important topic th get I see come up over and over. //adam On Sat, Jan 18, 2020, 2:43 AM Jeremy Stanley wrote: > On 2020-01-17 21:24:36 +0530 (+0530), Adam Peacock wrote: > [...] > > Also, we need to be clear not everyone leans towards being a > > developer or even *wants* to go in that direction when using > > OpenStack. In fact, most don't and if there is that expectation by > > those entrenched with the OpenStack product, the OpenStack option > > gets dropped in favor of something else. It's developer-friendly > > but we need to be mega-mega-careful, as a community, to ensure > > development isn't the baseline or assumption for adequate support > > or to get questions answered. Especially since we've converged our > > communication channels. > [...] > > Most users probably won't become developers on OpenStack, but some > will, and I believe its long-term survival depends on that so we > should do everything we can to encourage it. Users may also > contribute in a variety of other ways like bug reporting and triage, > outreach, revising or translating documentation, and so on. > > OpenStack isn't a "product," it's a community software collaboration > on which many companies have built products (either by running it as > a service or selling support for it). Treating the community the way > you might treat a paid vendor is where all of this goes to a bad > place very quickly. > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Jan 17 23:27:25 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 17 Jan 2020 23:27:25 +0000 Subject: DR options with openstack In-Reply-To: References: <5e201295.1c69fb81.a69b.d77d@mx.google.com> <20200117183915.gugkawaqx42z6uvs@yuggoth.org> Message-ID: <20200117232724.iufxvq7owuvhoyoo@yuggoth.org> On 2020-01-18 04:14:28 +0530 (+0530), Adam Peacock wrote: > How we view OpenStack within our community here is usually vastly > different than the majority of enterprises and how they view it. > Side note: My biggest gripe with OpenStack leadership is actually > that everything is viewed from the lens of a developer which, I > feel, is contributing to the plateau/decline in its adoption. That > is but that's a topic for another day. I don't know whether you consider me part of OpenStack leadership, but if it helps, my background is ~30 years as a Unix/Linux sysadmin, data center engineer, security analyst and network architect. I don't have any formal education in software development (or even a University degree). This is the lens with which I view OpenStack. > Most organizations ( as I've seen anyway) view OpenStack as a > product that is compared to other cloud products like vCloud > Director/similar. And after 8 years architecting clouds with it, I > see it the same way. So I'm not exactly inclined to split hairs > with how it is characterized. I used vCloud Director for years, and I don't recall getting it for free nor being provided with access to its source outside an NDA. There also wasn't any way to reach out to the developers for it without a paid service contract (or really even with one most of the time). Sounds like VMware has become a bit more progressive recently? ;) > Bottom line though, ensuring that non-developers are able to > easily able to get their questions answered will, in my personal > opinion, either promote OpenStack or promote the conception that > it requires a team of developers to understand and run which kills > any serious consideration in the boardroom. [...] I wholeheartedly agree with this, and it's basically the point I've been trying to make as well. We need to welcome users and let them ask questions wherever we're all having conversations. Free/libre open source software thrives or withers based on the strength of its user base, not on its technical superiority or novelty. If we don't take every opportunity to accommodate users who engage with us, we're going to have fewer and fewer users... until the day comes when we have none at all. Also as the hype subsides, companies aren't going to throw developer hours at OpenStack just because it looks good in advertisements. We're going to need to learn how to shore up our ranks of developers and maintainers from the only other source available to us: our users. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From tony.pearce at cinglevue.com Sat Jan 18 01:44:16 2020 From: tony.pearce at cinglevue.com (Tony Pearce) Date: Sat, 18 Jan 2020 09:44:16 +0800 Subject: Cinder snapshot delete successful when expected to fail In-Reply-To: References: Message-ID: Thank you. That really helps. I am going to diff the nimble.py files between Pike and Queens and see what's changed. On Fri, 17 Jan 2020, 22:18 Alan Bishop, wrote: > > > On Fri, Jan 17, 2020 at 2:01 AM Tony Pearce > wrote: > >> Could anyone help by pointing me where to go to be able to dig into this >> issue further? >> >> I have installed a test Openstack environment using RDO Packstack. I >> wanted to install the same version that I have in Production (Pike) but >> it's not listed in the CentOS repo via yum search. So I installed Queens. I >> am using nimble.py Cinder driver. Nimble Storage is a storage array >> accessed via iscsi from the Openstack host, and is controlled from >> Openstack by the driver and API. >> >> *What I expected to happen:* >> 1. create an instance with volume (the volume is created on the storage >> array successfully and instance boots from it) >> 2. take a snapshot (snapshot taken on the volume on the array >> successfully) >> 3. create a new instance from the snapshot (the api tells the array to >> clone the snapshot into a new volume on the array and use that volume for >> the instance) >> 4. try and delete the snapshot >> Expected Result - Openstack gives the user a message like "you're not >> allowed to do that". >> >> Note: Step 3 above creates a child volume from the parent snapshot. It's >> impossible to delete the parent snapshot because IO READ is sent to that >> part of the original volume (as I understand it). >> >> *My production problem is this: * >> 1. create an instance with volume (the volume is created on the storage >> array successfully) >> 2. take a snapshot (snapshot taken on the volume on the array >> successfully) >> 3. create a new instance from the snapshot (the api tells the array to >> clone the snapshot into a new volume on the array and use that volume for >> the instance) >> 4. try and delete the snapshot >> Result - snapshot goes into error state and later, all Cinder operations >> fail such as new instance/create volume etc. until the correct service is >> restarted. Then everything works once again. >> >> >> To troubleshoot the above, I installed the RDP Packstack Queens (because >> I couldnt get Pike). I tested the above and now, the result is the snapshot >> is successfully deleted from openstack but not deleted on the array. The >> log is below for reference. But I can see the in the log that the array >> sends back info to openstack saying the snapshot has a clone and the delete >> cannot be done because of that. Also response code 409. >> >> *Some info about why the problem with Pike started in the first place* >> 1. Vendor is Nimble Storage which HPE purchased >> 2. HPE/Nimble have dropped support for openstack. Latest supported >> version is Queens and Nimble array version v4.x. The current Array version >> is v5.x. Nimble say there are no guarantees with openstack, the driver and >> the array version v5.x >> 3. I was previously advised by Nimble that the array version v5.x will >> work fine and so left our DR array on v5.x with a pending upgrade that had >> a blocker due to an issue. This issue was resolved in December and the >> pending upgrade completed to match the DR array took place around 30 days >> ago. >> >> >> With regards to the production issue, I assumed that the array API has >> some changes between v4.x and v5.x and it's causing an issue with Cinder >> due to the API response. Although I have not been able to find out if or >> what changes there are that may have occurred after the array upgrade, as >> the documentation for this is Nimble internal-only. >> >> >> *So with that - some questions if I may:* >> When Openstack got the 409 error response from the API (as seen in the >> log below), why would Openstack then proceed to delete the snapshot on the >> Openstack side? How could I debug this further? I'm not sure what Openstack >> Cinder is acting on in terns of the response as yet. Maybe Openstack is not >> specifically looking for the error code in the response? >> >> The snapshot that got deleted on the openstack side is a problem. Would >> this be related to the driver? Could it be possible that the driver did not >> pass the error response to Cinder? >> > > Hi Tony, > > This is exactly what happened, and it appears to be a driver bug > introduced in queens by [1]. The code in question [2] logs the error, but > fails to propagate the exception. As far as the volume manager is > concerned, the snapshot deletion was successful. > > [1] https://review.opendev.org/601492 > [2] > https://opendev.org/openstack/cinder/src/branch/stable/queens/cinder/volume/drivers/nimble.py#L1815 > > Alan > > Thanks in advance. Just for reference, the log snippet is below. >> >> >> ==> volume.log <== >>> 2020-01-17 16:53:23.718 24723 WARNING py.warnings >>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>> 87e34c89e6fb41d2af25085b64011a55 - default default] >>> /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852: >>> InsecureRequestWarning: Unverified HTTPS request is being made. Adding >>> certificate verification is strongly advised. See: >>> https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings >>> InsecureRequestWarning) >>> : NimbleAPIException: Failed to execute api >>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >>> ==> api.log <== >>> 2020-01-17 16:53:23.769 25242 INFO cinder.api.openstack.wsgi >>> [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814 >>> 87e34c89e6fb41d2af25085b64011a55 - default default] >>> http://192.168.53.45:8776/v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail >>> returned with HTTP 200 >>> 2020-01-17 16:53:23.770 25242 INFO eventlet.wsgi.server >>> [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814 >>> 87e34c89e6fb41d2af25085b64011a55 - default default] 192.168.53.45 "GET >>> /v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail HTTP/1.1" status: 200 >>> len: 4657 time: 0.1152730 >>> ==> volume.log <== >>> 2020-01-17 16:53:23.811 24723 WARNING py.warnings >>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>> 87e34c89e6fb41d2af25085b64011a55 - default default] >>> /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852: >>> InsecureRequestWarning: Unverified HTTPS request is being made. Adding >>> certificate verification is strongly advised. See: >>> https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings >>> InsecureRequestWarning) >>> : NimbleAPIException: Failed to execute api >>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >>> 2020-01-17 16:53:23.902 24723 ERROR cinder.volume.drivers.nimble >>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>> 87e34c89e6fb41d2af25085b64011a55 - default default] Re-throwing Exception >>> Failed to execute api snapshots/0464a5fd65d75fcfe1000000000000011100001a41: >>> Error Code: 409 Message: Snapshot >>> snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.: >>> NimbleAPIException: Failed to execute api >>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >>> 2020-01-17 16:53:23.903 24723 WARNING cinder.volume.drivers.nimble >>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>> 87e34c89e6fb41d2af25085b64011a55 - default default] Snapshot >>> snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 : has a clone: >>> NimbleAPIException: Failed to execute api >>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >>> 2020-01-17 16:53:23.964 24723 WARNING cinder.quota >>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>> 87e34c89e6fb41d2af25085b64011a55 - default default] Deprecated: Default >>> quota for resource: snapshots_Nimble-DR is set by the default quota flag: >>> quota_snapshots_Nimble-DR, it is now deprecated. Please use the default >>> quota class for default quota. >>> 2020-01-17 16:53:24.054 24723 INFO cinder.volume.manager >>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>> 87e34c89e6fb41d2af25085b64011a55 - default default] Delete snapshot >>> completed successfully. >> >> >> >> Regards, >> >> *Tony Pearce* | >> *Senior Network Engineer / Infrastructure Lead**Cinglevue International >> * >> >> Email: tony.pearce at cinglevue.com >> Web: http://www.cinglevue.com >> >> *Australia* >> 1 Walsh Loop, Joondalup, WA 6027 Australia. >> >> Direct: +61 8 6202 0036 | Main: +61 8 6202 0024 >> >> Note: This email and all attachments are the sole property of Cinglevue >> International Pty Ltd. (or any of its subsidiary entities), and the >> information contained herein must be considered confidential, unless >> specified otherwise. If you are not the intended recipient, you must not >> use or forward the information contained in these documents. If you have >> received this message in error, please delete the email and notify the >> sender. >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Sat Jan 18 03:21:37 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 17 Jan 2020 22:21:37 -0500 Subject: DR options with openstack In-Reply-To: <20200117183915.gugkawaqx42z6uvs@yuggoth.org> References: <5e201295.1c69fb81.a69b.d77d@mx.google.com> <20200117183915.gugkawaqx42z6uvs@yuggoth.org> Message-ID: On Fri, Jan 17, 2020 at 1:42 PM Jeremy Stanley wrote: > > On 2020-01-17 21:24:36 +0530 (+0530), Adam Peacock wrote: > [...] > > Also, we need to be clear not everyone leans towards being a > > developer or even *wants* to go in that direction when using > > OpenStack. In fact, most don't and if there is that expectation by > > those entrenched with the OpenStack product, the OpenStack option > > gets dropped in favor of something else. It's developer-friendly > > but we need to be mega-mega-careful, as a community, to ensure > > development isn't the baseline or assumption for adequate support > > or to get questions answered. Especially since we've converged our > > communication channels. > [...] > > Most users probably won't become developers on OpenStack, but some > will, and I believe its long-term survival depends on that so we > should do everything we can to encourage it. Users may also > contribute in a variety of other ways like bug reporting and triage, > outreach, revising or translating documentation, and so on. > > OpenStack isn't a "product," it's a community software collaboration > on which many companies have built products (either by running it as > a service or selling support for it). Treating the community the way > you might treat a paid vendor is where all of this goes to a bad > place very quickly. We've probably strayed a bit far away from the original topic, but I echo this thought very much. OpenStack is a project. $your_favorite_vendor's OpenStack is a product. It's important for us to keep that distinction for the success of both the project and vendors IMHO. > -- > Jeremy Stanley -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. https://vexxhost.com From mnaser at vexxhost.com Sat Jan 18 03:22:25 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 17 Jan 2020 22:22:25 -0500 Subject: Galera config values In-Reply-To: References: Message-ID: On Fri, Jan 17, 2020 at 5:20 PM Albert Braden wrote: > > I’m experimenting with Galera in my Rocky openstack-ansible dev cluster, and I’m finding that the default haproxy config values don’t seem to work. Finding the correct values is a lot of work. For example, I spent this morning experimenting with different values for “timeout client” in /etc/haproxy/haproxy.cfg. The default is 1m, and with the default set I see this error in /var/log/nova/nova-scheduler.log on the controllers: > > > > 2020-01-17 13:54:26.059 443358 ERROR oslo_db.sqlalchemy.engines DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: u'SELECT 1'] (Background on this error at: http://sqlalche.me/e/e3q8) > > > > There are several timeout values in /etc/haproxy/haproxy.cfg. These are the values we started with: > > > > stats timeout 30s > > timeout http-request 10s > > timeout queue 1m > > timeout connect 10s > > timeout client 1m > > timeout server 1m > > timeout check 10s > > > > At first I changed them all to 30m. This stopped the “Lost connection” error in nova-scheduler.log. Then, one at a time, I changed them back to the default. When I got to “timeout client” I found that setting it back to 1m caused the errors to start again. I changed it back and forth and found that 4 minutes causes errors, and 6m stops them, so I left it at 6m. > > > > These are my active variables: > > > > root at us01odc-dev2-ctrl1:/etc/mysql# mysql -e 'show variables;'|grep timeout > > connect_timeout 20 > > deadlock_timeout_long 50000000 > > deadlock_timeout_short 10000 > > delayed_insert_timeout 300 > > idle_readonly_transaction_timeout 0 > > idle_transaction_timeout 0 > > idle_write_transaction_timeout 0 > > innodb_flush_log_at_timeout 1 > > innodb_lock_wait_timeout 50 > > innodb_rollback_on_timeout OFF > > interactive_timeout 28800 > > lock_wait_timeout 86400 > > net_read_timeout 30 > > net_write_timeout 60 > > rpl_semi_sync_master_timeout 10000 > > rpl_semi_sync_slave_kill_conn_timeout 5 > > slave_net_timeout 60 > > thread_pool_idle_timeout 60 > > wait_timeout 3600 > > > > So it looks like the value of “timeout client” in haproxy.cfg needs to match or exceed the value of “wait_timeout” in mysql. Also in nova.conf I see “#connection_recycle_time = 3600” – I need to experiment to see how that value interacts with the timeouts in the other config files. > > > > Is this the best way to find the correct config values? It seems like there should be a document that talks about these timeouts and how to set them (or maybe more generally how the different timeout settings in the various config files interact). Does that document exist? If not, maybe I could write one, since I have to figure out the correct values anyway. Is your cluster pretty idle? I've never seen that happen in any environments before... -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. https://vexxhost.com From eandersson at blizzard.com Sat Jan 18 03:36:55 2020 From: eandersson at blizzard.com (Erik Olof Gunnar Andersson) Date: Sat, 18 Jan 2020 03:36:55 +0000 Subject: Galera config values In-Reply-To: References: , Message-ID: <4E49E11B-83FA-4016-9FA1-30CDC377825C@blizzard.com> I can share our haproxt settings on monday, but you need to make sure that haproxy to at least match the Oslo config which I believe is 3600s, but I think in theory something like keepalived is better for galerara. btw pretty sure both client and server needs 3600s. Basically openstack recycles the connection every hour by default. So you need to make sure that haproxy does not close it before that if it’s idle. Sent from my iPhone > On Jan 17, 2020, at 7:24 PM, Mohammed Naser wrote: > > On Fri, Jan 17, 2020 at 5:20 PM Albert Braden > wrote: >> >> I’m experimenting with Galera in my Rocky openstack-ansible dev cluster, and I’m finding that the default haproxy config values don’t seem to work. Finding the correct values is a lot of work. For example, I spent this morning experimenting with different values for “timeout client” in /etc/haproxy/haproxy.cfg. The default is 1m, and with the default set I see this error in /var/log/nova/nova-scheduler.log on the controllers: >> >> >> >> 2020-01-17 13:54:26.059 443358 ERROR oslo_db.sqlalchemy.engines DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: u'SELECT 1'] (Background on this error at: https://urldefense.com/v3/__http://sqlalche.me/e/e3q8__;!!Ci6f514n9QsL8ck!39gvi32Ldv9W8zhZ_P1JLvkOFM-PelyP_RrU_rT5_EuELR24fLO5P3ShvZ56jfcQ7g$ ) >> >> >> >> There are several timeout values in /etc/haproxy/haproxy.cfg. These are the values we started with: >> >> >> >> stats timeout 30s >> >> timeout http-request 10s >> >> timeout queue 1m >> >> timeout connect 10s >> >> timeout client 1m >> >> timeout server 1m >> >> timeout check 10s >> >> >> >> At first I changed them all to 30m. This stopped the “Lost connection” error in nova-scheduler.log. Then, one at a time, I changed them back to the default. When I got to “timeout client” I found that setting it back to 1m caused the errors to start again. I changed it back and forth and found that 4 minutes causes errors, and 6m stops them, so I left it at 6m. >> >> >> >> These are my active variables: >> >> >> >> root at us01odc-dev2-ctrl1:/etc/mysql# mysql -e 'show variables;'|grep timeout >> >> connect_timeout 20 >> >> deadlock_timeout_long 50000000 >> >> deadlock_timeout_short 10000 >> >> delayed_insert_timeout 300 >> >> idle_readonly_transaction_timeout 0 >> >> idle_transaction_timeout 0 >> >> idle_write_transaction_timeout 0 >> >> innodb_flush_log_at_timeout 1 >> >> innodb_lock_wait_timeout 50 >> >> innodb_rollback_on_timeout OFF >> >> interactive_timeout 28800 >> >> lock_wait_timeout 86400 >> >> net_read_timeout 30 >> >> net_write_timeout 60 >> >> rpl_semi_sync_master_timeout 10000 >> >> rpl_semi_sync_slave_kill_conn_timeout 5 >> >> slave_net_timeout 60 >> >> thread_pool_idle_timeout 60 >> >> wait_timeout 3600 >> >> >> >> So it looks like the value of “timeout client” in haproxy.cfg needs to match or exceed the value of “wait_timeout” in mysql. Also in nova.conf I see “#connection_recycle_time = 3600” – I need to experiment to see how that value interacts with the timeouts in the other config files. >> >> >> >> Is this the best way to find the correct config values? It seems like there should be a document that talks about these timeouts and how to set them (or maybe more generally how the different timeout settings in the various config files interact). Does that document exist? If not, maybe I could write one, since I have to figure out the correct values anyway. > > Is your cluster pretty idle? I've never seen that happen in any > environments before... > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. https://urldefense.com/v3/__https://vexxhost.com__;!!Ci6f514n9QsL8ck!39gvi32Ldv9W8zhZ_P1JLvkOFM-PelyP_RrU_rT5_EuELR24fLO5P3ShvZ4PDThJbg$ > From mnaser at vexxhost.com Sat Jan 18 03:40:01 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 17 Jan 2020 22:40:01 -0500 Subject: Galera config values In-Reply-To: <4E49E11B-83FA-4016-9FA1-30CDC377825C@blizzard.com> References: <4E49E11B-83FA-4016-9FA1-30CDC377825C@blizzard.com> Message-ID: On Fri, Jan 17, 2020 at 10:37 PM Erik Olof Gunnar Andersson wrote: > > I can share our haproxt settings on monday, but you need to make sure that haproxy to at least match the Oslo config which I believe is 3600s, but I think in theory something like keepalived is better for galerara. > > btw pretty sure both client and server needs 3600s. Basically openstack recycles the connection every hour by default. So you need to make sure that haproxy does not close it before that if it’s idle. Indeed, this adds up to what we do in OSA https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/haproxy/haproxy.yml#L48-L49 > Sent from my iPhone > > > On Jan 17, 2020, at 7:24 PM, Mohammed Naser wrote: > > > > On Fri, Jan 17, 2020 at 5:20 PM Albert Braden > > wrote: > >> > >> I’m experimenting with Galera in my Rocky openstack-ansible dev cluster, and I’m finding that the default haproxy config values don’t seem to work. Finding the correct values is a lot of work. For example, I spent this morning experimenting with different values for “timeout client” in /etc/haproxy/haproxy.cfg. The default is 1m, and with the default set I see this error in /var/log/nova/nova-scheduler.log on the controllers: > >> > >> > >> > >> 2020-01-17 13:54:26.059 443358 ERROR oslo_db.sqlalchemy.engines DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: u'SELECT 1'] (Background on this error at: https://urldefense.com/v3/__http://sqlalche.me/e/e3q8__;!!Ci6f514n9QsL8ck!39gvi32Ldv9W8zhZ_P1JLvkOFM-PelyP_RrU_rT5_EuELR24fLO5P3ShvZ56jfcQ7g$ ) > >> > >> > >> > >> There are several timeout values in /etc/haproxy/haproxy.cfg. These are the values we started with: > >> > >> > >> > >> stats timeout 30s > >> > >> timeout http-request 10s > >> > >> timeout queue 1m > >> > >> timeout connect 10s > >> > >> timeout client 1m > >> > >> timeout server 1m > >> > >> timeout check 10s > >> > >> > >> > >> At first I changed them all to 30m. This stopped the “Lost connection” error in nova-scheduler.log. Then, one at a time, I changed them back to the default. When I got to “timeout client” I found that setting it back to 1m caused the errors to start again. I changed it back and forth and found that 4 minutes causes errors, and 6m stops them, so I left it at 6m. > >> > >> > >> > >> These are my active variables: > >> > >> > >> > >> root at us01odc-dev2-ctrl1:/etc/mysql# mysql -e 'show variables;'|grep timeout > >> > >> connect_timeout 20 > >> > >> deadlock_timeout_long 50000000 > >> > >> deadlock_timeout_short 10000 > >> > >> delayed_insert_timeout 300 > >> > >> idle_readonly_transaction_timeout 0 > >> > >> idle_transaction_timeout 0 > >> > >> idle_write_transaction_timeout 0 > >> > >> innodb_flush_log_at_timeout 1 > >> > >> innodb_lock_wait_timeout 50 > >> > >> innodb_rollback_on_timeout OFF > >> > >> interactive_timeout 28800 > >> > >> lock_wait_timeout 86400 > >> > >> net_read_timeout 30 > >> > >> net_write_timeout 60 > >> > >> rpl_semi_sync_master_timeout 10000 > >> > >> rpl_semi_sync_slave_kill_conn_timeout 5 > >> > >> slave_net_timeout 60 > >> > >> thread_pool_idle_timeout 60 > >> > >> wait_timeout 3600 > >> > >> > >> > >> So it looks like the value of “timeout client” in haproxy.cfg needs to match or exceed the value of “wait_timeout” in mysql. Also in nova.conf I see “#connection_recycle_time = 3600” – I need to experiment to see how that value interacts with the timeouts in the other config files. > >> > >> > >> > >> Is this the best way to find the correct config values? It seems like there should be a document that talks about these timeouts and how to set them (or maybe more generally how the different timeout settings in the various config files interact). Does that document exist? If not, maybe I could write one, since I have to figure out the correct values anyway. > > > > Is your cluster pretty idle? I've never seen that happen in any > > environments before... > > > > -- > > Mohammed Naser — vexxhost > > ----------------------------------------------------- > > D. 514-316-8872 > > D. 800-910-1726 ext. 200 > > E. mnaser at vexxhost.com > > W. https://urldefense.com/v3/__https://vexxhost.com__;!!Ci6f514n9QsL8ck!39gvi32Ldv9W8zhZ_P1JLvkOFM-PelyP_RrU_rT5_EuELR24fLO5P3ShvZ4PDThJbg$ > > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. https://vexxhost.com From tony.pearce at cinglevue.com Sat Jan 18 06:48:36 2020 From: tony.pearce at cinglevue.com (Tony Pearce) Date: Sat, 18 Jan 2020 14:48:36 +0800 Subject: DR options with openstack In-Reply-To: References: <5e201295.1c69fb81.a69b.d77d@mx.google.com> <20200117183915.gugkawaqx42z6uvs@yuggoth.org> Message-ID: So if I understand correctly, this says to me that Openstack is never intended to be consumed by end users. Is this correct? Regards On Sat, 18 Jan 2020, 11:28 Mohammed Naser, wrote: > On Fri, Jan 17, 2020 at 1:42 PM Jeremy Stanley wrote: > > > > On 2020-01-17 21:24:36 +0530 (+0530), Adam Peacock wrote: > > [...] > > > Also, we need to be clear not everyone leans towards being a > > > developer or even *wants* to go in that direction when using > > > OpenStack. In fact, most don't and if there is that expectation by > > > those entrenched with the OpenStack product, the OpenStack option > > > gets dropped in favor of something else. It's developer-friendly > > > but we need to be mega-mega-careful, as a community, to ensure > > > development isn't the baseline or assumption for adequate support > > > or to get questions answered. Especially since we've converged our > > > communication channels. > > [...] > > > > Most users probably won't become developers on OpenStack, but some > > will, and I believe its long-term survival depends on that so we > > should do everything we can to encourage it. Users may also > > contribute in a variety of other ways like bug reporting and triage, > > outreach, revising or translating documentation, and so on. > > > > OpenStack isn't a "product," it's a community software collaboration > > on which many companies have built products (either by running it as > > a service or selling support for it). Treating the community the way > > you might treat a paid vendor is where all of this goes to a bad > > place very quickly. > > We've probably strayed a bit far away from the original topic, but I > echo this thought very much. > > OpenStack is a project. $your_favorite_vendor's OpenStack is a > product. It's important for us to keep that distinction for the success > of both the project and vendors IMHO. > > > -- > > Jeremy Stanley > > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. https://vexxhost.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Sat Jan 18 15:02:27 2020 From: smooney at redhat.com (Sean Mooney) Date: Sat, 18 Jan 2020 15:02:27 +0000 Subject: DR options with openstack In-Reply-To: References: <5e201295.1c69fb81.a69b.d77d@mx.google.com> <20200117183915.gugkawaqx42z6uvs@yuggoth.org> Message-ID: <7bb7149f55e643615eb0767b37d33408e14b41df.camel@redhat.com> On Sat, 2020-01-18 at 14:48 +0800, Tony Pearce wrote: > So if I understand correctly, this says to me that Openstack is never > intended to be consumed by end users. > > Is this correct? no there are many end user that deploy there openstack directly from source or using comunity installer project. as an opensouce project developer of the project will try to help our end user fix there own problems by explain how things work and pointing them in the right direction. If a user reports a ligitimate bug we will try to fix it eventually when our day job allows. But the same way the mysql develpers wont provdie 1:1 support for tuneing your db deployment for your workload the openstack communtiy does nto provide deployment planning or day 2 operation supprot to customers. we obviously try to make day 2 operation eaiser and if user report pain point we take that on board but if you choose to deploy openstack directly from source or community distributions without a vendor then you are relying on the good will of indiviuals. i and other do help end users when they ask genuine questions providing links ot the relevnet docs or pointing them to config optins or presentaiton on the topic they ask about. we will also somtime help diagnose problem the people are having, but if that good will is abused then i will go back to my day job. if you show up and demand that we drop everything and fix your problem now you will likely have a negitive experience. unlike a vendor relationship we dont have a professional/paid relationship with our end users, on the other hand if you show up, get involved and help other where you can then i think you will find most people in the comunity will try to help you too when you need it. that is one of the big cultural difference between an opensouce project and a support product. a project is a comunity of people working together to advance a common goal. a product on the ohter hand has a businesses promise of support and with that an expectation that your vendor will go beyond good will if your business is impacted by an issue with there product. if you chose to deploy openstack as an end user understand that while we try to make that easy, the learning curvy is high and you need to have the right skills to make it a success but you can certenly do that if you invest the time and peopel to do it. kolla-ansible and openstack ansible provide two of the simplest comunity installer for managing openstack. as installer projects there focus is on day 1 and day 2 operations and tend to have more operators involved the component projects. while you can role your own they have already centralised the knoladge of may operators in the solutions they have developed so i would recommend reaching out to them to learn how you can deploy openstack your self. if you want a product as other said then you should reach out to your vendor of choice and they will help you with both planning your deployment and keeping it running. > > Regards > > On Sat, 18 Jan 2020, 11:28 Mohammed Naser, wrote: > > > On Fri, Jan 17, 2020 at 1:42 PM Jeremy Stanley wrote: > > > > > > On 2020-01-17 21:24:36 +0530 (+0530), Adam Peacock wrote: > > > [...] > > > > Also, we need to be clear not everyone leans towards being a > > > > developer or even *wants* to go in that direction when using > > > > OpenStack. In fact, most don't and if there is that expectation by > > > > those entrenched with the OpenStack product, the OpenStack option > > > > gets dropped in favor of something else. It's developer-friendly > > > > but we need to be mega-mega-careful, as a community, to ensure > > > > development isn't the baseline or assumption for adequate support > > > > or to get questions answered. Especially since we've converged our > > > > communication channels. > > > > > > [...] > > > > > > Most users probably won't become developers on OpenStack, but some > > > will, and I believe its long-term survival depends on that so we > > > should do everything we can to encourage it. Users may also > > > contribute in a variety of other ways like bug reporting and triage, > > > outreach, revising or translating documentation, and so on. > > > > > > OpenStack isn't a "product," it's a community software collaboration > > > on which many companies have built products (either by running it as > > > a service or selling support for it). Treating the community the way > > > you might treat a paid vendor is where all of this goes to a bad > > > place very quickly. > > > > We've probably strayed a bit far away from the original topic, but I > > echo this thought very much. > > > > OpenStack is a project. $your_favorite_vendor's OpenStack is a > > product. It's important for us to keep that distinction for the success > > of both the project and vendors IMHO. > > > > > -- > > > Jeremy Stanley > > > > > > > > -- > > Mohammed Naser — vexxhost > > ----------------------------------------------------- > > D. 514-316-8872 > > D. 800-910-1726 ext. 200 > > E. mnaser at vexxhost.com > > W. https://vexxhost.com > > > > From fungi at yuggoth.org Sat Jan 18 17:09:28 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sat, 18 Jan 2020 17:09:28 +0000 Subject: DR options with openstack In-Reply-To: References: <5e201295.1c69fb81.a69b.d77d@mx.google.com> <20200117183915.gugkawaqx42z6uvs@yuggoth.org> Message-ID: <20200118170928.efyjrjsqmzsrbigz@yuggoth.org> On 2020-01-18 14:48:36 +0800 (+0800), Tony Pearce wrote: > So if I understand correctly, this says to me that Openstack is > never intended to be consumed by end users. [...] I have no idea how you got that from my message (elided since I can't be bothered to fix your top-posting[*] right now). It's also unclear to me which definition of "end users" you're applying. For these purposes I lump people who install/manage OpenStack deployments and people who interact with OpenStack deployments together, though the latter have an established relationship with the former and that's generally where their recommended support channels lie. End users consume the Linux kernel. Where do they go for support when they have a problem with it? End users consume the bash shell. Where do they go for support with that? You can totally build and install those things yourself from source. When you do that you 1. are assumed to be a somewhat advanced user and 2. might consider reaching out to their developers and other advanced users for them when you run into an issue. They may have time to help you, or they may not. You don't have a paid support contract with them, so while they're likely to try and help you out if they can, they're certainly under no obligation. You can also get those things ready-to-use from various places, and can even buy support for them from someone who *is* obligated to help you. Which path you choose is up to you. [*] https://wiki.openstack.org/wiki/MailingListEtiquette -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From tony.pearce at cinglevue.com Sun Jan 19 04:52:25 2020 From: tony.pearce at cinglevue.com (Tony Pearce) Date: Sun, 19 Jan 2020 12:52:25 +0800 Subject: DR options with openstack In-Reply-To: <20200118170928.efyjrjsqmzsrbigz@yuggoth.org> References: <5e201295.1c69fb81.a69b.d77d@mx.google.com> <20200117183915.gugkawaqx42z6uvs@yuggoth.org> <20200118170928.efyjrjsqmzsrbigz@yuggoth.org> Message-ID: Thanks guys for clarifying 🙂 My question was in reply to Mohammed Naser's email. Sorry for the confusion. On Sun, 19 Jan 2020, 01:17 Jeremy Stanley, wrote: > On 2020-01-18 14:48:36 +0800 (+0800), Tony Pearce wrote: > > So if I understand correctly, this says to me that Openstack is > > never intended to be consumed by end users. > [...] > > I have no idea how you got that from my message (elided since I > can't be bothered to fix your top-posting[*] right now). It's also > unclear to me which definition of "end users" you're applying. For > these purposes I lump people who install/manage OpenStack > deployments and people who interact with OpenStack deployments > together, though the latter have an established relationship with > the former and that's generally where their recommended support > channels lie. > > End users consume the Linux kernel. Where do they go for support > when they have a problem with it? End users consume the bash shell. > Where do they go for support with that? You can totally build and > install those things yourself from source. When you do that you 1. > are assumed to be a somewhat advanced user and 2. might consider > reaching out to their developers and other advanced users for them > when you run into an issue. They may have time to help you, or they > may not. You don't have a paid support contract with them, so while > they're likely to try and help you out if they can, they're > certainly under no obligation. > > You can also get those things ready-to-use from various places, and > can even buy support for them from someone who *is* obligated to > help you. Which path you choose is up to you. > > [*] https://wiki.openstack.org/wiki/MailingListEtiquette > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Sun Jan 19 10:48:34 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Sun, 19 Jan 2020 11:48:34 +0100 Subject: [nova] Nova CI busted, please hold rechecks In-Reply-To: <4f03483c-3702-b71f-baca-43585096ca10@fried.cc> References: <4f03483c-3702-b71f-baca-43585096ca10@fried.cc> Message-ID: DevStack unblocked the gate by reverting recent changes. It's green now. Re-proposals of reverted changes will be tested against the could-be-faulty job. Seems we are hitting some odd situation with glance+swift when doing double cirros image upload. All details in bug report mentioned by Eric. -yoctozepto pt., 17 sty 2020 o 16:31 Eric Fried napisał(a): > > The nova-live-migration job is failing 100% since yesterday morning [1]. > Your rechecks won't work until that's resolved. I'll send an all-clear > message when we're green again. > > Thanks, > efried > > [1] https://bugs.launchpad.net/nova/+bug/1860021 > > From gmann at ghanshyammann.com Sun Jan 19 14:34:28 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sun, 19 Jan 2020 08:34:28 -0600 Subject: [nova] Nova CI busted, please hold rechecks In-Reply-To: References: <4f03483c-3702-b71f-baca-43585096ca10@fried.cc> Message-ID: <16fbe39fa9a.12bb88a2472227.6200009198294328118@ghanshyammann.com> now nova-next job is not so happy for multiple network case. Wait for the below patch to merge before recheck. - https://review.opendev.org/#/c/702553/ -gmann ---- On Sun, 19 Jan 2020 04:48:34 -0600 Radosław Piliszek wrote ---- > DevStack unblocked the gate by reverting recent changes. > It's green now. > > Re-proposals of reverted changes will be tested against the could-be-faulty job. > > Seems we are hitting some odd situation with glance+swift when doing > double cirros image upload. > All details in bug report mentioned by Eric. > > -yoctozepto > > pt., 17 sty 2020 o 16:31 Eric Fried napisał(a): > > > > The nova-live-migration job is failing 100% since yesterday morning [1]. > > Your rechecks won't work until that's resolved. I'll send an all-clear > > message when we're green again. > > > > Thanks, > > efried > > > > [1] https://bugs.launchpad.net/nova/+bug/1860021 > > > > > > From gmann at ghanshyammann.com Sun Jan 19 17:04:58 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sun, 19 Jan 2020 11:04:58 -0600 Subject: [nova] Nova CI busted, please hold rechecks In-Reply-To: <16fbe39fa9a.12bb88a2472227.6200009198294328118@ghanshyammann.com> References: <4f03483c-3702-b71f-baca-43585096ca10@fried.cc> <16fbe39fa9a.12bb88a2472227.6200009198294328118@ghanshyammann.com> Message-ID: <16fbec3c63d.1084d27af73514.1920681773096468265@ghanshyammann.com> ---- On Sun, 19 Jan 2020 08:34:28 -0600 Ghanshyam Mann wrote ---- > now nova-next job is not so happy for multiple network case. > > Wait for the below patch to merge before recheck. > - https://review.opendev.org/#/c/702553/ This is also merged and you are good to recheck. -gmann > > -gmann > > > ---- On Sun, 19 Jan 2020 04:48:34 -0600 Radosław Piliszek wrote ---- > > DevStack unblocked the gate by reverting recent changes. > > It's green now. > > > > Re-proposals of reverted changes will be tested against the could-be-faulty job. > > > > Seems we are hitting some odd situation with glance+swift when doing > > double cirros image upload. > > All details in bug report mentioned by Eric. > > > > -yoctozepto > > > > pt., 17 sty 2020 o 16:31 Eric Fried napisał(a): > > > > > > The nova-live-migration job is failing 100% since yesterday morning [1]. > > > Your rechecks won't work until that's resolved. I'll send an all-clear > > > message when we're green again. > > > > > > Thanks, > > > efried > > > > > > [1] https://bugs.launchpad.net/nova/+bug/1860021 > > > > > > > > > > > > > From madhuri.kumari at intel.com Mon Jan 20 05:58:36 2020 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Mon, 20 Jan 2020 05:58:36 +0000 Subject: [ironic][nova][neutron][cloud-init] Infiniband Support in OpenStack In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA61A5528B0@BGSMSX102.gar.corp.intel.com> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA61A554754@BGSMSX102.gar.corp.intel.com> Hi Clark, Thank you for your response. I think the infiniband MAC address should be mac[36:-14] + mac[51:] as suggested here[1]. I have specified the right MAC address as per this but it still fails. Please see the following output: ib0 interface details: 4: ib0: mtu 4092 qdisc noop state DOWN group default qlen 256 link/infiniband 80:00:00:02:fe:80:00:00:00:00:00:00:00:11:75:01:01:67:0f:b9 brd 00:ff:ff:ff:ff:12:40:1b:80:00:00:00:00:00:00:00:ff:ff:ff:ff Ironic port details: +-----------------------+----------------------------------------------------------------+ | Field | Value | +-----------------------+----------------------------------------------------------------+ | address | 00:11:75:67:0f:b9 | | created_at | 2020-01-16T08:24:15+00:00 | | extra | {'client-id': '0xfe800000000000000011750101670fb9'} | | internal_info | {'tenant_vif_port_id': 'c71b2b31-4231-423c-b512-962623220ddf'} | | is_smartnic | False | | local_link_connection | {} | | node_uuid | 05cce921-931f-4755-ad87-fc41d79a8988 | | physical_network | None | | portgroup_uuid | None | | pxe_enabled | False | | updated_at | 2020-01-16T08:59:47+00:00 | | uuid | 9921139a-63cc-4456-8e85-f7673c5c2b3b | +-----------------------+----------------------------------------------------------------+ [1] https://github.com/canonical/cloud-init/blob/9bfb2ba7268e2c3c932023fc3d3020cdc6d6cc18/cloudinit/net/__init__.py#L795 >>-----Original Message----- >>From: Clark Boylan >>Sent: Saturday, January 18, 2020 3:41 AM >>To: openstack-discuss at lists.openstack.org >>Subject: Re: [ironic][nova][neutron][cloud-init] Infiniband Support in >>OpenStack >> >>On Fri, Jan 17, 2020, at 7:31 AM, Kumari, Madhuri wrote: >>> >>> Hi, >>> >>> >>> I am trying to deploy a node with infiniband in Ironic without any success. >>> >>> >>> The node has two interfaces, eth0 and ib0. The deployment is >>> successful, node becomes active but is not reachable. I debugged and >>> checked that the issue is with cloud-init. The cloud-init fails to >>> configure the network interfaces on the node complaining that the MAC >>> address of infiniband port(ib0) is not known to the node. Ironic >>> provides a fake MAC address for infiniband ports and cloud-init is >>> supposed to generate the actual MAC address of infiband ports[1]. But >>> it fails[2] before reaching there. >> >>Reading the cloud-init code [4][5] it appears that the ethernet format MAC >>should match bytes 13-15 + 18-20 of the infiniband address. Is the problem >>here that the fake MAC supplied is unrelated to the actual infiniband >>address? If so I think you'll either need cloud-init to ignore unknown >>interfaces (as proposed in the cloud-init bug), or have Ironic supply the mac >>address as bytes 13-15 + 18-20 of the actual infiniband address. >> >>> >>> I have posted the issue in cloud-init[3] as well. >>> >>> >>> Can someone please help me with this issue? How do we specify >>> “TYPE=InfiniBand” from OpenStack? Currently the type sent is “phy” only. >>> >>> >>> [1] >>> https://github.com/canonical/cloud-init/blob/master/cloudinit/sources/ >>> helpers/openstack.py#L686 >>> >>> [2] >>> https://github.com/canonical/cloud-init/blob/master/cloudinit/sources/ >>> helpers/openstack.py#L677 >>> >>> [3] https://bugs.launchpad.net/cloud-init/+bug/1857031 >> >>[4] https://github.com/canonical/cloud- >>init/blob/9bfb2ba7268e2c3c932023fc3d3020cdc6d6cc18/cloudinit/net/__init >>__.py#L793-L795 >>[5] https://github.com/canonical/cloud- >>init/blob/9bfb2ba7268e2c3c932023fc3d3020cdc6d6cc18/cloudinit/net/__init >>__.py#L844-L846 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kiseok7 at gmail.com Mon Jan 20 06:47:49 2020 From: kiseok7 at gmail.com (Kim KS) Date: Mon, 20 Jan 2020 15:47:49 +0900 Subject: [nova] I would like to add another option for cross_az_attach Message-ID: Hello all, In nova with setting [cinder]/ cross_az_attach option to false, nova creates instance and volume in same AZ. but some of usecase (in my case), we need to attach new volume in different AZ to the instance. so I need two options. one is for nova block device mapping and attaching volume and another is for attaching volume in specified AZ. [cinder] cross_az_attach = False enable_az_attach_list = AZ1,AZ2 how do you all think of it? Best, Kiseok From mark at stackhpc.com Mon Jan 20 08:39:43 2020 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 20 Jan 2020 08:39:43 +0000 Subject: [ironic][nova][neutron][cloud-init] Infiniband Support in OpenStack In-Reply-To: <0512CBBECA36994BAA14C7FEDE986CA61A5528B0@BGSMSX102.gar.corp.intel.com> References: <0512CBBECA36994BAA14C7FEDE986CA61A5528B0@BGSMSX102.gar.corp.intel.com> Message-ID: On Fri, 17 Jan 2020 at 15:34, Kumari, Madhuri wrote: > > Hi, > > > > I am trying to deploy a node with infiniband in Ironic without any success. > > > > The node has two interfaces, eth0 and ib0. The deployment is successful, node becomes active but is not reachable. I debugged and checked that the issue is with cloud-init. The cloud-init fails to configure the network interfaces on the node complaining that the MAC address of infiniband port(ib0) is not known to the node. Ironic provides a fake MAC address for infiniband ports and cloud-init is supposed to generate the actual MAC address of infiband ports[1]. But it fails[2] before reaching there. > > I have posted the issue in cloud-init[3] as well. > > > > Can someone please help me with this issue? How do we specify “TYPE=InfiniBand” from OpenStack? Currently the type sent is “phy” only. > > > > [1] https://github.com/canonical/cloud-init/blob/master/cloudinit/sources/helpers/openstack.py#L686 > > [2] https://github.com/canonical/cloud-init/blob/master/cloudinit/sources/helpers/openstack.py#L677 > > [3] https://bugs.launchpad.net/cloud-init/+bug/1857031 > Hi Madhuri, Please see my blog post: https://www.stackhpc.com/bare-metal-infiniband.html. One major question to ask is whether you want shared IB network or multi-tenant isolation. The latter is significantly more challenging. It's probably best if you read that article and raise any further questions here or IRC. I'll be out of the office until Wednesday. Mark > > > > > Regards, > > Madhuri > > From tony.pearce at cinglevue.com Mon Jan 20 08:57:19 2020 From: tony.pearce at cinglevue.com (Tony Pearce) Date: Mon, 20 Jan 2020 16:57:19 +0800 Subject: Cinder snapshot delete successful when expected to fail In-Reply-To: References: Message-ID: Hi all, I made some progress on this, but I am unsure how to make it better. Long story short: - queens issue = snapshot is deleted from openstack but shouldnt be because the snapshot is unable to be deleted on the storage side - compared pike / queens / stein "nimble.py" and found all are different, with each newer version of openstack having additions in code - done some trial and error tests and decided to use the stein nimble.py and modify it - found the 'delete snapshot' section and found it is set to not halt on error - added 'raise' into the function and re-tested the 'delete snapshot' scenario - RESULT = now the snapshot is NOT deleted but goes "error deleting" instead :) So now after making that change, the snapshot is now in an unavailable status. I am looking as to how I can do something else other than make this snapshot go into an unavailable condition. Such as display a message while keeping the snapshot "available" because it can still be used Short story long: The "nimble.py" driver changes between pike,queens,stein versions (though within the file it has "driver version 4.0.1" on all). Pike has around 1700 lines. Queens has 1900 and Stein has 1910 approx. I confirmed the issue with the driver by copying the nimble.py driver (and the other 2 files named nimble.pyc and nimble.pyo) from Pike into the Queens test env. to test if the snapshot still gets deleted under Queens or shows an error instead. The snapshot was not deleted and it goes error status as expected. note: Initially, I only copied the text data from nimble.py and it appears as though the update to the text file was ignored. It looks to me like, openstack uses one of those .pyc or .pyo files instead. I googled on this and they are binaries that are used in some situations. If I make any changes to the nimble.py file then I need to re-generate those .pyc and .pyo files from the .py. So what is happening here is; I want to try and delete a snapshot that has a clone. The expected outcome is the snapshot is not deleted in Openstack. Current experience is that Openstack deletes the snapshot from the volume snapshots, leaving the snapshot behind on the array storage side. In the volume.log, I see the array sends back an error 409 with "has a clone" response. I managed to find which section is printing the error in the volume.log from the nimble.py driver file and so I edited the text section that gets printed and re-run the test. The volume.log now gets printed with the new text additions I added 'DELETE' and 'Response' words: : NimbleAPIException: DELETE Failed to execute api snapshots/0464a5fd65d75fcfe1000000000000011d00001b1d: Response Error Code: 409 Message: Snapshot snapshot-4ee076ad-2e14-4d0d-bc20-64c17c741f8c for volume volume-7dd64cf1-1d56-4f21-a153-a8137b68c557 has a clone. This is the python code where I made those changes: basically, because the error code is not 200 or 201 then it throws the error from what I understand. > def delete(self, api): > url = self.uri + api > r = requests.delete(url, headers=self.headers, verify=self.verify) > if r.status_code != 201 and r.status_code != 200: > base = "DELETE Failed to execute api %(api)s: Response Error > Code: %(code)s" % { > 'api': api, > 'code': r.status_code} > LOG.debug("Base error : %(base)s", {'base': base}) > try: > msg = _("%(base)s Message: %(msg)s") % { > 'base': base, > 'msg': r.json()['messages'][1]['text']} > except IndexError: > msg = _("%(base)s Message: %(msg)s") % { > 'base': base, > 'msg': six.text_type(r.json())} > raise NimbleAPIException(msg) > return r.json() However, slightly before the above within nimble.py I think I have found the function that is causing the initial problem: def delete_snap(self, volume_name, snap_name): > snap_info = self.get_snap_info(snap_name, volume_name) > api = "snapshots/" + six.text_type(snap_info['id']) > try: > self.delete(api) > except NimbleAPIException as ex: > LOG.debug("delete snapshot exception: %s", ex) > if SM_OBJ_HAS_CLONE in six.text_type(ex): > # if snap has a clone log the error and continue ahead > LOG.warning('Snapshot %(snap)s : %(state)s', > {'snap': snap_name, > 'state': SM_OBJ_HAS_CLONE}) > else: > raise SM_OBJ_HAS_CLONE is looking for "has a clone" and it's defined in the beginning of the file: "SM_OBJ_HAS_CLONE = "has a clone"" and I can see this in the log file "has a clone" as a response 409. My problem is literally " # if snap has a clone log the error and continue ahead" - it shouldnt be continuing, because by continuing it is deleting the snapshot on the Openstack side but is unable to do the same on the storage side because of the dependency issue. So what I did next was to look into the different "delete volume" section for some help - because a similar condition can occur there -> to explain; if volume2 is a clone of volume1 then we can't delete volume1 until we first delete volume2. What I notice is that there is a "raise" in that section at the end - I think I understand this to be throwing an exception to openstack. ie to cause an error condition. Here's the delete volume section from the driver: def delete_volume(self, volume): """Delete the specified volume.""" backup_snap_name, backup_vol_name = self .is_volume_backup_clone(volume) eventlet.sleep(DEFAULT_SLEEP) self.APIExecutor.online_vol(volume['name'], False) LOG.debug("Deleting volume %(vol)s", {'vol': volume['name']}) @utils.retry(NimbleAPIException, retries=3) def _retry_remove_vol(volume): self.APIExecutor.delete_vol(volume['name']) try: _retry_remove_vol(volume) except NimbleAPIException as ex: LOG.debug("delete volume exception: %s", ex) if SM_OBJ_HAS_CLONE in six.text_type(ex): LOG.warning('Volume %(vol)s : %(state)s', {'vol': volume['name'], 'state': SM_OBJ_HAS_CLONE}) # set the volume back to be online and raise busy exception self.APIExecutor.online_vol(volume['name'], True) raise exception.VolumeIsBusy(volume_name=volume['name']) raise So with the above, I modified the delete snapshot section and put in a simple "raise" like this (highlighted in yellow) > > def delete_snap(self, volume_name, snap_name): > snap_info = self.get_snap_info(snap_name, volume_name) > api = "snapshots/" + six.text_type(snap_info['id']) > try: > self.delete(api) > except NimbleAPIException as ex: > LOG.debug("delete snapshot exception: %s", ex) > if SM_OBJ_HAS_CLONE in six.text_type(ex): > # if snap has a clone log the error and continue ahead > LOG.warning('Snapshot %(snap)s : %(state)s', > {'snap': snap_name, > 'state': SM_OBJ_HAS_CLONE}) > raise > else: > raise And now when I test, the snapshot is not deleted but it instead goes into ERROR-DELETING. It's not perfect but at least I can make the snapshot back to "available" from the admin section within Openstack. Would anyone be able to if possible, give me some pointers how to accept this error but not cause the snapshot to go into "error" ? I think that I need to create a class? regards *Tony Pearce* | *Senior Network Engineer / Infrastructure Lead**Cinglevue International * Email: tony.pearce at cinglevue.com Web: http://www.cinglevue.com *Australia* 1 Walsh Loop, Joondalup, WA 6027 Australia. Direct: +61 8 6202 0036 | Main: +61 8 6202 0024 Note: This email and all attachments are the sole property of Cinglevue International Pty Ltd. (or any of its subsidiary entities), and the information contained herein must be considered confidential, unless specified otherwise. If you are not the intended recipient, you must not use or forward the information contained in these documents. If you have received this message in error, please delete the email and notify the sender. On Sat, 18 Jan 2020 at 09:44, Tony Pearce wrote: > Thank you. That really helps. > > I am going to diff the nimble.py files between Pike and Queens and see > what's changed. > > On Fri, 17 Jan 2020, 22:18 Alan Bishop, wrote: > >> >> >> On Fri, Jan 17, 2020 at 2:01 AM Tony Pearce >> wrote: >> >>> Could anyone help by pointing me where to go to be able to dig into this >>> issue further? >>> >>> I have installed a test Openstack environment using RDO Packstack. I >>> wanted to install the same version that I have in Production (Pike) but >>> it's not listed in the CentOS repo via yum search. So I installed Queens. I >>> am using nimble.py Cinder driver. Nimble Storage is a storage array >>> accessed via iscsi from the Openstack host, and is controlled from >>> Openstack by the driver and API. >>> >>> *What I expected to happen:* >>> 1. create an instance with volume (the volume is created on the storage >>> array successfully and instance boots from it) >>> 2. take a snapshot (snapshot taken on the volume on the array >>> successfully) >>> 3. create a new instance from the snapshot (the api tells the array to >>> clone the snapshot into a new volume on the array and use that volume for >>> the instance) >>> 4. try and delete the snapshot >>> Expected Result - Openstack gives the user a message like "you're not >>> allowed to do that". >>> >>> Note: Step 3 above creates a child volume from the parent snapshot. >>> It's impossible to delete the parent snapshot because IO READ is sent to >>> that part of the original volume (as I understand it). >>> >>> *My production problem is this: * >>> 1. create an instance with volume (the volume is created on the storage >>> array successfully) >>> 2. take a snapshot (snapshot taken on the volume on the array >>> successfully) >>> 3. create a new instance from the snapshot (the api tells the array to >>> clone the snapshot into a new volume on the array and use that volume for >>> the instance) >>> 4. try and delete the snapshot >>> Result - snapshot goes into error state and later, all Cinder operations >>> fail such as new instance/create volume etc. until the correct service is >>> restarted. Then everything works once again. >>> >>> >>> To troubleshoot the above, I installed the RDP Packstack Queens (because >>> I couldnt get Pike). I tested the above and now, the result is the snapshot >>> is successfully deleted from openstack but not deleted on the array. The >>> log is below for reference. But I can see the in the log that the array >>> sends back info to openstack saying the snapshot has a clone and the delete >>> cannot be done because of that. Also response code 409. >>> >>> *Some info about why the problem with Pike started in the first place* >>> 1. Vendor is Nimble Storage which HPE purchased >>> 2. HPE/Nimble have dropped support for openstack. Latest supported >>> version is Queens and Nimble array version v4.x. The current Array version >>> is v5.x. Nimble say there are no guarantees with openstack, the driver and >>> the array version v5.x >>> 3. I was previously advised by Nimble that the array version v5.x will >>> work fine and so left our DR array on v5.x with a pending upgrade that had >>> a blocker due to an issue. This issue was resolved in December and the >>> pending upgrade completed to match the DR array took place around 30 days >>> ago. >>> >>> >>> With regards to the production issue, I assumed that the array API has >>> some changes between v4.x and v5.x and it's causing an issue with Cinder >>> due to the API response. Although I have not been able to find out if or >>> what changes there are that may have occurred after the array upgrade, as >>> the documentation for this is Nimble internal-only. >>> >>> >>> *So with that - some questions if I may:* >>> When Openstack got the 409 error response from the API (as seen in the >>> log below), why would Openstack then proceed to delete the snapshot on the >>> Openstack side? How could I debug this further? I'm not sure what Openstack >>> Cinder is acting on in terns of the response as yet. Maybe Openstack is not >>> specifically looking for the error code in the response? >>> >>> The snapshot that got deleted on the openstack side is a problem. Would >>> this be related to the driver? Could it be possible that the driver did not >>> pass the error response to Cinder? >>> >> >> Hi Tony, >> >> This is exactly what happened, and it appears to be a driver bug >> introduced in queens by [1]. The code in question [2] logs the error, but >> fails to propagate the exception. As far as the volume manager is >> concerned, the snapshot deletion was successful. >> >> [1] https://review.opendev.org/601492 >> [2] >> https://opendev.org/openstack/cinder/src/branch/stable/queens/cinder/volume/drivers/nimble.py#L1815 >> >> Alan >> >> Thanks in advance. Just for reference, the log snippet is below. >>> >>> >>> ==> volume.log <== >>>> 2020-01-17 16:53:23.718 24723 WARNING py.warnings >>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>>> 87e34c89e6fb41d2af25085b64011a55 - default default] >>>> /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852: >>>> InsecureRequestWarning: Unverified HTTPS request is being made. Adding >>>> certificate verification is strongly advised. See: >>>> https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings >>>> InsecureRequestWarning) >>>> : NimbleAPIException: Failed to execute api >>>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >>>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >>>> ==> api.log <== >>>> 2020-01-17 16:53:23.769 25242 INFO cinder.api.openstack.wsgi >>>> [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814 >>>> 87e34c89e6fb41d2af25085b64011a55 - default default] >>>> http://192.168.53.45:8776/v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail >>>> returned with HTTP 200 >>>> 2020-01-17 16:53:23.770 25242 INFO eventlet.wsgi.server >>>> [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814 >>>> 87e34c89e6fb41d2af25085b64011a55 - default default] 192.168.53.45 "GET >>>> /v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail HTTP/1.1" status: 200 >>>> len: 4657 time: 0.1152730 >>>> ==> volume.log <== >>>> 2020-01-17 16:53:23.811 24723 WARNING py.warnings >>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>>> 87e34c89e6fb41d2af25085b64011a55 - default default] >>>> /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852: >>>> InsecureRequestWarning: Unverified HTTPS request is being made. Adding >>>> certificate verification is strongly advised. See: >>>> https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings >>>> InsecureRequestWarning) >>>> : NimbleAPIException: Failed to execute api >>>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >>>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >>>> 2020-01-17 16:53:23.902 24723 ERROR cinder.volume.drivers.nimble >>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>>> 87e34c89e6fb41d2af25085b64011a55 - default default] Re-throwing Exception >>>> Failed to execute api snapshots/0464a5fd65d75fcfe1000000000000011100001a41: >>>> Error Code: 409 Message: Snapshot >>>> snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.: >>>> NimbleAPIException: Failed to execute api >>>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >>>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >>>> 2020-01-17 16:53:23.903 24723 WARNING cinder.volume.drivers.nimble >>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>>> 87e34c89e6fb41d2af25085b64011a55 - default default] Snapshot >>>> snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 : has a clone: >>>> NimbleAPIException: Failed to execute api >>>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >>>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >>>> 2020-01-17 16:53:23.964 24723 WARNING cinder.quota >>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>>> 87e34c89e6fb41d2af25085b64011a55 - default default] Deprecated: Default >>>> quota for resource: snapshots_Nimble-DR is set by the default quota flag: >>>> quota_snapshots_Nimble-DR, it is now deprecated. Please use the default >>>> quota class for default quota. >>>> 2020-01-17 16:53:24.054 24723 INFO cinder.volume.manager >>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>>> 87e34c89e6fb41d2af25085b64011a55 - default default] Delete snapshot >>>> completed successfully. >>> >>> >>> >>> Regards, >>> >>> *Tony Pearce* | >>> *Senior Network Engineer / Infrastructure Lead**Cinglevue International >>> * >>> >>> Email: tony.pearce at cinglevue.com >>> Web: http://www.cinglevue.com >>> >>> *Australia* >>> 1 Walsh Loop, Joondalup, WA 6027 Australia. >>> >>> Direct: +61 8 6202 0036 | Main: +61 8 6202 0024 >>> >>> Note: This email and all attachments are the sole property of Cinglevue >>> International Pty Ltd. (or any of its subsidiary entities), and the >>> information contained herein must be considered confidential, unless >>> specified otherwise. If you are not the intended recipient, you must not >>> use or forward the information contained in these documents. If you have >>> received this message in error, please delete the email and notify the >>> sender. >>> >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at est.tech Mon Jan 20 10:00:14 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Mon, 20 Jan 2020 10:00:14 +0000 Subject: [barbican] TPM2.0 backend Message-ID: <1579514411.790283.0@est.tech> Hi, Looking at the Barbican documentation I see that the secrets can be stored on disk (SimpleCrypto backend) or in a HW vendor specific HSM module. Is there a way to use a TPM 2.0 device as the backend of Barbican via something like [1]? Cheers, gibi [1] https://github.com/tpm2-software/tpm2-pkcs11 From rdhasman at redhat.com Mon Jan 20 09:24:48 2020 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Mon, 20 Jan 2020 14:54:48 +0530 Subject: Cinder snapshot delete successful when expected to fail In-Reply-To: References: Message-ID: Hi Tony, The 'raise' you used raises 'NimbleAPIException' which isn't handled anywhere and is caught by the generic exception block here[1] which sets your snapshot to error_deleting state. My suggestion to try raise exception.SnapshotIsBusy exception which will be caught here[2] and will set your snapshot to available state. Let me know if it works. [1] https://github.com/openstack/cinder/blob/master/cinder/volume/manager.py#L1242 [2] https://github.com/openstack/cinder/blob/master/cinder/volume/manager.py#L1228 Regards Rajat Dhasmana On Mon, Jan 20, 2020 at 2:35 PM Tony Pearce wrote: > Hi all, > I made some progress on this, but I am unsure how to make it better. Long > story short: > > - queens issue = snapshot is deleted from openstack but shouldnt be > because the snapshot is unable to be deleted on the storage side > - compared pike / queens / stein "nimble.py" and found all are > different, with each newer version of openstack having additions in code > - done some trial and error tests and decided to use the stein > nimble.py and modify it > - found the 'delete snapshot' section and found it is set to not halt > on error > - added 'raise' into the function and re-tested the 'delete snapshot' > scenario > - RESULT = now the snapshot is NOT deleted but goes "error deleting" > instead :) > > So now after making that change, the snapshot is now in an unavailable > status. I am looking as to how I can do something else other than make this > snapshot go into an unavailable condition. Such as display a message while > keeping the snapshot "available" because it can still be used > > Short story long: > > The "nimble.py" driver changes between pike,queens,stein versions (though > within the file it has "driver version 4.0.1" on all). Pike has around 1700 > lines. Queens has 1900 and Stein has 1910 approx. > > I confirmed the issue with the driver by copying the nimble.py driver (and > the other 2 files named nimble.pyc and nimble.pyo) from Pike into the > Queens test env. to test if the snapshot still gets deleted under Queens or > shows an error instead. The snapshot was not deleted and it goes error > status as expected. > note: Initially, I only copied the text data from nimble.py and it appears > as though the update to the text file was ignored. It looks to me like, > openstack uses one of those .pyc or .pyo files instead. I googled on this > and they are binaries that are used in some situations. If I make any > changes to the nimble.py file then I need to re-generate those .pyc and > .pyo files from the .py. > > So what is happening here is; I want to try and delete a snapshot that has > a clone. The expected outcome is the snapshot is not deleted in Openstack. > Current experience is that Openstack deletes the snapshot from the volume > snapshots, leaving the snapshot behind on the array storage side. > > In the volume.log, I see the array sends back an error 409 with "has a > clone" response. I managed to find which section is printing the error in > the volume.log from the nimble.py driver file and so I edited the text > section that gets printed and re-run the test. The volume.log now gets > printed with the new text additions I added 'DELETE' and 'Response' words: > > : NimbleAPIException: DELETE Failed to execute api > snapshots/0464a5fd65d75fcfe1000000000000011d00001b1d: Response Error Code: > 409 Message: Snapshot snapshot-4ee076ad-2e14-4d0d-bc20-64c17c741f8c for > volume volume-7dd64cf1-1d56-4f21-a153-a8137b68c557 has a clone. > > This is the python code where I made those changes: basically, because the > error code is not 200 or 201 then it throws the error from what I > understand. > >> def delete(self, api): >> url = self.uri + api >> r = requests.delete(url, headers=self.headers, verify=self.verify) >> if r.status_code != 201 and r.status_code != 200: >> base = "DELETE Failed to execute api %(api)s: Response Error >> Code: %(code)s" % { >> 'api': api, >> 'code': r.status_code} >> LOG.debug("Base error : %(base)s", {'base': base}) >> try: >> msg = _("%(base)s Message: %(msg)s") % { >> 'base': base, >> 'msg': r.json()['messages'][1]['text']} >> except IndexError: >> msg = _("%(base)s Message: %(msg)s") % { >> 'base': base, >> 'msg': six.text_type(r.json())} >> raise NimbleAPIException(msg) >> return r.json() > > > > However, slightly before the above within nimble.py I think I have found > the function that is causing the initial problem: > > def delete_snap(self, volume_name, snap_name): >> snap_info = self.get_snap_info(snap_name, volume_name) >> api = "snapshots/" + six.text_type(snap_info['id']) >> try: >> self.delete(api) >> except NimbleAPIException as ex: >> LOG.debug("delete snapshot exception: %s", ex) >> if SM_OBJ_HAS_CLONE in six.text_type(ex): >> # if snap has a clone log the error and continue ahead >> LOG.warning('Snapshot %(snap)s : %(state)s', >> {'snap': snap_name, >> 'state': SM_OBJ_HAS_CLONE}) >> else: >> raise > > > SM_OBJ_HAS_CLONE is looking for "has a clone" and it's defined in the > beginning of the file: "SM_OBJ_HAS_CLONE = "has a clone"" and I can see > this in the log file "has a clone" as a response 409. > > My problem is literally " # if snap has a clone log the error and continue > ahead" - it shouldnt be continuing, because by continuing it is deleting > the snapshot on the Openstack side but is unable to do the same on the > storage side because of the dependency issue. > > So what I did next was to look into the different "delete volume" section > for some help - because a similar condition can occur there -> to explain; > if volume2 is a clone of volume1 then we can't delete volume1 until we > first delete volume2. > What I notice is that there is a "raise" in that section at the end - I > think I understand this to be throwing an exception to openstack. ie to > cause an error condition. > > Here's the delete volume section from the driver: > > def delete_volume(self, volume): > """Delete the specified volume.""" > backup_snap_name, backup_vol_name = self > .is_volume_backup_clone(volume) > eventlet.sleep(DEFAULT_SLEEP) > self.APIExecutor.online_vol(volume['name'], False) > LOG.debug("Deleting volume %(vol)s", {'vol': volume['name']}) > > @utils.retry(NimbleAPIException, retries=3) > def _retry_remove_vol(volume): > self.APIExecutor.delete_vol(volume['name']) > try: > _retry_remove_vol(volume) > except NimbleAPIException as ex: > LOG.debug("delete volume exception: %s", ex) > if SM_OBJ_HAS_CLONE in six.text_type(ex): > LOG.warning('Volume %(vol)s : %(state)s', > {'vol': volume['name'], > 'state': SM_OBJ_HAS_CLONE}) > > # set the volume back to be online and raise busy exception > self.APIExecutor.online_vol(volume['name'], True) > raise exception.VolumeIsBusy(volume_name=volume['name']) > raise > > > So with the above, I modified the delete snapshot section and put in a > simple "raise" like this (highlighted in yellow) > >> >> def delete_snap(self, volume_name, snap_name): >> snap_info = self.get_snap_info(snap_name, volume_name) >> api = "snapshots/" + six.text_type(snap_info['id']) >> try: >> self.delete(api) >> except NimbleAPIException as ex: >> LOG.debug("delete snapshot exception: %s", ex) >> if SM_OBJ_HAS_CLONE in six.text_type(ex): >> # if snap has a clone log the error and continue ahead >> LOG.warning('Snapshot %(snap)s : %(state)s', >> {'snap': snap_name, >> 'state': SM_OBJ_HAS_CLONE}) >> raise >> else: >> raise > > > > And now when I test, the snapshot is not deleted but it instead goes into > ERROR-DELETING. It's not perfect but at least I can make the snapshot back > to "available" from the admin section within Openstack. > > Would anyone be able to if possible, give me some pointers how to accept > this error but not cause the snapshot to go into "error" ? I think that I > need to create a class? > > regards > > *Tony Pearce* | > *Senior Network Engineer / Infrastructure Lead**Cinglevue International > * > > Email: tony.pearce at cinglevue.com > Web: http://www.cinglevue.com > > *Australia* > 1 Walsh Loop, Joondalup, WA 6027 Australia. > > Direct: +61 8 6202 0036 | Main: +61 8 6202 0024 > > Note: This email and all attachments are the sole property of Cinglevue > International Pty Ltd. (or any of its subsidiary entities), and the > information contained herein must be considered confidential, unless > specified otherwise. If you are not the intended recipient, you must not > use or forward the information contained in these documents. If you have > received this message in error, please delete the email and notify the > sender. > > > > > On Sat, 18 Jan 2020 at 09:44, Tony Pearce > wrote: > >> Thank you. That really helps. >> >> I am going to diff the nimble.py files between Pike and Queens and see >> what's changed. >> >> On Fri, 17 Jan 2020, 22:18 Alan Bishop, wrote: >> >>> >>> >>> On Fri, Jan 17, 2020 at 2:01 AM Tony Pearce >>> wrote: >>> >>>> Could anyone help by pointing me where to go to be able to dig into >>>> this issue further? >>>> >>>> I have installed a test Openstack environment using RDO Packstack. I >>>> wanted to install the same version that I have in Production (Pike) but >>>> it's not listed in the CentOS repo via yum search. So I installed Queens. I >>>> am using nimble.py Cinder driver. Nimble Storage is a storage array >>>> accessed via iscsi from the Openstack host, and is controlled from >>>> Openstack by the driver and API. >>>> >>>> *What I expected to happen:* >>>> 1. create an instance with volume (the volume is created on the storage >>>> array successfully and instance boots from it) >>>> 2. take a snapshot (snapshot taken on the volume on the array >>>> successfully) >>>> 3. create a new instance from the snapshot (the api tells the array to >>>> clone the snapshot into a new volume on the array and use that volume for >>>> the instance) >>>> 4. try and delete the snapshot >>>> Expected Result - Openstack gives the user a message like "you're not >>>> allowed to do that". >>>> >>>> Note: Step 3 above creates a child volume from the parent snapshot. >>>> It's impossible to delete the parent snapshot because IO READ is sent to >>>> that part of the original volume (as I understand it). >>>> >>>> *My production problem is this: * >>>> 1. create an instance with volume (the volume is created on the storage >>>> array successfully) >>>> 2. take a snapshot (snapshot taken on the volume on the array >>>> successfully) >>>> 3. create a new instance from the snapshot (the api tells the array to >>>> clone the snapshot into a new volume on the array and use that volume for >>>> the instance) >>>> 4. try and delete the snapshot >>>> Result - snapshot goes into error state and later, all Cinder >>>> operations fail such as new instance/create volume etc. until the correct >>>> service is restarted. Then everything works once again. >>>> >>>> >>>> To troubleshoot the above, I installed the RDP Packstack Queens >>>> (because I couldnt get Pike). I tested the above and now, the result is the >>>> snapshot is successfully deleted from openstack but not deleted on the >>>> array. The log is below for reference. But I can see the in the log that >>>> the array sends back info to openstack saying the snapshot has a clone and >>>> the delete cannot be done because of that. Also response code 409. >>>> >>>> *Some info about why the problem with Pike started in the first place* >>>> 1. Vendor is Nimble Storage which HPE purchased >>>> 2. HPE/Nimble have dropped support for openstack. Latest supported >>>> version is Queens and Nimble array version v4.x. The current Array version >>>> is v5.x. Nimble say there are no guarantees with openstack, the driver and >>>> the array version v5.x >>>> 3. I was previously advised by Nimble that the array version v5.x will >>>> work fine and so left our DR array on v5.x with a pending upgrade that had >>>> a blocker due to an issue. This issue was resolved in December and the >>>> pending upgrade completed to match the DR array took place around 30 days >>>> ago. >>>> >>>> >>>> With regards to the production issue, I assumed that the array API has >>>> some changes between v4.x and v5.x and it's causing an issue with Cinder >>>> due to the API response. Although I have not been able to find out if or >>>> what changes there are that may have occurred after the array upgrade, as >>>> the documentation for this is Nimble internal-only. >>>> >>>> >>>> *So with that - some questions if I may:* >>>> When Openstack got the 409 error response from the API (as seen in the >>>> log below), why would Openstack then proceed to delete the snapshot on the >>>> Openstack side? How could I debug this further? I'm not sure what Openstack >>>> Cinder is acting on in terns of the response as yet. Maybe Openstack is not >>>> specifically looking for the error code in the response? >>>> >>>> The snapshot that got deleted on the openstack side is a problem. Would >>>> this be related to the driver? Could it be possible that the driver did not >>>> pass the error response to Cinder? >>>> >>> >>> Hi Tony, >>> >>> This is exactly what happened, and it appears to be a driver bug >>> introduced in queens by [1]. The code in question [2] logs the error, but >>> fails to propagate the exception. As far as the volume manager is >>> concerned, the snapshot deletion was successful. >>> >>> [1] https://review.opendev.org/601492 >>> [2] >>> https://opendev.org/openstack/cinder/src/branch/stable/queens/cinder/volume/drivers/nimble.py#L1815 >>> >>> Alan >>> >>> Thanks in advance. Just for reference, the log snippet is below. >>>> >>>> >>>> ==> volume.log <== >>>>> 2020-01-17 16:53:23.718 24723 WARNING py.warnings >>>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>>>> 87e34c89e6fb41d2af25085b64011a55 - default default] >>>>> /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852: >>>>> InsecureRequestWarning: Unverified HTTPS request is being made. Adding >>>>> certificate verification is strongly advised. See: >>>>> https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings >>>>> InsecureRequestWarning) >>>>> : NimbleAPIException: Failed to execute api >>>>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >>>>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >>>>> ==> api.log <== >>>>> 2020-01-17 16:53:23.769 25242 INFO cinder.api.openstack.wsgi >>>>> [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814 >>>>> 87e34c89e6fb41d2af25085b64011a55 - default default] >>>>> http://192.168.53.45:8776/v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail >>>>> returned with HTTP 200 >>>>> 2020-01-17 16:53:23.770 25242 INFO eventlet.wsgi.server >>>>> [req-bfcbff34-134b-497e-82b1-082d48f8767f df7548ecad684f26b8bc802ba63a9814 >>>>> 87e34c89e6fb41d2af25085b64011a55 - default default] 192.168.53.45 "GET >>>>> /v3/87e34c89e6fb41d2af25085b64011a55/volumes/detail HTTP/1.1" status: 200 >>>>> len: 4657 time: 0.1152730 >>>>> ==> volume.log <== >>>>> 2020-01-17 16:53:23.811 24723 WARNING py.warnings >>>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>>>> 87e34c89e6fb41d2af25085b64011a55 - default default] >>>>> /usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:852: >>>>> InsecureRequestWarning: Unverified HTTPS request is being made. Adding >>>>> certificate verification is strongly advised. See: >>>>> https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings >>>>> InsecureRequestWarning) >>>>> : NimbleAPIException: Failed to execute api >>>>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >>>>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >>>>> 2020-01-17 16:53:23.902 24723 ERROR cinder.volume.drivers.nimble >>>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>>>> 87e34c89e6fb41d2af25085b64011a55 - default default] Re-throwing Exception >>>>> Failed to execute api snapshots/0464a5fd65d75fcfe1000000000000011100001a41: >>>>> Error Code: 409 Message: Snapshot >>>>> snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone.: >>>>> NimbleAPIException: Failed to execute api >>>>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >>>>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >>>>> 2020-01-17 16:53:23.903 24723 WARNING cinder.volume.drivers.nimble >>>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>>>> 87e34c89e6fb41d2af25085b64011a55 - default default] Snapshot >>>>> snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 : has a clone: >>>>> NimbleAPIException: Failed to execute api >>>>> snapshots/0464a5fd65d75fcfe1000000000000011100001a41: Error Code: 409 >>>>> Message: Snapshot snapshot-3806efc5-65ca-495a-a87a-baaddc9607d9 for volume >>>>> volume-5b02db35-8d5c-4ef6-b0e7-2f9b62cac57e has a clone. >>>>> 2020-01-17 16:53:23.964 24723 WARNING cinder.quota >>>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>>>> 87e34c89e6fb41d2af25085b64011a55 - default default] Deprecated: Default >>>>> quota for resource: snapshots_Nimble-DR is set by the default quota flag: >>>>> quota_snapshots_Nimble-DR, it is now deprecated. Please use the default >>>>> quota class for default quota. >>>>> 2020-01-17 16:53:24.054 24723 INFO cinder.volume.manager >>>>> [req-60fe4335-af66-4c46-9bbd-2408bf4d6f07 df7548ecad684f26b8bc802ba63a9814 >>>>> 87e34c89e6fb41d2af25085b64011a55 - default default] Delete snapshot >>>>> completed successfully. >>>> >>>> >>>> >>>> Regards, >>>> >>>> *Tony Pearce* | >>>> *Senior Network Engineer / Infrastructure Lead**Cinglevue >>>> International * >>>> >>>> Email: tony.pearce at cinglevue.com >>>> Web: http://www.cinglevue.com >>>> >>>> *Australia* >>>> 1 Walsh Loop, Joondalup, WA 6027 Australia. >>>> >>>> Direct: +61 8 6202 0036 | Main: +61 8 6202 0024 >>>> >>>> Note: This email and all attachments are the sole property of Cinglevue >>>> International Pty Ltd. (or any of its subsidiary entities), and the >>>> information contained herein must be considered confidential, unless >>>> specified otherwise. If you are not the intended recipient, you must not >>>> use or forward the information contained in these documents. If you have >>>> received this message in error, please delete the email and notify the >>>> sender. >>>> >>>> >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From lyarwood at redhat.com Mon Jan 20 15:41:07 2020 From: lyarwood at redhat.com (Lee Yarwood) Date: Mon, 20 Jan 2020 15:41:07 +0000 Subject: [queens][nova] iscsi issue In-Reply-To: References: Message-ID: <20200120154107.czjws3n3p5rl64nu@lyarwood.usersys.redhat.com> On 17-01-20 19:30:13, Ignazio Cassano wrote: > Hello all we are testing openstack queens cinder driver for Unity iscsi > (driver cinder.volume.drivers.dell_emc.unity.Driver). > > The unity storage is a Unity600 Version 4.5.10.5.001 > > We are facing an issue when we try to detach volume from a virtual machine > with two or more volumes attached (this happens often but not always): Could you write this up as an os-brick bug and attach the nova-compute log in DEBUG showing the initial volume attachments prior to this detach error? https://bugs.launchpad.net/os-brick/+filebug > The following is reported nova-compute.log: > > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server > self.connector.disconnect_volume(connection_info['data'], None) > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File > "/usr/lib/python2.7/site-packages/os_brick/utils.py", line 137, in > trace_logging_wrapper > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server return > f(*args, **kwargs) > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File > "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, > in inner > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server return > f(*args, **kwargs) > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File > "/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py", > line 848, in disconnect_volume > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server > device_info=device_info) > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File > "/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py", > line 892, in _cleanup_connection > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server path_used, > was_multipath) > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File > "/usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py", line > 271, in remove_connection > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server > self.flush_multipath_device(multipath_name) > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File > "/usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py", line > 329, in flush_multipath_device > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server > root_helper=self._root_helper) > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File > "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52, in > _execute > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server result = > self.__execute(*args, **kwargs) > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File > "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py", line > 169, in execute > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server return > execute_root(*cmd, **kwargs) > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File > "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, > in _wrap > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server return > self.channel.remote_call(name, args, kwargs) > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File > "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in > remote_call > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server raise > exc_type(*result[2]) > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server > ProcessExecutionError: Unexpected error while running command. > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server Command: > multipath -f 36006016006e04400d0c4215e3ec55757 > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server Exit code: 1 > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server Stdout: u'Jan > 17 16:04:30 | 36006016006e04400d0c4215e3ec55757p1: map in use\nJan 17 > 16:04:31 | failed to remove multipath map > 36006016006e04400d0c4215e3ec55757\n' > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server Stderr: u'' > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server > > > Best Regards > > Ignazio -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From haleyb.dev at gmail.com Mon Jan 20 16:07:31 2020 From: haleyb.dev at gmail.com (Brian Haley) Date: Mon, 20 Jan 2020 11:07:31 -0500 Subject: [neutron] Bug deputy report for week of January 13th Message-ID: Hi, I was Neutron bug deputy last week. Below is a short summary about reported bugs. -Brian Critical bugs ------------- * https://bugs.launchpad.net/neutron/+bug/1859988 - neutron-tempest-plugin tests fail for stable/queens - bcafarel picked up, related to below bug * https://bugs.launchpad.net/neutron/+bug/1860033 - neutron tempest jobs broken on rocky due to requirements neutron-lib upgrade - many proposed changes, discussion on ML High bugs --------- * https://bugs.launchpad.net/neutron/+bug/1859977 - [OVN] Update of floatingip creates new row in NBDB NAT table - maciej picked up * https://bugs.launchpad.net/neutron/+bug/1860140 - [OVN] Provider driver sends malformed update to Octavia - https://review.opendev.org/#/c/703097 * https://bugs.launchpad.net/neutron/+bug/1860141 - [OVN] Provider driver fails while LB VIP port already created - https://review.opendev.org/#/c/703110 Medium bugs ----------- * https://bugs.launchpad.net/neutron/+bug/1859832 - L3 HA connectivity to GW port can be broken after reboot of backup node - Issue with MLDv2 packet causing issues with connections - Slawek picked up ownership * https://bugs.launchpad.net/neutron/+bug/1860273 - https://review.opendev.org/#/c/703292/ Low bugs -------- * https://bugs.launchpad.net/neutron/+bug/1859765 - Sgs of device_owner_network port can be update without specifing "device_owner" attr - https://review.opendev.org/#/c/702632/ - discussion in review * https://bugs.launchpad.net/neutron/+bug/1859962 - Sanity checks comparing versions should not use decimal comparison - https://review.opendev.org/#/c/702847/ Wishlist bugs ------------- Invalid bugs ------------ * https://bugs.launchpad.net/neutron/+bug/1859638 - VIP between dvr east-west networks does not work at all - Duplicate of https://bugs.launchpad.net/neutron/+bug/1774459 * https://bugs.launchpad.net/neutron/+bug/1859976 - Removing smart_nic port in openstack will try to delete representor port - os-vif issue Further triage required ----------------------- * https://bugs.launchpad.net/neutron/+bug/1859362 - Neutron accepts arbitrary MTU values for networks - Rodolfo has some questions on the bug * https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1859649 - networking disruption on upgrade from 14.0.0 to 14.0.3 - Rodolfo had question about restart order of services * https://bugs.launchpad.net/neutron/+bug/1859887 - External connectivity broken because of stale FIP rule - Asked for release information, etc * https://bugs.launchpad.net/networking-ovn/+bug/1859965 - networking-ovn's octavia api,how get octavia-api DB use python code? - networking-ovn octavia driver bug? asked for more information From rico.lin.guanyu at gmail.com Mon Jan 20 16:47:19 2020 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Tue, 21 Jan 2020 00:47:19 +0800 Subject: [Multi-Arch-sig] Meeting this week Message-ID: Hi all. As reminder, we will host our meeting this week. Tuesday at 0800 UTC in #openstack-meeting-alt and 1500 UTC in #openstack-meeting . Feel free to propose agenda here https://etherpad.openstack.org/p/Multi-Arch-agenda See you all in meeting:) -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdulko at redhat.com Mon Jan 20 16:51:47 2020 From: mdulko at redhat.com (=?UTF-8?Q?Micha=C5=82?= Dulko) Date: Mon, 20 Jan 2020 17:51:47 +0100 Subject: [kuryr] Deprecation of Ingress support and namespace isolation Message-ID: <97e0d4450c2777bcc4f8f2aff39dfdb0150fbe09.camel@redhat.com> Hi, I've decided to put up a patch [1] deprecating the aforementioned features. It's motivated by the fact that there are better ways to do both: * Ingress can be done by another controller or through cloud provider. * Namespace isolation can be achieved through network policies. Both alternative ways are way better tested and there's nobody maintaining the deprecated features. I'm open to keep them if someone using them steps up. With the fact that Kuryr seems to be tied more with Kubernetes releases than with OpenStack ones and given there will be no objections, we might start removing the code in the Ussuri timeframe. [1] https://review.opendev.org/#/c/703420/ Thanks, Michał From sean.mcginnis at gmail.com Mon Jan 20 18:01:19 2020 From: sean.mcginnis at gmail.com (Sean McGinnis) Date: Mon, 20 Jan 2020 12:01:19 -0600 Subject: [release] Python universal wheel support Message-ID: <64cc9fee-de60-7584-3f9e-7cb3b3be28aa@gmail.com> Greetings, We have just merged a change to the release-openstack-python job that changes the wheels we produce to not be universal. For some background - we added an explicit flag of "--universal" to the creation of wheels a while back. For projects that have both Python 2 and 3 support, you want an universal wheel. Not all (probably most) projects did not add this flag to their setup.cfg, so overriding at the release job level was considered a good way to make sure our output was what we wanted at the time. We now have the majority of projects dropping py2 support, so we actually no longer want to create these universal wheels if py2 support has been dropped. That has actually been seen to cause some issues. The downside with this change is that the job is for *all* deliverables we release, including stable releases. So with this change, any new stable branches will no longer get universal wheels if the flag has not been set locally. This was deemed a good tradeoff with the current needs though. The lack of a univeral wheel may just make installation of py2 stable deliverables just slightly slower, but should not cause any real issues. Actions ------ Most likely this won't require any actions on the project team's part. If you have a project that still supports both py2 and py3 and do not have the flag set in setup.cfg, that can be added to still get the universal wheels built. That is done by adding the following: ``` [bdist_wheel] universal = 1 ``` Again, the performance impact is probably very minimal during installation, so this shouldn't be a major concern. If there are any oddities noticed after this change, please bring them up and we can help investigate what is happening. Thanks! Sean From gmann at ghanshyammann.com Mon Jan 20 18:46:34 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 20 Jan 2020 12:46:34 -0600 Subject: [neutron] Bug deputy report for week of January 13th In-Reply-To: References: Message-ID: <16fc4472431.edd8d6c6118183.415190410856446702@ghanshyammann.com> ---- On Mon, 20 Jan 2020 10:07:31 -0600 Brian Haley wrote ---- > Hi, > > I was Neutron bug deputy last week. Below is a short summary about > reported bugs. > > -Brian > > > Critical bugs > ------------- > > * https://bugs.launchpad.net/neutron/+bug/1859988 > - neutron-tempest-plugin tests fail for stable/queens > - bcafarel picked up, related to below bug This needs to be done at devstack level not only on job side. We have to cap the Tempest and use the corresponding upper-constraint for tempest venv. I have started the work on End of Queens support in Tempest[1] and based on what change exactly made Temepst master to fail on queens I will proceed to pin the Tempest on devstack queens like done for ocata and pike. [1] https://review.opendev.org/#/c/703255/ -gmann > > * https://bugs.launchpad.net/neutron/+bug/1860033 - neutron tempest jobs > broken on rocky due to requirements neutron-lib upgrade > - many proposed changes, discussion on ML > > High bugs > --------- > > * https://bugs.launchpad.net/neutron/+bug/1859977 - [OVN] Update of > floatingip creates new row in NBDB NAT table > - maciej picked up > > * https://bugs.launchpad.net/neutron/+bug/1860140 - [OVN] Provider > driver sends malformed update to Octavia > - https://review.opendev.org/#/c/703097 > > * https://bugs.launchpad.net/neutron/+bug/1860141 - [OVN] Provider > driver fails while LB VIP port already created > - https://review.opendev.org/#/c/703110 > > Medium bugs > ----------- > > * https://bugs.launchpad.net/neutron/+bug/1859832 - L3 HA connectivity > to GW port can be broken after reboot of backup node > - Issue with MLDv2 packet causing issues with connections > - Slawek picked up ownership > > * https://bugs.launchpad.net/neutron/+bug/1860273 > - https://review.opendev.org/#/c/703292/ > > Low bugs > -------- > > * https://bugs.launchpad.net/neutron/+bug/1859765 - Sgs of > device_owner_network port can be update without specifing "device_owner" > attr > - https://review.opendev.org/#/c/702632/ - discussion in review > > * https://bugs.launchpad.net/neutron/+bug/1859962 - Sanity checks > comparing versions should not use decimal comparison > - https://review.opendev.org/#/c/702847/ > > Wishlist bugs > ------------- > > Invalid bugs > ------------ > > * https://bugs.launchpad.net/neutron/+bug/1859638 - VIP between dvr > east-west networks does not work at all > - Duplicate of https://bugs.launchpad.net/neutron/+bug/1774459 > > * https://bugs.launchpad.net/neutron/+bug/1859976 - Removing smart_nic > port in openstack will try to delete representor port > - os-vif issue > > Further triage required > ----------------------- > > * https://bugs.launchpad.net/neutron/+bug/1859362 - Neutron accepts > arbitrary MTU values for networks > - Rodolfo has some questions on the bug > > * https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1859649 - > networking disruption on upgrade from 14.0.0 to 14.0.3 > - Rodolfo had question about restart order of services > > * https://bugs.launchpad.net/neutron/+bug/1859887 - > External connectivity broken because of stale FIP rule > - Asked for release information, etc > > * https://bugs.launchpad.net/networking-ovn/+bug/1859965 - > networking-ovn's octavia api,how get octavia-api DB use python code? > - networking-ovn octavia driver bug? asked for more information > > From skaplons at redhat.com Mon Jan 20 20:26:27 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 20 Jan 2020 21:26:27 +0100 Subject: [all][neutron][neutron-fwaas] FINAL CALL Maintainers needed In-Reply-To: References: <20191119102615.oq46xojyhoybulna@skaplons-mac> Message-ID: Hi, We are getting closer and closer to Ussuri-2 milestone which is our deadline to deprecate neutron-fwaas project if there will be no any volunteers to maintain this project. So if You are interested in this project, please raise Your hand here or ping me on IRC about that. > On 6 Jan 2020, at 21:05, Slawek Kaplonski wrote: > > Hi, > > Just as a reminder, we are still looking for maintainers who want to keep neutron-fwaas project alive. As it was written in my previous email, we will mark this project as deprecated. > So please reply to this email or contact me directly if You are interested in maintaining this project. > >> On 19 Nov 2019, at 11:26, Slawek Kaplonski wrote: >> >> Hi, >> >> Over the past couple of cycles we have noticed that new contributions and >> maintenance efforts for neutron-fwaas project were almost non existent. >> This impacts patches for bug fixes, new features and reviews. The Neutron >> core team is trying to at least keep the CI of this project healthy, but we >> don’t have enough knowledge about the details of the neutron-fwaas >> code base to review more complex patches. >> >> During the PTG in Shanghai we discussed that with operators and TC members >> during the forum session [1] and later within the Neutron team during the >> PTG session [2]. >> >> During these discussions, with the help of operators and TC members, we reached >> the conclusion that we need to have someone responsible for maintaining project. >> This doesn’t mean that the maintainer needs to spend full time working on this >> project. Rather, we need someone to be the contact person for the project, who >> takes care of the project’s CI and review patches. Of course that’s only a >> minimal requirement. If the new maintainer works on new features for the >> project, it’s even better :) >> >> If we don’t have any new maintainer(s) before milestone Ussuri-2, which is >> Feb 10 - Feb 14 according to [3], we will need to mark neutron-fwaas >> as deprecated and in “V” cycle we will propose to move the project >> from the Neutron stadium, hosted in the “openstack/“ namespace, to the >> unofficial projects hosted in the “x/“ namespace. >> >> So if You are using this project now, or if You have customers who are >> using it, please consider the possibility of maintaining it. Otherwise, >> please be aware that it is highly possible that the project will be >> deprecated and moved out from the official OpenStack projects. >> >> [1] >> https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward >> [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored - >> Lines 379-421 >> [3] https://releases.openstack.org/ussuri/schedule.html >> >> -- >> Slawek Kaplonski >> Senior software engineer >> Red Hat > > — > Slawek Kaplonski > Senior software engineer > Red Hat > — Slawek Kaplonski Senior software engineer Red Hat From emiller at genesishosting.com Mon Jan 20 20:54:02 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Mon, 20 Jan 2020 14:54:02 -0600 Subject: [all][neutron][neutron-fwaas] FINAL CALL Maintainers needed In-Reply-To: References: <20191119102615.oq46xojyhoybulna@skaplons-mac> Message-ID: <046E9C0290DD9149B106B72FC9156BEA04771777@gmsxchsvr01.thecreation.com> > -----Original Message----- > From: Slawek Kaplonski [mailto:skaplons at redhat.com] > Sent: Monday, January 20, 2020 2:26 PM > To: OpenStack Discuss ML > Subject: Re: [all][neutron][neutron-fwaas] FINAL CALL Maintainers needed > > Hi, > > We are getting closer and closer to Ussuri-2 milestone which is our deadline > to deprecate neutron-fwaas project if there will be no any volunteers to > maintain this project. > So if You are interested in this project, please raise Your hand here or ping > me on IRC about that. > > > On 6 Jan 2020, at 21:05, Slawek Kaplonski wrote: > > > > Hi, > > > > Just as a reminder, we are still looking for maintainers who want to keep > neutron-fwaas project alive. As it was written in my previous email, we will > mark this project as deprecated. > > So please reply to this email or contact me directly if You are interested in > maintaining this project. > > > >> On 19 Nov 2019, at 11:26, Slawek Kaplonski > wrote: > >> > >> Hi, > >> > >> Over the past couple of cycles we have noticed that new contributions > and > >> maintenance efforts for neutron-fwaas project were almost non existent. > >> This impacts patches for bug fixes, new features and reviews. The > Neutron > >> core team is trying to at least keep the CI of this project healthy, but we > >> don’t have enough knowledge about the details of the neutron-fwaas > >> code base to review more complex patches. > >> > >> During the PTG in Shanghai we discussed that with operators and TC > members > >> during the forum session [1] and later within the Neutron team during the > >> PTG session [2]. > >> > >> During these discussions, with the help of operators and TC members, we > reached > >> the conclusion that we need to have someone responsible for > maintaining project. > >> This doesn’t mean that the maintainer needs to spend full time working > on this > >> project. Rather, we need someone to be the contact person for the > project, who > >> takes care of the project’s CI and review patches. Of course that’s only a > >> minimal requirement. If the new maintainer works on new features for > the > >> project, it’s even better :) > >> > >> If we don’t have any new maintainer(s) before milestone Ussuri-2, which > is > >> Feb 10 - Feb 14 according to [3], we will need to mark neutron-fwaas > >> as deprecated and in “V” cycle we will propose to move the project > >> from the Neutron stadium, hosted in the “openstack/“ namespace, to the > >> unofficial projects hosted in the “x/“ namespace. > >> > >> So if You are using this project now, or if You have customers who are > >> using it, please consider the possibility of maintaining it. Otherwise, > >> please be aware that it is highly possible that the project will be > >> deprecated and moved out from the official OpenStack projects. > >> > >> [1] > >> https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the- > path-forward > >> [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning- > restored - > >> Lines 379-421 > >> [3] https://releases.openstack.org/ussuri/schedule.html > >> > >> -- > >> Slawek Kaplonski > >> Senior software engineer > >> Red Hat > > > > — > > Slawek Kaplonski > > Senior software engineer > > Red Hat > > > > — > Slawek Kaplonski > Senior software engineer > Red Hat > > I'm not a developer, rather an operator. I just thought I'd ask if the OpenStack community had thought about creating a fund for developers that may want to contribute, but can only do it for a fee. Essentially a donation bucket. I don't know if the OpenStack Foundation does this or not already. There may be adequate need for fwaas, for example, but not enough volunteer resources to do it. However there may be money in multiple operators' budgets that can be used to donate to the support of a project. Eric From fungi at yuggoth.org Mon Jan 20 21:25:05 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 20 Jan 2020 21:25:05 +0000 Subject: [all][neutron][neutron-fwaas] FINAL CALL Maintainers needed In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04771777@gmsxchsvr01.thecreation.com> References: <20191119102615.oq46xojyhoybulna@skaplons-mac> <046E9C0290DD9149B106B72FC9156BEA04771777@gmsxchsvr01.thecreation.com> Message-ID: <20200120212505.tdqv5br5vwfib63z@yuggoth.org> On 2020-01-20 14:54:02 -0600 (-0600), Eric K. Miller wrote: [...] > I'm not a developer, rather an operator. I just thought I'd ask > if the OpenStack community had thought about creating a fund for > developers that may want to contribute, but can only do it for a > fee. Essentially a donation bucket. I don't know if the > OpenStack Foundation does this or not already. There may be > adequate need for fwaas, for example, but not enough volunteer > resources to do it. However there may be money in multiple > operators' budgets that can be used to donate to the support of a > project. Our community has mostly relied on commercial distribution vendors and service providers to fill that role in the past. If enough of their customers say it's a feature they're relying on, then employing developers who can help keep it maintained is their raison d'être. This is how pretty much all free/libre open source community software projects work. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From emiller at genesishosting.com Mon Jan 20 21:43:48 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Mon, 20 Jan 2020 15:43:48 -0600 Subject: [all][neutron][neutron-fwaas] FINAL CALL Maintainers needed In-Reply-To: <20200120212505.tdqv5br5vwfib63z@yuggoth.org> References: <20191119102615.oq46xojyhoybulna@skaplons-mac> <046E9C0290DD9149B106B72FC9156BEA04771777@gmsxchsvr01.thecreation.com> <20200120212505.tdqv5br5vwfib63z@yuggoth.org> Message-ID: <046E9C0290DD9149B106B72FC9156BEA0477177D@gmsxchsvr01.thecreation.com> > Our community has mostly relied on commercial distribution vendors and > service providers to fill that role in the past. If enough of their customers say > it's a feature they're relying on, then employing developers who can help > keep it maintained is their raison d'être. This is how pretty much all free/libre > open source community software projects work. > -- > Jeremy Stanley Understood. I thought there may be a way to crowd-fund something in case smaller operators had small budgets, needed some feature supported, but couldn't afford an entire employee with the small budget. Maybe there are developer consultants interested in a gig to maintain something for a while. Not sure where the best place to go for matching operators with consultants for this type of thing. fwaas seems like such a necessity. We would love to offer it, but it is unusable with DVR. We just don't have the budget to hire someone specifically for development/support of this. Eric From raubvogel at gmail.com Mon Jan 20 21:49:51 2020 From: raubvogel at gmail.com (Mauricio Tavares) Date: Mon, 20 Jan 2020 16:49:51 -0500 Subject: [all][neutron][neutron-fwaas] FINAL CALL Maintainers needed In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA0477177D@gmsxchsvr01.thecreation.com> References: <20191119102615.oq46xojyhoybulna@skaplons-mac> <046E9C0290DD9149B106B72FC9156BEA04771777@gmsxchsvr01.thecreation.com> <20200120212505.tdqv5br5vwfib63z@yuggoth.org> <046E9C0290DD9149B106B72FC9156BEA0477177D@gmsxchsvr01.thecreation.com> Message-ID: On Mon, Jan 20, 2020 at 4:46 PM Eric K. Miller wrote: > > > Our community has mostly relied on commercial distribution vendors and > > service providers to fill that role in the past. If enough of their customers say > > it's a feature they're relying on, then employing developers who can help > > keep it maintained is their raison d'être. This is how pretty much all free/libre > > open source community software projects work. > > -- > > Jeremy Stanley > > Understood. I thought there may be a way to crowd-fund something in case smaller operators had small budgets, needed some feature supported, but couldn't afford an entire employee with the small budget. > > Maybe there are developer consultants interested in a gig to maintain something for a while. Not sure where the best place to go for matching operators with consultants for this type of thing. > > fwaas seems like such a necessity. We would love to offer it, but it is unusable with DVR. We just don't have the budget to hire someone specifically for development/support of this. > Smells like a case for gofundme time. Or, is there still time for google summer of code submission? > Eric > From emiller at genesishosting.com Mon Jan 20 21:54:27 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Mon, 20 Jan 2020 15:54:27 -0600 Subject: [all][neutron][neutron-fwaas] FINAL CALL Maintainers needed In-Reply-To: References: <20191119102615.oq46xojyhoybulna@skaplons-mac> <046E9C0290DD9149B106B72FC9156BEA04771777@gmsxchsvr01.thecreation.com> <20200120212505.tdqv5br5vwfib63z@yuggoth.org> <046E9C0290DD9149B106B72FC9156BEA0477177D@gmsxchsvr01.thecreation.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA04771782@gmsxchsvr01.thecreation.com> > Smells like a case for gofundme time. Or, is there still time > for google summer of code submission? I'll check out gofundme. I honestly haven't looked at it much. I wasn't sure where OpenStack developer consultants would look for this type of gig. If it is gofundme, that works! Eric From petebirley+openstack-dev at gmail.com Tue Jan 21 01:09:08 2020 From: petebirley+openstack-dev at gmail.com (Pete Birley) Date: Mon, 20 Jan 2020 19:09:08 -0600 Subject: [openstack-helm] Core Reviewer Nominations Message-ID: OpenStack-Helm team, Based on their record of quality code review and substantial/meaningful code contributions to the openstack-helm project, at last weeks meeting we proposed the following individuals as core reviewers for openstack-helm: - Gage Hugo - Steven Fitzpatrick All OpenStack-Helm Core Reviewers are invited to reply with a +1/-1 by EOD next Monday (27/1/2020). A lone +1/-1 will apply to both candidates, otherwise please spell out votes individually for the candidates. Cheers, Pete -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilkers.steve at gmail.com Tue Jan 21 02:12:07 2020 From: wilkers.steve at gmail.com (Steve Wilkerson) Date: Mon, 20 Jan 2020 20:12:07 -0600 Subject: [openstack-helm] Core Reviewer Nominations In-Reply-To: References: Message-ID: A resounding +1 from me for both Steven and Gage. Both have done really great work and have provided meaningful reviews along the way. On Mon, Jan 20, 2020 at 7:14 PM Pete Birley < petebirley+openstack-dev at gmail.com> wrote: > OpenStack-Helm team, > > > > Based on their record of quality code review and substantial/meaningful > code contributions to the openstack-helm project, at last weeks meeting we > proposed the following individuals as core reviewers for openstack-helm: > > > > - Gage Hugo > - Steven Fitzpatrick > > > > All OpenStack-Helm Core Reviewers are invited to reply with a +1/-1 by > EOD next Monday (27/1/2020). A lone +1/-1 will apply to both candidates, > otherwise please spell out votes individually for the candidates. > > > > Cheers, > > > Pete > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Albert.Braden at synopsys.com Tue Jan 21 02:19:27 2020 From: Albert.Braden at synopsys.com (Albert Braden) Date: Tue, 21 Jan 2020 02:19:27 +0000 Subject: Galera config values In-Reply-To: <4E49E11B-83FA-4016-9FA1-30CDC377825C@blizzard.com> References: , <4E49E11B-83FA-4016-9FA1-30CDC377825C@blizzard.com> Message-ID: That would be fantastic; thanks! -----Original Message----- From: Erik Olof Gunnar Andersson Sent: Friday, January 17, 2020 7:37 PM To: Mohammed Naser Cc: Albert Braden ; openstack-discuss at lists.openstack.org Subject: Re: Galera config values I can share our haproxt settings on monday, but you need to make sure that haproxy to at least match the Oslo config which I believe is 3600s, but I think in theory something like keepalived is better for galerara. btw pretty sure both client and server needs 3600s. Basically openstack recycles the connection every hour by default. So you need to make sure that haproxy does not close it before that if it’s idle. Sent from my iPhone > On Jan 17, 2020, at 7:24 PM, Mohammed Naser wrote: > > On Fri, Jan 17, 2020 at 5:20 PM Albert Braden > wrote: >> >> I’m experimenting with Galera in my Rocky openstack-ansible dev cluster, and I’m finding that the default haproxy config values don’t seem to work. Finding the correct values is a lot of work. For example, I spent this morning experimenting with different values for “timeout client” in /etc/haproxy/haproxy.cfg. The default is 1m, and with the default set I see this error in /var/log/nova/nova-scheduler.log on the controllers: >> >> >> >> 2020-01-17 13:54:26.059 443358 ERROR oslo_db.sqlalchemy.engines DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: u'SELECT 1'] (Background on this error at: https://urldefense.com/v3/__http://sqlalche.me/e/e3q8__;!!Ci6f514n9QsL8ck!39gvi32Ldv9W8zhZ_P1JLvkOFM-PelyP_RrU_rT5_EuELR24fLO5P3ShvZ56jfcQ7g$ ) >> >> >> >> There are several timeout values in /etc/haproxy/haproxy.cfg. These are the values we started with: >> >> >> >> stats timeout 30s >> >> timeout http-request 10s >> >> timeout queue 1m >> >> timeout connect 10s >> >> timeout client 1m >> >> timeout server 1m >> >> timeout check 10s >> >> >> >> At first I changed them all to 30m. This stopped the “Lost connection” error in nova-scheduler.log. Then, one at a time, I changed them back to the default. When I got to “timeout client” I found that setting it back to 1m caused the errors to start again. I changed it back and forth and found that 4 minutes causes errors, and 6m stops them, so I left it at 6m. >> >> >> >> These are my active variables: >> >> >> >> root at us01odc-dev2-ctrl1:/etc/mysql# mysql -e 'show variables;'|grep timeout >> >> connect_timeout 20 >> >> deadlock_timeout_long 50000000 >> >> deadlock_timeout_short 10000 >> >> delayed_insert_timeout 300 >> >> idle_readonly_transaction_timeout 0 >> >> idle_transaction_timeout 0 >> >> idle_write_transaction_timeout 0 >> >> innodb_flush_log_at_timeout 1 >> >> innodb_lock_wait_timeout 50 >> >> innodb_rollback_on_timeout OFF >> >> interactive_timeout 28800 >> >> lock_wait_timeout 86400 >> >> net_read_timeout 30 >> >> net_write_timeout 60 >> >> rpl_semi_sync_master_timeout 10000 >> >> rpl_semi_sync_slave_kill_conn_timeout 5 >> >> slave_net_timeout 60 >> >> thread_pool_idle_timeout 60 >> >> wait_timeout 3600 >> >> >> >> So it looks like the value of “timeout client” in haproxy.cfg needs to match or exceed the value of “wait_timeout” in mysql. Also in nova.conf I see “#connection_recycle_time = 3600” – I need to experiment to see how that value interacts with the timeouts in the other config files. >> >> >> >> Is this the best way to find the correct config values? It seems like there should be a document that talks about these timeouts and how to set them (or maybe more generally how the different timeout settings in the various config files interact). Does that document exist? If not, maybe I could write one, since I have to figure out the correct values anyway. > > Is your cluster pretty idle? I've never seen that happen in any > environments before... > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. https://urldefense.com/v3/__https://vexxhost.com__;!!Ci6f514n9QsL8ck!39gvi32Ldv9W8zhZ_P1JLvkOFM-PelyP_RrU_rT5_EuELR24fLO5P3ShvZ4PDThJbg$ > From smooney at redhat.com Tue Jan 21 02:26:07 2020 From: smooney at redhat.com (Sean Mooney) Date: Tue, 21 Jan 2020 02:26:07 +0000 Subject: [all][neutron][neutron-fwaas] FINAL CALL Maintainers needed In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA0477177D@gmsxchsvr01.thecreation.com> References: <20191119102615.oq46xojyhoybulna@skaplons-mac> <046E9C0290DD9149B106B72FC9156BEA04771777@gmsxchsvr01.thecreation.com> <20200120212505.tdqv5br5vwfib63z@yuggoth.org> <046E9C0290DD9149B106B72FC9156BEA0477177D@gmsxchsvr01.thecreation.com> Message-ID: On Mon, 2020-01-20 at 15:43 -0600, Eric K. Miller wrote: > > Our community has mostly relied on commercial distribution vendors and > > service providers to fill that role in the past. If enough of their customers say > > it's a feature they're relying on, then employing developers who can help > > keep it maintained is their raison d'être. This is how pretty much all free/libre > > open source community software projects work. > > -- > > Jeremy Stanley > > Understood. I thought there may be a way to crowd-fund something in case smaller operators had small budgets, needed > some feature supported, but couldn't afford an entire employee with the small budget. > > Maybe there are developer consultants interested in a gig to maintain something for a while. Not sure where the best > place to go for matching operators with consultants for this type of thing. > > fwaas seems like such a necessity. We would love to offer it, but it is unusable with DVR. We just don't have the > budget to hire someone specifically for development/support of this. for a lot of usecase security groups is sufficent and people do not need to enforce firewalls between networks at neutron routers which is effectivly how fwaas worked. enfrocement via security groups on the ports attached to the instance was sufficent. similarly for operators that have invested in an sdn solution they can porvide an out of band policy enformce point. as a result in a normal openstack deployment fwaas became redundant. there still usecase for this fwaas but less then you would think. much of the thing you migh typicaly do in an east west direction or bettwen laywers in you applciation can be done using remote security groups instead of cidrs with security groups. the gab that security groups did not fill easily was ironic and sriov however i belive some fo the heriacical switch binding drives did support security impletend at the top of rack switch that could close that gap. as a result FWaaS has become less deployed and less maintianed over time. the other issue as you noted is compatabliy the fact that VPNaas FWaas and ovs dvr with ml2/ovs did not just work means its a hard sell. that is compounded by the fact that none of the main sdn solutions supported it either. any implematnion of FWaaS whould have provdied sevedor entroy point for neutron.agent.l2.firewall_drivers https://github.com/openstack/neutron-fwaas/blob/master/setup.cfg#L45-L47 and neutron.agent.l3.firewall_drivers https://github.com/openstack/neutron-fwaas/blob/master/setup.cfg#L51-L53 but as you can see odl, ovn, mideonet, onos, contrail, dragonflow and calico do not implement support https://github.com/openstack/networking-odl/blob/master/setup.cfg#L47-L66 https://github.com/openstack/networking-ovn/blob/master/setup.cfg#L51-L62 https://github.com/openstack/networking-midonet/blob/02e25cc65601add1d96b7150ed70403c3de4243b/setup.cfg#L58-L81 https://github.com/openstack/networking-onos/blob/master/setup.cfg#L40-L49 https://opendev.org/x/networking-opencontrail/src/branch/master/setup.cfg#L29-L32 https://github.com/openstack/dragonflow/blob/master/setup.cfg#L45-L121 https://github.com/openstack/networking-calico/blob/master/setup.cfg#L24-L30 networking-cisco provide an alternive fw api https://opendev.org/x/networking-cisco/src/branch/master/setup.cfg#L97-L100 and arista support a security group api at the top or rack switch https://opendev.org/x/networking-arista/src/branch/master/setup.cfg#L41 unless customers are asking vendors to provide FWaaS, in my experince it was never a telco prioity at least, vendor wont have a buisness case to justify investment. That does not help those that do use FWaaS today but its a sad fact that individuals and compaines need to choose wehre they spend there resouces carfully and FWaaS just never caught on enough to remain relevent. > > Eric > From feilong at catalyst.net.nz Tue Jan 21 02:34:26 2020 From: feilong at catalyst.net.nz (Feilong Wang) Date: Tue, 21 Jan 2020 15:34:26 +1300 Subject: [magnum][kolla] etcd wal sync duration issue In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04771749@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA0477170E@gmsxchsvr01.thecreation.com> <3f3fe0d1-7b61-d2f9-da65-d126ea5ed336@catalyst.net.nz> <046E9C0290DD9149B106B72FC9156BEA04771716@gmsxchsvr01.thecreation.com> <279cedf1-8bf4-fcf1-cfc2-990c97685531@catalyst.net.nz> <046E9C0290DD9149B106B72FC9156BEA04771749@gmsxchsvr01.thecreation.com> Message-ID: Hi Eric, Thanks for sharing the article. As for the etcd volumes, you can disable it by without setting the etcd_volume_size label. Just FYI. On 17/01/20 6:00 AM, Eric K. Miller wrote: > > Hi Feilong, > >   > > Before I was able to use the benchmark tool you mentioned, we saw some > other slowdowns with Ceph (all flash).  It appears that something must > have crashed somewhere since we had to restart a couple things, after > which etcd has been performing fine and no more health issues being > reported by Magnum. > >   > > So, it looks like it wasn't etcd related afterall. > >   > > However, while researching, I found that etcd's fsync on every write > (so it guarantees a write cache flush for each write) apparently > creates some havoc with some SSDs, where the SSD performs a full cache > flush of multiple caches.  This article explains it a LOT better:  > https://yourcmc.ru/wiki/Ceph_performance (scroll to the "Drive cache > is slowing you down" section) > >   > > It seems that the optimal configuration for etcd would be to use local > drives in each node and be sure that the write cache is disabled in > the SSDs - as opposed to using Ceph volumes, which already adds > network latency, but can create even more latency for synchronizations > due to Ceph's replication. > >   > > Eric > >   > >   > > *From:*feilong [mailto:feilong at catalyst.net.nz] > *Sent:* Wednesday, January 15, 2020 2:36 PM > *To:* Eric K. Miller; openstack-discuss at lists.openstack.org > *Cc:* Spyros Trigazis > *Subject:* Re: [magnum][kolla] etcd wal sync duration issue > >   > > Hi Eric, > > If you're using SSD, then I think the IO performance should  be OK. > You can use this > https://github.com/etcd-io/etcd/tree/master/tools/benchmark to verify > and confirm that 's the root cause. Meanwhile, you can review the > config of etcd cluster deployed by Magnum. I'm not an export of Etcd, > so TBH I can't see anything wrong with the config. Most of them are > just default configurations. > > As for the etcd image, it's built from > https://github.com/projectatomic/atomic-system-containers/tree/master/etcd > or you can refer CERN's repo > https://gitlab.cern.ch/cloud/atomic-system-containers/blob/cern-qa/etcd/ > > *Spyros*, any comments? > >   > > On 14/01/20 10:52 AM, Eric K. Miller wrote: > > Hi Feilong, > >   > > Thanks for responding!  I am, indeed, using the default v3.2.7 version for etcd, which is the only available image. > >   > > I did not try to reproduce with any other driver (we have never used DevStack, honestly, only Kolla-Ansible deployments).  I did see a number of people indicating similar issues with etcd versions in the 3.3.x range, so I didn't think of it being an etcd issue, but then again most issues seem to be a result of people using HDDs and not SSDs, which makes sense. > >   > > Interesting that you saw the same issue, though.  We haven't tried Fedora CoreOS, but I think we would need Train for this. > >   > > Everything I read about etcd indicates that it is extremely latency sensitive, due to the fact that it replicates all changes to all nodes and sends an fsync to Linux each time, so data is always guaranteed to be stored.  I can see this becoming an issue quickly without super-low-latency network and storage.  We are using Ceph-based SSD volumes for the Kubernetes Master node disks, which is extremely fast (likely 10x or better than anything people recommend for etcd), but network latency is always going to be higher with VMs on OpenStack with DVR than bare metal with VLANs due to all of the abstractions. > >   > > Do you know who maintains the etcd images for Magnum here?  Is there an easy way to create a newer image? > > https://hub.docker.com/r/openstackmagnum/etcd/tags/ > >   > > Eric > >   > >   > >   > > From: Feilong Wang [mailto:feilong at catalyst.net.nz] > > Sent: Monday, January 13, 2020 3:39 PM > > To: openstack-discuss at lists.openstack.org > > Subject: Re: [magnum][kolla] etcd wal sync duration issue > >   > > Hi Eric, > > That issue looks familiar for me. There are some questions I'd like to check before answering if you should upgrade to train. > > 1. Are using the default v3.2.7 version for etcd? > > 2. Did you try to reproduce this with devstack, using Fedora CoreOS driver? The etcd version could be 3.2.26 > > I asked above questions because I saw the same error when I used Fedora Atomic with etcd v3.2.7 and I can't reproduce it with Fedora CoreOS + etcd 3.2.26 > >   > >   > > -- > Cheers & Best regards, > Feilong Wang (王飞龙) > ------------------------------------------------------ > Senior Cloud Software Engineer > Tel: +64-48032246 > Email: flwang at catalyst.net.nz > Catalyst IT Limited > Level 6, Catalyst House, 150 Willis Street, Wellington > ------------------------------------------------------ -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangbailin at inspur.com Tue Jan 21 02:35:23 2020 From: zhangbailin at inspur.com (=?gb2312?B?QnJpbiBaaGFuZyjVxbDZwdYp?=) Date: Tue, 21 Jan 2020 02:35:23 +0000 Subject: =?gb2312?B?tPC4tDogW2xpc3RzLm9wZW5zdGFjay5vcme0+reiXVtub3ZhXSBJIHdvdWxk?= =?gb2312?B?IGxpa2UgdG8gYWRkIGFub3RoZXIgb3B0aW9uIGZvciBjcm9zc19hel9hdHRh?= =?gb2312?Q?ch?= In-Reply-To: References: Message-ID: Hi, Kim KS: "cross_az_attach"'s default value is True, that means a llow attach between instance and volume in different availability zones. If False, volumes attached to an instance must be in the same availability zone in Cinder as the instance availability zone in Nova. Another thing is, you should care booting an BFV instance from "image", and this should interact the " allow_availability_zone_fallback" in Cinder, if " allow_availability_zone_fallback=False" and *that* request AZ does not in Cinder, the request will be fail. About specify AZ to unshelve a shelved_offloaded server, about the cross_az_attach something you can know https://github.com/openstack/nova/blob/master/releasenotes/notes/bp-specifying-az-to-unshelve-server-aa355fef1eab2c02.yaml Availability Zones docs, that contains some description with cinder.cross_az_attach https://docs.openstack.org/nova/latest/admin/availability-zones.html#implications-for-moving-servers cross_az_attach configuration: https://docs.openstack.org/nova/train/configuration/config.html#cinder.cross_az_attach And cross_az_attach with the server is in https://github.com/openstack/nova/blob/master/nova/volume/cinder.py#L523-L545 I am not sure why you are need " enable_az_attach_list = AZ1,AZ2" configuration? brinzhang > cross_az_attach > > Hello all, > > In nova with setting [cinder]/ cross_az_attach option to false, nova creates > instance and volume in same AZ. > > but some of usecase (in my case), we need to attach new volume in different > AZ to the instance. > > so I need two options. > > one is for nova block device mapping and attaching volume and another is for > attaching volume in specified AZ. > > [cinder] > cross_az_attach = False > enable_az_attach_list = AZ1,AZ2 > > how do you all think of it? > > Best, > Kiseok > From ignaziocassano at gmail.com Tue Jan 21 05:59:06 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Tue, 21 Jan 2020 06:59:06 +0100 Subject: [queens][nova] iscsi issue In-Reply-To: <20200120154107.czjws3n3p5rl64nu@lyarwood.usersys.redhat.com> References: <20200120154107.czjws3n3p5rl64nu@lyarwood.usersys.redhat.com> Message-ID: Hello, I increased cinder and nova rpc resoonse time out and now it works better. I am going to test it again and if the error come back, I 'll send the log file as you suggested. Thanks Ignazio Il Lun 20 Gen 2020, 16:41 Lee Yarwood ha scritto: > On 17-01-20 19:30:13, Ignazio Cassano wrote: > > Hello all we are testing openstack queens cinder driver for Unity iscsi > > (driver cinder.volume.drivers.dell_emc.unity.Driver). > > > > The unity storage is a Unity600 Version 4.5.10.5.001 > > > > We are facing an issue when we try to detach volume from a virtual > machine > > with two or more volumes attached (this happens often but not always): > > Could you write this up as an os-brick bug and attach the nova-compute > log in DEBUG showing the initial volume attachments prior to this detach > error? > > https://bugs.launchpad.net/os-brick/+filebug > > > The following is reported nova-compute.log: > > > > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server > > self.connector.disconnect_volume(connection_info['data'], None) > > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File > > "/usr/lib/python2.7/site-packages/os_brick/utils.py", line 137, in > > trace_logging_wrapper > > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server return > > f(*args, **kwargs) > > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server File > > "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line > 274, > > in inner > > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server return > > f(*args, **kwargs) > > 2020-01-17 16:05:11.132 6643 ERROR oslo_messaging.rpc.server