From ignaziocassano at gmail.com Sun Jul 1 10:25:24 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Sun, 1 Jul 2018 12:25:24 +0200 Subject: [Openstack-operators] diskimage-builder error Message-ID: Hi All, I just installed disk-image builder on my centos 7. For creating centos7 image I am using the same command used 3 o 4 months ago, but wiith the last diskimage-builder installed with pip I got the following error: + /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:20 : '[' -r /proc/meminfo ']' ++ /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:21 : awk '/^MemTotal/ { print $2 }' /proc/meminfo + /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:21 : total_kB=8157200 + /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:24 : RAM_NEEDED=4 + /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:25 : '[' 8157200 -lt 4194304 ']' + /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:25 : return 0 + /usr/share/diskimage-builder/lib/common-functions:cleanup_image_dir:211 : timeout 120 sh -c 'while ! sudo umount -f /tmp/dib_image.azW5Wi4F; do sleep 1; done' + /usr/share/diskimage-builder/lib/common-functions:cleanup_image_dir:216 : rm -rf --one-file-system /tmp/dib_image.azW5Wi4F + /usr/share/diskimage-builder/lib/img-functions:trap_cleanup:46 : exit 1 Anyone knows if is it a bug ? Please, help me. Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrhillsman at gmail.com Mon Jul 2 01:30:57 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Sun, 1 Jul 2018 20:30:57 -0500 Subject: [Openstack-operators] Fwd: Reminder: User Committee Meeting - Monday July 2nd @1400UTC In-Reply-To: <5b366ec68bafdc64b1000004@polymail.io> References: <5b366ec68bafdc64b1000004@polymail.io> Message-ID: Hi everyone, Please be sure to join us - if not getting ready for firecrackers - on Monday July 2nd @1400UTC in #openstack-uc for weekly User Committee meeting. Also you can freely add to the meeting agenda here - Governance/Foundation/UserCommittee - OpenStack WIKI.OPENSTACK.ORG -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony at bakeyournoodle.com Mon Jul 2 01:34:20 2018 From: tony at bakeyournoodle.com (Tony Breeds) Date: Mon, 2 Jul 2018 11:34:20 +1000 Subject: [Openstack-operators] diskimage-builder error In-Reply-To: References: Message-ID: <20180702013419.GE21570@thor.bakeyournoodle.com> On Sun, Jul 01, 2018 at 12:25:24PM +0200, Ignazio Cassano wrote: > Hi All, > I just installed disk-image builder on my centos 7. > For creating centos7 image I am using the same command used 3 o 4 months > ago, but wiith the last diskimage-builder installed with pip I got the > following error: > > + > /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:20 > : '[' -r /proc/meminfo ']' > ++ > /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:21 > : awk '/^MemTotal/ { print $2 }' /proc/meminfo > + > /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:21 > : total_kB=8157200 > + > /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:24 > : RAM_NEEDED=4 > + > /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:25 > : '[' 8157200 -lt 4194304 ']' > + > /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:25 > : return 0 > + > /usr/share/diskimage-builder/lib/common-functions:cleanup_image_dir:211 > : timeout 120 sh -c 'while ! sudo umount -f /tmp/dib_image.azW5Wi4F; do > sleep 1; done' > + > /usr/share/diskimage-builder/lib/common-functions:cleanup_image_dir:216 > : rm -rf --one-file-system /tmp/dib_image.azW5Wi4F > + > /usr/share/diskimage-builder/lib/img-functions:trap_cleanup:46 > : exit 1 > > > > Anyone knows if is it a bug ? Not one I know of. It looks like the image based install was close to completion, the tmpfs was unmounted but for some reason the removal of the (empty) directory failed. The only reason I can think of is if a FS was still mounted. Are you able to send the full logs? It's possible something failed earlier and we're just seeing a secondary failure. If you can supply the logs, can you supply your command line and link to the base image you're using? Yours Tony. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From ignaziocassano at gmail.com Mon Jul 2 05:55:13 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 2 Jul 2018 07:55:13 +0200 Subject: [Openstack-operators] diskimage-builder error In-Reply-To: <20180702013419.GE21570@thor.bakeyournoodle.com> References: <20180702013419.GE21570@thor.bakeyournoodle.com> Message-ID: Hi Tony, applying the patch reported here (https://review.openstack.org/#/c/561740/) the issue is solved.. The above path was related to another issue (distutils) but is solves also the cleanup error. Anycase I could unpatch the diskimage code and send you the log. What do you think ? Regards Ignazio 2018-07-02 3:34 GMT+02:00 Tony Breeds : > On Sun, Jul 01, 2018 at 12:25:24PM +0200, Ignazio Cassano wrote: > > Hi All, > > I just installed disk-image builder on my centos 7. > > For creating centos7 image I am using the same command used 3 o 4 months > > ago, but wiith the last diskimage-builder installed with pip I got the > > following error: > > > > + > > /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:20 > > : '[' -r /proc/meminfo ']' > > ++ > > /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:21 > > : awk '/^MemTotal/ { print $2 }' /proc/meminfo > > + > > /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:21 > > : total_kB=8157200 > > + > > /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:24 > > : RAM_NEEDED=4 > > + > > /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:25 > > : '[' 8157200 -lt 4194304 ']' > > + > > /usr/share/diskimage-builder/lib/common-functions:tmpfs_check:25 > > : return 0 > > + > > /usr/share/diskimage-builder/lib/common-functions:cleanup_image_dir:211 > > : timeout 120 sh -c 'while ! sudo umount -f /tmp/dib_image.azW5Wi4F; do > > sleep 1; done' > > + > > /usr/share/diskimage-builder/lib/common-functions:cleanup_image_dir:216 > > : rm -rf --one-file-system /tmp/dib_image.azW5Wi4F > > + > > /usr/share/diskimage-builder/lib/img-functions:trap_cleanup:46 > > : exit 1 > > > > > > > > Anyone knows if is it a bug ? > > Not one I know of. It looks like the image based install was close to > completion, the tmpfs was unmounted but for some reason the removal of > the (empty) directory failed. The only reason I can think of is if a > FS was still mounted. > > Are you able to send the full logs? It's possible something failed > earlier and we're just seeing a secondary failure. > > If you can supply the logs, can you supply your command line and link to > the base image you're using? > > Yours Tony. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony at bakeyournoodle.com Mon Jul 2 06:11:45 2018 From: tony at bakeyournoodle.com (Tony Breeds) Date: Mon, 2 Jul 2018 16:11:45 +1000 Subject: [Openstack-operators] diskimage-builder error In-Reply-To: References: <20180702013419.GE21570@thor.bakeyournoodle.com> Message-ID: <20180702061144.GF21570@thor.bakeyournoodle.com> On Mon, Jul 02, 2018 at 07:55:13AM +0200, Ignazio Cassano wrote: > Hi Tony, > applying the patch reported here (https://review.openstack.org/#/c/561740/) > the issue is solved.. > The above path was related to another issue (distutils) but is solves also > the cleanup error. > Anycase I could unpatch the diskimage code and send you the log. > What do you think ? That would be helpful. Or if it's easier you can let us know how you're running DIB Yours Tony. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From ignaziocassano at gmail.com Mon Jul 2 06:13:39 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 2 Jul 2018 08:13:39 +0200 Subject: [Openstack-operators] diskimage-builder error In-Reply-To: <20180702061144.GF21570@thor.bakeyournoodle.com> References: <20180702013419.GE21570@thor.bakeyournoodle.com> <20180702061144.GF21570@thor.bakeyournoodle.com> Message-ID: Tony, do you mean the script I am using to create the image ? 2018-07-02 8:11 GMT+02:00 Tony Breeds : > On Mon, Jul 02, 2018 at 07:55:13AM +0200, Ignazio Cassano wrote: > > Hi Tony, > > applying the patch reported here (https://review.openstack.org/ > #/c/561740/) > > the issue is solved.. > > The above path was related to another issue (distutils) but is solves > also > > the cleanup error. > > Anycase I could unpatch the diskimage code and send you the log. > > What do you think ? > > That would be helpful. Or if it's easier you can let us know how you're > running DIB > > > Yours Tony. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrhillsman at gmail.com Mon Jul 2 10:59:00 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Mon, 02 Jul 2018 03:59:00 -0700 Subject: [Openstack-operators] Reminder: User Committee Meeting - Monday July 2nd @1400UTC In-Reply-To: <5b366ec68bafdc64b1000004@polymail.io> References: <5b366ec68bafdc64b1000004@polymail.io> Message-ID: <5b3698532eef73480d000001@polymail.io> In case you did not get the reminder on Friday afternoon ;) --  Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 On Fri, Jun 29th, 2018 at 12:59 PM, Melvin Hillsman wrote: > > Hi everyone, > > > Please be sure to join us - if not getting ready for firecrackers - on > Monday July 2nd @1400UTC in #openstack-uc for weekly User Committee > meeting. > > > Also you can freely add to the meeting agenda here -  > ( > https://wiki.openstack.org/wiki/Governance/Foundation/UserCommittee#Meeting_Agenda.2FPrevious_Meeting_Logs > ) > > > > Governance/Foundation/UserCommittee - OpenStack ( > https://wiki.openstack.org/wiki/Governance/Foundation/UserCommittee#Meeting_Agenda.2FPrevious_Meeting_Logs > ) ( > https://wiki.openstack.org/wiki/Governance/Foundation/UserCommittee#Meeting_Agenda.2FPrevious_Meeting_Logs > ) WIKI.OPENSTACK.ORG ( > https://wiki.openstack.org/wiki/Governance/Foundation/UserCommittee#Meeting_Agenda.2FPrevious_Meeting_Logs > ) > > > > > > > > --  > Kind regards, > > Melvin Hillsman > mrhillsman at gmail.com > mobile: (832) 264-2646 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Mon Jul 2 13:58:57 2018 From: mihalis68 at gmail.com (Chris Morgan) Date: Mon, 2 Jul 2018 09:58:57 -0400 Subject: [Openstack-operators] PTG survey reminder for ops Message-ID: Hello Everyone, We have 23 responses so far on the PTG survey for openstack operators to let the ops meetups team and the openstack foundation folk know preferences for the upcoming PTG in Denver. Perhaps some of you that intended to respond were, like me, sweltering in a heatwave and didn't touch your computers over the weekend. If so here is a final reminder to please share your preferences to help make this event as good as it can be. Survey link : https://www.surveymonkey.com/r/ZSLF9GB We need to close this today to allow detailed planning of the ops part of the event to proceed. Thanks to all those that already responded. I will share the results once it closes later today. Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony at bakeyournoodle.com Tue Jul 3 00:49:51 2018 From: tony at bakeyournoodle.com (Tony Breeds) Date: Tue, 3 Jul 2018 10:49:51 +1000 Subject: [Openstack-operators] diskimage-builder error In-Reply-To: References: <20180702013419.GE21570@thor.bakeyournoodle.com> <20180702061144.GF21570@thor.bakeyournoodle.com> Message-ID: <20180703004950.GA3734@thor.bakeyournoodle.com> On Mon, Jul 02, 2018 at 08:13:39AM +0200, Ignazio Cassano wrote: > Tony, do you mean the script I am using to create the image ? Yup, it'd be good to try and reproduce this outside your environment as that'll make fixing the underlying bug quicker. Yours Tony. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From ignaziocassano at gmail.com Tue Jul 3 04:12:21 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Tue, 3 Jul 2018 06:12:21 +0200 Subject: [Openstack-operators] diskimage-builder error In-Reply-To: <20180703004950.GA3734@thor.bakeyournoodle.com> References: <20180702013419.GE21570@thor.bakeyournoodle.com> <20180702061144.GF21570@thor.bakeyournoodle.com> <20180703004950.GA3734@thor.bakeyournoodle.com> Message-ID: Hi Tony, I sent log file and script yesterday. I hope you received them. Ignazio Il Mar 3 Lug 2018 02:49 Tony Breeds ha scritto: > On Mon, Jul 02, 2018 at 08:13:39AM +0200, Ignazio Cassano wrote: > > Tony, do you mean the script I am using to create the image ? > > Yup, it'd be good to try and reproduce this outside your environment as > that'll make fixing the underlying bug quicker. > > Yours Tony. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony at bakeyournoodle.com Tue Jul 3 04:36:21 2018 From: tony at bakeyournoodle.com (Tony Breeds) Date: Tue, 3 Jul 2018 14:36:21 +1000 Subject: [Openstack-operators] diskimage-builder error In-Reply-To: References: <20180702013419.GE21570@thor.bakeyournoodle.com> <20180702061144.GF21570@thor.bakeyournoodle.com> <20180703004950.GA3734@thor.bakeyournoodle.com> Message-ID: <20180703043620.GB3734@thor.bakeyournoodle.com> On Tue, Jul 03, 2018 at 06:12:21AM +0200, Ignazio Cassano wrote: > Hi Tony, I sent log file and script yesterday. I hope you received them. Sorry I can't find them in any of my inboxes :( Yours Tony. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From ignaziocassano at gmail.com Tue Jul 3 06:08:09 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Tue, 3 Jul 2018 08:08:09 +0200 Subject: [Openstack-operators] diskimage-builder error In-Reply-To: <20180703043620.GB3734@thor.bakeyournoodle.com> References: <20180702013419.GE21570@thor.bakeyournoodle.com> <20180702061144.GF21570@thor.bakeyournoodle.com> <20180703004950.GA3734@thor.bakeyournoodle.com> <20180703043620.GB3734@thor.bakeyournoodle.com> Message-ID: Hi Tony, the message I sent is waiting for moderator approval because it is big. :-( 2018-07-03 6:36 GMT+02:00 Tony Breeds : > On Tue, Jul 03, 2018 at 06:12:21AM +0200, Ignazio Cassano wrote: > > Hi Tony, I sent log file and script yesterday. I hope you received them. > > Sorry I can't find them in any of my inboxes :( > > Yours Tony. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergio.traldi at pd.infn.it Tue Jul 3 10:03:19 2018 From: sergio.traldi at pd.infn.it (Sergio Traldi) Date: Tue, 3 Jul 2018 12:03:19 +0200 Subject: [Openstack-operators] Ocata heat AWS::CloudFormation::WaitCondition doesn't work Message-ID: <21ec2111-aaf1-5bc3-5d3a-c34af1020e02@pd.infn.it> Hi, I have a previous IaaS with Openstack Mitaka version and my heat template with the AWS wait conditions perfectly working. Now the same template launch first instance and never launch the second one. The part of the template useful is: ----------------------------------------- .......  node1_server_instance:     type: OS::Nova::Server     properties:       name: "node1"       key_name: { get_param: key_name_user }       image: { get_param: image_centos_7 }       flavor: "m1.small"       networks:         - port: { get_resource: pnode1_server_port }       user_data_format: RAW       user_data:         str_replace:           template: |            #!/bin/bash            curl -k -X PUT -H 'Content-Type:application/json' \                    -d '{"Status" : "SUCCESS","Reason" : "Configuration OK","UniqueId" : "NODE1","Data" : "Node1 started Configured."}' \                    "$wait_handle$"           params:             $wait_handle$: { get_resource: node1_instance_wait_handle }   node1_instance_wait:     type: "AWS::CloudFormation::WaitCondition"     depends_on: node1_server_instance     properties:       Handle:         get_resource: node1_instance_wait_handle       Timeout: 3600   node1_instance_wait_handle:     type: "AWS::CloudFormation::WaitConditionHandle"    node2_server_instance:     type: OS::Nova::Server     depends_on: node1_instance_wait     properties:       name: "node2" ...... -------------------------------------------------------------------- I try to enter in node1 with ssh and I try to use the curl command with the $wait_handle$ variable but I obtain a "User is not authorized to perform action": curl -k -X PUT -H 'Content-Type:application/json' -d '{"Status" : "SUCCESS","Reason" : "Configuration OK","UniqueId" : "NODO2","Data" : "Nodo2 started Configured."}' -i "https://cloud-test.pd.infn.it:8000/v1/waitcondition/arn%3Aopenstack%3Aheat%3A%3A3beba6dd3f2648378263bc04d9c205fa%3Astacks%2Fvevever%2F66030fe2-56be-4e03-ad07-ce078a5a6f02%2Fresources%2Fnodo1_instance_wait_handle?Timestamp=2018-06-22T13%3A01%3A33Z&SignatureMethod=HmacSHA256&AWSAccessKeyId=38edd7e8c98e4e36b85331d4bca5601b&SignatureVersion=2&Signature=%2BT7%2FQVsHcvEpv63qfIe6wsGgG0enH54vEb%2FoWx5odfM%3D" HTTP/1.1 403 AccessDenied Content-Type: application/xml; charset=UTF-8 Content-Length: 149 Date: Fri, 22 Jun 2018 13:04:26 GMT Connection: close User is not authorized to perform actionAccessDeniedSender It seems the same error described here in kilo version: https://bugs.launchpad.net/openstack-ansible/+bug/1515485 I have this Openstack version of keystone and heat in O.S. CentOS7 : [~]# rpm -qa | grep -e keystone -e heat | sort openstack-heat-api-8.0.6-1.el7.noarch openstack-heat-api-cfn-8.0.6-1.el7.noarch openstack-heat-common-8.0.6-1.el7.noarch openstack-heat-engine-8.0.6-1.el7.noarch openstack-keystone-11.0.3-1.el7.noarch python2-heatclient-1.8.2-1.el7.noarch python2-keystoneauth1-2.18.0-1.el7.noarch python2-keystoneclient-3.10.0-1.el7.noarch python2-keystonemiddleware-4.14.0-1.el7.noarch python-keystone-11.0.3-1.el7.noarch I try to add some conf in heat clients but no good try. Anyone can suggest me something? Cheers Sergio From mihalis68 at gmail.com Tue Jul 3 11:20:42 2018 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 3 Jul 2018 07:20:42 -0400 Subject: [Openstack-operators] Ops Survey Results! Message-ID: Question 1. "Are you considering attending the OpenStack Project Technical Gathering (PTG) in Denver in September?" 83.33% yes 16.67% no (24 respondents) Question 2 "If you are considering attending, which is of more interest to you?" 17.39% A 2-day event focused on OpenStack Operators working sessions, just like prior Ops Meetups 82.61% A 5-day event including some specific operators sessions as well community-wide sessions (SIGs for example) (23 respondents) Question 3 "If you are interested in attending a longer event, which would you prefer?" 26.32% Cross-project sessions first, and then operator focused sessions later in the week 73.68% Operator specific content first, then time to join specific project teams (19 respondents) Thanks to all that responded. This will be a big help to those planning the PTG to know how to mix in the OpenStack Ops sessions into the overall agenda. Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at doughellmann.com Tue Jul 3 12:58:13 2018 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 03 Jul 2018 08:58:13 -0400 Subject: [Openstack-operators] Ops Survey Results! In-Reply-To: References: Message-ID: <1530622650-sup-4716@lrrr.local> Excerpts from Chris Morgan's message of 2018-07-03 07:20:42 -0400: > Question 1. "Are you considering attending the OpenStack Project Technical > Gathering (PTG) in Denver in September?" > > 83.33% yes > 16.67% no > > (24 respondents) How does the response rate to the survey compare to attendance at recent Ops Meetups? Doug From james.page at canonical.com Tue Jul 3 14:35:58 2018 From: james.page at canonical.com (James Page) Date: Tue, 3 Jul 2018 15:35:58 +0100 Subject: [Openstack-operators] [sig][upgrade] Todays IRC meeting Message-ID: Hi All Unfortunately I can't make todays IRC meeting at 1600 UTC. Should be back for next week, but I think we need todo some rescheduling to fit better with other ops and dev meetings. Cheers James -------------- next part -------------- An HTML attachment was scrubbed... URL: From emccormick at cirrusseven.com Tue Jul 3 14:42:30 2018 From: emccormick at cirrusseven.com (Erik McCormick) Date: Tue, 3 Jul 2018 10:42:30 -0400 Subject: [Openstack-operators] Ops Survey Results! In-Reply-To: <1530622650-sup-4716@lrrr.local> References: <1530622650-sup-4716@lrrr.local> Message-ID: On Tue, Jul 3, 2018, 8:59 AM Doug Hellmann wrote: > Excerpts from Chris Morgan's message of 2018-07-03 07:20:42 -0400: > > Question 1. "Are you considering attending the OpenStack Project > Technical > > Gathering (PTG) in Denver in September?" > > > > 83.33% yes > > 16.67% no > > > > (24 respondents) > > How does the response rate to the survey compare to attendance at > recent Ops Meetups? > > Doug > We've been around 100ish so pretty light -Erik > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Tue Jul 3 14:45:46 2018 From: amy at demarco.com (Amy Marrich) Date: Tue, 3 Jul 2018 09:45:46 -0500 Subject: [Openstack-operators] Ops Survey Results! In-Reply-To: References: <1530622650-sup-4716@lrrr.local> Message-ID: But a quarter responses for a survey actually isn't bad statistically. Amy (spotz) On Tue, Jul 3, 2018 at 9:42 AM, Erik McCormick wrote: > > > On Tue, Jul 3, 2018, 8:59 AM Doug Hellmann wrote: > >> Excerpts from Chris Morgan's message of 2018-07-03 07:20:42 -0400: >> > Question 1. "Are you considering attending the OpenStack Project >> Technical >> > Gathering (PTG) in Denver in September?" >> > >> > 83.33% yes >> > 16.67% no >> > >> > (24 respondents) >> >> How does the response rate to the survey compare to attendance at >> recent Ops Meetups? >> >> Doug >> > > We've been around 100ish so pretty light > > -Erik > >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Jul 3 14:47:33 2018 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 3 Jul 2018 14:47:33 +0000 Subject: [Openstack-operators] Ops Survey Results! In-Reply-To: References: <1530622650-sup-4716@lrrr.local> Message-ID: <20180703144732.r56ryvko2kkprwmp@yuggoth.org> On 2018-07-03 10:42:30 -0400 (-0400), Erik McCormick wrote: > On Tue, Jul 3, 2018, 8:59 AM Doug Hellmann wrote: > > > Excerpts from Chris Morgan's message of 2018-07-03 07:20:42 -0400: > > > Question 1. "Are you considering attending the OpenStack Project > > Technical > > > Gathering (PTG) in Denver in September?" > > > > > > 83.33% yes > > > 16.67% no > > > > > > (24 respondents) > > > > How does the response rate to the survey compare to attendance at > > recent Ops Meetups? > > > > Doug > > > > We've been around 100ish so pretty light How does it compare to respondent count from previous surveys about organizing operations-focused gatherings? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From openstack at medberry.net Tue Jul 3 15:37:15 2018 From: openstack at medberry.net (David Medberry) Date: Tue, 3 Jul 2018 09:37:15 -0600 Subject: [Openstack-operators] Fwd: Feedback on Ops Meetup Planning meeting today after the fact In-Reply-To: References: Message-ID: I missed the ops meetup but have the scrollback of what was discussed. I'd definitely like to see upgrades (FFUpgrades, etc) and LTS should be on the Ops Agenda and socialized so that the distros and other concerned parties come join us. And I'd put them at the start of day two. [Looks like the actual plan may be to have us join the SIGs that have formed around those ideas so that would be fine too as long as we're not double booked.] I will be socializing an OpenStack Meetup the first night (as I think PTG generally do a bigger event on Tuesday night so those arriving for Wed can attend.) But I'm open to any social events. I'm not sure what venue I will be able to arrange for that (the normal one is up in Superior near Boulder.) Stay tuned. Also, reiterate what others said: Not a big venue. The only "commute" between sessions that's more than 90 seconds is 3rd to 1st or visa versa. And it's difficult to defeat the elevator system for that. So figure 3-4 minutes for that.... -dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimmy at openstack.org Tue Jul 3 15:44:38 2018 From: jimmy at openstack.org (Jimmy McArthur) Date: Tue, 03 Jul 2018 10:44:38 -0500 Subject: [Openstack-operators] [Openstack] Recovering from full outage In-Reply-To: References: Message-ID: <5B3B99E6.4070306@openstack.org> I'm adding this to the OpenStack Operators list as it's a bit better for these types of questions. Torin Woltjer wrote: > We just suffered a power outage in out data center and I'm having > trouble recovering the Openstack cluster. All of the nodes are back > online, every instance shows active but `virsh list --all` on the > compute nodes show that all of the VMs are actually shut down. Running > `ip addr` on any of the nodes shows that none of the bridges are > present and `ip netns` shows that all of the network namespaces are > missing as well. So despite all of the neutron service running, none > of the networking appears to be active, which is concerning. How do I > solve this without recreating all of the networks? > > /*Torin Woltjer*/ > *Grand Dial Communications - A ZK Tech Inc. Company* > *616.776.1066 ext. 2006* > /*www.granddial.com */ > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : openstack at lists.openstack.org > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at doughellmann.com Tue Jul 3 16:49:26 2018 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 03 Jul 2018 12:49:26 -0400 Subject: [Openstack-operators] Fwd: Feedback on Ops Meetup Planning meeting today after the fact In-Reply-To: References: Message-ID: <1530636431-sup-278@lrrr.local> Excerpts from David Medberry's message of 2018-07-03 09:37:15 -0600: > I missed the ops meetup but have the scrollback of what was discussed. > > I'd definitely like to see upgrades (FFUpgrades, etc) and LTS should be on > the Ops Agenda and socialized so that the distros and other concerned > parties come join us. And I'd put them at the start of day two. [Looks like > the actual plan may be to have us join the SIGs that have formed around > those ideas so that would be fine too as long as we're not double booked.] I hope folks will join the SIG meetings. One benefit of combining the PTG and Ops meetings is to be able to discuss these sorts of topics *together*. > I will be socializing an OpenStack Meetup the first night (as I think PTG > generally do a bigger event on Tuesday night so those arriving for Wed can > attend.) But I'm open to any social events. I'm not sure what venue I will > be able to arrange for that (the normal one is up in Superior near > Boulder.) Stay tuned. > > Also, reiterate what others said: Not a big venue. The only "commute" > between sessions that's more than 90 seconds is 3rd to 1st or visa versa. > And it's difficult to defeat the elevator system for that. So figure 3-4 > minutes for that.... > > -dave From mihalis68 at gmail.com Tue Jul 3 19:51:09 2018 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 3 Jul 2018 15:51:09 -0400 Subject: [Openstack-operators] Ops Survey Results! In-Reply-To: <20180703144732.r56ryvko2kkprwmp@yuggoth.org> References: <1530622650-sup-4716@lrrr.local> <20180703144732.r56ryvko2kkprwmp@yuggoth.org> Message-ID: I've spent some time trying to find the old polls, but I don't think we collected the numbers, we just announced the decision. Certainly for the ops meetup in mexico last august I found a link for a doodle poll about which venue, but the link is dead and the meeting minutes don't list the numbers. I agree with Erik that we normally think of these events (ops meetups not at main Summits) as being for about 100 people. Japan earlier this year was almost exactly that, mexico last august about half that, Philadelphia way over. If we take that as a rule then we had about 25% participation, which is not bad. We don't know how many people our potential audience is so in the end this is always going to be hand-wavy Chris On Tue, Jul 3, 2018 at 10:51 AM Jeremy Stanley wrote: > On 2018-07-03 10:42:30 -0400 (-0400), Erik McCormick wrote: > > On Tue, Jul 3, 2018, 8:59 AM Doug Hellmann > wrote: > > > > > Excerpts from Chris Morgan's message of 2018-07-03 07:20:42 -0400: > > > > Question 1. "Are you considering attending the OpenStack Project > > > Technical > > > > Gathering (PTG) in Denver in September?" > > > > > > > > 83.33% yes > > > > 16.67% no > > > > > > > > (24 respondents) > > > > > > How does the response rate to the survey compare to attendance at > > > recent Ops Meetups? > > > > > > Doug > > > > > > > We've been around 100ish so pretty light > > How does it compare to respondent count from previous surveys about > organizing operations-focused gatherings? > -- > Jeremy Stanley > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From torin.woltjer at granddial.com Tue Jul 3 21:34:26 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Tue, 03 Jul 2018 21:34:26 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators at lists.openstack.org" , "openstack at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack at lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at doughellmann.com Tue Jul 3 21:41:45 2018 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 03 Jul 2018 17:41:45 -0400 Subject: [Openstack-operators] Ops Survey Results! In-Reply-To: References: <1530622650-sup-4716@lrrr.local> <20180703144732.r56ryvko2kkprwmp@yuggoth.org> Message-ID: <1530654026-sup-7379@lrrr.local> Excerpts from Chris Morgan's message of 2018-07-03 15:51:09 -0400: > I've spent some time trying to find the old polls, but I don't think we > collected the numbers, we just announced the decision. Certainly for the > ops meetup in mexico last august I found a link for a doodle poll about > which venue, but the link is dead and the meeting minutes don't list the > numbers. > > I agree with Erik that we normally think of these events (ops meetups not > at main Summits) as being for about 100 people. Japan earlier this year was > almost exactly that, mexico last august about half that, Philadelphia way > over. If we take that as a rule then we had about 25% participation, which > is not bad. We don't know how many people our potential audience is so in > the end this is always going to be hand-wavy That gives me an idea of how much weight to attach to the results. At first they seemed like they may not mean much because the numbers were low, but if there are only about 100 people attending the event anyway then maybe the results are more significant than they initially appeared. Doug From lmihaiescu at gmail.com Tue Jul 3 23:47:13 2018 From: lmihaiescu at gmail.com (George Mihaiescu) Date: Tue, 3 Jul 2018 19:47:13 -0400 Subject: [Openstack-operators] [Openstack] Recovering from full outage In-Reply-To: References: Message-ID: <06CB62EE-0078-4C35-AF6C-7E7099DBC474@gmail.com> Did you set a lock_path in the neutron’s config? > On Jul 3, 2018, at 17:34, Torin Woltjer wrote: > > The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ > > No such errors are on the compute nodes themselves. > > Torin Woltjer > > Grand Dial Communications - A ZK Tech Inc. Company > > 616.776.1066 ext. 2006 > www.granddial.com > > From: "Torin Woltjer" > Sent: 7/3/18 5:14 PM > To: > Cc: "openstack-operators at lists.openstack.org" , "openstack at lists.openstack.org" > Subject: Re: [Openstack] Recovering from full outage > Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: > http://paste.openstack.org/show/724917/ > > I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. > http://paste.openstack.org/show/724921/ > And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. > > Torin Woltjer > > Grand Dial Communications - A ZK Tech Inc. Company > > 616.776.1066 ext. 2006 > www.granddial.com > > From: George Mihaiescu > Sent: 7/3/18 11:50 AM > To: torin.woltjer at granddial.com > Subject: Re: [Openstack] Recovering from full outage > Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. > >> On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: >> We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? >> >> Torin Woltjer >> >> Grand Dial Communications - A ZK Tech Inc. Company >> >> 616.776.1066 ext. 2006 >> www.granddial.com >> >> _______________________________________________ >> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> Post to : openstack at lists.openstack.org >> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From laszlo.budai at gmail.com Wed Jul 4 09:09:14 2018 From: laszlo.budai at gmail.com (Budai Laszlo) Date: Wed, 4 Jul 2018 12:09:14 +0300 Subject: [Openstack-operators] [openstack-ansible] group_binds exceptions Message-ID: <7bb29350-657c-a19a-f476-a5c8940bfff2@gmail.com> Dear all, is there a way to define exceptions for the group_binds for a network definition? for instance we have something like this: - network: container_bridge: "vlan4" container_type: "veth" container_interface: "eth1" ip_from_q: "container" type: "raw" group_binds: *- all_containers* - hosts is_container_address: true is_ssh_address: true So, instead of *all_containers* I would be interested in something like "all_containers except those running on the ceph nodes". Any ideas are welcome. Thank you, Laszlo From tobias.rydberg at citynetwork.eu Thu Jul 5 07:58:26 2018 From: tobias.rydberg at citynetwork.eu (Tobias Rydberg) Date: Thu, 5 Jul 2018 09:58:26 +0200 Subject: [Openstack-operators] [publiccloud-wg] Meeting this afternoon for Public Cloud WG Message-ID: Hi folks, Time for a new meeting for the Public Cloud WG. Agenda draft can be found at https://etherpad.openstack.org/p/publiccloud-wg, feel free to add items to that list. See you all at IRC 1400 UTC in #openstack-publiccloud Cheers, Tobias -- Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED From jean-daniel.bonnetot at corp.ovh.com Thu Jul 5 08:02:34 2018 From: jean-daniel.bonnetot at corp.ovh.com (Jean-Daniel Bonnetot) Date: Thu, 5 Jul 2018 08:02:34 +0000 Subject: [Openstack-operators] [openstack-dev] [publiccloud-wg] Meeting this afternoon for Public Cloud WG In-Reply-To: References: Message-ID: <6AE18849-2F98-419B-833E-678C744A8CDE@corp.ovh.com> Sorry guys, I'm not available once again. See you next time. Jean-Daniel Bonnetot ovh.com | @pilgrimstack On 05/07/2018 09:59, "Tobias Rydberg" wrote: Hi folks, Time for a new meeting for the Public Cloud WG. Agenda draft can be found at https://etherpad.openstack.org/p/publiccloud-wg, feel free to add items to that list. See you all at IRC 1400 UTC in #openstack-publiccloud Cheers, Tobias -- Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev From torin.woltjer at granddial.com Thu Jul 5 12:43:43 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Thu, 05 Jul 2018 12:43:43 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.woltjer at granddial.com Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators at lists.openstack.org" , "openstack at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack at lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -------------- next part -------------- An HTML attachment was scrubbed... URL: From torin.woltjer at granddial.com Thu Jul 5 14:30:10 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Thu, 05 Jul 2018 14:30:10 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: <5d62f81a0e864009ab7a1b12097e0b2f@granddial.com> The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.woltjer at granddial.com Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators at lists.openstack.org" , "openstack at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack at lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -------------- next part -------------- An HTML attachment was scrubbed... URL: From torin.woltjer at granddial.com Thu Jul 5 16:39:49 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Thu, 05 Jul 2018 16:39:49 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: <4cb6b48da9734ad1899ff99db02db307@granddial.com> Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been restarted on both controllers. The qdhcp netns's were all present; I stopped the service, removed the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, restarted all neutron services, noted the qdhcp netns's were recreated, restarted a VM again and it still fails to pull an IP address. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 10:38 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Did you restart the neutron-dhcp-agent and rebooted the VMs? On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer wrote: The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.woltjer at granddial.com Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators at lists.openstack.org" , "openstack at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack at lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmihaiescu at gmail.com Thu Jul 5 16:56:07 2018 From: lmihaiescu at gmail.com (George Mihaiescu) Date: Thu, 5 Jul 2018 12:56:07 -0400 Subject: [Openstack-operators] [Openstack] Recovering from full outage In-Reply-To: <4cb6b48da9734ad1899ff99db02db307@granddial.com> References: <4cb6b48da9734ad1899ff99db02db307@granddial.com> Message-ID: You should tcpdump inside the qdhcp namespace to see if the requests make it there, and also check iptables rules on the compute nodes for the return traffic. On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer wrote: > Yes, I've done this. The VMs hang for awhile waiting for DHCP and > eventually come up with no addresses. neutron-dhcp-agent has been restarted > on both controllers. The qdhcp netns's were all present; I stopped the > service, removed the qdhcp netns's, noted the dhcp agents show offline by > `neutron agent-list`, restarted all neutron services, noted the qdhcp > netns's were recreated, restarted a VM again and it still fails to pull an > IP address. > > *Torin Woltjer* > > *Grand Dial Communications - A ZK Tech Inc. Company* > > *616.776.1066 ext. 2006* > * www.granddial.com * > > ------------------------------ > *From*: George Mihaiescu > *Sent*: 7/5/18 10:38 AM > *To*: torin.woltjer at granddial.com > *Subject*: Re: [Openstack] Recovering from full outage > Did you restart the neutron-dhcp-agent and rebooted the VMs? > > On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer < > torin.woltjer at granddial.com> wrote: > >> The qrouter netns appears once the lock_path is specified, the neutron >> router is pingable as well. However, instances are not pingable. If I log >> in via console, the instances have not been given IP addresses, if I >> manually give them an address and route they are pingable and seem to work. >> So the router is working correctly but dhcp is not working. >> >> No errors in any of the neutron or nova logs on controllers or compute >> nodes. >> >> >> *Torin Woltjer* >> >> *Grand Dial Communications - A ZK Tech Inc. Company* >> >> *616.776.1066 ext. 2006* >> * >> www.granddial.com * >> >> ------------------------------ >> *From*: "Torin Woltjer" >> *Sent*: 7/5/18 8:53 AM >> *To*: >> *Cc*: openstack-operators at lists.openstack.org, >> openstack at lists.openstack.org >> *Subject*: Re: [Openstack] Recovering from full outage >> There is no lock path set in my neutron configuration. Does it ultimately >> matter what it is set to as long as it is consistent? Does it need to be >> set on compute nodes as well as controllers? >> >> *Torin Woltjer* >> >> *Grand Dial Communications - A ZK Tech Inc. Company* >> >> *616.776.1066 ext. 2006* >> * >> >> www.granddial.com * >> >> ------------------------------ >> *From*: George Mihaiescu >> *Sent*: 7/3/18 7:47 PM >> *To*: torin.woltjer at granddial.com >> *Cc*: openstack-operators at lists.openstack.org, >> openstack at lists.openstack.org >> *Subject*: Re: [Openstack] Recovering from full outage >> >> Did you set a lock_path in the neutron’s config? >> >> On Jul 3, 2018, at 17:34, Torin Woltjer >> wrote: >> >> The following errors appear in the neutron-linuxbridge-agent.log on both >> controllers: >> >> >> >> >> http://paste.openstack.org/sho >> w/724930/ >> >> No such errors are on the compute nodes themselves. >> >> *Torin Woltjer* >> >> *Grand Dial Communications - A ZK Tech Inc. Company* >> >> *616.776.1066 ext. 2006* >> * >> >> >> www.granddial.com * >> >> ------------------------------ >> *From*: "Torin Woltjer" >> *Sent*: 7/3/18 5:14 PM >> *To*: >> *Cc*: "openstack-operators at lists.openstack.org" < >> openstack-operators at lists.openstack.org>, "openstack at lists.openstack.org" >> >> *Subject*: Re: [Openstack] Recovering from full outage >> Running `openstack server reboot` on an instance just causes the instance >> to be stuck in a rebooting status. Most notable of the logs is >> neutron-server.log which shows the following: >> >> >> >> >> >> >> >> http://paste.openstack.org/sho >> w/724917/ >> >> I realized that rabbitmq was in a failed state, so I bootstrapped it, >> rebooted controllers, and all of the agents show online. >> >> >> >> >> >> >> >> http://paste.openstack.org/sho >> w/724921/ >> And all of the instances can be properly started, however I cannot ping >> any of the instances floating IPs or the neutron router. And when logging >> into an instance with the console, there is no IP address on any interface. >> >> *Torin Woltjer* >> >> *Grand Dial Communications - A ZK Tech Inc. Company* >> >> *616.776.1066 ext. 2006* >> * >> >> >> >> www.granddial.com * >> >> ------------------------------ >> *From*: George Mihaiescu >> *Sent*: 7/3/18 11:50 AM >> *To*: torin.woltjer at granddial.com >> *Subject*: Re: [Openstack] Recovering from full outage >> Try restarting them using "openstack server reboot" and also check the >> nova-compute.log and neutron agents logs on the compute nodes. >> >> On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer < >> torin.woltjer at granddial.com> wrote: >> >>> We just suffered a power outage in out data center and I'm having >>> trouble recovering the Openstack cluster. All of the nodes are back online, >>> every instance shows active but `virsh list --all` on the compute nodes >>> show that all of the VMs are actually shut down. Running `ip addr` on any >>> of the nodes shows that none of the bridges are present and `ip netns` >>> shows that all of the network namespaces are missing as well. So despite >>> all of the neutron service running, none of the networking appears to be >>> active, which is concerning. How do I solve this without recreating all of >>> the networks? >>> >>> *Torin Woltjer* >>> >>> *Grand Dial Communications - A ZK Tech Inc. Company* >>> >>> *616.776.1066 ext. 2006* >>> * >>> >>> >>> >>> >>> www.granddial.com * >>> >>> _______________________________________________ >>> Mailing list: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>> Post to : openstack at lists.openstack.org >>> Unsubscribe : >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Thu Jul 5 18:55:47 2018 From: melwittt at gmail.com (melanie witt) Date: Thu, 5 Jul 2018 11:55:47 -0700 Subject: [Openstack-operators] [Openstack] [nova][api] Novaclient redirect endpoint https into http In-Reply-To: <0D8F95CB-0AAB-45FD-ADC8-3B917C1460D4@workday.com> References: <00be01d41381$d37b2940$7a717bc0$@gmail.com> <0D8F95CB-0AAB-45FD-ADC8-3B917C1460D4@workday.com> Message-ID: +openstack-dev@ On Wed, 4 Jul 2018 14:50:26 +0000, Bogdan Katynski wrote: >> But, I can not use nova command, endpoint nova have been redirected from https to http. Here:http://prntscr.com/k2e8s6 (command: nova –insecure service list) > First of all, it seems that the nova client is hitting /v2.1 instead of /v2.1/ URI and this seems to be triggering the redirect. > > Since openstack CLI works, I presume it must be using the correct URL and hence it’s not getting redirected. > >> >> And this is error log: Unable to establish connection tohttp://192.168.30.70:8774/v2.1/: ('Connection aborted.', BadStatusLine("''",)) >> > Looks to me that nova-api does a redirect to an absolute URL. I suspect SSL is terminated on the HAProxy and nova-api itself is configured without SSL so it redirects to an http URL. > > In my opinion, nova would be more load-balancer friendly if it used a relative URI in the redirect but that’s outside of the scope of this question and since I don’t know the context behind choosing the absolute URL, I could be wrong on that. Thanks for mentioning this. We do have a bug open in python-novaclient around a similar issue [1]. I've added comments based on this thread and will consult with the API subteam to see if there's something we can do about this in nova-api. -melanie [1] https://bugs.launchpad.net/python-novaclient/+bug/1776928 From torin.woltjer at granddial.com Thu Jul 5 20:06:17 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Thu, 05 Jul 2018 20:06:17 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't working on my VMs. created a new instance from an ubuntu 18.04 image to test with, the hostname was not set to the name of the instance and could not login as users I had specified in the configuration. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 12:57 PM To: torin.woltjer at granddial.com Cc: "openstack at lists.openstack.org" , "openstack-operators at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage You should tcpdump inside the qdhcp namespace to see if the requests make it there, and also check iptables rules on the compute nodes for the return traffic. On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer wrote: Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been restarted on both controllers. The qdhcp netns's were all present; I stopped the service, removed the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, restarted all neutron services, noted the qdhcp netns's were recreated, restarted a VM again and it still fails to pull an IP address. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 10:38 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Did you restart the neutron-dhcp-agent and rebooted the VMs? On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer wrote: The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.woltjer at granddial.com Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators at lists.openstack.org" , "openstack at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack at lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -------------- next part -------------- An HTML attachment was scrubbed... URL: From mordred at inaugust.com Thu Jul 5 20:10:08 2018 From: mordred at inaugust.com (Monty Taylor) Date: Thu, 5 Jul 2018 15:10:08 -0500 Subject: [Openstack-operators] [openstack-dev] [Openstack] [nova][api] Novaclient redirect endpoint https into http In-Reply-To: References: <00be01d41381$d37b2940$7a717bc0$@gmail.com> <0D8F95CB-0AAB-45FD-ADC8-3B917C1460D4@workday.com> Message-ID: On 07/05/2018 01:55 PM, melanie witt wrote: > +openstack-dev@ > > On Wed, 4 Jul 2018 14:50:26 +0000, Bogdan Katynski wrote: >>> But, I can not use nova command, endpoint nova have been redirected >>> from https to http. Here:http://prntscr.com/k2e8s6  (command: nova >>> –insecure service list) >> First of all, it seems that the nova client is hitting /v2.1 instead >> of /v2.1/ URI and this seems to be triggering the redirect. >> >> Since openstack CLI works, I presume it must be using the correct URL >> and hence it’s not getting redirected. >> >>> And this is error log: Unable to establish connection >>> tohttp://192.168.30.70:8774/v2.1/: ('Connection aborted.', >>> BadStatusLine("''",)) >> Looks to me that nova-api does a redirect to an absolute URL. I >> suspect SSL is terminated on the HAProxy and nova-api itself is >> configured without SSL so it redirects to an http URL. >> >> In my opinion, nova would be more load-balancer friendly if it used a >> relative URI in the redirect but that’s outside of the scope of this >> question and since I don’t know the context behind choosing the >> absolute URL, I could be wrong on that. > > Thanks for mentioning this. We do have a bug open in python-novaclient > around a similar issue [1]. I've added comments based on this thread and > will consult with the API subteam to see if there's something we can do > about this in nova-api. A similar thing came up the other day related to keystone and version discovery. Version discovery documents tend to return full urls - even though relative urls would make public/internal API endpoints work better. (also, sometimes people don't configure things properly and the version discovery url winds up being incorrect) In shade/sdk - we actually construct a wholly-new discovery url based on the url used for the catalog and the url in the discovery document since we've learned that the version discovery urls are frequently broken. This is problematic because SOMETIMES people have public urls deployed as a sub-url and internal urls deployed on a port - so you have: Catalog: public: https://example.com/compute internal: https://compute.example.com:1234 Version discovery: https://example.com/compute/v2.1 When we go to combine the catalog url and the versioned url, if the user is hitting internal, we product https://compute.example.com:1234/compute/v2.1 - because we have no way of systemically knowing that /compute should also be stripped. VERY LONG WINDED WAY of saying 2 things: a) Relative URLs would be *way* friendlier (and incidentally are supported by keystoneauth, openstacksdk and shade - and are written up as being a thing people *should* support in the documents about API consumption) b) Can we get agreement that changing behavior to return or redirect to a relative URL would not be considered an api contract break? (it's possible the answer to this is 'no' - so it's a real question) Monty From jp.methot at planethoster.info Thu Jul 5 21:30:11 2018 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Thu, 5 Jul 2018 17:30:11 -0400 Subject: [Openstack-operators] Storage concerns when switching from a single controller to a HA setup Message-ID: Hi, We’ve been running on Openstack for several years now and our setup has always counted a single controller. We are currently testing switching to a dual controller HA solution, but an unexpected issue has appeared, regarding storage. See, we use Dell compellent SAN for our block devices. I notice that when I create a volume on one controller, I am unable to make any operation on the same volume on the second controller (this is with an active/passive cinder-volume). Worse, this affects VMs directly as they can’t be migrated if the active controller isn’t the one that created their block device. I know this issue doesn’t happen on Ceph, so I’ve been wondering, is this a limitation of Openstack or the SAN driver? Also, is there actually a way to reach even active-passive high availability with this current storage solution? Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgsousa at gmail.com Thu Jul 5 22:57:44 2018 From: pgsousa at gmail.com (Pedro Sousa) Date: Thu, 5 Jul 2018 23:57:44 +0100 Subject: [Openstack-operators] [Openstack] Recovering from full outage In-Reply-To: References: Message-ID: Hi, that could be a problem with neutron metadata service, check the logs. Have you considered that the outage might have corrupted your databases, neutron, nova, etc? BR On Thu, Jul 5, 2018 at 9:07 PM Torin Woltjer wrote: > Are IP addresses set by cloud-init on boot? I noticed that cloud-init > isn't working on my VMs. created a new instance from an ubuntu 18.04 image > to test with, the hostname was not set to the name of the instance and > could not login as users I had specified in the configuration. > > *Torin Woltjer* > > *Grand Dial Communications - A ZK Tech Inc. Company* > > *616.776.1066 ext. 2006* > *www.granddial.com * > > ------------------------------ > *From*: George Mihaiescu > *Sent*: 7/5/18 12:57 PM > *To*: torin.woltjer at granddial.com > *Cc*: "openstack at lists.openstack.org" , " > openstack-operators at lists.openstack.org" < > openstack-operators at lists.openstack.org> > *Subject*: Re: [Openstack] Recovering from full outage > You should tcpdump inside the qdhcp namespace to see if the requests make > it there, and also check iptables rules on the compute nodes for the return > traffic. > > > On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer < > torin.woltjer at granddial.com> wrote: > >> Yes, I've done this. The VMs hang for awhile waiting for DHCP and >> eventually come up with no addresses. neutron-dhcp-agent has been restarted >> on both controllers. The qdhcp netns's were all present; I stopped the >> service, removed the qdhcp netns's, noted the dhcp agents show offline by >> `neutron agent-list`, restarted all neutron services, noted the qdhcp >> netns's were recreated, restarted a VM again and it still fails to pull an >> IP address. >> >> *Torin Woltjer* >> >> *Grand Dial Communications - A ZK Tech Inc. Company* >> >> *616.776.1066 ext. 2006* >> * www.granddial.com >> * >> >> ------------------------------ >> *From*: George Mihaiescu >> *Sent*: 7/5/18 10:38 AM >> *To*: torin.woltjer at granddial.com >> *Subject*: Re: [Openstack] Recovering from full outage >> Did you restart the neutron-dhcp-agent and rebooted the VMs? >> >> On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer < >> torin.woltjer at granddial.com> wrote: >> >>> The qrouter netns appears once the lock_path is specified, the neutron >>> router is pingable as well. However, instances are not pingable. If I log >>> in via console, the instances have not been given IP addresses, if I >>> manually give them an address and route they are pingable and seem to work. >>> So the router is working correctly but dhcp is not working. >>> >>> No errors in any of the neutron or nova logs on controllers or compute >>> nodes. >>> >>> >>> *Torin Woltjer* >>> >>> *Grand Dial Communications - A ZK Tech Inc. Company* >>> >>> *616.776.1066 ext. 2006* >>> * >>> www.granddial.com >>> * >>> >>> ------------------------------ >>> *From*: "Torin Woltjer" >>> *Sent*: 7/5/18 8:53 AM >>> *To*: >>> *Cc*: openstack-operators at lists.openstack.org, >>> openstack at lists.openstack.org >>> *Subject*: Re: [Openstack] Recovering from full outage >>> There is no lock path set in my neutron configuration. Does it >>> ultimately matter what it is set to as long as it is consistent? Does it >>> need to be set on compute nodes as well as controllers? >>> >>> *Torin Woltjer* >>> >>> *Grand Dial Communications - A ZK Tech Inc. Company* >>> >>> *616.776.1066 ext. 2006* >>> * >>> >>> www.granddial.com >>> * >>> >>> ------------------------------ >>> *From*: George Mihaiescu >>> *Sent*: 7/3/18 7:47 PM >>> *To*: torin.woltjer at granddial.com >>> *Cc*: openstack-operators at lists.openstack.org, >>> openstack at lists.openstack.org >>> *Subject*: Re: [Openstack] Recovering from full outage >>> >>> Did you set a lock_path in the neutron’s config? >>> >>> On Jul 3, 2018, at 17:34, Torin Woltjer >>> wrote: >>> >>> The following errors appear in the neutron-linuxbridge-agent.log on both >>> controllers: >>> >>> >>> >>> >>> >>> >>> http://paste.openstack.org/show/724930/ >>> >>> No such errors are on the compute nodes themselves. >>> >>> *Torin Woltjer* >>> >>> *Grand Dial Communications - A ZK Tech Inc. Company* >>> >>> *616.776.1066 ext. 2006* >>> * >>> >>> >>> www.granddial.com >>> * >>> >>> ------------------------------ >>> *From*: "Torin Woltjer" >>> *Sent*: 7/3/18 5:14 PM >>> *To*: >>> *Cc*: "openstack-operators at lists.openstack.org" < >>> openstack-operators at lists.openstack.org>, "openstack at lists.openstack.org" >>> >>> *Subject*: Re: [Openstack] Recovering from full outage >>> Running `openstack server reboot` on an instance just causes the >>> instance to be stuck in a rebooting status. Most notable of the logs is >>> neutron-server.log which shows the following: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> http://paste.openstack.org/show/724917/ >>> >>> I realized that rabbitmq was in a failed state, so I bootstrapped it, >>> rebooted controllers, and all of the agents show online. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> http://paste.openstack.org/show/724921/ >>> And all of the instances can be properly started, however I cannot ping >>> any of the instances floating IPs or the neutron router. And when logging >>> into an instance with the console, there is no IP address on any interface. >>> >>> *Torin Woltjer* >>> >>> *Grand Dial Communications - A ZK Tech Inc. Company* >>> >>> *616.776.1066 ext. 2006* >>> * >>> >>> >>> >>> www.granddial.com >>> * >>> >>> ------------------------------ >>> *From*: George Mihaiescu >>> *Sent*: 7/3/18 11:50 AM >>> *To*: torin.woltjer at granddial.com >>> *Subject*: Re: [Openstack] Recovering from full outage >>> Try restarting them using "openstack server reboot" and also check the >>> nova-compute.log and neutron agents logs on the compute nodes. >>> >>> On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer < >>> torin.woltjer at granddial.com> wrote: >>> >>>> We just suffered a power outage in out data center and I'm having >>>> trouble recovering the Openstack cluster. All of the nodes are back online, >>>> every instance shows active but `virsh list --all` on the compute nodes >>>> show that all of the VMs are actually shut down. Running `ip addr` on any >>>> of the nodes shows that none of the bridges are present and `ip netns` >>>> shows that all of the network namespaces are missing as well. So despite >>>> all of the neutron service running, none of the networking appears to be >>>> active, which is concerning. How do I solve this without recreating all of >>>> the networks? >>>> >>>> *Torin Woltjer* >>>> >>>> *Grand Dial Communications - A ZK Tech Inc. Company* >>>> >>>> *616.776.1066 ext. 2006* >>>> * >>>> >>>> >>>> >>>> >>>> www.granddial.com >>>> * >>>> >>>> _______________________________________________ >>>> Mailing list: >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>>> Post to : openstack at lists.openstack.org >>>> Unsubscribe : >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>>> >>>> >>> >> > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.zunker at codecentric.cloud Fri Jul 6 08:23:43 2018 From: christian.zunker at codecentric.cloud (Christian Zunker) Date: Fri, 6 Jul 2018 10:23:43 +0200 Subject: [Openstack-operators] Storage concerns when switching from a single controller to a HA setup In-Reply-To: References: Message-ID: Hi Jean-Philippe, we had the same issue with ceph as backend. This fixed the problem in our setup: https://ask.openstack.org/en/question/87545/cinder-high-availability/ Although the above link talks about an active-active setup, the official docs mention the hostname in the configuration also for an active-passive setup: https://docs.openstack.org/ha-guide/storage-ha-block.html#configure-block-storage-api-service regards Christian Zunker Jean-Philippe Méthot schrieb am Fr., 6. Juli 2018 um 07:11 Uhr: > Hi, > > We’ve been running on Openstack for several years now and our setup has > always counted a single controller. We are currently testing switching to a > dual controller HA solution, but an unexpected issue has appeared, > regarding storage. See, we use Dell compellent SAN for our block devices. I > notice that when I create a volume on one controller, I am unable to make > any operation on the same volume on the second controller (this is with an > active/passive cinder-volume). Worse, this affects VMs directly as they > can’t be migrated if the active controller isn’t the one that created their > block device. > > I know this issue doesn’t happen on Ceph, so I’ve been wondering, is this > a limitation of Openstack or the SAN driver? Also, is there actually a way > to reach even active-passive high availability with this current storage > solution? > > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From molenkam at uwo.ca Fri Jul 6 11:55:17 2018 From: molenkam at uwo.ca (Gary Molenkamp) Date: Fri, 6 Jul 2018 07:55:17 -0400 Subject: [Openstack-operators] After power outage, nearly all vm volumes corrupted and unmountable Message-ID: <5794a4af-03d9-1159-c385-aed7c277675e@uwo.ca> Good morning all, After losing all power to our DC last night due to a storm, nearly all of the volumes in our Pike cluster are unmountable.  Of the 30 VMs in use at the time, only one has been able to successfully mount and boot from its rootfs.   We are using Ceph as the backend storage to cinder and glance.  Any help or pointers to bring this back online would be appreciated.  What most of the volumes are seeing is [    2.622252] SGI XFS with ACLs, security attributes, no debug enabled [    2.629285] XFS (sda1): Mounting V5 Filesystem [    2.832223] sd 2:0:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [    2.838412] sd 2:0:0:0: [sda] Sense Key : Aborted Command [current] [    2.842383] sd 2:0:0:0: [sda] Add. Sense: I/O process terminated [    2.846152] sd 2:0:0:0: [sda] CDB: Write(10) 2a 00 00 80 2c 19 00 04 00 00 [    2.850146] blk_update_request: I/O error, dev sda, sector 8399897 or [    2.590178] EXT4-fs (vda1): INFO: recovery required on readonly filesystem [    2.594319] EXT4-fs (vda1): write access will be enabled during recovery [    2.957742] print_req_error: I/O error, dev vda, sector 227328 [    2.962468] Buffer I/O error on dev vda1, logical block 0, lost async page write [    2.967933] Buffer I/O error on dev vda1, logical block 1, lost async page write [    2.973076] print_req_error: I/O error, dev vda, sector 229384 As a test for one of the less critical vms, I deleted the vm and mounted the volume on the one VM I managed to start.  The results were not promising: # dmesg |tail [    5.136862] type=1305 audit(1530847244.811:4): audit_pid=496 old=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1 [    7.726331] nf_conntrack version 0.5.0 (65536 buckets, 262144 max) [29374.967315] scsi 2:0:0:1: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5 [29374.988104] sd 2:0:0:1: [sdb] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB) [29374.991126] sd 2:0:0:1: Attached scsi generic sg1 type 0 [29374.995302] sd 2:0:0:1: [sdb] Write Protect is off [29374.997109] sd 2:0:0:1: [sdb] Mode Sense: 63 00 00 08 [29374.997186] sd 2:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [29375.005968]  sdb: sdb1 [29375.007746] sd 2:0:0:1: [sdb] Attached SCSI disk # parted /dev/sdb GNU Parted 3.1 Using /dev/sdb Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: QEMU QEMU HARDDISK (scsi) Disk /dev/sdb: 42.9GB Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number  Start   End     Size    Type     File system  Flags  1      1049kB  42.9GB  42.9GB  primary  xfs          boot # mount -t xfs /dev/sdb temp mount: wrong fs type, bad option, bad superblock on /dev/sdb,        missing codepage or helper program, or other error        In some cases useful info is found in syslog - try        dmesg | tail or so. # xfs_repair /dev/sdb Phase 1 - find and verify superblock... bad primary superblock - bad magic number !!! attempting to find secondary superblock... Which eventually fails.   The ceph cluster looks healthy, I can export the volumes from rbd.  I can find no other errors in ceph of openstack indicating a fault in either system.     - Is this recoverable?     - What happened to all of these volumes and can this be prevented from occurring again?  Note that any shutdown vm at the time of the outage appears to be fine. Relevant versions:     Base OS:  all Centos 7.5     Ceph:  Luminous 12.2.5-0     Openstack:  Latest Pike releases in centos-release-openstack-pike-1-1         nova 16.1.4-1         cinder  11.1.1-1 -- Gary Molenkamp Computer Science/Science Technology Services Systems Administrator University of Western Ontario molenkam at uwo.ca http://www.csd.uwo.ca (519) 661-2111 x86882 (519) 661-3566 From molenkam at uwo.ca Fri Jul 6 13:17:20 2018 From: molenkam at uwo.ca (Gary Molenkamp) Date: Fri, 6 Jul 2018 09:17:20 -0400 Subject: [Openstack-operators] [ceph-users] After power outage, nearly all vm volumes corrupted and unmountable In-Reply-To: References: <5794a4af-03d9-1159-c385-aed7c277675e@uwo.ca> Message-ID: <0d6e8673-0bc4-43c8-7776-f4debded6d42@uwo.ca> Thank you Jason,  Not sure how I missed that step. On 2018-07-06 08:34 AM, Jason Dillaman wrote: > There have been several similar reports on the mailing list about this > [1][2][3][4] that are always a result of skipping step 6 from the > Luminous upgrade guide [5]. The new (starting Luminous) 'profile > rbd'-style caps are designed to try to simplify caps going forward [6]. > > TL;DR: your Openstack CephX users need to have permission to blacklist > dead clients that failed to properly release the exclusive lock. > > [1] > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-November/022278.html > [2] > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-November/022694.html > [3] > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026496.html > [4] https://www.spinics.net/lists/ceph-users/msg45665.html > [5] > http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken > [6] > http://docs.ceph.com/docs/luminous/rbd/rbd-openstack/#setup-ceph-client-authentication > > > On Fri, Jul 6, 2018 at 7:55 AM Gary Molenkamp > wrote: > > Good morning all, > > After losing all power to our DC last night due to a storm, nearly > all > of the volumes in our Pike cluster are unmountable.  Of the 30 VMs in > use at the time, only one has been able to successfully mount and > boot > from its rootfs.   We are using Ceph as the backend storage to cinder > and glance.  Any help or pointers to bring this back online would be > appreciated. > >   What most of the volumes are seeing is > > [    2.622252] SGI XFS with ACLs, security attributes, no debug > enabled > [    2.629285] XFS (sda1): Mounting V5 Filesystem > [    2.832223] sd 2:0:0:0: [sda] FAILED Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > [    2.838412] sd 2:0:0:0: [sda] Sense Key : Aborted Command [current] > [    2.842383] sd 2:0:0:0: [sda] Add. Sense: I/O process terminated > [    2.846152] sd 2:0:0:0: [sda] CDB: Write(10) 2a 00 00 80 2c 19 > 00 04 > 00 00 > [    2.850146] blk_update_request: I/O error, dev sda, sector 8399897 > > or > > [    2.590178] EXT4-fs (vda1): INFO: recovery required on readonly > filesystem > [    2.594319] EXT4-fs (vda1): write access will be enabled during > recovery > [    2.957742] print_req_error: I/O error, dev vda, sector 227328 > [    2.962468] Buffer I/O error on dev vda1, logical block 0, lost > async > page write > [    2.967933] Buffer I/O error on dev vda1, logical block 1, lost > async > page write > [    2.973076] print_req_error: I/O error, dev vda, sector 229384 > > As a test for one of the less critical vms, I deleted the vm and > mounted > the volume on the one VM I managed to start.  The results were not > promising: > > > # dmesg |tail > [    5.136862] type=1305 audit(1530847244.811:4): audit_pid=496 old=0 > auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 > res=1 > [    7.726331] nf_conntrack version 0.5.0 (65536 buckets, 262144 max) > [29374.967315] scsi 2:0:0:1: Direct-Access     QEMU     QEMU HARDDISK > 2.5+ PQ: 0 ANSI: 5 > [29374.988104] sd 2:0:0:1: [sdb] 83886080 512-byte logical blocks: > (42.9 > GB/40.0 GiB) > [29374.991126] sd 2:0:0:1: Attached scsi generic sg1 type 0 > [29374.995302] sd 2:0:0:1: [sdb] Write Protect is off > [29374.997109] sd 2:0:0:1: [sdb] Mode Sense: 63 00 00 08 > [29374.997186] sd 2:0:0:1: [sdb] Write cache: enabled, read cache: > enabled, doesn't support DPO or FUA > [29375.005968]  sdb: sdb1 > [29375.007746] sd 2:0:0:1: [sdb] Attached SCSI disk > > # parted /dev/sdb > GNU Parted 3.1 > Using /dev/sdb > Welcome to GNU Parted! Type 'help' to view a list of commands. > (parted) p > Model: QEMU QEMU HARDDISK (scsi) > Disk /dev/sdb: 42.9GB > Sector size (logical/physical): 512B/512B > Partition Table: msdos > Disk Flags: > > Number  Start   End     Size    Type     File system  Flags >   1      1049kB  42.9GB  42.9GB  primary  xfs          boot > > # mount -t xfs /dev/sdb temp > mount: wrong fs type, bad option, bad superblock on /dev/sdb, >         missing codepage or helper program, or other error > >         In some cases useful info is found in syslog - try >         dmesg | tail or so. > > # xfs_repair /dev/sdb > Phase 1 - find and verify superblock... > bad primary superblock - bad magic number !!! > > attempting to find secondary superblock... > > > > Which eventually fails.   The ceph cluster looks healthy, I can > export > the volumes from rbd.  I can find no other errors in ceph of > openstack > indicating a fault in either system. > >      - Is this recoverable? > >      - What happened to all of these volumes and can this be > prevented > from occurring again?  Note that any shutdown vm at the time of the > outage appears to be fine. > > > Relevant versions: > >      Base OS:  all Centos 7.5 > >      Ceph:  Luminous 12.2.5-0 > >      Openstack:  Latest Pike releases in > centos-release-openstack-pike-1-1 > >          nova 16.1.4-1 > >          cinder  11.1.1-1 > > > > -- > Gary Molenkamp                  Computer Science/Science > Technology Services > Systems Administrator           University of Western Ontario > molenkam at uwo.ca http://www.csd.uwo.ca > (519) 661-2111 x86882           (519) 661-3566 > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Jason -- Gary Molenkamp Computer Science/Science Technology Services Systems Administrator University of Western Ontario molenkam at uwo.ca http://www.csd.uwo.ca (519) 661-2111 x86882 (519) 661-3566 -------------- next part -------------- An HTML attachment was scrubbed... URL: From torin.woltjer at granddial.com Fri Jul 6 13:38:55 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Fri, 06 Jul 2018 13:38:55 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: I have done tcpdumps on both the controllers and on a compute node. Controller: `ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i ns-83d68c76-b8 port 67` `tcpdump -vnes0 -i any port 67` Compute: `tcpdump -vnes0 -i brqd85c2a00-a6 port 68` For the first command on the controller, there are no packets captured at all. The second command on the controller captures packets, but they don't appear to be relevant to openstack. The dump from the compute node shows constant requests are getting sent by openstack instances. In summary; DHCP requests are being sent, but are never received. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 4:50 PM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage The cloud-init requires network connectivity by default in order to reach the metadata server for the hostname, ssh-key, etc You can configure cloud-init to use the config-drive, but the lack of network connectivity will make the instance useless anyway, even though it will have you ssh-key and hostname... Did you check the things I told you? On Jul 5, 2018, at 16:06, Torin Woltjer wrote: Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't working on my VMs. created a new instance from an ubuntu 18.04 image to test with, the hostname was not set to the name of the instance and could not login as users I had specified in the configuration. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 12:57 PM To: torin.woltjer at granddial.com Cc: "openstack at lists.openstack.org" , "openstack-operators at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage You should tcpdump inside the qdhcp namespace to see if the requests make it there, and also check iptables rules on the compute nodes for the return traffic. On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer wrote: Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been restarted on both controllers. The qdhcp netns's were all present; I stopped the service, removed the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, restarted all neutron services, noted the qdhcp netns's were recreated, restarted a VM again and it still fails to pull an IP address. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 10:38 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Did you restart the neutron-dhcp-agent and rebooted the VMs? On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer wrote: The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.woltjer at granddial.com Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators at lists.openstack.org" , "openstack at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack at lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmihaiescu at gmail.com Fri Jul 6 15:14:23 2018 From: lmihaiescu at gmail.com (George Mihaiescu) Date: Fri, 6 Jul 2018 11:14:23 -0400 Subject: [Openstack-operators] [Openstack] Recovering from full outage In-Reply-To: References: Message-ID: Can you manually assign an IP address to a VM and once inside, ping the address of the dhcp server? That would confirm if there is connectivity at least. Also, on the controller node where the dhcp server for that network is, check the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and make sure there are entries corresponding to your instances. In my experience, if neutron is broken after working fine (so excluding any miss-configuration), then an agent is out-of-sync and restart usually fixes things. On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer wrote: > I have done tcpdumps on both the controllers and on a compute node. > Controller: > `ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 > -i ns-83d68c76-b8 port 67` > `tcpdump -vnes0 -i any port 67` > Compute: > `tcpdump -vnes0 -i brqd85c2a00-a6 port 68` > > For the first command on the controller, there are no packets captured at > all. The second command on the controller captures packets, but they don't > appear to be relevant to openstack. The dump from the compute node shows > constant requests are getting sent by openstack instances. > > In summary; DHCP requests are being sent, but are never received. > > *Torin Woltjer* > > *Grand Dial Communications - A ZK Tech Inc. Company* > > *616.776.1066 ext. 2006* > * www.granddial.com * > > ------------------------------ > *From*: George Mihaiescu > *Sent*: 7/5/18 4:50 PM > *To*: torin.woltjer at granddial.com > *Subject*: Re: [Openstack] Recovering from full outage > > The cloud-init requires network connectivity by default in order to reach > the metadata server for the hostname, ssh-key, etc > > You can configure cloud-init to use the config-drive, but the lack of > network connectivity will make the instance useless anyway, even though it > will have you ssh-key and hostname... > > Did you check the things I told you? > > On Jul 5, 2018, at 16:06, Torin Woltjer > wrote: > > Are IP addresses set by cloud-init on boot? I noticed that cloud-init > isn't working on my VMs. created a new instance from an ubuntu 18.04 image > to test with, the hostname was not set to the name of the instance and > could not login as users I had specified in the configuration. > > *Torin Woltjer* > > *Grand Dial Communications - A ZK Tech Inc. Company* > > *616.776.1066 ext. 2006* > * > www.granddial.com * > > ------------------------------ > *From*: George Mihaiescu > *Sent*: 7/5/18 12:57 PM > *To*: torin.woltjer at granddial.com > *Cc*: "openstack at lists.openstack.org" , " > openstack-operators at lists.openstack.org" openstack.org> > *Subject*: Re: [Openstack] Recovering from full outage > You should tcpdump inside the qdhcp namespace to see if the requests make > it there, and also check iptables rules on the compute nodes for the return > traffic. > > > On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer < > torin.woltjer at granddial.com> wrote: > >> Yes, I've done this. The VMs hang for awhile waiting for DHCP and >> eventually come up with no addresses. neutron-dhcp-agent has been restarted >> on both controllers. The qdhcp netns's were all present; I stopped the >> service, removed the qdhcp netns's, noted the dhcp agents show offline by >> `neutron agent-list`, restarted all neutron services, noted the qdhcp >> netns's were recreated, restarted a VM again and it still fails to pull an >> IP address. >> >> *Torin Woltjer* >> >> *Grand Dial Communications - A ZK Tech Inc. Company* >> >> *616.776.1066 ext. 2006* >> * >> >> www.granddial.com * >> >> ------------------------------ >> *From*: George Mihaiescu >> *Sent*: 7/5/18 10:38 AM >> *To*: torin.woltjer at granddial.com >> *Subject*: Re: [Openstack] Recovering from full outage >> Did you restart the neutron-dhcp-agent and rebooted the VMs? >> >> On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer < >> torin.woltjer at granddial.com> wrote: >> >>> The qrouter netns appears once the lock_path is specified, the neutron >>> router is pingable as well. However, instances are not pingable. If I log >>> in via console, the instances have not been given IP addresses, if I >>> manually give them an address and route they are pingable and seem to work. >>> So the router is working correctly but dhcp is not working. >>> >>> No errors in any of the neutron or nova logs on controllers or compute >>> nodes. >>> >>> >>> *Torin Woltjer* >>> >>> *Grand Dial Communications - A ZK Tech Inc. Company* >>> >>> *616.776.1066 ext. 2006* >>> * >>> >>> >>> www.granddial.com * >>> >>> ------------------------------ >>> *From*: "Torin Woltjer" >>> *Sent*: 7/5/18 8:53 AM >>> *To*: >>> *Cc*: openstack-operators at lists.openstack.org, >>> openstack at lists.openstack.org >>> *Subject*: Re: [Openstack] Recovering from full outage >>> There is no lock path set in my neutron configuration. Does it >>> ultimately matter what it is set to as long as it is consistent? Does it >>> need to be set on compute nodes as well as controllers? >>> >>> *Torin Woltjer* >>> >>> *Grand Dial Communications - A ZK Tech Inc. Company* >>> >>> *616.776.1066 ext. 2006* >>> * >>> >>> >>> >>> www.granddial.com * >>> >>> ------------------------------ >>> *From*: George Mihaiescu >>> *Sent*: 7/3/18 7:47 PM >>> *To*: torin.woltjer at granddial.com >>> *Cc*: openstack-operators at lists.openstack.org, >>> openstack at lists.openstack.org >>> *Subject*: Re: [Openstack] Recovering from full outage >>> >>> Did you set a lock_path in the neutron’s config? >>> >>> On Jul 3, 2018, at 17:34, Torin Woltjer >>> wrote: >>> >>> The following errors appear in the neutron-linuxbridge-agent.log on both >>> controllers: >>> >>> >>> >>> >>> >>> >>> >>> >>> http://paste.openstack.org/sho >>> w/724930/ >>> >>> No such errors are on the compute nodes themselves. >>> >>> *Torin Woltjer* >>> >>> *Grand Dial Communications - A ZK Tech Inc. Company* >>> >>> *616.776.1066 ext. 2006* >>> * >>> >>> >>> >>> >>> www.granddial.com * >>> >>> ------------------------------ >>> *From*: "Torin Woltjer" >>> *Sent*: 7/3/18 5:14 PM >>> *To*: >>> *Cc*: "openstack-operators at lists.openstack.org" < >>> openstack-operators at lists.openstack.org>, "openstack at lists.openstack.org" >>> >>> *Subject*: Re: [Openstack] Recovering from full outage >>> Running `openstack server reboot` on an instance just causes the >>> instance to be stuck in a rebooting status. Most notable of the logs is >>> neutron-server.log which shows the following: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> http://paste.openstack.org/sho >>> w/724917/ >>> >>> I realized that rabbitmq was in a failed state, so I bootstrapped it, >>> rebooted controllers, and all of the agents show online. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> http://paste.openstack.org/sho >>> w/724921/ >>> And all of the instances can be properly started, however I cannot ping >>> any of the instances floating IPs or the neutron router. And when logging >>> into an instance with the console, there is no IP address on any interface. >>> >>> *Torin Woltjer* >>> >>> *Grand Dial Communications - A ZK Tech Inc. Company* >>> >>> *616.776.1066 ext. 2006* >>> * >>> >>> >>> >>> >>> >>> www.granddial.com * >>> >>> ------------------------------ >>> *From*: George Mihaiescu >>> *Sent*: 7/3/18 11:50 AM >>> *To*: torin.woltjer at granddial.com >>> *Subject*: Re: [Openstack] Recovering from full outage >>> Try restarting them using "openstack server reboot" and also check the >>> nova-compute.log and neutron agents logs on the compute nodes. >>> >>> On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer < >>> torin.woltjer at granddial.com> wrote: >>> >>>> We just suffered a power outage in out data center and I'm having >>>> trouble recovering the Openstack cluster. All of the nodes are back online, >>>> every instance shows active but `virsh list --all` on the compute nodes >>>> show that all of the VMs are actually shut down. Running `ip addr` on any >>>> of the nodes shows that none of the bridges are present and `ip netns` >>>> shows that all of the network namespaces are missing as well. So despite >>>> all of the neutron service running, none of the networking appears to be >>>> active, which is concerning. How do I solve this without recreating all of >>>> the networks? >>>> >>>> *Torin Woltjer* >>>> >>>> *Grand Dial Communications - A ZK Tech Inc. Company* >>>> >>>> *616.776.1066 ext. 2006* >>>> * >>>> >>>> >>>> >>>> >>>> >>>> >>>> www.granddial.com * >>>> >>>> _______________________________________________ >>>> Mailing list: >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>>> Post to : openstack at lists.openstack.org >>>> Unsubscribe : >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From torin.woltjer at granddial.com Fri Jul 6 15:49:58 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Fri, 06 Jul 2018 15:49:58 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP (located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually added the IP address to has a floating IP, and oddly enough I am able to ping DHCP on the provider network, which suggests that DHCP may be working on other networks but not on my selfservice network. I was able to confirm this by creating a new virtual machine directly on the provider network, I was able to ping to it and SSH into it right off of the bat, as it obtained the proper address on its own. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains: fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8 fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7 fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12 fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10 fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3 fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14 fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1 fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2 fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4 fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100 fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13 fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18 I've done system restarts since the power outage and the agent hasn't corrected itself. I've restarted all neutron services as I've done things, I could also try stopping and starting dnsmasq. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/6/18 11:15 AM To: torin.woltjer at granddial.com Cc: "openstack at lists.openstack.org" , "openstack-operators at lists.openstack.org" , pgsousa at gmail.com Subject: Re: [Openstack] Recovering from full outage Can you manually assign an IP address to a VM and once inside, ping the address of the dhcp server? That would confirm if there is connectivity at least. Also, on the controller node where the dhcp server for that network is, check the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and make sure there are entries corresponding to your instances. In my experience, if neutron is broken after working fine (so excluding any miss-configuration), then an agent is out-of-sync and restart usually fixes things. On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer wrote: I have done tcpdumps on both the controllers and on a compute node. Controller: `ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i ns-83d68c76-b8 port 67` `tcpdump -vnes0 -i any port 67` Compute: `tcpdump -vnes0 -i brqd85c2a00-a6 port 68` For the first command on the controller, there are no packets captured at all. The second command on the controller captures packets, but they don't appear to be relevant to openstack. The dump from the compute node shows constant requests are getting sent by openstack instances. In summary; DHCP requests are being sent, but are never received. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 4:50 PM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage The cloud-init requires network connectivity by default in order to reach the metadata server for the hostname, ssh-key, etc You can configure cloud-init to use the config-drive, but the lack of network connectivity will make the instance useless anyway, even though it will have you ssh-key and hostname... Did you check the things I told you? On Jul 5, 2018, at 16:06, Torin Woltjer wrote: Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't working on my VMs. created a new instance from an ubuntu 18.04 image to test with, the hostname was not set to the name of the instance and could not login as users I had specified in the configuration. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 12:57 PM To: torin.woltjer at granddial.com Cc: "openstack at lists.openstack.org" , "openstack-operators at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage You should tcpdump inside the qdhcp namespace to see if the requests make it there, and also check iptables rules on the compute nodes for the return traffic. On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer wrote: Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been restarted on both controllers. The qdhcp netns's were all present; I stopped the service, removed the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, restarted all neutron services, noted the qdhcp netns's were recreated, restarted a VM again and it still fails to pull an IP address. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 10:38 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Did you restart the neutron-dhcp-agent and rebooted the VMs? On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer wrote: The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.woltjer at granddial.com Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators at lists.openstack.org" , "openstack at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack at lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luiz.Gavioli at netapp.com Fri Jul 6 17:08:08 2018 From: Luiz.Gavioli at netapp.com (Gavioli, Luiz) Date: Fri, 6 Jul 2018 17:08:08 +0000 Subject: [Openstack-operators] Deprecation notice: Cinder Driver for NetApp E-Series Message-ID: <1530896888.7565.11.camel@netapp.com> Developers and Operators, NetApp’s various Cinder drivers currently provide platform integration for ONTAP powered systems, SolidFire, and E/EF-Series systems. Per systems-provided telemetry and discussion amongst our user community, we’ve learned that when E/EF-series systems are deployed with OpenStack they do not commonly make use of the platform specific Cinder driver (instead opting for use of the LVM driver or Ceph layered atop). Given that, we’re proposing to cease further development and maintenance of the E-Series drivers within OpenStack and will focus development on our widely used SolidFire and ONTAP options. In accordance with community policy [1], we are initiating the deprecation process for the NetApp E-Series drivers [2] set to conclude with their removal in the OpenStack Stein release. This will apply to both protocols currently supported in this driver: iSCSI and FC. What is being deprecated: Cinder drivers for NetApp E-Series Period of deprecation: E-Series drivers will be around in stable/rocky and will be removed in the Stein release (All milestones of this release) What should users/operators do: Any Cinder E-series deployers are encouraged to get in touch with NetApp via the community #openstack-netapp IRC channel on freenode or via the #OpenStack Slack channel on http://netapp.io. We encourage migration to the LVM driver for continued use of E-series systems in most cases via Cinder’s migrate facility [3]. [1] https://governance.openstack.org/reference/tags/assert_follows-standard-deprecation.html [2] https://review.openstack.org/#/c/580679/ [3] https://docs.openstack.org/admin-guide/blockstorage-volume-migration.html Thanks, Luiz Gavioli -------------- next part -------------- An HTML attachment was scrubbed... URL: From torin.woltjer at granddial.com Fri Jul 6 18:13:20 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Fri, 06 Jul 2018 18:13:20 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: I explored creating a second "selfservice" vxlan to see if DHCP would work on it as it does on my external "provider" network. The new vxlan network shares the same problems as the old vxlan network. Am I having problems with VXLAN in particular? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/6/18 12:05 PM To: Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP (located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually added the IP address to has a floating IP, and oddly enough I am able to ping DHCP on the provider network, which suggests that DHCP may be working on other networks but not on my selfservice network. I was able to confirm this by creating a new virtual machine directly on the provider network, I was able to ping to it and SSH into it right off of the bat, as it obtained the proper address on its own. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains: fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8 fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7 fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12 fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10 fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3 fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14 fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1 fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2 fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4 fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100 fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13 fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18 I've done system restarts since the power outage and the agent hasn't corrected itself. I've restarted all neutron services as I've done things, I could also try stopping and starting dnsmasq. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/6/18 11:15 AM To: torin.woltjer at granddial.com Cc: "openstack at lists.openstack.org" , "openstack-operators at lists.openstack.org" , pgsousa at gmail.com Subject: Re: [Openstack] Recovering from full outage Can you manually assign an IP address to a VM and once inside, ping the address of the dhcp server? That would confirm if there is connectivity at least. Also, on the controller node where the dhcp server for that network is, check the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and make sure there are entries corresponding to your instances. In my experience, if neutron is broken after working fine (so excluding any miss-configuration), then an agent is out-of-sync and restart usually fixes things. On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer wrote: I have done tcpdumps on both the controllers and on a compute node. Controller: `ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i ns-83d68c76-b8 port 67` `tcpdump -vnes0 -i any port 67` Compute: `tcpdump -vnes0 -i brqd85c2a00-a6 port 68` For the first command on the controller, there are no packets captured at all. The second command on the controller captures packets, but they don't appear to be relevant to openstack. The dump from the compute node shows constant requests are getting sent by openstack instances. In summary; DHCP requests are being sent, but are never received. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 4:50 PM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage The cloud-init requires network connectivity by default in order to reach the metadata server for the hostname, ssh-key, etc You can configure cloud-init to use the config-drive, but the lack of network connectivity will make the instance useless anyway, even though it will have you ssh-key and hostname... Did you check the things I told you? On Jul 5, 2018, at 16:06, Torin Woltjer wrote: Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't working on my VMs. created a new instance from an ubuntu 18.04 image to test with, the hostname was not set to the name of the instance and could not login as users I had specified in the configuration. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 12:57 PM To: torin.woltjer at granddial.com Cc: "openstack at lists.openstack.org" , "openstack-operators at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage You should tcpdump inside the qdhcp namespace to see if the requests make it there, and also check iptables rules on the compute nodes for the return traffic. On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer wrote: Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been restarted on both controllers. The qdhcp netns's were all present; I stopped the service, removed the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, restarted all neutron services, noted the qdhcp netns's were recreated, restarted a VM again and it still fails to pull an IP address. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 10:38 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Did you restart the neutron-dhcp-agent and rebooted the VMs? On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer wrote: The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.woltjer at granddial.com Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators at lists.openstack.org" , "openstack at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack at lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrhillsman at gmail.com Mon Jul 9 17:27:55 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Mon, 9 Jul 2018 12:27:55 -0500 Subject: [Openstack-operators] Reminder: UC Meeting Today 1800UTC / 1300CST Message-ID: Hey everyone, Please see https://wiki.openstack.org/wiki/Governance/Foundation/Us erCommittee for UC meeting info and add additional agenda items if needed. -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.zunker at codecentric.cloud Tue Jul 10 04:54:55 2018 From: christian.zunker at codecentric.cloud (Christian Zunker) Date: Tue, 10 Jul 2018 06:54:55 +0200 Subject: [Openstack-operators] How are you handling billing/chargeback? In-Reply-To: References: <20180312192113.znz4eavfze5zg7yn@redhat.com> Message-ID: Hi, just a short feedback to my previous post. We switched from our self-written Python script to cloudkitty. We had to make some commits to openstack-ansible, to integrate cloudkitty into our installation but we got it working. cloudkitty replaced our self-written script completely. We needed some time to get used to it, but it reduced our maintenance. We will have to write a converter for the json/csv reports generated by cloudkitty, so we can import the data into our billing system. But we get the complete data we need from cloudkitty which is worth a lot. Our next steps: - Perhaps migrate to gnocchi as cloudkitty storage backend. We will have to test this first. - The Horizon pages aren't that nice, especially the user-facing one. We will have a look at that later. For now, we disabled it. regards Christian Christian Zunker schrieb am Di., 8. Mai 2018 um 08:36 Uhr: > Hi, > > we are running a cloud based on openstack-ansible and now are trying to > integrate cloudkitty for billing. > > Till now we used a self written python script to query ceilometer for > needed data, but that got more tedious than we are willing to handle. We > hope it gets much easier once cloudkitty is set up. > > regards > Christian > > >> From: Lars Kellogg-Stedman >> Date: Mo., 12. März 2018 um 20:27 Uhr >> Subject: [Openstack-operators] How are you handling billing/chargeback? >> To: openstack-operators at lists.openstack.org < >> openstack-operators at lists.openstack.org> >> >> >> Hey folks, >> >> I'm curious what folks out there are using for chargeback/billing in >> your OpenStack environment. >> >> Are you doing any sort of chargeback (or showback)? Are you using (or >> have you tried) CloudKitty? Or some other existing project? Have you >> rolled your own instead? >> >> I ask because I am helping out some folks get a handle on the >> operational side of their existing OpenStack environment, and they are >> interested in but have not yet deployed some sort of reporting >> mechanism. >> >> Thanks, >> >> >> -- >> Lars Kellogg-Stedman | larsks @ {irc,twitter,github} >> http://blog.oddbit.com/ | >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > -- > cc cloud GmbH | Hochstr. 11 > | 42697 > Solingen | Deutschland > mobil: +49 175 1068513 <+49%20175%201068513> > www.codecentric.cloud | blog.codecentric.de | www.meettheexperts.de > Sitz der Gesellschaft: Solingen | HRB 28640| Amtsgericht Wuppertal > > Geschäftsführung: Werner Krandick . Rainer Vehns > > Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche > und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige > Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie > bitte sofort den Absender und löschen Sie diese E-Mail und evtl. > beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen > evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist > nicht gestattet. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From berendt at betacloud-solutions.de Tue Jul 10 12:03:54 2018 From: berendt at betacloud-solutions.de (Christian Berendt) Date: Tue, 10 Jul 2018 14:03:54 +0200 Subject: [Openstack-operators] [glance] share image with domain Message-ID: <094CFFC9-1F54-4522-8178-1642F94724A0@betacloud-solutions.de> It is possible to add a domain as a member, however this is not taken in account. It should be mentioned that you can also add non-existing project ids as a member. For me it looks like it is not possible to share a image with visibility “shared” with a domain. Are there known workarounds or scripts for that use case? Christian. -- Christian Berendt Chief Executive Officer (CEO) Mail: berendt at betacloud-solutions.de Web: https://www.betacloud-solutions.de Betacloud Solutions GmbH Teckstrasse 62 / 70190 Stuttgart / Deutschland Geschäftsführer: Christian Berendt Unternehmenssitz: Stuttgart Amtsgericht: Stuttgart, HRB 756139 From amy at demarco.com Tue Jul 10 15:32:42 2018 From: amy at demarco.com (Amy Marrich) Date: Tue, 10 Jul 2018 10:32:42 -0500 Subject: [Openstack-operators] [openstack-community] Openstack package repo In-Reply-To: References: Message-ID: Alfredo, Forwarding this to the OPS list in the hopes of it reaching the appropriate folks, but you might also want to checkout the RDO repos https://trunk.rdoproject.org/centos7/current/ Thanks, Amy (spotz) On Tue, Jul 10, 2018 at 10:07 AM, Alfredo De Luca wrote: > Hi all. > I have centos/7 on a VM Virtualbox... I want to install all the openstack > python clients (nova, swift etc). > I installed > *yum install centos-release-openstack-queens * > > and all good but when I try to install one client I have the following > error: > > yum install python-swiftclient > > ** > Loaded plugins: fastestmirror > Loading mirror speeds from cached hostfile > * base: mirror.infonline.de > * extras: mirror.infonline.de > * updates: centos.mirrors.psw.services > centos-ceph-luminous > | 2.9 kB 00:00:00 > centos-openstack-queens > | 2.9 kB 00:00:00 > *http://mirror.centos.org/altarch/7/virt/x86_64/kvm-common/repodata/repomd.xml > : > [Errno 14] HTTP Error 404 - Not Found* > Trying other mirror. > To address this issue please refer to the below wiki article > > https://wiki.centos.org/yum-errors > ** > > Now the only way to install the package (or any other) is to disable that > repo > *yum-config-manager --disable centos-qemu-ev* > > then I can install the client... > > Any idea? > It looks like *http://mirror.centos.org/altarch/7/virt/x86_64 > doesn't exist.....* > > > > > > > -- > *Alfredo* > > > _______________________________________________ > Community mailing list > Community at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/community > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From torin.woltjer at granddial.com Tue Jul 10 18:58:32 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Tue, 10 Jul 2018 18:58:32 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: <167ce30bac124c85a16061c83353553a@granddial.com> DHCP is working again so instances are getting their addresses. For some reason cloud-init isn't working correctly. Hostnames aren't getting set, and SSH key pair isn't getting set. The neutron-metadata service is in control of this? neutron-metadata-agent.log: 2018-07-10 08:01:42.046 5518 INFO eventlet.wsgi.server [-] 109.73.185.195, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0622332 2018-07-10 09:49:42.604 5518 INFO eventlet.wsgi.server [-] 197.149.85.150, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0645461 2018-07-10 10:52:50.845 5517 INFO eventlet.wsgi.server [-] 88.249.225.204, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0659041 2018-07-10 11:43:20.471 5518 INFO eventlet.wsgi.server [-] 143.208.186.168, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0618532 2018-07-10 11:53:15.574 5511 INFO eventlet.wsgi.server [-] 194.40.240.254, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0636070 2018-07-10 13:26:46.795 5518 INFO eventlet.wsgi.server [-] 109.73.177.149, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0611560 2018-07-10 13:27:38.795 5513 INFO eventlet.wsgi.server [-] 125.167.69.238, "GET / HTTP/1.0" status: 404 len: 195 time: 0.0631371 2018-07-10 13:30:49.551 5514 INFO eventlet.wsgi.server [-] 155.93.152.111, "GET / HTTP/1.0" status: 404 len: 195 time: 0.0609179 2018-07-10 14:12:42.008 5521 INFO eventlet.wsgi.server [-] 190.85.38.173, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0597739 No other log files show abnormal behavior. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/6/18 2:33 PM To: "lmihaiescu at gmail.com" Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage I explored creating a second "selfservice" vxlan to see if DHCP would work on it as it does on my external "provider" network. The new vxlan network shares the same problems as the old vxlan network. Am I having problems with VXLAN in particular? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/6/18 12:05 PM To: Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP (located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually added the IP address to has a floating IP, and oddly enough I am able to ping DHCP on the provider network, which suggests that DHCP may be working on other networks but not on my selfservice network. I was able to confirm this by creating a new virtual machine directly on the provider network, I was able to ping to it and SSH into it right off of the bat, as it obtained the proper address on its own. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains: fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8 fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7 fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12 fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10 fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3 fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14 fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1 fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2 fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4 fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100 fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13 fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18 I've done system restarts since the power outage and the agent hasn't corrected itself. I've restarted all neutron services as I've done things, I could also try stopping and starting dnsmasq. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/6/18 11:15 AM To: torin.woltjer at granddial.com Cc: "openstack at lists.openstack.org" , "openstack-operators at lists.openstack.org" , pgsousa at gmail.com Subject: Re: [Openstack] Recovering from full outage Can you manually assign an IP address to a VM and once inside, ping the address of the dhcp server? That would confirm if there is connectivity at least. Also, on the controller node where the dhcp server for that network is, check the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and make sure there are entries corresponding to your instances. In my experience, if neutron is broken after working fine (so excluding any miss-configuration), then an agent is out-of-sync and restart usually fixes things. On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer wrote: I have done tcpdumps on both the controllers and on a compute node. Controller: `ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i ns-83d68c76-b8 port 67` `tcpdump -vnes0 -i any port 67` Compute: `tcpdump -vnes0 -i brqd85c2a00-a6 port 68` For the first command on the controller, there are no packets captured at all. The second command on the controller captures packets, but they don't appear to be relevant to openstack. The dump from the compute node shows constant requests are getting sent by openstack instances. In summary; DHCP requests are being sent, but are never received. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 4:50 PM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage The cloud-init requires network connectivity by default in order to reach the metadata server for the hostname, ssh-key, etc You can configure cloud-init to use the config-drive, but the lack of network connectivity will make the instance useless anyway, even though it will have you ssh-key and hostname... Did you check the things I told you? On Jul 5, 2018, at 16:06, Torin Woltjer wrote: Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't working on my VMs. created a new instance from an ubuntu 18.04 image to test with, the hostname was not set to the name of the instance and could not login as users I had specified in the configuration. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 12:57 PM To: torin.woltjer at granddial.com Cc: "openstack at lists.openstack.org" , "openstack-operators at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage You should tcpdump inside the qdhcp namespace to see if the requests make it there, and also check iptables rules on the compute nodes for the return traffic. On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer wrote: Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been restarted on both controllers. The qdhcp netns's were all present; I stopped the service, removed the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, restarted all neutron services, noted the qdhcp netns's were recreated, restarted a VM again and it still fails to pull an IP address. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 10:38 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Did you restart the neutron-dhcp-agent and rebooted the VMs? On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer wrote: The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.woltjer at granddial.com Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators at lists.openstack.org" , "openstack at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack at lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrhillsman at gmail.com Tue Jul 10 23:16:18 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Tue, 10 Jul 2018 18:16:18 -0500 Subject: [Openstack-operators] [openstack-community] Openstack package repo In-Reply-To: References: Message-ID: May I suggest install python-pip and then pip install python-swiftclient (python-openstackclient, python-whateverclient, etc at that point) On Tue, Jul 10, 2018 at 10:32 AM, Amy Marrich wrote: > Alfredo, > > Forwarding this to the OPS list in the hopes of it reaching the > appropriate folks, but you might also want to checkout the RDO repos > > https://trunk.rdoproject.org/centos7/current/ > > > Thanks, > > > Amy (spotz) > > On Tue, Jul 10, 2018 at 10:07 AM, Alfredo De Luca < > alfredo.deluca at gmail.com> wrote: > >> Hi all. >> I have centos/7 on a VM Virtualbox... I want to install all the openstack >> python clients (nova, swift etc). >> I installed >> *yum install centos-release-openstack-queens * >> >> and all good but when I try to install one client I have the following >> error: >> >> yum install python-swiftclient >> >> ** >> Loaded plugins: fastestmirror >> Loading mirror speeds from cached hostfile >> * base: mirror.infonline.de >> * extras: mirror.infonline.de >> * updates: centos.mirrors.psw.services >> centos-ceph-luminous >> | 2.9 kB 00:00:00 >> centos-openstack-queens >> | 2.9 kB 00:00:00 >> *http://mirror.centos.org/altarch/7/virt/x86_64/kvm-common/repodata/repomd.xml >> : >> [Errno 14] HTTP Error 404 - Not Found* >> Trying other mirror. >> To address this issue please refer to the below wiki article >> >> https://wiki.centos.org/yum-errors >> ** >> >> Now the only way to install the package (or any other) is to disable that >> repo >> *yum-config-manager --disable centos-qemu-ev* >> >> then I can install the client... >> >> Any idea? >> It looks like *http://mirror.centos.org/altarch/7/virt/x86_64 >> doesn't exist.....* >> >> >> >> >> >> >> -- >> *Alfredo* >> >> >> _______________________________________________ >> Community mailing list >> Community at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/community >> >> > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From namnh at vn.fujitsu.com Wed Jul 11 16:44:39 2018 From: namnh at vn.fujitsu.com (Nguyen Hoai, Nam) Date: Wed, 11 Jul 2018 16:44:39 +0000 Subject: [Openstack-operators] Vietnam OpenInfra Days - Call for presentations Message-ID: <1531413407355.46226@vn.fujitsu.com> Hello everyone, We - VietOpenStack from Vietnam would like to notify that OpenInfra Days will be held the full day on Aug-25-2018. We are writing this email to invite everyone to join with us as speakers or listeners. In this event, we will focus on some topics like OpenStack, SDS (Ceph), SDN/NFV, Container (K8S/Docker..), CI/CD (Jenkins/Gitlab/Zuul), Automation (Ansible..) and cases study in Cloud Native. So we are very glad to welcome you as speakers to present some above topics. Some important information as the following: 1. Common information • Powered by: OpenStack Foundation • Website: https://2018.vietopenstack.org • Time: 8:00 to 17:00, Sunday, Aug-25-2018 • Location: Hanoi, Vietnam • Email contact: contact at vietopenstack.org 2. How to become speakers Link register: https://2018.vietopenstack.org/2018/06/28/call-for-presentations Scheduler for CFP: ? • Deadline for CFP: Aug-01-2018 • Deadline to get result: Aug-03-2018 • Deadline to send slides: Aug-10-2018 • Deadline for reviewing : Aug-20-2018 3. Addition information • Maximum of each presentation is 2 speakers • Speakers will be provided free ticket access, not include traveling and living cost After your register is approved, we will spend our full effort to support you how to come Vietnam, a beautiful and peaceful country. Thanks and best regards, Nam Nguyen Hoai -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Wed Jul 11 18:49:23 2018 From: melwittt at gmail.com (melanie witt) Date: Wed, 11 Jul 2018 11:49:23 -0700 Subject: [Openstack-operators] [nova] Denver Stein ptg planning Message-ID: <60144508-c601-95f8-1b39-3b5287b2ff76@gmail.com> Hello Devs and Ops, I've created an etherpad where we can start collecting ideas for topics to cover at the Stein PTG. Please feel free to add your comments and topics with your IRC nick next to it to make it easier to discuss with you. https://etherpad.openstack.org/p/nova-ptg-stein Cheers, -melanie From ashlee at openstack.org Wed Jul 11 19:34:33 2018 From: ashlee at openstack.org (Ashlee Ferguson) Date: Wed, 11 Jul 2018 14:34:33 -0500 Subject: [Openstack-operators] OpenStack Summit Berlin CFP Closes July 17 Message-ID: Hi everyone, The CFP deadline for the OpenStack Summit Berlin is less than one week away, so make sure to submit your talks before July 18 at 6:59am UTC (July 17 at 11:59pm PST). Tracks: • CI/CD • Container Infrastructure • Edge Computing • Hands on Workshops • HPC / GPU / AI • Open Source Community • Private & Hybrid Cloud • Public Cloud • Telecom & NFV SUBMIT HERE Community voting, the first step in building the Summit schedule, will open in mid July. Once community voting concludes, a Programming Committee for each Track will build the schedule. Programming Committees are made up of individuals from many different open source communities working in open infrastructure, in addition to people who have participated in the past. Read the full selection process here . Register for the Summit - Early Bird pricing ends August 21 Become a Sponsor Cheers, Ashlee Ashlee Ferguson OpenStack Foundation ashlee at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From torin.woltjer at granddial.com Wed Jul 11 21:23:30 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Wed, 11 Jul 2018 21:23:30 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: If I run `ip netns exec qrouter netstat -lnp` or `ip netns exec qdhcp netstat -lnp` on the controller, should I see anything listening on the metadata port (8775)? When I run these commands I don't see that listening, but I have no example of a working system to check against. Can anybody verify this? Thanks, Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/10/18 2:58 PM To: Cc: , Subject: Re: [Openstack] Recovering from full outage DHCP is working again so instances are getting their addresses. For some reason cloud-init isn't working correctly. Hostnames aren't getting set, and SSH key pair isn't getting set. The neutron-metadata service is in control of this? neutron-metadata-agent.log: 2018-07-10 08:01:42.046 5518 INFO eventlet.wsgi.server [-] 109.73.185.195, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0622332 2018-07-10 09:49:42.604 5518 INFO eventlet.wsgi.server [-] 197.149.85.150, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0645461 2018-07-10 10:52:50.845 5517 INFO eventlet.wsgi.server [-] 88.249.225.204, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0659041 2018-07-10 11:43:20.471 5518 INFO eventlet.wsgi.server [-] 143.208.186.168, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0618532 2018-07-10 11:53:15.574 5511 INFO eventlet.wsgi.server [-] 194.40.240.254, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0636070 2018-07-10 13:26:46.795 5518 INFO eventlet.wsgi.server [-] 109.73.177.149, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0611560 2018-07-10 13:27:38.795 5513 INFO eventlet.wsgi.server [-] 125.167.69.238, "GET / HTTP/1.0" status: 404 len: 195 time: 0.0631371 2018-07-10 13:30:49.551 5514 INFO eventlet.wsgi.server [-] 155.93.152.111, "GET / HTTP/1.0" status: 404 len: 195 time: 0.0609179 2018-07-10 14:12:42.008 5521 INFO eventlet.wsgi.server [-] 190.85.38.173, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0597739 No other log files show abnormal behavior. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/6/18 2:33 PM To: "lmihaiescu at gmail.com" Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage I explored creating a second "selfservice" vxlan to see if DHCP would work on it as it does on my external "provider" network. The new vxlan network shares the same problems as the old vxlan network. Am I having problems with VXLAN in particular? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/6/18 12:05 PM To: Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP (located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually added the IP address to has a floating IP, and oddly enough I am able to ping DHCP on the provider network, which suggests that DHCP may be working on other networks but not on my selfservice network. I was able to confirm this by creating a new virtual machine directly on the provider network, I was able to ping to it and SSH into it right off of the bat, as it obtained the proper address on its own. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains: fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8 fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7 fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12 fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10 fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3 fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14 fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1 fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2 fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4 fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100 fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13 fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18 I've done system restarts since the power outage and the agent hasn't corrected itself. I've restarted all neutron services as I've done things, I could also try stopping and starting dnsmasq. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/6/18 11:15 AM To: torin.woltjer at granddial.com Cc: "openstack at lists.openstack.org" , "openstack-operators at lists.openstack.org" , pgsousa at gmail.com Subject: Re: [Openstack] Recovering from full outage Can you manually assign an IP address to a VM and once inside, ping the address of the dhcp server? That would confirm if there is connectivity at least. Also, on the controller node where the dhcp server for that network is, check the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and make sure there are entries corresponding to your instances. In my experience, if neutron is broken after working fine (so excluding any miss-configuration), then an agent is out-of-sync and restart usually fixes things. On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer wrote: I have done tcpdumps on both the controllers and on a compute node. Controller: `ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i ns-83d68c76-b8 port 67` `tcpdump -vnes0 -i any port 67` Compute: `tcpdump -vnes0 -i brqd85c2a00-a6 port 68` For the first command on the controller, there are no packets captured at all. The second command on the controller captures packets, but they don't appear to be relevant to openstack. The dump from the compute node shows constant requests are getting sent by openstack instances. In summary; DHCP requests are being sent, but are never received. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 4:50 PM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage The cloud-init requires network connectivity by default in order to reach the metadata server for the hostname, ssh-key, etc You can configure cloud-init to use the config-drive, but the lack of network connectivity will make the instance useless anyway, even though it will have you ssh-key and hostname... Did you check the things I told you? On Jul 5, 2018, at 16:06, Torin Woltjer wrote: Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't working on my VMs. created a new instance from an ubuntu 18.04 image to test with, the hostname was not set to the name of the instance and could not login as users I had specified in the configuration. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 12:57 PM To: torin.woltjer at granddial.com Cc: "openstack at lists.openstack.org" , "openstack-operators at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage You should tcpdump inside the qdhcp namespace to see if the requests make it there, and also check iptables rules on the compute nodes for the return traffic. On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer wrote: Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been restarted on both controllers. The qdhcp netns's were all present; I stopped the service, removed the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, restarted all neutron services, noted the qdhcp netns's were recreated, restarted a VM again and it still fails to pull an IP address. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/5/18 10:38 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Did you restart the neutron-dhcp-agent and rebooted the VMs? On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer wrote: The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.woltjer at granddial.com Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators at lists.openstack.org" , "openstack at lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.woltjer at granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack at lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -------------- next part -------------- An HTML attachment was scrubbed... URL: From thangam.arunx at gmail.com Thu Jul 12 03:59:56 2018 From: thangam.arunx at gmail.com (=?UTF-8?B?4K6F4K6w4K+B4K6j4K+NIOCuleCvgeCuruCuvuCusOCvjSAoQXJ1biBLdW1hcik=?=) Date: Thu, 12 Jul 2018 11:59:56 +0800 Subject: [Openstack-operators] [Openstack] Recovering from full outage In-Reply-To: References: Message-ID: Hi Torin, If I run `ip netns exec qrouter netstat -lnp` or `ip netns exec qdhcp > netstat -lnp` on the controller, should I see anything listening on the > metadata port (8775)? When I run these commands I don't see that listening, > but I have no example of a working system to check against. Can anybody > verify this? > Either on qrouter/qdhcp namespaces, you won't see port 8775, instead check whether meta-data service is running on the neutron controller node(s) and listening on port 8775? Aslo, you can verify metadata and neturon services using following commands service neutron-metadata-agent status neutron agent-list netstat -ntplua | grep :8775 Thanks & Regards Arun ஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃ அன்புடன் அருண் நுட்பம் நம்மொழியில் தழைக்கச் செய்வோம் http://thangamaniarun.wordpress.com ஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃ -------------- next part -------------- An HTML attachment was scrubbed... URL: From adriant at catalyst.net.nz Thu Jul 12 04:01:21 2018 From: adriant at catalyst.net.nz (Adrian Turjak) Date: Thu, 12 Jul 2018 16:01:21 +1200 Subject: [Openstack-operators] [publiccloud-wg] [adjutant] Input on Adjutant's official project status Message-ID: Hello fellow public cloud providers (and others)! Adjutant is in the process of being voted in (or not) as an official project as part of OpenStack, but to help over the last few hurdles, some input from the people who would likely benefit the most directly from such a service existing would really be useful. In the past you've probably talked to me about the need for some form of business logic related APIs and services in OpenStack (signup, account termination, project/user management, billing details management, etc). In that space I've been trying to push Adjutant as a solution, not because it's the perfect solution, but because we are trying to keep the service as a cloud agnostic solution that could be tweaked for the unique requirements of various clouds. It's also a place were we can collaborate on these often rather miscellaneous business logic requirements rather than us each writing our own entirely distinct thing and wasting time and effort reinventing the wheel again and again. The review in question where this discussion has been happening for a while: https://review.openstack.org/#/c/553643/ And if you don't know much about Adjutant, here is a little background. The current mission statement is: "To provide an extensible API framework for exposing to users an organization's automated business processes relating to account management across OpenStack and external systems, that can be adapted to the unique requirements of an organization's processes." The docs: https://adjutant.readthedocs.io/en/latest/ The code: https://github.com/openstack/adjutant And here is a rough feature list that was put together as part of the review process for official project status: https://etherpad.openstack.org/p/Adjutant_Features If you have any questions about the service, don't hesitate to get in touch, but some input on the current discussion would be very welcome! Cheers, Adrian Turjak -------------- next part -------------- A non-text attachment was scrubbed... Name: pEpkey.asc Type: application/pgp-keys Size: 1769 bytes Desc: not available URL: From torin.woltjer at granddial.com Thu Jul 12 12:20:32 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Thu, 12 Jul 2018 12:20:32 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: <0742b8e467364769a2c2cdac10067e2f@granddial.com> The neutron-metadata-agent service is running, the the agent is alive, and it is listening on port 8775. However, new instances still do not get any information like hostname or keypair. If I run `curl 192.168.116.22:8775` from the compute nodes, I do get a response. The metadata agent is running, listening, and accessible from the compute nodes; and it worked previously. I'm stumped. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: அருண் குமார் (Arun Kumar) Sent: 7/12/18 12:01 AM To: torin.woltjer at granddial.com Cc: "openstack at lists.openstack.org" , openstack-operators at lists.openstack.org Subject: Re: [Openstack-operators] [Openstack] Recovering from full outage Hi Torin, If I run `ip netns exec qrouter netstat -lnp` or `ip netns exec qdhcp netstat -lnp` on the controller, should I see anything listening on the metadata port (8775)? When I run these commands I don't see that listening, but I have no example of a working system to check against. Can anybody verify this? Either on qrouter/qdhcp namespaces, you won't see port 8775, instead check whether meta-data service is running on the neutron controller node(s) and listening on port 8775? Aslo, you can verify metadata and neturon services using following commands service neutron-metadata-agent status neutron agent-list netstat -ntplua | grep :8775 Thanks & Regards Arun ஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃ அன்புடன் அருண் நுட்பம் நம்மொழியில் தழைக்கச் செய்வோம் http://thangamaniarun.wordpress.com ஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃஃ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jpetrini at coredial.com Thu Jul 12 13:16:15 2018 From: jpetrini at coredial.com (John Petrini) Date: Thu, 12 Jul 2018 09:16:15 -0400 Subject: [Openstack-operators] [Openstack] Recovering from full outage In-Reply-To: <0742b8e467364769a2c2cdac10067e2f@granddial.com> References: <0742b8e467364769a2c2cdac10067e2f@granddial.com> Message-ID: Are you instances receiving a route to the metadata service (169.254.169.254) from DHCP? Can you curl the endpoint? curl http://169.254.169.254/latest/meta-data -------------- next part -------------- An HTML attachment was scrubbed... URL: From torin.woltjer at granddial.com Thu Jul 12 14:21:17 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Thu, 12 Jul 2018 14:21:17 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: <863528991ac14ddf87d2449c763071e1@granddial.com> I tested this on two instances. The first instance has existed since before I began having this issue. The second is created from a cirros test image. On the first instance: The route exists: 169.254.169.254 via 172.16.1.1 dev ens3 proto dhcp metric 100. curl returns information, for example; `curl http://169.254.169.254/latest/meta-data/public-keys` 0=nextcloud On the second instance: The route exists: 169.254.169.254 via 172.16.1.1 dev eth0 curl fails; `curl http://169.254.169.254/latest/meta-data` curl: (7) Failed to connect to 169.254.169.254 port 80: Connection timed out I am curious why this is the case that one is able to connect but not the other. Both the first and second instances were running on the same compute node. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: John Petrini Sent: 7/12/18 9:16 AM To: torin.woltjer at granddial.com Cc: thangam.arunx at gmail.com, OpenStack Operators , OpenStack Mailing List Subject: Re: [Openstack-operators] [Openstack] Recovering from full outage Are you instances receiving a route to the metadata service (169.254.169.254) from DHCP? Can you curl the endpoint? curl http://169.254.169.254/latest/meta-data -------------- next part -------------- An HTML attachment was scrubbed... URL: From jpetrini at coredial.com Thu Jul 12 14:33:10 2018 From: jpetrini at coredial.com (John Petrini) Date: Thu, 12 Jul 2018 10:33:10 -0400 Subject: [Openstack-operators] [Openstack] Recovering from full outage In-Reply-To: <863528991ac14ddf87d2449c763071e1@granddial.com> References: <863528991ac14ddf87d2449c763071e1@granddial.com> Message-ID: You might want to try giving the neutron-dhcp and metadata agents a restart. -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Thu Jul 12 14:42:17 2018 From: amy at demarco.com (Amy Marrich) Date: Thu, 12 Jul 2018 09:42:17 -0500 Subject: [Openstack-operators] [openstack-community] Running instance snapshot In-Reply-To: References: Message-ID: Alfredo, I've added the operators list but in Newton you should be able to use the OpenStack CLI for this as well. Checkout 'openstack server backup create --help' if you get an error again add --debug to the end to get some more information to troubleshoot. Thanks, Amy (spotz) On Thu, Jul 12, 2018 at 8:25 AM, Alfredo De Luca wrote: > Hi all. > We have OS Newton and I wonder if it's possible to perform instance > snapshot either on WUI or CLI. > > ​I tried with glance image-create or nova backup.... but I got the > following > > ERROR (BadRequest): The request is invalid. (HTTP 400) (Request-ID: > req-89154d7e-f0c5-4a2a-9bc9-b98c0c5e3182)​ > > > ​Any clue/info? > cheers > > > -- > *Alfredo* > > > _______________________________________________ > Community mailing list > Community at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/community > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From torin.woltjer at granddial.com Thu Jul 12 15:03:26 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Thu, 12 Jul 2018 15:03:26 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: <373f719b15654b4a8ae5832d8e12229f@granddial.com> Checking iptables for the metadata-proxy inside of qrouter provides the following: $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | grep 169 [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697 [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0xffff Packets:Bytes are both 0, so no traffic is touching this rule? Interestingly the command: $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | grep 9697 returns nothing, so there isn't actually anything running on 9697 in the network namespace... This is the output without grep: Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name raw 0 0 0.0.0.0:112 0.0.0.0:* 7 0 76154 8404/keepalived raw 0 0 0.0.0.0:112 0.0.0.0:* 7 0 76153 8404/keepalived Active UNIX domain sockets (servers and established) Proto RefCnt Flags Type State I-Node PID/Program name Path unix 2 [ ] DGRAM 64501 7567/python2 unix 2 [ ] DGRAM 79953 8403/keepalived Could the reason no traffic touching the rule be that nothing is listening on that port, or is there a second issue down the chain? Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata agent. Thank you for this, and any future help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alfredo.deluca at gmail.com Thu Jul 12 15:07:41 2018 From: alfredo.deluca at gmail.com (Alfredo De Luca) Date: Thu, 12 Jul 2018 17:07:41 +0200 Subject: [Openstack-operators] [openstack-community] Openstack package repo In-Reply-To: References: Message-ID: Hi Melvin. Thanks for that. Anyway I was able to install the packages with the repo. Cheers On Wed, Jul 11, 2018 at 1:16 AM Melvin Hillsman wrote: > May I suggest install python-pip and then pip install python-swiftclient > (python-openstackclient, python-whateverclient, etc at that point) > > On Tue, Jul 10, 2018 at 10:32 AM, Amy Marrich wrote: > >> Alfredo, >> >> Forwarding this to the OPS list in the hopes of it reaching the >> appropriate folks, but you might also want to checkout the RDO repos >> >> https://trunk.rdoproject.org/centos7/current/ >> >> >> Thanks, >> >> >> Amy (spotz) >> >> On Tue, Jul 10, 2018 at 10:07 AM, Alfredo De Luca < >> alfredo.deluca at gmail.com> wrote: >> >>> Hi all. >>> I have centos/7 on a VM Virtualbox... I want to install all the >>> openstack python clients (nova, swift etc). >>> I installed >>> *yum install centos-release-openstack-queens * >>> >>> and all good but when I try to install one client I have the following >>> error: >>> >>> yum install python-swiftclient >>> >>> ** >>> Loaded plugins: fastestmirror >>> Loading mirror speeds from cached hostfile >>> * base: mirror.infonline.de >>> * extras: mirror.infonline.de >>> * updates: centos.mirrors.psw.services >>> centos-ceph-luminous >>> | 2.9 kB 00:00:00 >>> centos-openstack-queens >>> | 2.9 kB 00:00:00 >>> *http://mirror.centos.org/altarch/7/virt/x86_64/kvm-common/repodata/repomd.xml >>> : >>> [Errno 14] HTTP Error 404 - Not Found* >>> Trying other mirror. >>> To address this issue please refer to the below wiki article >>> >>> https://wiki.centos.org/yum-errors >>> ** >>> >>> Now the only way to install the package (or any other) is to disable >>> that repo >>> *yum-config-manager --disable centos-qemu-ev* >>> >>> then I can install the client... >>> >>> Any idea? >>> It looks like *http://mirror.centos.org/altarch/7/virt/x86_64 >>> doesn't exist.....* >>> >>> >>> >>> >>> >>> >>> -- >>> *Alfredo* >>> >>> >>> _______________________________________________ >>> Community mailing list >>> Community at lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/community >>> >>> >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >> > > > -- > Kind regards, > > Melvin Hillsman > mrhillsman at gmail.com > mobile: (832) 264-2646 > -- *Alfredo* -------------- next part -------------- An HTML attachment was scrubbed... URL: From alfredo.deluca at gmail.com Thu Jul 12 15:09:16 2018 From: alfredo.deluca at gmail.com (Alfredo De Luca) Date: Thu, 12 Jul 2018 17:09:16 +0200 Subject: [Openstack-operators] [openstack-community] Running instance snapshot In-Reply-To: References: Message-ID: Thanks Amy... I ll do that and post the result Cheers On Thu, Jul 12, 2018 at 4:42 PM Amy Marrich wrote: > Alfredo, > > I've added the operators list but in Newton you should be able to use the > OpenStack CLI for this as well. Checkout 'openstack server backup create > --help' if you get an error again add --debug to the end to get some more > information to troubleshoot. > > Thanks, > > Amy (spotz) > > On Thu, Jul 12, 2018 at 8:25 AM, Alfredo De Luca > wrote: > >> Hi all. >> We have OS Newton and I wonder if it's possible to perform instance >> snapshot either on WUI or CLI. >> >> ​I tried with glance image-create or nova backup.... but I got the >> following >> >> ERROR (BadRequest): The request is invalid. (HTTP 400) (Request-ID: >> req-89154d7e-f0c5-4a2a-9bc9-b98c0c5e3182)​ >> >> >> ​Any clue/info? >> cheers >> >> >> -- >> *Alfredo* >> >> >> _______________________________________________ >> Community mailing list >> Community at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/community >> >> > -- *Alfredo* -------------- next part -------------- An HTML attachment was scrubbed... URL: From jp.methot at planethoster.info Thu Jul 12 19:08:54 2018 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Thu, 12 Jul 2018 15:08:54 -0400 Subject: [Openstack-operators] Cinder-volume and high availability Message-ID: Hi, I’ve noticed that in the high-availability guide, it is not recommended to run cinder-volume in an active-active configuration. However, I have built an active-passive setup that uses keepalived and a virtual IP to redirect API traffic to only one controller at a time. In such a configuration, would I still need to have only one cinder-volume service running at a time? Also, the backend is a Dell compellent SAN, if that makes a difference. Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lars at redhat.com Sun Jul 15 03:07:10 2018 From: lars at redhat.com (Lars Kellogg-Stedman) Date: Sat, 14 Jul 2018 23:07:10 -0400 Subject: [Openstack-operators] How are you handling billing/chargeback? In-Reply-To: References: <20180312192113.znz4eavfze5zg7yn@redhat.com> Message-ID: <20180715030710.uaqzkzhblnvjcvos@redhat.com> On Tue, Jul 10, 2018 at 06:54:55AM +0200, Christian Zunker wrote: > just a short feedback to my previous post. > [...] > cloudkitty replaced our self-written script completely. We needed some time > to get used to it, but it reduced our maintenance... Hi Christian, Thanks for following up! That all seems like a pretty strong recommendation. I haven't had the chance to look at cloudkitty myself, but once I get out from under my pile of tripleo patches I will try to take a look. -- Lars Kellogg-Stedman | larsks @ {irc,twitter,github} http://blog.oddbit.com/ | From mriedemos at gmail.com Sun Jul 15 14:18:51 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Sun, 15 Jul 2018 09:18:51 -0500 Subject: [Openstack-operators] [nova] Cinder cross_az_attach=False changes/fixes In-Reply-To: <9d5a9be4-7869-d67d-7931-d65ea3b8d10d@gmail.com> References: <9d5a9be4-7869-d67d-7931-d65ea3b8d10d@gmail.com> Message-ID: <23a931e6-9d30-344a-f938-601aa08c6a5d@gmail.com> Just an update on an old thread, but I've been working on the cross_az_attach=False issues again this past week and I think I have a couple of decent fixes. On 5/31/2017 6:08 PM, Matt Riedemann wrote: > This is a request for any operators out there that configure nova to set: > > [cinder] > cross_az_attach=False > > To check out these two bug fixes: > > 1. https://review.openstack.org/#/c/366724/ > > This is a case where nova is creating the volume during boot from volume > and providing an AZ to cinder during the volume create request. Today we > just pass the instance.availability_zone which is None if the instance > was created without an AZ set. It's unclear to me if that causes the > volume creation to fail (someone in IRC was showing the volume going > into ERROR state while Nova was waiting for it to be available), but I > think it will cause the later attach to fail here [1] because the > instance AZ (defaults to None) and volume AZ (defaults to nova) may not > match. I'm still looking for more details on the actual failure in that > one though. > > The proposed fix in this case is pass the AZ associated with any host > aggregate that the instance is in. This was indirectly fixed by change https://review.openstack.org/#/c/446053/ in Pike where we now set the instance.availability_zone in conductor after we get a selected host from the scheduler (we get the AZ for the host and set that on the instance before sending the instance to compute to build it). While investigating this on master, I found a new bug where we do an up-call to the API DB which fails in a split MQ setup, and I have a fix here: https://review.openstack.org/#/c/582342/ > > 2. https://review.openstack.org/#/c/469675/ > > This is similar, but rather than checking the AZ when we're on the > compute and the instance has a host, we're in the API and doing a boot > from volume where an existing volume is provided during server create. > By default, the volume's AZ is going to be 'nova'. The code doing the > check here is getting the AZ for the instance, and since the instance > isn't on a host yet, it's not in any aggregate, so the only AZ we can > get is from the server create request itself. If an AZ isn't provided > during the server create request, then we're comparing > instance.availability_zone (None) to volume['availability_zone'] > ("nova") and that results in a 400. > > My proposed fix is in the case of BFV checks from the API, we default > the AZ if one wasn't requested when comparing against the volume. By > default this is going to compare "nova" for nova and "nova" for cinder, > since CONF.default_availability_zone is "nova" by default in both projects. I've refined this fix a bit to be more flexible: https://review.openstack.org/#/c/469675/ So now if doing boot from volume and we're checking cross_az_attach=False in the API and the user didn't explicitly request an AZ for the instance, we do a few checks: 1. If [DEFAULT]/default_schedule_zone is not None (the default), we use that to compare against the volume AZ. 2. If the volume AZ is equal to the [DEFAULT]/default_availability_zone (nova by default in both nova and cinder), we're OK - no issues. 3. If the volume AZ is not equal to [DEFAULT]/default_availability_zone, it means either the volume was created with a specific AZ or cinder's default AZ is configured differently from nova's. In that case, I take the volume AZ and put it into the instance RequestSpec so that during scheduling, the nova scheduler picks a host in the same AZ as the volume - if that AZ isn't in nova, we fail to schedule (NoValidHost) (but that shouldn't really happen, why would one have cross_az_attach=False w/o mirrored AZ in both cinder and nova?). > > -- > > I'm requesting help from any operators that are setting > cross_az_attach=False because I have to imagine your users have run into > this and you're patching around it somehow, so I'd like input on how you > or your users are dealing with this. > > I'm also trying to recreate these in upstream CI [2] which I was already > able to do with the 2nd bug. This devstack patch has recreated both issues above and I'm adding the fixes to it as dependencies to show the problems are resolved. > > Having said all of this, I really hate cross_az_attach as it's > config-driven API behavior which is not interoperable across clouds. > Long-term I'd really love to deprecate this option but we need a > replacement first, and I'm hoping placement with compute/volume resource > providers in a shared aggregate can maybe make that happen. > > [1] > https://github.com/openstack/nova/blob/f278784ccb06e16ee12a42a585c5615abe65edfe/nova/virt/block_device.py#L368 > > [2] https://review.openstack.org/#/c/467674/ -- Thanks, Matt From lijie at unitedstack.com Mon Jul 16 08:55:46 2018 From: lijie at unitedstack.com (=?utf-8?B?UmFtYm8=?=) Date: Mon, 16 Jul 2018 16:55:46 +0800 Subject: [Openstack-operators] [cinder] about BlockDeviceDriver Message-ID: Hi,all In the Cinder repository, I noticed that the BlockDeviceDriver driver is being deprecated, and was eventually be removed with the Queens release. https://github.com/openstack/cinder/blob/stable/ocata/cinder/volume/drivers/block_device.py In my use case, the instances using Cinder perform intense I/O, thus iSCSI or LVM is not a viable option - benchmarked them several times, since Juno, unsatisfactory results.For data processing scenarios is always better to use local storage than any SAN/NAS solution. So I felt a great need to know why we deprecated it.If there has any better one to replace it? What do you suggest to use once BlockDeviceDriver is removed?Can you tell me about this?Thank you very much! Best Regards Rambo -------------- next part -------------- An HTML attachment was scrubbed... URL: From torin.woltjer at granddial.com Mon Jul 16 12:41:16 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Mon, 16 Jul 2018 12:41:16 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: <1361da1cb6954d29955d92d0b0f3ddae@granddial.com> $ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl http://169.254.169.254 404 Not Found

404 Not Found

The resource could not be found.

$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl http://169.254.169.254 curl: (7) Couldn't connect to server Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: "Torin Woltjer" Sent: 7/12/18 11:16 AM To: , , "jpetrini at coredial.com" Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] [Openstack-operators] Recovering from full outage Checking iptables for the metadata-proxy inside of qrouter provides the following: $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | grep 169 [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697 [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0xffff Packets:Bytes are both 0, so no traffic is touching this rule? Interestingly the command: $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | grep 9697 returns nothing, so there isn't actually anything running on 9697 in the network namespace... This is the output without grep: Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name raw 0 0 0.0.0.0:112 0.0.0.0:* 7 0 76154 8404/keepalived raw 0 0 0.0.0.0:112 0.0.0.0:* 7 0 76153 8404/keepalived Active UNIX domain sockets (servers and established) Proto RefCnt Flags Type State I-Node PID/Program name Path unix 2 [ ] DGRAM 64501 7567/python2 unix 2 [ ] DGRAM 79953 8403/keepalived Could the reason no traffic touching the rule be that nothing is listening on that port, or is there a second issue down the chain? Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata agent. Thank you for this, and any future help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From torin.woltjer at granddial.com Mon Jul 16 20:54:28 2018 From: torin.woltjer at granddial.com (Torin Woltjer) Date: Mon, 16 Jul 2018 20:54:28 GMT Subject: [Openstack-operators] [Openstack] Recovering from full outage Message-ID: I feel pretty dumb about this, but it was fixed by adding a rule to my security groups. I'm still very confused about some of the other behavior that I saw, but at least the problem is fixed now. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---------------------------------------- From: Brian Haley Sent: 7/16/18 4:39 PM To: torin.woltjer at granddial.com, thangam.arunx at gmail.com, jpetrini at coredial.com Cc: openstack-operators at lists.openstack.org, openstack at lists.openstack.org Subject: Re: [Openstack] [Openstack-operators] Recovering from full outage On 07/16/2018 08:41 AM, Torin Woltjer wrote: > $ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl > http://169.254.169.254 > > > 404 Not Found > > > 404 Not Found > The resource could not be found. > > Strange, don't know where the reply came from for that. > $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl > http://169.254.169.254 > curl: (7) Couldn't connect to server Based on your iptables output below, I would think the metadata proxy is running in the qrouter namespace. However, a curl from there will not work since it is restricted to only work for incoming packets from the qr- device(s). You would have to try curl from a running instance. Is there an haproxy process running? And is it listening on port 9697 in the qrouter namespace? -Brian > ------------------------------------------------------------------------ > *From*: "Torin Woltjer" > *Sent*: 7/12/18 11:16 AM > *To*: , , > "jpetrini at coredial.com" > *Cc*: openstack-operators at lists.openstack.org, openstack at lists.openstack.org > *Subject*: Re: [Openstack] [Openstack-operators] Recovering from full outage > Checking iptables for the metadata-proxy inside of qrouter provides the > following: > $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e > iptables-save -c | grep 169 > [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p > tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697 > [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p > tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0xffff > Packets:Bytes are both 0, so no traffic is touching this rule? > > Interestingly the command: > $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat > -anep | grep 9697 > returns nothing, so there isn't actually anything running on 9697 in the > network namespace... > > This is the output without grep: > Active Internet connections (servers and established) > Proto Recv-Q Send-Q Local Address Foreign Address > State User Inode PID/Program name > raw 0 0 0.0.0.0:112 0.0.0.0:* 7 > 0 76154 8404/keepalived > raw 0 0 0.0.0.0:112 0.0.0.0:* 7 > 0 76153 8404/keepalived > Active UNIX domain sockets (servers and established) > Proto RefCnt Flags Type State I-Node PID/Program > name Path > unix 2 [ ] DGRAM 64501 7567/python2 > unix 2 [ ] DGRAM 79953 8403/keepalived > > Could the reason no traffic touching the rule be that nothing is > listening on that port, or is there a second issue down the chain? > > Curl fails even after restarting the neutron-dhcp-agent & > neutron-metadata agent. > > Thank you for this, and any future help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Mon Jul 16 21:40:11 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 16 Jul 2018 16:40:11 -0500 Subject: [Openstack-operators] [openstack-community] Running instance snapshot In-Reply-To: References: Message-ID: <9a9a1238-a152-058c-c25e-0fb4b8b0646d@gmail.com> On 7/12/2018 10:09 AM, Alfredo De Luca wrote: > ​I tried with glance image-create or nova backup.... but I got the > following Neither of those are server snapshot operations (well backup is, but it's probably not what you're looking for). glance image-create is creating an image in glance, not creating a snapshot from a server. That would be 'nova image-create': https://docs.openstack.org/python-novaclient/latest/cli/nova.html#nova-image-create What is the error message in the 400 response? It should be in the CLI output but if not, what's in the nova-api logs? -- Thanks, Matt From ashlee at openstack.org Tue Jul 17 13:34:03 2018 From: ashlee at openstack.org (Ashlee Ferguson) Date: Tue, 17 Jul 2018 08:34:03 -0500 Subject: [Openstack-operators] OpenStack Summit Berlin CFP Deadline Today Message-ID: Hi everyone, The CFP for the OpenStack Summit Berlin closes July 17 at 11:59pm PST (July 18 at 6:59am UTC), so make sure to press submit on your talks for: • CI/CD • Container Infrastructure • Edge Computing • Hands-on Workshops • HPC / GPU / AI • Open Source Community • Private & Hybrid Cloud • Public Cloud • Telecom & NFV SUBMIT HERE Register for the Summit - Early Bird pricing ends August 21 Become a Sponsor If you have any questions, please email summit at openstack.org . Cheers, Ashlee Ashlee Ferguson OpenStack Foundation ashlee at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.therond at gmail.com Tue Jul 17 15:13:41 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Tue, 17 Jul 2018 17:13:41 +0200 Subject: [Openstack-operators] [kolla-ansible][octavia-role] Message-ID: Hi guys, I'm a trying to install Octavia as a new service on our cloud and facing few issues that I've been able to manage so far, until this nova-api keypair related issue. When creating a loadbalancer with the following command: openstack --os-cloud loadbalancer create --name lb1 --vip-network-id My loadbalancer is in ERROR state with the following error from the NOVA API logs: 2018-07-17 14:03:58.721 25812 INFO nova.api.openstack.wsgi [req-69713077-c1e9-409a-9f9b-e3d5fb8006fc - - - - -] HTTP exception thrown: Invalid key_name provided. 2018-07-17 14:03:58.723 25812 INFO nova.osapi_compute.wsgi.server [req-69713077-c1e9-409a-9f9b-e3d5fb8006fc - - - - -] 10.1.0.10,172.21.0.21 "POST /v2.1/8dfa9231b14545bbab9d222c4425dd2f/servers HTTP/1.1" status: 400 len: 489 time: 0.8432851 >From my understanding of the nova-api source code it seems to be related to nova-api not being able to found out the expected ssh keypair, however if I'm doing: openstack --os-cloud keypair list I'm correctly seing the octavia_ssh_key entry for my user. Has anyone already made it work using kolla? On a side note, I'm using stable/queens branch for both kolla docker images and kolla-ansible. Kind regards, G. -------------- next part -------------- An HTML attachment was scrubbed... URL: From iain.macdonnell at oracle.com Tue Jul 17 16:06:47 2018 From: iain.macdonnell at oracle.com (iain MacDonnell) Date: Tue, 17 Jul 2018 09:06:47 -0700 Subject: [Openstack-operators] [kolla-ansible][octavia-role] In-Reply-To: References: Message-ID: On 07/17/2018 08:13 AM, Flint WALRUS wrote: > Hi guys, I'm a trying to install Octavia as a new service on our cloud > and facing few issues that I've been able to manage so far, until this > nova-api keypair related issue. > > When creating a loadbalancer with the following command: > > openstack --os-cloud loadbalancer create --name lb1 > --vip-network-id > > My loadbalancer is in ERROR state with the following error from the NOVA > API logs: > > 2018-07-17 14:03:58.721 25812 INFO nova.api.openstack.wsgi > [req-69713077-c1e9-409a-9f9b-e3d5fb8006fc - - - - -] HTTP exception > thrown: Invalid key_name provided. > > 2018-07-17 14:03:58.723 25812 INFO nova.osapi_compute.wsgi.server > [req-69713077-c1e9-409a-9f9b-e3d5fb8006fc - - - - -] > 10.1.0.10,172.21.0.21 "POST > /v2.1/8dfa9231b14545bbab9d222c4425dd2f/servers HTTP/1.1" status: 400 > len: 489 time: 0.8432851 > > From my understanding of the nova-api source code it seems to be > related to nova-api not being able to found out the expected ssh > keypair, however if I'm doing: > > openstack --os-cloud keypair list > > I'm correctly seing the octavia_ssh_key entry for my user. > > Has anyone already made it work using kolla? > On a side note, I'm using stable/queens branch for both kolla docker > images and kolla-ansible. Don't know how kolla handles it, but I'm fairly sure that the ssh key has to be created/owned by the user that creates the amphora instances, which is not the same as the user that creates the load-balancer. I believe it's the user specified in the service_auth section of octavia.conf. ~iain From johnsomor at gmail.com Tue Jul 17 16:55:08 2018 From: johnsomor at gmail.com (Michael Johnson) Date: Tue, 17 Jul 2018 09:55:08 -0700 Subject: [Openstack-operators] [kolla-ansible][octavia-role] In-Reply-To: References: Message-ID: Right. I am not familiar with the kolla role either, but you are correct. The keypair created in nova needs to be "owned" by the octavia service account. Michael On Tue, Jul 17, 2018 at 9:07 AM iain MacDonnell wrote: > > > > On 07/17/2018 08:13 AM, Flint WALRUS wrote: > > Hi guys, I'm a trying to install Octavia as a new service on our cloud > > and facing few issues that I've been able to manage so far, until this > > nova-api keypair related issue. > > > > When creating a loadbalancer with the following command: > > > > openstack --os-cloud loadbalancer create --name lb1 > > --vip-network-id > > > > My loadbalancer is in ERROR state with the following error from the NOVA > > API logs: > > > > 2018-07-17 14:03:58.721 25812 INFO nova.api.openstack.wsgi > > [req-69713077-c1e9-409a-9f9b-e3d5fb8006fc - - - - -] HTTP exception > > thrown: Invalid key_name provided. > > > > 2018-07-17 14:03:58.723 25812 INFO nova.osapi_compute.wsgi.server > > [req-69713077-c1e9-409a-9f9b-e3d5fb8006fc - - - - -] > > 10.1.0.10,172.21.0.21 "POST > > /v2.1/8dfa9231b14545bbab9d222c4425dd2f/servers HTTP/1.1" status: 400 > > len: 489 time: 0.8432851 > > > > From my understanding of the nova-api source code it seems to be > > related to nova-api not being able to found out the expected ssh > > keypair, however if I'm doing: > > > > openstack --os-cloud keypair list > > > > I'm correctly seing the octavia_ssh_key entry for my user. > > > > Has anyone already made it work using kolla? > > On a side note, I'm using stable/queens branch for both kolla docker > > images and kolla-ansible. > > Don't know how kolla handles it, but I'm fairly sure that the ssh key > has to be created/owned by the user that creates the amphora instances, > which is not the same as the user that creates the load-balancer. I > believe it's the user specified in the service_auth section of octavia.conf. > > ~iain > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From gael.therond at gmail.com Tue Jul 17 19:29:30 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Tue, 17 Jul 2018 21:29:30 +0200 Subject: [Openstack-operators] [kolla-ansible][octavia-role] In-Reply-To: References: Message-ID: Oooooh! Ok! Thanks a lot for such information !! Indeed that was not clear and didn’t make sens to me why each user should have to get it’s own ssh key to log into the amphora! You rocks guys, I’ll test that on tomorrow morning! Thanks a lot! G. Le mar. 17 juil. 2018 à 18:55, Michael Johnson a écrit : > Right. I am not familiar with the kolla role either, but you are > correct. The keypair created in nova needs to be "owned" by the > octavia service account. > > Michael > On Tue, Jul 17, 2018 at 9:07 AM iain MacDonnell > wrote: > > > > > > > > On 07/17/2018 08:13 AM, Flint WALRUS wrote: > > > Hi guys, I'm a trying to install Octavia as a new service on our cloud > > > and facing few issues that I've been able to manage so far, until this > > > nova-api keypair related issue. > > > > > > When creating a loadbalancer with the following command: > > > > > > openstack --os-cloud loadbalancer create --name lb1 > > > --vip-network-id > > > > > > My loadbalancer is in ERROR state with the following error from the > NOVA > > > API logs: > > > > > > 2018-07-17 14:03:58.721 25812 INFO nova.api.openstack.wsgi > > > [req-69713077-c1e9-409a-9f9b-e3d5fb8006fc - - - - -] HTTP exception > > > thrown: Invalid key_name provided. > > > > > > 2018-07-17 14:03:58.723 25812 INFO nova.osapi_compute.wsgi.server > > > [req-69713077-c1e9-409a-9f9b-e3d5fb8006fc - - - - -] > > > 10.1.0.10,172.21.0.21 "POST > > > /v2.1/8dfa9231b14545bbab9d222c4425dd2f/servers HTTP/1.1" status: 400 > > > len: 489 time: 0.8432851 > > > > > > From my understanding of the nova-api source code it seems to be > > > related to nova-api not being able to found out the expected ssh > > > keypair, however if I'm doing: > > > > > > openstack --os-cloud keypair list > > > > > > I'm correctly seing the octavia_ssh_key entry for my user. > > > > > > Has anyone already made it work using kolla? > > > On a side note, I'm using stable/queens branch for both kolla docker > > > images and kolla-ansible. > > > > Don't know how kolla handles it, but I'm fairly sure that the ssh key > > has to be created/owned by the user that creates the amphora instances, > > which is not the same as the user that creates the load-balancer. I > > believe it's the user specified in the service_auth section of > octavia.conf. > > > > ~iain > > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.rydberg at citynetwork.eu Wed Jul 18 12:06:46 2018 From: tobias.rydberg at citynetwork.eu (Tobias Rydberg) Date: Wed, 18 Jul 2018 14:06:46 +0200 Subject: [Openstack-operators] [publiccloud-wg][passport] Public Cloud Passport program v2 Message-ID: <35f0af3d-d106-c073-9440-4056fd14bd15@citynetwork.eu> Hi everyone, After the discussions we had in Vancouver at the related Forum session we have been working on a new spec for the Passport program version 2. This spec includes the use case around coupon codes, requirements for being a member in the program, changes at Foundation homepage to mention some of them. We now would like feedback from all members member prospects and others that have interest in the project. Please vote +1 or -1 together with a comment, also online comments are appreciated. Spec can be found at: https://review.openstack.org/#/c/583529/ Feel free to join our weekly meeting tomorrow at 1400 UTC in #openstack-publiccloud Cheers, Tobias Rydberg Co-Chair Public Cloud WG From tobias.rydberg at citynetwork.eu Wed Jul 18 13:40:29 2018 From: tobias.rydberg at citynetwork.eu (Tobias Rydberg) Date: Wed, 18 Jul 2018 15:40:29 +0200 Subject: [Openstack-operators] [publiccloud-wg] Meeting tomorrow for Public Cloud WG Message-ID: Hi folks, Time for a new meeting for the Public Cloud WG. Agenda draft can be found at https://etherpad.openstack.org/p/publiccloud-wg, feel free to add items to that list. See you all tomorrow at IRC 1400 UTC in #openstack-publiccloud Cheers, Tobias -- Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED From thierry at openstack.org Fri Jul 20 14:44:35 2018 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 20 Jul 2018 16:44:35 +0200 Subject: [Openstack-operators] [all] [ptg] PTG track schedule published Message-ID: Hi everyone, Last month we published the tentative schedule layout for the 5 days of PTG. There was no major complaint, so that was confirmed as the PTG event schedule and published on the PTG website: https://www.openstack.org/ptg#tab_schedule You'll notice that: - The Ops meetup days were added. - Keystone track is split in two: one day on Monday for cross-project discussions around identity management, and two days on Thursday/Friday for team discussions. - The "Ask me anything" project helproom on Monday/Tuesday is for horizontal support teams (infrastructure, release management, stable maint, requirements...) to provide support for other teams, SIGs and workgroups and answer their questions. Goal champions should also be available there to help with Stein goal completion questions. - Like in Dublin, a number of tracks do not get pre-allocated time, and will be scheduled on the spot in available rooms at the time that makes the most sense for the participants. - Every track will be able to book extra time and space in available extra rooms at the event. To find more information about the event, register or book a room at the event hotel, visit: https://www.openstack.org/ptg Note that the second (and last) round of applications for travel support to the event is closing at the end of next week (July 29th) ! Apply if you need financial help attending the event: https://openstackfoundation.formstack.com/forms/travelsupportptg_denver_2018 See you there ! -- Thierry Carrez (ttx) From thierry at openstack.org Fri Jul 20 14:57:20 2018 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 20 Jul 2018 16:57:20 +0200 Subject: [Openstack-operators] [all] [ptg] PTG track schedule published In-Reply-To: References: Message-ID: Thierry Carrez wrote: > Hi everyone, > > Last month we published the tentative schedule layout for the 5 days of > PTG. There was no major complaint, so that was confirmed as the PTG > event schedule and published on the PTG website: > > https://www.openstack.org/ptg#tab_schedule The tab temporarily disappeared, while it is being restored you can access the schedule at: https://docs.google.com/spreadsheets/d/e/2PACX-1vRM2UIbpnL3PumLjRaso_9qpOfnyV9VrPqGbTXiMVNbVgjiR3SIdl8VSBefk339MhrbJO5RficKt2Rr/pubhtml?gid=1156322660&single=true -- Thierry Carrez (ttx) From jonmills at gmail.com Mon Jul 23 15:43:52 2018 From: jonmills at gmail.com (Jonathan Mills) Date: Mon, 23 Jul 2018 11:43:52 -0400 Subject: [Openstack-operators] Couple of CellsV2 questions Message-ID: Good morning all, I am looking at implementing CellsV2 with multiple cells, and there's a few things I'm seeking clarification on: 1) How does a superconductor know that it is a superconductor? Is its operation different in any fundamental way? Is there any explicit configuration or a setting in the database required? Or does it simply not care one way or another? 2) When I ran the command "nova-manage cell_v2 create_cell --name=cell1 --verbose", the entry created for cell1 in the api database includes only one rabbitmq server, but I have three of them as an HA cluster. Does it only support talking to one rabbitmq server in this configuration? Or can I just update the cell1 transport_url in the database to point to all three? Is that a supported configuration? 3) Is there anything wrong with having one cell share the amqp bus with your control plane, while having additional cells use their own amqp buses? Certainly I realize that the point of CellsV2 is to shard the amqp bus for greater horizontal scalability. But in my case, my first cell is on the smaller side, and happens to be colocated with the control plane hardware (whereas other cells will be in other parts of the datacenter, or in other datacenters with high-speed links). I was thinking of just pointing that first cell back at the same rabbitmq servers used by the control plane, but perhaps directing them at their own rabbitmq vhost. Is that a terrible idea? Your feedback is highly appreciated! Thank you, Jonathan Mills NASA Center for Climate Simulation -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.page at canonical.com Mon Jul 23 16:01:39 2018 From: james.page at canonical.com (James Page) Date: Mon, 23 Jul 2018 17:01:39 +0100 Subject: [Openstack-operators] [sig][upgrades][ansible][charms][tripleo][kolla][airship] reboot or poweroff? Message-ID: Hi All tl;dr we (the original founders) have not managed to invest the time to get the Upgrades SIG booted - time to hit reboot or time to poweroff? Since Vancouver, two of the original SIG chairs have stepped down leaving me in the hot seat with minimal participation from either deployment projects or operators in the IRC meetings. In addition I've only been able to make every 3rd IRC meeting, so they have generally not being happening. I think the current timing is not good for a lot of folk so finding a better slot is probably a must-have if the SIG is going to continue - and maybe moving to a monthly or bi-weekly schedule rather than the weekly slot we have now. In addition I need some willing folk to help with leadership in the SIG. If you have an interest and would like to help please let me know! I'd also like to better engage with all deployment projects - upgrades is something that deployment tools should be looking to encapsulate as features, so it would be good to get deployment projects engaged in the SIG with nominated representatives. Based on the attendance in upgrades sessions in Vancouver and developer/operator appetite to discuss all things upgrade at said sessions I'm assuming that there is still interest in having a SIG for Upgrades but I may be wrong! Thoughts? James -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Mon Jul 23 21:57:16 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 23 Jul 2018 16:57:16 -0500 Subject: [Openstack-operators] [nova] Couple of CellsV2 questions In-Reply-To: References: Message-ID: <5eb59ccc-f860-b15c-7ed8-e1a04807adb7@gmail.com> I'll try to help a bit inline. Also cross-posting to openstack-dev and tagging with [nova] to highlight it. On 7/23/2018 10:43 AM, Jonathan Mills wrote: > I am looking at implementing CellsV2 with multiple cells, and there's a > few things I'm seeking clarification on: > > 1) How does a superconductor know that it is a superconductor?  Is its > operation different in any fundamental way?  Is there any explicit > configuration or a setting in the database required? Or does it simply > not care one way or another? It's a topology term, not really anything in config or the database that distinguishes the "super" conductor. I assume you've gone over the service layout in the docs: https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#service-layout There are also some summit talks from Dan about the topology linked here: https://docs.openstack.org/nova/latest/user/cells.html#cells-v2 The superconductor is the conductor service at the "top" of the tree which interacts with the API and scheduler (controller) services and routes operations to the cell. Then once in a cell, the operation should ideally be confined there. So, for example, reschedules during a build would be confined to the cell. The cell conductor doesn't go back "up" to the scheduler to get a new set of hosts for scheduling. This of course depends on which release you're using and your configuration, see the caveats section in the cellsv2-layout doc. > > 2) When I ran the command "nova-manage cell_v2 create_cell --name=cell1 > --verbose", the entry created for cell1 in the api database includes > only one rabbitmq server, but I have three of them as an HA cluster. > Does it only support talking to one rabbitmq server in this > configuration? Or can I just update the cell1 transport_url in the > database to point to all three? Is that a supported configuration? First, don't update stuff directly in the database if you don't have to. :) What you set on the transport_url should be whatever oslo.messaging can handle: https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.transport_url There is at least one reported bug for this but I'm not sure I fully grok it or what its status is at this point: https://bugs.launchpad.net/nova/+bug/1717915 > > 3) Is there anything wrong with having one cell share the amqp bus with > your control plane, while having additional cells use their own amqp > buses? Certainly I realize that the point of CellsV2 is to shard the > amqp bus for greater horizontal scalability.  But in my case, my first > cell is on the smaller side, and happens to be colocated with the > control plane hardware (whereas other cells will be in other parts of > the datacenter, or in other datacenters with high-speed links).  I was > thinking of just pointing that first cell back at the same rabbitmq > servers used by the control plane, but perhaps directing them at their > own rabbitmq vhost. Is that a terrible idea? Would need to get input from operators and/or Dan Smith's opinion on this one, but I'd say it's no worse than having a flat single cell deployment. However, if you're going to do multi-cell long-term anyway, then it would be best to get in the mindset and discipline of not relying on shared MQ between the controller services and the cells. In other words, just do the right thing from the start rather than have to worry about maybe changing the deployment / configuration for that one cell down the road when it's harder. -- Thanks, Matt From zhipengh512 at gmail.com Tue Jul 24 06:58:59 2018 From: zhipengh512 at gmail.com (Zhipeng Huang) Date: Tue, 24 Jul 2018 14:58:59 +0800 Subject: [Openstack-operators] [publiccloud-wg]New Meeting Time Starting This Week Message-ID: Hi Folks, As indicated in https://review.openstack.org/#/c/584389/, PCWG is moving towards a tick-tock meeting arrangements to better accommodate participants along the globe. For even weeks starting this Wed, we will have a new meeting time on UTC0700. For odd weeks we will remain for the UTC1400 time slot. Look forward to meet you all at #openstack-publiccloud on Wed ! -- Zhipeng (Howard) Huang Standard Engineer IT Standard & Patent/IT Product Line Huawei Technologies Co,. Ltd Email: huangzhipeng at huawei.com Office: Huawei Industrial Base, Longgang, Shenzhen (Previous) Research Assistant Mobile Ad-Hoc Network Lab, Calit2 University of California, Irvine Email: zhipengh at uci.edu Office: Calit2 Building Room 2402 OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashlee at openstack.org Tue Jul 24 14:23:40 2018 From: ashlee at openstack.org (Ashlee Ferguson) Date: Tue, 24 Jul 2018 09:23:40 -0500 Subject: [Openstack-operators] OpenStack Summit Berlin - Community Voting Open Message-ID: <5D259863-CF2D-4D2C-B85C-4C029D686D75@openstack.org> Hi everyone, Session voting is now open for the November 2018 OpenStack Summit in Berlin! VOTE HERE Hurry, voting closes Thursday, July 26 at 11:59pm Pacific Time (Friday, July 27 at 6:59 UTC). The Programming Committees will ultimately determine the final schedule. Community votes are meant to help inform the decision, but are not considered to be the deciding factor. The Programming Committee members exercise judgment in their area of expertise and help ensure diversity. View full details of the session selection process here. Continue to visit https://www.openstack.org/summit/berlin-2018 for all Summit-related information. REGISTER Register for the Summit before prices increase in late August! VISA APPLICATION PROCESS Make sure to secure your Visa soon. More information about the Visa application process. TRAVEL SUPPORT PROGRAM August 30 is the last day to submit applications. Please submit your applications by 11:59pm Pacific Time (August 31 at 6:59am UTC). If you have any questions, please email summit at openstack.org . Cheers, Ashlee Ashlee Ferguson OpenStack Foundation ashlee at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmills at gmail.com Tue Jul 24 14:38:05 2018 From: jonmills at gmail.com (Jonathan Mills) Date: Tue, 24 Jul 2018 10:38:05 -0400 Subject: [Openstack-operators] [nova] Couple of CellsV2 questions In-Reply-To: <5eb59ccc-f860-b15c-7ed8-e1a04807adb7@gmail.com> References: <5eb59ccc-f860-b15c-7ed8-e1a04807adb7@gmail.com> Message-ID: Thanks, Matt. Those are all good suggestions, and we will incorporate your feedback into our plans. On 07/23/2018 05:57 PM, Matt Riedemann wrote: > I'll try to help a bit inline. Also cross-posting to openstack-dev and > tagging with [nova] to highlight it. > > On 7/23/2018 10:43 AM, Jonathan Mills wrote: >> I am looking at implementing CellsV2 with multiple cells, and there's >> a few things I'm seeking clarification on: >> >> 1) How does a superconductor know that it is a superconductor?  Is its >> operation different in any fundamental way?  Is there any explicit >> configuration or a setting in the database required? Or does it simply >> not care one way or another? > > It's a topology term, not really anything in config or the database that > distinguishes the "super" conductor. I assume you've gone over the > service layout in the docs: > > https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#service-layout > > > There are also some summit talks from Dan about the topology linked here: > > https://docs.openstack.org/nova/latest/user/cells.html#cells-v2 > > The superconductor is the conductor service at the "top" of the tree > which interacts with the API and scheduler (controller) services and > routes operations to the cell. Then once in a cell, the operation should > ideally be confined there. So, for example, reschedules during a build > would be confined to the cell. The cell conductor doesn't go back "up" > to the scheduler to get a new set of hosts for scheduling. This of > course depends on which release you're using and your configuration, see > the caveats section in the cellsv2-layout doc. > >> >> 2) When I ran the command "nova-manage cell_v2 create_cell >> --name=cell1 --verbose", the entry created for cell1 in the api >> database includes only one rabbitmq server, but I have three of them >> as an HA cluster.  Does it only support talking to one rabbitmq server >> in this configuration? Or can I just update the cell1 transport_url in >> the database to point to all three? Is that a supported configuration? > > First, don't update stuff directly in the database if you don't have to. > :) What you set on the transport_url should be whatever oslo.messaging > can handle: > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.transport_url > > > There is at least one reported bug for this but I'm not sure I fully > grok it or what its status is at this point: > > https://bugs.launchpad.net/nova/+bug/1717915 > >> >> 3) Is there anything wrong with having one cell share the amqp bus >> with your control plane, while having additional cells use their own >> amqp buses? Certainly I realize that the point of CellsV2 is to shard >> the amqp bus for greater horizontal scalability.  But in my case, my >> first cell is on the smaller side, and happens to be colocated with >> the control plane hardware (whereas other cells will be in other parts >> of the datacenter, or in other datacenters with high-speed links).  I >> was thinking of just pointing that first cell back at the same >> rabbitmq servers used by the control plane, but perhaps directing them >> at their own rabbitmq vhost. Is that a terrible idea? > > Would need to get input from operators and/or Dan Smith's opinion on > this one, but I'd say it's no worse than having a flat single cell > deployment. However, if you're going to do multi-cell long-term anyway, > then it would be best to get in the mindset and discipline of not > relying on shared MQ between the controller services and the cells. In > other words, just do the right thing from the start rather than have to > worry about maybe changing the deployment / configuration for that one > cell down the road when it's harder. > From amy at demarco.com Tue Jul 24 15:05:51 2018 From: amy at demarco.com (Amy Marrich) Date: Tue, 24 Jul 2018 10:05:51 -0500 Subject: [Openstack-operators] UC Election - Looking for Election Officials Message-ID: Hey Stackers, We are getting ready for the Summer UC election and we need to have at least two Election Officials. I was wondering if you would like to help us on that process. You can find all the details of the election at https://governance.openstack.org/uc/reference/uc-election-aug2018.html. I do want to point out to those who are new that Election Officials are unable to run in the election itself but can of course vote. The election dates will be: August 6 - August 17, 05:59 UTC: Open candidacy for UC positions August 20 - August 24, 11:59 UTC: UC elections (voting) Please, reach out to any of the current UC members or simple reply to this email if you can help us in this community process. Thanks, OpenStack User Committee Amy, Leong, Matt, Melvin, and Saverio -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Tue Jul 24 16:39:18 2018 From: amy at demarco.com (Amy Marrich) Date: Tue, 24 Jul 2018 11:39:18 -0500 Subject: [Openstack-operators] UC Election - Looking for Election Officials In-Reply-To: References: Message-ID: Just wanted to say THANK you as we now have 3 officials! Please participate in the User Committee elections as a candidate and perhaps most importantly by voting! Thanks, Amy (spotz) On Tue, Jul 24, 2018 at 10:05 AM, Amy Marrich wrote: > Hey Stackers, > > > We are getting ready for the Summer UC election and we need to have at > least two Election Officials. I was wondering if you would like to help us > on that process. You can find all the details of the election at > https://governance.openstack.org/uc/reference/uc-election-aug2018.html. > > > I do want to point out to those who are new that Election Officials are > unable to run in the election itself but can of course vote. > > > > The election dates will be: > > August 6 - August 17, 05:59 UTC: Open candidacy for UC positions > > August 20 - August 24, 11:59 UTC: UC elections (voting) > > > > Please, reach out to any of the current UC members or simple reply to this > email if you can help us in this community process. > > > > Thanks, > > > > OpenStack User Committee > > Amy, Leong, Matt, Melvin, and Saverio > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Tue Jul 24 19:24:25 2018 From: amy at demarco.com (Amy Marrich) Date: Tue, 24 Jul 2018 14:24:25 -0500 Subject: [Openstack-operators] UC Election - Looking for Election Officials In-Reply-To: References: Message-ID: And for those curious... our officials are..... Ed Leafe, Chandan Kumar and then Mohamed Elsakhawy Thanks, Amy (spotz) (Who's claiming lack of sleep for not including the names earlier) On Tue, Jul 24, 2018 at 11:39 AM, Amy Marrich wrote: > Just wanted to say THANK you as we now have 3 officials! Please > participate in the User Committee elections as a candidate and perhaps most > importantly by voting! > > Thanks, > > Amy (spotz) > > On Tue, Jul 24, 2018 at 10:05 AM, Amy Marrich wrote: > >> Hey Stackers, >> >> >> We are getting ready for the Summer UC election and we need to have at >> least two Election Officials. I was wondering if you would like to help us >> on that process. You can find all the details of the election at >> https://governance.openstack.org/uc/reference/uc-election-aug2018.html. >> >> >> I do want to point out to those who are new that Election Officials are >> unable to run in the election itself but can of course vote. >> >> >> >> The election dates will be: >> >> August 6 - August 17, 05:59 UTC: Open candidacy for UC positions >> >> August 20 - August 24, 11:59 UTC: UC elections (voting) >> >> >> >> Please, reach out to any of the current UC members or simple reply to >> this email if you can help us in this community process. >> >> >> >> Thanks, >> >> >> >> OpenStack User Committee >> >> Amy, Leong, Matt, Melvin, and Saverio >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashlee at openstack.org Thu Jul 26 21:12:19 2018 From: ashlee at openstack.org (Ashlee Ferguson) Date: Thu, 26 Jul 2018 16:12:19 -0500 Subject: [Openstack-operators] OpenStack Summit Berlin - Community Voting Closing Soon Message-ID: Hi everyone, Session voting for the Berlin Summit closes in less than 8 hours! Submit your votes by July 26 at 11:59pm Pacific Time (Friday, July 27 at 6:59 UTC). VOTE HERE The Programming Committees will ultimately determine the final schedule. Community votes are meant to help inform the decision, but are not considered to be the deciding factor. The Programming Committee members exercise judgment in their area of expertise and help ensure diversity. View full details of the session selection process here. Continue to visit https://www.openstack.org/summit/berlin-2018 for all Summit-related information. REGISTER Register for the Summit for $699 before prices increase after August 21 at 11:59pm Pacific Time (August 22 at 6:59am UTC). VISA APPLICATION PROCESS Make sure to secure your Visa soon. More information about the Visa application process. TRAVEL SUPPORT PROGRAM August 30 is the last day to submit applications. Please submit your applications by 11:59pm Pacific Time (August 31 at 6:59am UTC). If you have any questions, please email summit at openstack.org . Cheers, Ashlee Ashlee Ferguson OpenStack Foundation ashlee at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From gilles.mocellin at nuagelibre.org Fri Jul 27 08:34:50 2018 From: gilles.mocellin at nuagelibre.org (Gilles Mocellin) Date: Fri, 27 Jul 2018 10:34:50 +0200 Subject: [Openstack-operators] [openstack-ansible] How to manage system upgrades ? Message-ID: <618796e0e942dc5bd5b0824950565ea1@nuagelibre.org> Hello ! Would be great to have a playbook to upgrade system parts of an OpenStack Cloud ! With OpenStack Ansible : LXC containers and hosts. It would be awesome to do a controlled rolling reboot of hosts when needed... Different conditions to check : - for controllers : check galera status... - for compute nodes : disable compute node and live-evacuate instances... - for storage : with Ceph : set no out... I know, I can do it and contribute, but perhaps someone already has something similar ? It could be hosted in one of these projects : - https://git.openstack.org/cgit/openstack/openstack-ansible-ops - https://git.openstack.org/cgit/openstack/ansible-role-openstack-operations (Why already to project for the same goal ?) From christian.zunker at codecentric.cloud Fri Jul 27 11:58:20 2018 From: christian.zunker at codecentric.cloud (Christian Zunker) Date: Fri, 27 Jul 2018 13:58:20 +0200 Subject: [Openstack-operators] [openstack-ansible] How to manage system upgrades ? In-Reply-To: <618796e0e942dc5bd5b0824950565ea1@nuagelibre.org> References: <618796e0e942dc5bd5b0824950565ea1@nuagelibre.org> Message-ID: Hi Gilles, sounds like a good idea. We've just written a script for live evacuate, which we can contribute after some refactoring. Gilles Mocellin schrieb am Fr., 27. Juli 2018 um 10:44 Uhr: > Hello ! > > Would be great to have a playbook to upgrade system parts of an > OpenStack Cloud ! > With OpenStack Ansible : LXC containers and hosts. > > It would be awesome to do a controlled rolling reboot of hosts when > needed... > > Different conditions to check : > - for controllers : check galera status... > - for compute nodes : disable compute node and live-evacuate > instances... > - for storage : with Ceph : set no out... > > I know, I can do it and contribute, but perhaps someone already has > something similar ? > It could be hosted in one of these projects : > - https://git.openstack.org/cgit/openstack/openstack-ansible-ops > - > https://git.openstack.org/cgit/openstack/ansible-role-openstack-operations > > (Why already to project for the same goal ?) > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrhillsman at gmail.com Mon Jul 30 13:44:31 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Mon, 30 Jul 2018 08:44:31 -0500 Subject: [Openstack-operators] Reminder: User Committee @ 1800 UTC Message-ID: Hi everyone, UC meeting today in #openstack-uc Agenda: https://wiki.openstack.org/wiki/Governance/Foundation/UserCommittee -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alfredo.deluca at gmail.com Mon Jul 30 14:53:20 2018 From: alfredo.deluca at gmail.com (Alfredo De Luca) Date: Mon, 30 Jul 2018 16:53:20 +0200 Subject: [Openstack-operators] swift question Message-ID: Hi all. I wonder if i can sync a directory on a server to the obj store (swift). What I do now is just a backup but I d like to implement a sort of file rotate locally and on the obj store. Any idea? -- *Alfredo* -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Mon Jul 30 15:33:10 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 30 Jul 2018 17:33:10 +0200 Subject: [Openstack-operators] dashboard show only project after upgrading Message-ID: Hello everyone, I upgraded openstack centos 7 from ocata to pike ad command line work fine but dashboard does not show any menu on the left . I missed the following menus: Project Admin Identity You can find the image attached here. Could anyone help me ? Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot at 2018-07-30 17:31:50.png Type: image/png Size: 134429 bytes Desc: not available URL: From ignaziocassano at gmail.com Mon Jul 30 15:35:31 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 30 Jul 2018 17:35:31 +0200 Subject: [Openstack-operators] dashboard show only project after upgrading In-Reply-To: References: Message-ID: Sorry, I sent a wrong image. The correct screenshot is attached here. Regards 2018-07-30 17:33 GMT+02:00 Ignazio Cassano : > Hello everyone, > I upgraded openstack centos 7 from ocata to pike ad command line work fine > but dashboard does not show any menu on the left . > I missed the following menus: > > Project > Admin > Identity > > You can find the image attached here. > > Could anyone help me ? > Regards > Ignazio > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot at 2018-07-30 17:35:03.png Type: image/png Size: 122330 bytes Desc: not available URL: From mriedemos at gmail.com Mon Jul 30 15:52:59 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 30 Jul 2018 10:52:59 -0500 Subject: [Openstack-operators] [openstack-ansible] How to manage system upgrades ? In-Reply-To: <618796e0e942dc5bd5b0824950565ea1@nuagelibre.org> References: <618796e0e942dc5bd5b0824950565ea1@nuagelibre.org> Message-ID: On 7/27/2018 3:34 AM, Gilles Mocellin wrote: > - for compute nodes : disable compute node and live-evacuate instances... To be clear, what do you mean exactly by "live-evacuate"? I assume you mean live migration of all instances off each (disabled) compute node *before* you upgrade it. I wanted to ask because "evacuate" as a server operation is something else entirely (it's rebuild on another host which is definitely disruptive to the workload on that server). http://www.danplanet.com/blog/2016/03/03/evacuate-in-nova-one-command-to-confuse-us-all/ -- Thanks, Matt From bitskrieg at bitskrieg.net Mon Jul 30 15:53:00 2018 From: bitskrieg at bitskrieg.net (Chris Apsey) Date: Mon, 30 Jul 2018 11:53:00 -0400 Subject: [Openstack-operators] dashboard show only project after upgrading In-Reply-To: References: Message-ID: <164ebe48760.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> Ignazio, Are your horizon instances in separate containers/VMS? If so, I'd highly recommend completely wiping them and rebuilding from scratch since horizon itself is stateless. I am not a fan of upgrades for reasons like this. If that's not possible, a purge of the horizon packages on your controller and a reinstallation should fix it. Chris On July 30, 2018 11:38:03 Ignazio Cassano wrote: > Sorry, I sent a wrong image. > The correct screenshot is attached here. > Regards > > 2018-07-30 17:33 GMT+02:00 Ignazio Cassano : > Hello everyone, > I upgraded openstack centos 7 from ocata to pike ad command line work fine > but dashboard does not show any menu on the left . > I missed the following menus: > > Project > Admin > Identity > > You can find the image attached here. > > Could anyone help me ? > Regards > Ignazio > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Mon Jul 30 16:14:14 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 30 Jul 2018 18:14:14 +0200 Subject: [Openstack-operators] dashboard show only project after upgrading In-Reply-To: <164ebe48760.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> References: <164ebe48760.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> Message-ID: Hello Chris, I am not using containers yet. I wiil try to purge it. Many thanks. Ignazio Il Lun 30 Lug 2018 17:53 Chris Apsey ha scritto: > Ignazio, > > Are your horizon instances in separate containers/VMS? If so, I'd highly > recommend completely wiping them and rebuilding from scratch since horizon > itself is stateless. I am not a fan of upgrades for reasons like this. > > If that's not possible, a purge of the horizon packages on your controller > and a reinstallation should fix it. > > Chris > > On July 30, 2018 11:38:03 Ignazio Cassano > wrote: > >> Sorry, I sent a wrong image. >> The correct screenshot is attached here. >> Regards >> >> 2018-07-30 17:33 GMT+02:00 Ignazio Cassano : >> >>> Hello everyone, >>> I upgraded openstack centos 7 from ocata to pike ad command line work >>> fine >>> but dashboard does not show any menu on the left . >>> I missed the following menus: >>> >>> Project >>> Admin >>> Identity >>> >>> You can find the image attached here. >>> >>> Could anyone help me ? >>> Regards >>> Ignazio >>> >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From clay.gerrard at gmail.com Mon Jul 30 16:28:09 2018 From: clay.gerrard at gmail.com (Clay Gerrard) Date: Mon, 30 Jul 2018 11:28:09 -0500 Subject: [Openstack-operators] swift question In-Reply-To: References: Message-ID: Sure! python swiftclient's upload command has a --changed option: https://docs.openstack.org/python-swiftclient/latest/cli/index.html#swift-upload But you might be happier with something more sophisticated like rclone: https://rclone.org/ Nice thing about object storage is you can access it from anywhere via HTTP and PUT anything you want in there ;) -Clay On Mon, Jul 30, 2018 at 9:54 AM Alfredo De Luca wrote: > Hi all. > I wonder if i can sync a directory on a server to the obj store (swift). > What I do now is just a backup but I d like to implement a sort of file > rotate locally and on the obj store. > Any idea? > > > -- > *Alfredo* > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.urdin at crystone.com Mon Jul 30 17:08:50 2018 From: tobias.urdin at crystone.com (Tobias Urdin) Date: Mon, 30 Jul 2018 17:08:50 +0000 Subject: [Openstack-operators] [neutron] [neutron-dynamic-routing] bgp-dragent not sending BGP UPDATE messages Message-ID: Hello, I'm trying to get the neutron-bgp-dragent that is delivered by the neutron-dynamic-routing project to work. I've gotten it to open a BGP peer session without any issues but the no BGP UPDATE messages seems to be sent from the neutron-bgp-dragent daemon. I'm having a BGP peer with a machine running FreeBSD 11 with OpenBGPD, my goals is being able to announce IPv6 over IPv4 peers which should work but I'm unsure if python-ryu supports this. [root at controller ~]# openstack bgp speaker show bgp-speaker-ipv6 +-----------------------------------+-------------------------------------------+ | Field | Value | +-----------------------------------+-------------------------------------------+ | advertise_floating_ip_host_routes | False | | advertise_tenant_networks | True | | id | d22b30f2-50fe-49eb-9577-77cceb3fcc81 | | ip_version | 6 | | local_as | 64600 | | name | bgp-speaker-ipv6 | | networks | [u'fdcead67-8a12-42fe-a31d-8cb3a03d8ee0'] | | peers | [u'b42d808f-c2ef-41e7-93b5-859a51cf6a36'] | | project_id | 050c556faa5944a8953126c867313770 | | tenant_id | 050c556faa5944a8953126c867313770 | +-----------------------------------+-------------------------------------------+ [root at controller ~]# openstack bgp peer show b42d808f-c2ef-41e7-93b5-859a51cf6a36 +------------+--------------------------------------+ | Field | Value | +------------+--------------------------------------+ | auth_type | none | | id | b42d808f-c2ef-41e7-93b5-859a51cf6a36 | | name | bgp-peer-1 | | peer_ip | 172.20.x.y | | project_id | 050c556faa5944a8953126c867313770 | | remote_as | xxxx | | tenant_id | 050c556faa5944a8953126c867313770 | +------------+--------------------------------------+ [root at controller ~]# openstack bgp speaker list advertised routes bgp-speaker-ipv6 +----+--------------------+--------------+ | ID | Destination | Nexthop | +----+--------------------+--------------+ | | xxxx:xxxx:0:1::/64 | xxxx:xxxx::f | +----+--------------------+--------------+ 2018-07-30 19:00:57.302 2143006 INFO neutron_dynamic_routing.services.bgp.agent.driver.ryu.driver [-] Initializing Ryu driver for BGP Speaker functionality. 2018-07-30 19:00:57.302 2143006 INFO neutron_dynamic_routing.services.bgp.agent.driver.ryu.driver [-] Initialized Ryu BGP Speaker driver interface with bgp_router_id=172.20.zz.yy 2018-07-30 19:00:57.351 2143006 INFO neutron_dynamic_routing.services.bgp.agent.bgp_dragent [-] BGP dynamic routing agent started 2018-07-30 19:00:57.450 2143006 INFO bgpspeaker.api.base [req-f15418e8-731b-4ebe-82a9-e2933e8df8b7 - - - - -] API method core.start called with args: {'router_id': '172.20.zz.yy', 'label_range': (100, 100000), 'waiter': , 'bgp_server_port': 0, 'local_as': 64600, 'allow_local_as_in_count': 0, 'refresh_stalepath_time': 0, 'cluster_id': None, 'local_pref': 100, 'refresh_max_eor_time': 0} 2018-07-30 19:00:57.455 2143006 INFO neutron_dynamic_routing.services.bgp.agent.driver.ryu.driver [req-f15418e8-731b-4ebe-82a9-e2933e8df8b7 - - - - -] Added BGP Speaker for local_as=64600 with router_id= 172.20.zz.yy. 2018-07-30 19:00:57.456 2143006 INFO bgpspeaker.api.base [req-f15418e8-731b-4ebe-82a9-e2933e8df8b7 - - - - -] API method neighbor.create called with args: {'connect_mode': 'active', 'cap_mbgp_evpn': False, 'remote_as': 35041, 'cap_mbgp_vpnv6': False, 'cap_mbgp_l2vpnfs': False, 'cap_four_octet_as_number': True, 'cap_mbgp_ipv6': False, 'is_next_hop_self': False, 'cap_mbgp_ipv4': True, 'cap_mbgp_ipv4fs': False, 'is_route_reflector_client': False, 'cap_mbgp_ipv6fs': False, 'is_route_server_client': False, 'cap_enhanced_refresh': False, 'peer_next_hop': None, 'password': None, 'ip_address': u'172.20.x.y', 'cap_mbgp_vpnv4fs': False, 'cap_mbgp_vpnv4': False, 'cap_mbgp_vpnv6fs': False} 2018-07-30 19:00:57.456 2143006 INFO neutron_dynamic_routing.services.bgp.agent.driver.ryu.driver [req-f15418e8-731b-4ebe-82a9-e2933e8df8b7 - - - - -] Added BGP Peer 172.20.x.y for remote_as=xxxx to BGP Speaker running for local_as=64600. 2018-07-30 19:00:57.457 2143006 INFO bgpspeaker.api.base [req-f15418e8-731b-4ebe-82a9-e2933e8df8b7 - - - - -] API method network.add called with args: {'prefix': u'xxxx:xxxx:0:1::/64', 'next_hop': u'2a05:4545::f'} 2018-07-30 19:00:57.457 2143006 INFO neutron_dynamic_routing.services.bgp.agent.driver.ryu.driver [req-f15418e8-731b-4ebe-82a9-e2933e8df8b7 - - - - -] Route cidr=xxxx:xxxx:0:1::/64, nexthop=xxxx:xxxx::f is advertised for BGP Speaker running for local_as=64600. 2018-07-30 19:00:58.460 2143006 INFO bgpspeaker.peer [-] Connection to peer: 172.20.zz.yy established 2018-07-30 19:00:58.460 2143006 INFO neutron_dynamic_routing.services.bgp.agent.driver.ryu.driver [-] BGP Peer my.peer.id for remote_as=xxxx is UP. On the router side the peer is up but there is no BGP UPDATE messages so I don't get any prefixes. root at router:~ # bgpctl show sum Neighbor AS MsgRcvd MsgSent OutQ Up/Down State/PrfRcvd controllername 64600 488 491 0 00:02:20 0 root at dr20-1-sto1:~ # bgpctl show neighbor 172.20.104.192 BGP neighbor is 172.20.zz.yy, remote AS 64600 Description: controllername BGP version 4, remote router-id 172.20.zz.yy BGP state = Established, up for 00:03:03 Last read 00:00:01, holdtime 40s, keepalive interval 13s Neighbor capabilities: Multiprotocol extensions: IPv4 unicast Route Refresh 4-byte AS numbers Message statistics: Sent Received Opens 6 6 Notifications 0 0 Updates 0 0 Keepalives 489 486 Route Refresh 0 0 Total 495 492 Update statistics: Sent Received Updates 0 0 Withdraws 0 0 End-of-Rib 0 0 I'm wondering if this might be something related to the neighbor capabilities that is announces, see the output below and from the neutron-bgp-dragent log we can see this capabilities: 2018-07-30 19:00:57.456 2143006 INFO bgpspeaker.api.base [req-f15418e8-731b-4ebe-82a9-e2933e8df8b7 - - - - -] API method neighbor.create called with args: {'connect_mode': 'active', 'cap_mbgp_evpn': False, 'remote_as': 35041, 'cap_mbgp_vpnv6': False, 'cap_mbgp_l2vpnfs': False, 'cap_four_octet_as_number': True, 'cap_mbgp_ipv6': False, 'is_next_hop_self': False, 'cap_mbgp_ipv4': True, 'cap_mbgp_ipv4fs': False, 'is_route_reflector_client': False, 'cap_mbgp_ipv6fs': False, 'is_route_server_client': False, 'cap_enhanced_refresh': False, 'peer_next_hop': None, 'password': None, 'ip_address': u'172.20.x.y', 'cap_mbgp_vpnv4fs': False, 'cap_mbgp_vpnv4': False, 'cap_mbgp_vpnv6fs': False} Here is an example on how the bgpd.conf looks like: group "peering AS64600" { remote-as 64600 softreconfig in yes transparent-as yes neighbor 172.20.zz.yy { announce none announce IPv6 unicast descr "controller" local-address 172.20.x.x depend on vlan10 } If I interpret the IPv6 section in this document correctly https://docs.openstack.org/mitaka/networking-guide/config-bgp-dynamic-routing.html it should work. Anybody have any ideas or know if it's supported? Appreciate any help or pointers. Best regards From tobias.urdin at crystone.com Mon Jul 30 19:16:44 2018 From: tobias.urdin at crystone.com (Tobias Urdin) Date: Mon, 30 Jul 2018 19:16:44 +0000 Subject: [Openstack-operators] [neutron] [neutron-dynamic-routing] bgp-dragent not sending BGP UPDATE messages In-Reply-To: References: Message-ID: <1532978205744.26838@crystone.com> So the real question is pretty much if Ryu supports MP-BGP and it does however it seems that neutron-dynamic-routing is disabling IPv6 if the peer IP is a IPv4 address :( So the link below answers my own question [1] [1] https://github.com/openstack/neutron-dynamic-routing/blob/98d3cf24d6d7b5eca55ca19eb19bdd2e7b1975ec/neutron_dynamic_routing/services/bgp/agent/driver/ryu/driver.py#L131 ________________________________________ From: Tobias Urdin Sent: Monday, July 30, 2018 7:08 PM To: openstack-operators at lists.openstack.org Subject: [Openstack-operators] [neutron] [neutron-dynamic-routing] bgp-dragent not sending BGP UPDATE messages Hello, I'm trying to get the neutron-bgp-dragent that is delivered by the neutron-dynamic-routing project to work. I've gotten it to open a BGP peer session without any issues but the no BGP UPDATE messages seems to be sent from the neutron-bgp-dragent daemon. I'm having a BGP peer with a machine running FreeBSD 11 with OpenBGPD, my goals is being able to announce IPv6 over IPv4 peers which should work but I'm unsure if python-ryu supports this. [root at controller ~]# openstack bgp speaker show bgp-speaker-ipv6 +-----------------------------------+-------------------------------------------+ | Field | Value | +-----------------------------------+-------------------------------------------+ | advertise_floating_ip_host_routes | False | | advertise_tenant_networks | True | | id | d22b30f2-50fe-49eb-9577-77cceb3fcc81 | | ip_version | 6 | | local_as | 64600 | | name | bgp-speaker-ipv6 | | networks | [u'fdcead67-8a12-42fe-a31d-8cb3a03d8ee0'] | | peers | [u'b42d808f-c2ef-41e7-93b5-859a51cf6a36'] | | project_id | 050c556faa5944a8953126c867313770 | | tenant_id | 050c556faa5944a8953126c867313770 | +-----------------------------------+-------------------------------------------+ [root at controller ~]# openstack bgp peer show b42d808f-c2ef-41e7-93b5-859a51cf6a36 +------------+--------------------------------------+ | Field | Value | +------------+--------------------------------------+ | auth_type | none | | id | b42d808f-c2ef-41e7-93b5-859a51cf6a36 | | name | bgp-peer-1 | | peer_ip | 172.20.x.y | | project_id | 050c556faa5944a8953126c867313770 | | remote_as | xxxx | | tenant_id | 050c556faa5944a8953126c867313770 | +------------+--------------------------------------+ [root at controller ~]# openstack bgp speaker list advertised routes bgp-speaker-ipv6 +----+--------------------+--------------+ | ID | Destination | Nexthop | +----+--------------------+--------------+ | | xxxx:xxxx:0:1::/64 | xxxx:xxxx::f | +----+--------------------+--------------+ 2018-07-30 19:00:57.302 2143006 INFO neutron_dynamic_routing.services.bgp.agent.driver.ryu.driver [-] Initializing Ryu driver for BGP Speaker functionality. 2018-07-30 19:00:57.302 2143006 INFO neutron_dynamic_routing.services.bgp.agent.driver.ryu.driver [-] Initialized Ryu BGP Speaker driver interface with bgp_router_id=172.20.zz.yy 2018-07-30 19:00:57.351 2143006 INFO neutron_dynamic_routing.services.bgp.agent.bgp_dragent [-] BGP dynamic routing agent started 2018-07-30 19:00:57.450 2143006 INFO bgpspeaker.api.base [req-f15418e8-731b-4ebe-82a9-e2933e8df8b7 - - - - -] API method core.start called with args: {'router_id': '172.20.zz.yy', 'label_range': (100, 100000), 'waiter': , 'bgp_server_port': 0, 'local_as': 64600, 'allow_local_as_in_count': 0, 'refresh_stalepath_time': 0, 'cluster_id': None, 'local_pref': 100, 'refresh_max_eor_time': 0} 2018-07-30 19:00:57.455 2143006 INFO neutron_dynamic_routing.services.bgp.agent.driver.ryu.driver [req-f15418e8-731b-4ebe-82a9-e2933e8df8b7 - - - - -] Added BGP Speaker for local_as=64600 with router_id= 172.20.zz.yy. 2018-07-30 19:00:57.456 2143006 INFO bgpspeaker.api.base [req-f15418e8-731b-4ebe-82a9-e2933e8df8b7 - - - - -] API method neighbor.create called with args: {'connect_mode': 'active', 'cap_mbgp_evpn': False, 'remote_as': 35041, 'cap_mbgp_vpnv6': False, 'cap_mbgp_l2vpnfs': False, 'cap_four_octet_as_number': True, 'cap_mbgp_ipv6': False, 'is_next_hop_self': False, 'cap_mbgp_ipv4': True, 'cap_mbgp_ipv4fs': False, 'is_route_reflector_client': False, 'cap_mbgp_ipv6fs': False, 'is_route_server_client': False, 'cap_enhanced_refresh': False, 'peer_next_hop': None, 'password': None, 'ip_address': u'172.20.x.y', 'cap_mbgp_vpnv4fs': False, 'cap_mbgp_vpnv4': False, 'cap_mbgp_vpnv6fs': False} 2018-07-30 19:00:57.456 2143006 INFO neutron_dynamic_routing.services.bgp.agent.driver.ryu.driver [req-f15418e8-731b-4ebe-82a9-e2933e8df8b7 - - - - -] Added BGP Peer 172.20.x.y for remote_as=xxxx to BGP Speaker running for local_as=64600. 2018-07-30 19:00:57.457 2143006 INFO bgpspeaker.api.base [req-f15418e8-731b-4ebe-82a9-e2933e8df8b7 - - - - -] API method network.add called with args: {'prefix': u'xxxx:xxxx:0:1::/64', 'next_hop': u'2a05:4545::f'} 2018-07-30 19:00:57.457 2143006 INFO neutron_dynamic_routing.services.bgp.agent.driver.ryu.driver [req-f15418e8-731b-4ebe-82a9-e2933e8df8b7 - - - - -] Route cidr=xxxx:xxxx:0:1::/64, nexthop=xxxx:xxxx::f is advertised for BGP Speaker running for local_as=64600. 2018-07-30 19:00:58.460 2143006 INFO bgpspeaker.peer [-] Connection to peer: 172.20.zz.yy established 2018-07-30 19:00:58.460 2143006 INFO neutron_dynamic_routing.services.bgp.agent.driver.ryu.driver [-] BGP Peer my.peer.id for remote_as=xxxx is UP. On the router side the peer is up but there is no BGP UPDATE messages so I don't get any prefixes. root at router:~ # bgpctl show sum Neighbor AS MsgRcvd MsgSent OutQ Up/Down State/PrfRcvd controllername 64600 488 491 0 00:02:20 0 root at dr20-1-sto1:~ # bgpctl show neighbor 172.20.104.192 BGP neighbor is 172.20.zz.yy, remote AS 64600 Description: controllername BGP version 4, remote router-id 172.20.zz.yy BGP state = Established, up for 00:03:03 Last read 00:00:01, holdtime 40s, keepalive interval 13s Neighbor capabilities: Multiprotocol extensions: IPv4 unicast Route Refresh 4-byte AS numbers Message statistics: Sent Received Opens 6 6 Notifications 0 0 Updates 0 0 Keepalives 489 486 Route Refresh 0 0 Total 495 492 Update statistics: Sent Received Updates 0 0 Withdraws 0 0 End-of-Rib 0 0 I'm wondering if this might be something related to the neighbor capabilities that is announces, see the output below and from the neutron-bgp-dragent log we can see this capabilities: 2018-07-30 19:00:57.456 2143006 INFO bgpspeaker.api.base [req-f15418e8-731b-4ebe-82a9-e2933e8df8b7 - - - - -] API method neighbor.create called with args: {'connect_mode': 'active', 'cap_mbgp_evpn': False, 'remote_as': 35041, 'cap_mbgp_vpnv6': False, 'cap_mbgp_l2vpnfs': False, 'cap_four_octet_as_number': True, 'cap_mbgp_ipv6': False, 'is_next_hop_self': False, 'cap_mbgp_ipv4': True, 'cap_mbgp_ipv4fs': False, 'is_route_reflector_client': False, 'cap_mbgp_ipv6fs': False, 'is_route_server_client': False, 'cap_enhanced_refresh': False, 'peer_next_hop': None, 'password': None, 'ip_address': u'172.20.x.y', 'cap_mbgp_vpnv4fs': False, 'cap_mbgp_vpnv4': False, 'cap_mbgp_vpnv6fs': False} Here is an example on how the bgpd.conf looks like: group "peering AS64600" { remote-as 64600 softreconfig in yes transparent-as yes neighbor 172.20.zz.yy { announce none announce IPv6 unicast descr "controller" local-address 172.20.x.x depend on vlan10 } If I interpret the IPv6 section in this document correctly https://docs.openstack.org/mitaka/networking-guide/config-bgp-dynamic-routing.html it should work. Anybody have any ideas or know if it's supported? Appreciate any help or pointers. Best regards _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From mgagne at calavera.ca Mon Jul 30 21:40:16 2018 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Mon, 30 Jul 2018 17:40:16 -0400 Subject: [Openstack-operators] dashboard show only project after upgrading In-Reply-To: References: Message-ID: Try enabling DEBUG in local_settings.py. Some dashboard or panel might fail loading for some reasons. I had a similar behavior last week and enabling DEBUG should show the error. -- Mathieu On Mon, Jul 30, 2018 at 11:35 AM, Ignazio Cassano wrote: > Sorry, I sent a wrong image. > The correct screenshot is attached here. > Regards > > 2018-07-30 17:33 GMT+02:00 Ignazio Cassano : >> >> Hello everyone, >> I upgraded openstack centos 7 from ocata to pike ad command line work fine >> but dashboard does not show any menu on the left . >> I missed the following menus: >> >> Project >> Admin >> Identity >> >> You can find the image attached here. >> >> Could anyone help me ? >> Regards >> Ignazio > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > From iain.macdonnell at oracle.com Tue Jul 31 00:09:56 2018 From: iain.macdonnell at oracle.com (iain MacDonnell) Date: Mon, 30 Jul 2018 17:09:56 -0700 Subject: [Openstack-operators] neutron-server memcached connections In-Reply-To: <9598665a-8748-9fa8-147d-e618db3f7b94@oracle.com> References: <9598665a-8748-9fa8-147d-e618db3f7b94@oracle.com> Message-ID: <59bb939e-7aa0-6de6-4f2a-61fd2f4650ae@oracle.com> Following up on my own question, in case it's useful to others.... Turns out that keystonemiddleware uses eventlet, and, by default, creates a connection to memcached from each green thread (and doesn't clean them up), and the green threads are essentially unlimited. There is a solution for this, which implements a shared connection pool. It's enabled via the keystone_authtoken.memcache_use_advanced_pool config option. Unfortunately it was broken in a few different ways (I guess this means that no one is using it?) I've worked with the keystone devs, and we were able to get a fix (in keystonemiddleware) in just in time for the Rocky release. Related fixes have also been backported to Queens (for the next update), and a couple needed for Pike are pending completion. With this in place, so-far I have not seen more than one connection to memcached for each neutron-api worker process, and everything seems to be working well. Some relevant changes: master: https://review.openstack.org/#/c/583695/ Queens: https://review.openstack.org/#/c/583698/ https://review.openstack.org/#/c/583684/ Pike: https://review.openstack.org/#/c/583699/ https://review.openstack.org/#/c/583835/ I do wonder how others are managing memcached connections for larger deployments... ~iain On 06/26/2018 12:59 PM, iain MacDonnell wrote: > > In diagnosing a situation where a Pike deployment was intermittently > slower (in general), I discovered that it was (sometimes) exceeding > memcached's maximum connection limit, which is set to 4096. > > Looking closer, ~2750 of the connections are from 8 neutron-server > process. neutron-server is configured with 8 API workers, and those 8 > processes have a combined total of ~2750 connections to memcached: > > # lsof -i TCP:11211 | awk '/^neutron-s/ {print $2}' | sort | uniq -c >     245 2611 >     306 2612 >     228 2613 >     406 2614 >     407 2615 >     385 2616 >     369 2617 >     398 2618 > # > > > There doesn't seem to be much turnover - comparing samples of the > connections (incl. source port) 15 mins apart, two were dropped, and one > new one added. > > In neutron.conf, keystone_authtoken.memcached_servers is configured, but > nothing else pertaining to caching, so > keystone_authtoken.memcache_pool_maxsize should default to 10. > > Am I misunderstanding something, or shouldn't I see a maximum of 10 > connections from each of the neutron-server API workers, with this > configuration? > > Any known issues, or pointers to what I'm missing? > > TIA, > >     ~iain From ignaziocassano at gmail.com Tue Jul 31 05:04:06 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Tue, 31 Jul 2018 07:04:06 +0200 Subject: [Openstack-operators] dashboard show only project after upgrading In-Reply-To: References: Message-ID: Ok, I will check. thanks Il Lun 30 Lug 2018 23:40 Mathieu Gagné ha scritto: > Try enabling DEBUG in local_settings.py. Some dashboard or panel might > fail loading for some reasons. > I had a similar behavior last week and enabling DEBUG should show the > error. > > -- > Mathieu > > > On Mon, Jul 30, 2018 at 11:35 AM, Ignazio Cassano > wrote: > > Sorry, I sent a wrong image. > > The correct screenshot is attached here. > > Regards > > > > 2018-07-30 17:33 GMT+02:00 Ignazio Cassano : > >> > >> Hello everyone, > >> I upgraded openstack centos 7 from ocata to pike ad command line work > fine > >> but dashboard does not show any menu on the left . > >> I missed the following menus: > >> > >> Project > >> Admin > >> Identity > >> > >> You can find the image attached here. > >> > >> Could anyone help me ? > >> Regards > >> Ignazio > > > > > > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sorrison at gmail.com Tue Jul 31 06:56:56 2018 From: sorrison at gmail.com (Sam Morrison) Date: Tue, 31 Jul 2018 16:56:56 +1000 Subject: [Openstack-operators] neutron-server memcached connections In-Reply-To: <59bb939e-7aa0-6de6-4f2a-61fd2f4650ae@oracle.com> References: <9598665a-8748-9fa8-147d-e618db3f7b94@oracle.com> <59bb939e-7aa0-6de6-4f2a-61fd2f4650ae@oracle.com> Message-ID: <23985AB0-B635-4311-BACF-0194D2306501@gmail.com> Great, yeah we have also seen these issues with nova-api with keystonemiddle in newton and ocata. Thanks for the heads up as I was going to start digging deeper. Cheers, Sam > On 31 Jul 2018, at 10:09 am, iain MacDonnell wrote: > > > Following up on my own question, in case it's useful to others.... > > Turns out that keystonemiddleware uses eventlet, and, by default, creates a connection to memcached from each green thread (and doesn't clean them up), and the green threads are essentially unlimited. > > There is a solution for this, which implements a shared connection pool. It's enabled via the keystone_authtoken.memcache_use_advanced_pool config option. > > Unfortunately it was broken in a few different ways (I guess this means that no one is using it?) > > I've worked with the keystone devs, and we were able to get a fix (in keystonemiddleware) in just in time for the Rocky release. Related fixes have also been backported to Queens (for the next update), and a couple needed for Pike are pending completion. > > With this in place, so-far I have not seen more than one connection to memcached for each neutron-api worker process, and everything seems to be working well. > > Some relevant changes: > > master: > > https://review.openstack.org/#/c/583695/ > > > Queens: > > https://review.openstack.org/#/c/583698/ > https://review.openstack.org/#/c/583684/ > > > Pike: > > https://review.openstack.org/#/c/583699/ > https://review.openstack.org/#/c/583835/ > > > I do wonder how others are managing memcached connections for larger deployments... > > ~iain > > > > On 06/26/2018 12:59 PM, iain MacDonnell wrote: >> In diagnosing a situation where a Pike deployment was intermittently slower (in general), I discovered that it was (sometimes) exceeding memcached's maximum connection limit, which is set to 4096. >> Looking closer, ~2750 of the connections are from 8 neutron-server process. neutron-server is configured with 8 API workers, and those 8 processes have a combined total of ~2750 connections to memcached: >> # lsof -i TCP:11211 | awk '/^neutron-s/ {print $2}' | sort | uniq -c >> 245 2611 >> 306 2612 >> 228 2613 >> 406 2614 >> 407 2615 >> 385 2616 >> 369 2617 >> 398 2618 >> # >> There doesn't seem to be much turnover - comparing samples of the connections (incl. source port) 15 mins apart, two were dropped, and one new one added. >> In neutron.conf, keystone_authtoken.memcached_servers is configured, but nothing else pertaining to caching, so keystone_authtoken.memcache_pool_maxsize should default to 10. >> Am I misunderstanding something, or shouldn't I see a maximum of 10 connections from each of the neutron-server API workers, with this configuration? >> Any known issues, or pointers to what I'm missing? >> TIA, >> ~iain > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From alfredo.deluca at gmail.com Tue Jul 31 08:41:37 2018 From: alfredo.deluca at gmail.com (Alfredo De Luca) Date: Tue, 31 Jul 2018 10:41:37 +0200 Subject: [Openstack-operators] swift question In-Reply-To: References: Message-ID: Thanks Clay. I ve been using --changed already but doesn't sync the content of the folder remotely. I ll haver a look at rclone as you suggested Thanks On Mon, Jul 30, 2018 at 6:28 PM Clay Gerrard wrote: > Sure! python swiftclient's upload command has a --changed option: > > > https://docs.openstack.org/python-swiftclient/latest/cli/index.html#swift-upload > > But you might be happier with something more sophisticated like rclone: > > https://rclone.org/ > > Nice thing about object storage is you can access it from anywhere via > HTTP and PUT anything you want in there ;) > > -Clay > > On Mon, Jul 30, 2018 at 9:54 AM Alfredo De Luca > wrote: > >> Hi all. >> I wonder if i can sync a directory on a server to the obj store (swift). >> What I do now is just a backup but I d like to implement a sort of file >> rotate locally and on the obj store. >> Any idea? >> >> >> -- >> *Alfredo* >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > -- *Alfredo* -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.therond at gmail.com Tue Jul 31 08:59:56 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Tue, 31 Jul 2018 10:59:56 +0200 Subject: [Openstack-operators] [OCTAVIA][KOLLA] - Amphora to control plan communication question. Message-ID: Hi Folks, I'm currently deploying the Octavia component into our testing environment which is based on KOLLA. So far I'm quite enjoying it as it is pretty much straight forward (Except for some documentation pitfalls), but I'm now facing a weird and hard to debug situation. I actually have a hard time to understand how Amphora are communicating back and forth with the Control Plan components. >From my understanding, as soon as I create a new LB, the Control Plan is spawning an instance using the configured Octavia Flavor and Image type, attach it to the LB-MGMT-NET and to the user provided subnet. What I think I'm misunderstanding is the discussion that follows between the amphora and the different components such as the HealthManager/HouseKeeper, the API and the Worker. How is the amphora agent able to found my control plan? Is the HealthManager or the Octavia Worker initiating the communication to the Amphora on port 9443 and so give the agent the API/Control plan internalURL? If anyone have a diagram of the workflow I would be more than happy ^^ Thanks a lot in advance to anyone willing to help :D -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Tue Jul 31 12:23:04 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Tue, 31 Jul 2018 14:23:04 +0200 Subject: [Openstack-operators] manila-ui does not work after upgrading from ocata to pike Message-ID: Hi everyone, I upgraded my centos 7 openstack from ocata to pike. Openstack dashboard works fine only if I remove openstack manila ui package. With the manila ui it gives me internal server error. In httpd error log I read: KeyError: From johnsomor at gmail.com Tue Jul 31 16:14:53 2018 From: johnsomor at gmail.com (Michael Johnson) Date: Tue, 31 Jul 2018 09:14:53 -0700 Subject: [Openstack-operators] [OCTAVIA][KOLLA] - Amphora to control plan communication question. In-Reply-To: References: Message-ID: Hi Flint, We don't have a logical network diagram at this time (it's still on the to-do list), but I can talk you through it. The Octavia worker, health manager, and housekeeping need to be able to reach the amphora (service VM at this point) over the lb-mgmt-net on TCP 9443. It knows the amphora IP addresses on the lb-mgmt-net via the database and the information we save from the compute driver (I.e. what IP was assigned to the instance). The Octavia API process does not need to be connected to the lb-mgmt-net at this time. It only connects the the messaging bus and the Octavia database. Provider drivers may have other connectivity requirements for the Octavia API. The amphorae also send UDP packets back to the health manager on port 5555. This is the heartbeat packet from the amphora. It contains the health and statistics from that amphora. It know it's list of health manager endpoints from the configuration file "controller_ip_port_list" (https://docs.openstack.org/octavia/latest/configuration/configref.html#health_manager.controller_ip_port_list). Each amphora will rotate through that list of endpoints to reduce the chance of a network split impacting the heartbeat messages. This is the only traffic that passed over this network. All of it is IP based and can be routed (it does not require L2 connectivity). Michael On Tue, Jul 31, 2018 at 2:00 AM Flint WALRUS wrote: > > Hi Folks, > > I'm currently deploying the Octavia component into our testing environment which is based on KOLLA. > > So far I'm quite enjoying it as it is pretty much straight forward (Except for some documentation pitfalls), but I'm now facing a weird and hard to debug situation. > > I actually have a hard time to understand how Amphora are communicating back and forth with the Control Plan components. > > From my understanding, as soon as I create a new LB, the Control Plan is spawning an instance using the configured Octavia Flavor and Image type, attach it to the LB-MGMT-NET and to the user provided subnet. > > What I think I'm misunderstanding is the discussion that follows between the amphora and the different components such as the HealthManager/HouseKeeper, the API and the Worker. > > How is the amphora agent able to found my control plan? Is the HealthManager or the Octavia Worker initiating the communication to the Amphora on port 9443 and so give the agent the API/Control plan internalURL? > > If anyone have a diagram of the workflow I would be more than happy ^^ > > Thanks a lot in advance to anyone willing to help :D > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From thierry at openstack.org Tue Jul 31 16:30:24 2018 From: thierry at openstack.org (Thierry Carrez) Date: Tue, 31 Jul 2018 18:30:24 +0200 Subject: [Openstack-operators] [ptg] Self-healing SIG meeting moved to Thursday morning Message-ID: <3ee9cf15-4587-7884-f8fd-b00ec22549fc@openstack.org> Hi! Quick heads-up: Following a request[1] from Adam Spiers (SIG lead), we modified the PTG schedule to move the Self-Healing SIG meeting from Friday (all day) to Thursday morning (only morning). You can see the resulting schedule at: https://www.openstack.org/ptg#tab_schedule Sorry for any inconvenience this may cause. [1] http://lists.openstack.org/pipermail/openstack-dev/2018-July/132392.html -- Thierry Carrez (ttx) From aspiers at suse.com Tue Jul 31 16:57:56 2018 From: aspiers at suse.com (Adam Spiers) Date: Tue, 31 Jul 2018 17:57:56 +0100 Subject: [Openstack-operators] [openstack-dev] [ptg] Self-healing SIG meeting moved to Thursday morning In-Reply-To: <3ee9cf15-4587-7884-f8fd-b00ec22549fc@openstack.org> References: <3ee9cf15-4587-7884-f8fd-b00ec22549fc@openstack.org> Message-ID: <20180731165755.dqxgittuzao2sdhu@pacific.linksys.moosehall> Thierry Carrez wrote: >Hi! Quick heads-up: > >Following a request[1] from Adam Spiers (SIG lead), we modified the >PTG schedule to move the Self-Healing SIG meeting from Friday (all >day) to Thursday morning (only morning). You can see the resulting >schedule at: > >https://www.openstack.org/ptg#tab_schedule > >Sorry for any inconvenience this may cause. It's me who should be apologising - Thierry only deserves thanks for accommodating my request at late notice ;-) From gael.therond at gmail.com Tue Jul 31 17:05:11 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Tue, 31 Jul 2018 19:05:11 +0200 Subject: [Openstack-operators] [OCTAVIA][KOLLA] - Amphora to control plan communication question. In-Reply-To: References: Message-ID: Hi Michael, thanks a lot for that explanation, it’s actually how I envisioned the flow. I’ll have to produce a diagram for my peers understanding, I maybe can share it with you. There is still one point that seems to be a little bit odd to me. How the amphora agent know where to find out the healthManagers and worker services? Is that because the worker is sending the agent some catalog informations or because we set that at diskimage-create time? If so, I think the Centos based amphora is missing the agent.conf because currently my vms doesn’t have any. Once again thanks for your help! Le mar. 31 juil. 2018 à 18:15, Michael Johnson a écrit : > Hi Flint, > > We don't have a logical network diagram at this time (it's still on > the to-do list), but I can talk you through it. > > The Octavia worker, health manager, and housekeeping need to be able > to reach the amphora (service VM at this point) over the lb-mgmt-net > on TCP 9443. It knows the amphora IP addresses on the lb-mgmt-net via > the database and the information we save from the compute driver (I.e. > what IP was assigned to the instance). > > The Octavia API process does not need to be connected to the > lb-mgmt-net at this time. It only connects the the messaging bus and > the Octavia database. Provider drivers may have other connectivity > requirements for the Octavia API. > > The amphorae also send UDP packets back to the health manager on port > 5555. This is the heartbeat packet from the amphora. It contains the > health and statistics from that amphora. It know it's list of health > manager endpoints from the configuration file > "controller_ip_port_list" > ( > https://docs.openstack.org/octavia/latest/configuration/configref.html#health_manager.controller_ip_port_list > ). > Each amphora will rotate through that list of endpoints to reduce the > chance of a network split impacting the heartbeat messages. > > This is the only traffic that passed over this network. All of it is > IP based and can be routed (it does not require L2 connectivity). > > Michael > > On Tue, Jul 31, 2018 at 2:00 AM Flint WALRUS > wrote: > > > > Hi Folks, > > > > I'm currently deploying the Octavia component into our testing > environment which is based on KOLLA. > > > > So far I'm quite enjoying it as it is pretty much straight forward > (Except for some documentation pitfalls), but I'm now facing a weird and > hard to debug situation. > > > > I actually have a hard time to understand how Amphora are communicating > back and forth with the Control Plan components. > > > > From my understanding, as soon as I create a new LB, the Control Plan is > spawning an instance using the configured Octavia Flavor and Image type, > attach it to the LB-MGMT-NET and to the user provided subnet. > > > > What I think I'm misunderstanding is the discussion that follows between > the amphora and the different components such as the > HealthManager/HouseKeeper, the API and the Worker. > > > > How is the amphora agent able to found my control plan? Is the > HealthManager or the Octavia Worker initiating the communication to the > Amphora on port 9443 and so give the agent the API/Control plan internalURL? > > > > If anyone have a diagram of the workflow I would be more than happy ^^ > > > > Thanks a lot in advance to anyone willing to help :D > > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tpb at dyncloud.net Tue Jul 31 17:12:17 2018 From: tpb at dyncloud.net (Tom Barron) Date: Tue, 31 Jul 2018 13:12:17 -0400 Subject: [Openstack-operators] manila-ui does not work after upgrading from ocata to pike In-Reply-To: References: Message-ID: <20180731171217.t47enqskb425zu3b@barron.net> On 31/07/18 14:23 +0200, Ignazio Cassano wrote: >Hi everyone, >I upgraded my centos 7 openstack from ocata to pike. >Openstack dashboard works fine only if I remove openstack manila ui package. >With the manila ui it gives me internal server error. > >In httpd error log I read: > > > KeyError: > >Please, anyone has solved this issue yet ? > >Regards >Ignazio >_______________________________________________ >OpenStack-operators mailing list >OpenStack-operators at lists.openstack.org >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators Seems likely to be a packaging issue. Might be the issue that this (queens) patch [1] addressed. To get someone to work on the pike issue please file a BZ against RDO [2]. -- Tom Barron (tbarron) [1] https://review.rdoproject.org/r/#/c/14049/ [2] https://bugzilla.redhat.com/enter_bug.cgi?product=RDO&component=openstack-manila-ui From ignaziocassano at gmail.com Tue Jul 31 19:19:25 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Tue, 31 Jul 2018 21:19:25 +0200 Subject: [Openstack-operators] manila-ui does not work after upgrading from ocata to pike In-Reply-To: <20180731171217.t47enqskb425zu3b@barron.net> References: <20180731171217.t47enqskb425zu3b@barron.net> Message-ID: Hello Tom, I wiil upgrade from pike to Queens asap. I upgraded from ocata to pike to go on step by step, but it is useful dir the community I can open a bug. What do you think ? Thanks Ignazio Il Mar 31 Lug 2018 19:12 Tom Barron ha scritto: > On 31/07/18 14:23 +0200, Ignazio Cassano wrote: > >Hi everyone, > >I upgraded my centos 7 openstack from ocata to pike. > >Openstack dashboard works fine only if I remove openstack manila ui > package. > >With the manila ui it gives me internal server error. > > > >In httpd error log I read: > > > > > > KeyError: > > > > >Please, anyone has solved this issue yet ? > > > >Regards > >Ignazio > > >_______________________________________________ > >OpenStack-operators mailing list > >OpenStack-operators at lists.openstack.org > >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > Seems likely to be a packaging issue. Might be the issue that this > (queens) patch [1] addressed. To get someone to work on the pike > issue please file a BZ against RDO [2]. > > -- Tom Barron (tbarron) > > [1] https://review.rdoproject.org/r/#/c/14049/ > > [2] > https://bugzilla.redhat.com/enter_bug.cgi?product=RDO&component=openstack-manila-ui > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tpb at dyncloud.net Tue Jul 31 19:25:14 2018 From: tpb at dyncloud.net (Tom Barron) Date: Tue, 31 Jul 2018 15:25:14 -0400 Subject: [Openstack-operators] manila-ui does not work after upgrading from ocata to pike In-Reply-To: References: <20180731171217.t47enqskb425zu3b@barron.net> Message-ID: <20180731192514.6whouw4h5af7arj2@barron.net> On 31/07/18 21:19 +0200, Ignazio Cassano wrote: >Hello Tom, I wiil upgrade from pike to Queens asap. >I upgraded from ocata to pike to go on step by step, but it is useful dir >the community I can open a bug. >What do you think ? >Thanks >Ignazio Opening a bug would help others. manila-ui has been a bit of a neglected step-child so I'm glad you are checking it out! -- Tom > > >Il Mar 31 Lug 2018 19:12 Tom Barron ha scritto: > >> On 31/07/18 14:23 +0200, Ignazio Cassano wrote: >> >Hi everyone, >> >I upgraded my centos 7 openstack from ocata to pike. >> >Openstack dashboard works fine only if I remove openstack manila ui >> package. >> >With the manila ui it gives me internal server error. >> > >> >In httpd error log I read: >> > >> > >> > KeyError: > > >> > >> >Please, anyone has solved this issue yet ? >> > >> >Regards >> >Ignazio >> >> >_______________________________________________ >> >OpenStack-operators mailing list >> >OpenStack-operators at lists.openstack.org >> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >> Seems likely to be a packaging issue. Might be the issue that this >> (queens) patch [1] addressed. To get someone to work on the pike >> issue please file a BZ against RDO [2]. >> >> -- Tom Barron (tbarron) >> >> [1] https://review.rdoproject.org/r/#/c/14049/ >> >> [2] >> https://bugzilla.redhat.com/enter_bug.cgi?product=RDO&component=openstack-manila-ui >> >> From ignaziocassano at gmail.com Tue Jul 31 19:51:04 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Tue, 31 Jul 2018 21:51:04 +0200 Subject: [Openstack-operators] dashboard show only project after upgrading In-Reply-To: <164ebe48760.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> References: <164ebe48760.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> Message-ID: Hello, purging and reinstalling the dashboard solved. thanks ignazio Il Lun 30 Lug 2018 17:53 Chris Apsey ha scritto: > Ignazio, > > Are your horizon instances in separate containers/VMS? If so, I'd highly > recommend completely wiping them and rebuilding from scratch since horizon > itself is stateless. I am not a fan of upgrades for reasons like this. > > If that's not possible, a purge of the horizon packages on your controller > and a reinstallation should fix it. > > Chris > > On July 30, 2018 11:38:03 Ignazio Cassano > wrote: > >> Sorry, I sent a wrong image. >> The correct screenshot is attached here. >> Regards >> >> 2018-07-30 17:33 GMT+02:00 Ignazio Cassano : >> >>> Hello everyone, >>> I upgraded openstack centos 7 from ocata to pike ad command line work >>> fine >>> but dashboard does not show any menu on the left . >>> I missed the following menus: >>> >>> Project >>> Admin >>> Identity >>> >>> You can find the image attached here. >>> >>> Could anyone help me ? >>> Regards >>> Ignazio >>> >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: