From cdent+os at anticdent.org Sat Jun 2 00:28:01 2018 From: cdent+os at anticdent.org (Chris Dent) Date: Fri, 1 Jun 2018 17:28:01 -0700 (PDT) Subject: [Openstack-operators] [cinder] [placement] cinder + placement forum session etherpad In-Reply-To: References: Message-ID: On Wed, 9 May 2018, Chris Dent wrote: > I've started an etherpad for the forum session in Vancouver devoted > to discussing the possibility of tracking and allocation resources > in Cinder using the Placement service. This is not a done deal. > Instead the session is to discuss if it could work and how to make > it happen if it seems like a good idea. > > The etherpad is at > > https://etherpad.openstack.org/p/YVR-cinder-placement The session went well. Some of the members of the cinder team who might have had more questions had not been able to be at summit so we were unable to get their input. We clarified some of the things that cinder wants to be able to accomplish (run multiple schedulers in active-active and avoid race conditions) and the fact that this is what placement is built for. We also made it clear that placement itself can be highly available (and scalable) because of its nature as a dead-simple web app over a database. The next steps are for the cinder team to talk amongst themselves and socialize the capabilities of placement (with the help of placement people) and see if it will be suitable. It is unlikely there will be much visible progress in this area before Stein. See the etherpad for a bit more detail. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From bitskrieg at bitskrieg.net Sat Jun 2 06:37:57 2018 From: bitskrieg at bitskrieg.net (Chris Apsey) Date: Sat, 02 Jun 2018 09:37:57 +0300 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: <914e8aec-2a25-e7d0-270f-725b0aeba9d5@gmail.com> References: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> <350f070b9d654a0a5430fafb07bcc1d41c98d2f8.camel@emag.ro> <43cd8579c761a13dcc81e6ffc9a69089fb421cda.camel@emag.ro> <6ec89b3767f26b73451fa490d18ed7756266d3ec.camel@emag.ro> <163aea4dc70.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> <914e8aec-2a25-e7d0-270f-725b0aeba9d5@gmail.com> Message-ID: <163bf378588.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> This is great. I would even go so far as to say the install docs should be updated to capture this as the default; as far as I know there is no negative impact when running in daemon mode, even on very small deployments. I would imagine that there are operators out there who have run into this issue but didn't know how to work through it - making stuff like this less painful is key to breaking the 'openstack is hard' stigma. On June 1, 2018 00:49:32 Matt Riedemann wrote: > On 5/30/2018 9:30 AM, Matt Riedemann wrote: >> >> I can start pushing some docs patches and report back here for review help. > > Here are the docs patches in both nova and neutron: > > https://review.openstack.org/#/q/topic:bug/1774217+(status:open+OR+status:merged) > > -- > > Thanks, > > Matt > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From mriedemos at gmail.com Sun Jun 3 14:54:44 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Sun, 3 Jun 2018 09:54:44 -0500 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: <163bf378588.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> References: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> <350f070b9d654a0a5430fafb07bcc1d41c98d2f8.camel@emag.ro> <43cd8579c761a13dcc81e6ffc9a69089fb421cda.camel@emag.ro> <6ec89b3767f26b73451fa490d18ed7756266d3ec.camel@emag.ro> <163aea4dc70.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> <914e8aec-2a25-e7d0-270f-725b0aeba9d5@gmail.com> <163bf378588.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> Message-ID: On 6/2/2018 1:37 AM, Chris Apsey wrote: > This is great.  I would even go so far as to say the install docs should > be updated to capture this as the default; as far as I know there is no > negative impact when running in daemon mode, even on very small > deployments.  I would imagine that there are operators out there who > have run into this issue but didn't know how to work through it - making > stuff like this less painful is key to breaking the 'openstack is hard' > stigma. I think changing the default on the root_helper_daemon option is a good idea if everyone is setting that anyway. There are some comments in the code next to the option that make me wonder if there are edge cases where it might not be a good idea, but I don't really know the details, someone from the neutron team that knows more about it would have to speak up. Also, I wonder if converting to privsep in the neutron agent would eliminate the need for this option altogether and still gain the performance benefits. -- Thanks, Matt From skaplons at redhat.com Sun Jun 3 19:09:50 2018 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Sun, 3 Jun 2018 21:09:50 +0200 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: References: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> <350f070b9d654a0a5430fafb07bcc1d41c98d2f8.camel@emag.ro> <43cd8579c761a13dcc81e6ffc9a69089fb421cda.camel@emag.ro> <6ec89b3767f26b73451fa490d18ed7756266d3ec.camel@emag.ro> <163aea4dc70.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> <914e8aec-2a25-e7d0-270f-725b0aeba9d5@gmail.com> <163bf378588.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> Message-ID: <6DB37469-33DD-4182-9545-C3F0712886A5@redhat.com> Hi, > Wiadomość napisana przez Matt Riedemann w dniu 03.06.2018, o godz. 16:54: > > On 6/2/2018 1:37 AM, Chris Apsey wrote: >> This is great. I would even go so far as to say the install docs should be updated to capture this as the default; as far as I know there is no negative impact when running in daemon mode, even on very small deployments. I would imagine that there are operators out there who have run into this issue but didn't know how to work through it - making stuff like this less painful is key to breaking the 'openstack is hard' stigma. > > I think changing the default on the root_helper_daemon option is a good idea if everyone is setting that anyway. There are some comments in the code next to the option that make me wonder if there are edge cases where it might not be a good idea, but I don't really know the details, someone from the neutron team that knows more about it would have to speak up. > > Also, I wonder if converting to privsep in the neutron agent would eliminate the need for this option altogether and still gain the performance benefits. Converting L2 agents to privsep is ongoing process but it’s very slow. There is switch of ip_lib to privsep in progress: https://bugs.launchpad.net/neutron/+bug/1492714 But to completely drop rootwrap there is also tc_lib to switch to privsep for QoS, iptables module for security groups and probably also some other modules. So I would not consider it as possibly done soon :) > > -- > > Thanks, > > Matt > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators — Slawek Kaplonski Senior software engineer Red Hat From tobias.urdin at crystone.com Mon Jun 4 11:43:48 2018 From: tobias.urdin at crystone.com (Tobias Urdin) Date: Mon, 4 Jun 2018 11:43:48 +0000 Subject: [Openstack-operators] [nova] isolate hypervisor to project Message-ID: <56fae8af7e184a7c833efc2c5e0d1440@mb01.staff.ognet.se> Hello, I have received a question about a more specialized use case where we need to isolate several hypervisors to a specific project. My first thinking was using nova flavors for only that project and add extra specs properties to use a specific host aggregate but this means I need to assign values to all other flavors to not use those which seems weird. How could I go about solving this the easies/best way or from the history of the mailing lists, the most supported way since there is a lot of changes to scheduler/placement part right now? Best regards From mriedemos at gmail.com Mon Jun 4 12:47:21 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 4 Jun 2018 07:47:21 -0500 Subject: [Openstack-operators] [nova] isolate hypervisor to project In-Reply-To: <56fae8af7e184a7c833efc2c5e0d1440@mb01.staff.ognet.se> References: <56fae8af7e184a7c833efc2c5e0d1440@mb01.staff.ognet.se> Message-ID: <9431f4e5-1a2d-e327-6ea8-98448ec8695c@gmail.com> On 6/4/2018 6:43 AM, Tobias Urdin wrote: > I have received a question about a more specialized use case where we > need to isolate several hypervisors > > to a specific project. My first thinking was using nova flavors for only > that project and add extra specs properties to use a specific host > aggregate but this > > means I need to assign values to all other flavors to not use those > which seems weird. > > > How could I go about solving this the easies/best way or from the > history of the mailing lists, the most supported way since there is a > lot of changes > > to scheduler/placement part right now? Depending on which release you're on, it sounds like you want to use this: https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregatemultitenancyisolation In Rocky we have a replacement for that filter which does pre-filtering in Placement which should give you a performance gain when it comes time to do the host filtering: https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#tenant-isolation-with-placement Note that even if you use AggregateMultiTenancyIsolation for the one project, other projects can still randomly land on the hosts in that aggregate unless you also assign those to their own aggregates. It sounds like you're might be looking for a dedicated hosts feature? There is an RFE from the public cloud work group for that: https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1771523 -- Thanks, Matt From thierry at openstack.org Mon Jun 4 13:13:47 2018 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 4 Jun 2018 15:13:47 +0200 Subject: [Openstack-operators] [stable][EM] Summary of forum session(s) on extended maintenance Message-ID: <9401d333-64a1-6a5c-3f0a-04fb1880f5bd@openstack.org> Hi! We had a double session on extended maintenance at the Forum in Vancouver, here is a late summary of it. Feel free to add to it if you remember extra things. The first part of the session was to present the Extended Maintenance process as implemented after the discussion at the PTG in Dublin, and answer questions around it. The process was generally well received, with question on how to sign up (no real sign up required, just start helping and join #openstack-stable). There were also a number of questions around the need to maintain all releases up to an old maintained release, with explanation of the FFU process and the need to avoid regressions from release to release. The second part of the session was taking a step back and discuss extended maintenance in the context of release cycles and upgrade pain. A summary of the Dublin discussion was given. Some questions were raised on the need for fast-forward upgrades (vs. skip-level upgrades), as well as a bit of a brainstorm around how to encourage people to gather around popular EM releases (a wiki page was considered a good trade-off). The EM process mandates that no releases would be tagged after the end of the 18-month official "maintenance" period. There was a standing question on the need to still release libraries (since tests of HEAD changes are by default run against released versions of libraries). The consensus in the room was that when extended maintenance starts, we should switch to testing stable/$foo HEAD changes against stable/$foo HEAD of libraries. This should be first done when Ocata switches to extended maintenance in August. The discussion then switched to how to further ease upgrade pain, with reports of progress on the Upgrades SIG on better documenting the Fast Forward Upgrade process. We discussed how minimal cold upgrades capabilities should be considered the minimum to be considered an official OpenStack component, and whether we could use the Goals mechanism to push it. We also discussed testing database migrations with real production data (what turbo-hipster did) and the challenges to share deidentified data to that purpose. Cheers, -- Thierry Carrez (ttx) From tobias.urdin at crystone.com Mon Jun 4 13:30:30 2018 From: tobias.urdin at crystone.com (Tobias Urdin) Date: Mon, 4 Jun 2018 13:30:30 +0000 Subject: [Openstack-operators] [nova] isolate hypervisor to project References: <56fae8af7e184a7c833efc2c5e0d1440@mb01.staff.ognet.se> <9431f4e5-1a2d-e327-6ea8-98448ec8695c@gmail.com> Message-ID: Hello, Thanks for the reply Matt. The hard thing here is that I have to ensure it the other way around as well i.e other instances cannot be allowed landing on those "reserved" hypervisors. I assume I could do something like in [1] and also set key-value metadata on all flavors to select a host aggregate that is not the "reserved" hypervisors. openstack aggregate create fast-cpu --property fast-cpu=true --property other=true openstack aggregate create normal-cpu --property normal-cpu=true --property other=true openstack aggregate create dedicated --property dedicated=true openstack aggregate add host fast-cpu compute1 openstack aggregate add host normal-cpu compute2 openstack aggregate add host dedicated compute3 openstack flavor create --vcpus 4 --ram 4096 --disk 50 --property aggregate_instance_extra_specs:fast-cpu=true --property aggregate_instance_extra_specs:other=true fast-cpu.medium openstack flavor create --vcpus 4 --ram 4096 --disk 50 --property aggregate_instance_extra_specs:normal-cpu=true --property aggregate_instance_extra_specs:other=true normal-cpu.medium openstack flavor create --vcpus 4 --ram 4096 --disk 50 --private --project --property aggregate_instance_extra_specs:dedicated=true dedicated.medium It's seems very messy, would that be an supported approach? We are on Queens, doing it in a way that is not removed in the future would be optimal. Best regards [1] https://www.brad-x.com/2016/01/01/dedicate-compute-hosts-to-projects/ On 06/04/2018 02:50 PM, Matt Riedemann wrote: > On 6/4/2018 6:43 AM, Tobias Urdin wrote: >> I have received a question about a more specialized use case where we >> need to isolate several hypervisors >> >> to a specific project. My first thinking was using nova flavors for only >> that project and add extra specs properties to use a specific host >> aggregate but this >> >> means I need to assign values to all other flavors to not use those >> which seems weird. >> >> >> How could I go about solving this the easies/best way or from the >> history of the mailing lists, the most supported way since there is a >> lot of changes >> >> to scheduler/placement part right now? > Depending on which release you're on, it sounds like you want to use this: > > https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregatemultitenancyisolation > > In Rocky we have a replacement for that filter which does pre-filtering > in Placement which should give you a performance gain when it comes time > to do the host filtering: > > https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#tenant-isolation-with-placement > > Note that even if you use AggregateMultiTenancyIsolation for the one > project, other projects can still randomly land on the hosts in that > aggregate unless you also assign those to their own aggregates. > > It sounds like you're might be looking for a dedicated hosts feature? > There is an RFE from the public cloud work group for that: > > https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1771523 > From tobias.urdin at crystone.com Mon Jun 4 13:36:54 2018 From: tobias.urdin at crystone.com (Tobias Urdin) Date: Mon, 4 Jun 2018 13:36:54 +0000 Subject: [Openstack-operators] [nova] isolate hypervisor to project References: <56fae8af7e184a7c833efc2c5e0d1440@mb01.staff.ognet.se> <9431f4e5-1a2d-e327-6ea8-98448ec8695c@gmail.com> Message-ID: Saw now in the docs that multiple aggregate_instance_extra_specs keys should be a comma-separated list. But other than that, would the below do what I'm looking for? Has a very high maintenance level when having a lot of hypervisors and steadily adding new ones, but I can't see any other way to fully isolate it. Would've been cool if the RFE you mentioned [1] could be researched and if it qualifies implemented. Best regards [1] https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1771523 On 06/04/2018 03:32 PM, Tobias Urdin wrote: > Hello, > Thanks for the reply Matt. > > The hard thing here is that I have to ensure it the other way around as > well i.e other instances cannot be allowed landing on those "reserved" > hypervisors. > I assume I could do something like in [1] and also set key-value > metadata on all flavors to select a host aggregate that is not the > "reserved" hypervisors. > > openstack aggregate create fast-cpu --property fast-cpu=true --property > other=true > openstack aggregate create normal-cpu --property normal-cpu=true > --property other=true > openstack aggregate create dedicated --property dedicated=true > openstack aggregate add host fast-cpu compute1 > openstack aggregate add host normal-cpu compute2 > openstack aggregate add host dedicated compute3 > openstack flavor create --vcpus 4 --ram 4096 --disk 50 --property > aggregate_instance_extra_specs:fast-cpu=true --property > aggregate_instance_extra_specs:other=true fast-cpu.medium > openstack flavor create --vcpus 4 --ram 4096 --disk 50 --property > aggregate_instance_extra_specs:normal-cpu=true --property > aggregate_instance_extra_specs:other=true normal-cpu.medium > openstack flavor create --vcpus 4 --ram 4096 --disk 50 --private > --project --property > aggregate_instance_extra_specs:dedicated=true dedicated.medium > > It's seems very messy, would that be an supported approach? > We are on Queens, doing it in a way that is not removed in the future > would be optimal. > > Best regards > > [1] https://www.brad-x.com/2016/01/01/dedicate-compute-hosts-to-projects/ > > > On 06/04/2018 02:50 PM, Matt Riedemann wrote: >> On 6/4/2018 6:43 AM, Tobias Urdin wrote: >>> I have received a question about a more specialized use case where we >>> need to isolate several hypervisors >>> >>> to a specific project. My first thinking was using nova flavors for only >>> that project and add extra specs properties to use a specific host >>> aggregate but this >>> >>> means I need to assign values to all other flavors to not use those >>> which seems weird. >>> >>> >>> How could I go about solving this the easies/best way or from the >>> history of the mailing lists, the most supported way since there is a >>> lot of changes >>> >>> to scheduler/placement part right now? >> Depending on which release you're on, it sounds like you want to use this: >> >> https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregatemultitenancyisolation >> >> In Rocky we have a replacement for that filter which does pre-filtering >> in Placement which should give you a performance gain when it comes time >> to do the host filtering: >> >> https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#tenant-isolation-with-placement >> >> Note that even if you use AggregateMultiTenancyIsolation for the one >> project, other projects can still randomly land on the hosts in that >> aggregate unless you also assign those to their own aggregates. >> >> It sounds like you're might be looking for a dedicated hosts feature? >> There is an RFE from the public cloud work group for that: >> >> https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1771523 >> > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > From JohnMP at cardiff.ac.uk Mon Jun 4 13:37:49 2018 From: JohnMP at cardiff.ac.uk (Matthew John) Date: Mon, 4 Jun 2018 13:37:49 +0000 Subject: [Openstack-operators] Switching from Fuel to OpenStack-Ansible Message-ID: Hi, Apologies if this has been asked before but Google didn't turn up anything useful. We are currently using Fuel to deploy OpenStack but given that it seems to be unmaintained have choose to switch to OpenStack-Ansible. The current setup routes all external traffic i.e. for floating IPs through the control nodes which is ideal as we have a limited number of external 10Gb connections available. Is it possible to mirror this setup in the Ansible deployment? The configuration would be three control nodes each with three 10Gb NICs: Bonded 10Gb NICs for storage, management networks 10Gb NIC for external network connectivity and access to Horizon dashboard All external traffic from the compute nodes would then be routed through the controllers and vice versa Cheers, Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.friesen at windriver.com Mon Jun 4 15:07:31 2018 From: chris.friesen at windriver.com (Chris Friesen) Date: Mon, 4 Jun 2018 09:07:31 -0600 Subject: [Openstack-operators] [nova] isolate hypervisor to project In-Reply-To: <56fae8af7e184a7c833efc2c5e0d1440@mb01.staff.ognet.se> References: <56fae8af7e184a7c833efc2c5e0d1440@mb01.staff.ognet.se> Message-ID: <5B1555B3.7030100@windriver.com> On 06/04/2018 05:43 AM, Tobias Urdin wrote: > Hello, > > I have received a question about a more specialized use case where we need to > isolate several hypervisors to a specific project. My first thinking was > using nova flavors for only that project and add extra specs properties to > use a specific host aggregate but this means I need to assign values to all > other flavors to not use those which seems weird. > > How could I go about solving this the easies/best way or from the > history of the mailing lists, the most supported way since there is a > lot of changes to scheduler/placement part right now? There was a "Strict isolation of group of hosts for images" spec that was proposed for a number of releases but never got accepted: https://review.openstack.org/#/c/381912/ The idea was to have special metadata on a host aggregate and a new scheduler filter such that only instances with images having a property matching the metadata would be allowed to land on that host aggregate. In the end the spec was abandoned (see the final comment in the review) because it was expected that a combination of other accepted features would enable the desired behaviour. It might be worth checking out the links in the final comment. Chris From mriedemos at gmail.com Mon Jun 4 15:53:43 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 4 Jun 2018 10:53:43 -0500 Subject: [Openstack-operators] [openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point In-Reply-To: <6992a8851a8349eeb194664c267a1a63@garmin.com> References: <6992a8851a8349eeb194664c267a1a63@garmin.com> Message-ID: +openstack-operators to see if others have the same use case On 5/31/2018 5:14 PM, Moore, Curt wrote: > We recently upgraded from Liberty to Pike and looking ahead to the code > in Queens, noticed the image download deprecation notice with > instructions to post here if this interface was in use.  As such, I’d > like to explain our use case and see if there is a better way of > accomplishing our goal or lobby for the "un-deprecation" of this > extension point. Thanks for speaking up - this is much easier *before* code is removed. > > As with many installations, we are using Ceph for both our Glance image > store and VM instance disks.  In a normal workflow when both Glance and > libvirt are configured to use Ceph, libvirt reacts to the direct_url > field on the Glance image and performs an in-place clone of the RAW disk > image from the images pool into the vms pool all within Ceph.  The > snapshot creation process is very fast and is thinly provisioned as it’s > a COW snapshot. > > This underlying workflow itself works great, the issue is with > performance of the VM’s disk within Ceph, especially as the number of > nodes within the cluster grows.  We have found, especially with Windows > VMs (largely as a result of I/O for the Windows pagefile), that the > performance of the Ceph cluster as a whole takes a very large hit in > keeping up with all of this I/O thrashing, especially when Windows is > booting.  This is not the case with Linux VMs as they do not use swap as > frequently as do Windows nodes with their pagefiles.  Windows can be run > without a pagefile but that leads to other odditites within Windows. > > I should also mention that in our case, the nodes themselves are > ephemeral and we do not care about live migration, etc., we just want > raw performance. > > As an aside on our Ceph setup without getting into too many details, we > have very fast SSD based Ceph nodes for this pool (separate crush root, > SSDs for both OSD and journals, 2 replicas), interconnected on the same > switch backplane, each with bonded 10GB uplinks to the switch.  Our Nova > nodes are within the same datacenter (also have bonded 10GB uplinks to > their switches) but are distributed across different switches.  We could > move the Nova nodes to the same switch as the Ceph nodes but that is a > larger logistical challenge to rearrange many servers to make space. > > Back to our use case, in order to isolate this heavy I/O, a subset of > our compute nodes have a local SSD and are set to use qcow2 images > instead of rbd so that libvirt will pull the image down from Glance into > the node’s local image cache and run the VM from the local SSD.  This > allows Windows VMs to boot and perform their initial cloudbase-init > setup/reboot within ~20 sec vs 4-5 min, regardless of overall Ceph > cluster load.  Additionally, this prevents us from "wasting" IOPS and > instead keep them local to the Nova node, reclaiming the network > bandwidth and Ceph IOPS for use by Cinder volumes.  This is essentially > the use case outlined here in the "Do designate some non-Ceph compute > hosts with low-latency local storage" section: > > https://ceph.com/planet/the-dos-and-donts-for-ceph-for-openstack/ > > The challenge is that transferring the Glance image transfer is > _glacially slow_ when using the Glance HTTP API (~30 min for a 50GB > Windows image (It’s Windows, it’s huge with all of the necessary tools > installed)).  If libvirt can instead perform an RBD export on the image > using the image download functionality, it is able to download the same > image in ~30 sec.  We have code that is performing the direct download > from Glance over RBD and it works great in our use case which is very > similar to the code in this older patch: > > https://review.openstack.org/#/c/44321/ It looks like at the time this had general approval (i.e. it wasn't considered crazy) but was blocked simply due to the Havana feature freeze. That's good to know. > > We could look at attaching an additional ephemeral disk to the instance > and have cloudbase-init use it as the pagefile but it appears that if > libvirt is using rbd for its images_type, _all_ disks must then come > from Ceph, there is no way at present to allow the VM image to run from > Ceph and have an ephemeral disk mapped in from node-local storage.  Even > still, this would have the effect of "wasting" Ceph IOPS for the VM disk > itself which could be better used for other purposes. When you mentioned the swap above I was thinking similar to this, attaching a swap device but as you've pointed out, all disks local to the compute host are going to use the same image type backend, so you can't have the root disk and swap/ephemeral disks using different image backends. > > Based on what I have explained about our use case, is there a > better/different way to accomplish the same goal without using the > deprecated image download functionality?  If not, can we work to > "un-deprecate" the download extension point? Should I work to get the > code for this RBD download into the upstream repository? > I think you should propose your changes upstream with a blueprint, the docs for the blueprint process are here: https://docs.openstack.org/nova/latest/contributor/blueprints.html Since it's not an API change, this might just be a specless blueprint, but you'd need to write up the blueprint and probably post the PoC code to Gerrit and then bring it up during the "Open Discussion" section of the weekly nova meeting. Once we can take a look at the code change, we can go from there on whether or not to add that in-tree or go some alternative route. Until that happens, I think we'll just say we won't remove that deprecated image download extension code, but that's not going to be an unlimited amount of time if you don't propose your changes upstream. Is there going to be anything blocking or slowing you down on your end with regard to contributing this change, like legal approval, license agreements, etc? If so, please be up front about that. -- Thanks, Matt From mriedemos at gmail.com Mon Jun 4 20:41:17 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 4 Jun 2018 15:41:17 -0500 Subject: [Openstack-operators] [openstack-dev] [TC] Stein Goal Selection In-Reply-To: References: <20180604180742.GA6404@sm-xps> Message-ID: <34fa9615-2add-4a93-e9fc-2823340357a1@gmail.com> +openstack-operators since we need to have more operator feedback in our community-wide goals decisions. +Melvin as my elected user committee person for the same reasons as adding operators into the discussion. On 6/4/2018 3:38 PM, Matt Riedemann wrote: > On 6/4/2018 1:07 PM, Sean McGinnis wrote: >> Python 3 First >> ============== >> >> One of the things brought up in the session was picking things that bring >> excitement and are obvious benefits to deployers and users of OpenStack >> services. While this one is maybe not as immediately obvious, I think >> this >> is something that will end up helping deployers and also falls into >> the tech >> debt reduction category that will help us move quicker long term. >> >> Python 2 is going away soon, so I think we need something to help >> compel folks >> to work on making sure we are ready to transition. This will also be a >> good >> point to help switch the mindset over to Python 3 being the default used >> everywhere, with our Python 2 compatibility being just to continue legacy >> support. > > I still don't really know what this goal means - we have python 3 > support across the projects for the most part don't we? Based on that, > this doesn't seem like much to take an entire "goal slot" for the release. > >> >> Cold Upgrade Support >> ==================== >> >> The other suggestion in the Forum session related to upgrades was the >> addition >> of "upgrade check" CLIs for each project, and I was tempted to suggest >> that as >> my second strawman choice. For some projects that would be a very >> minimal or >> NOOP check, so it would probably be easy to complete the goal. But >> ultimately >> what I think would bring the most value would be the work on >> supporting cold >> upgrade, even if it will be more of a stretch for some projects to >> accomplish. > > I think you might be mixing two concepts here. > > The cold upgrade support, per my understanding, is about getting the > assert:supports-upgrade tag: > > https://governance.openstack.org/tc/reference/tags/assert_supports-upgrade.html > > > Which to me basically means the project runs a grenade job. There was > discussion in the room about grenade not being a great tool for all > projects, but no one is working on a replacement for that, so I don't > think it's really justification at this point for *not* making it a goal. > > The "upgrade check" CLIs is a different thing though, which is more > about automating as much of the upgrade release notes as possible. See > the nova docs for examples on how we have used it: > > https://docs.openstack.org/nova/latest/cli/nova-status.html > > I'm not sure what projects you had in mind when you said, "For some > projects that would be a very minimal or NOOP check, so it would > probably be easy to complete the goal." I would expect that projects > aren't meeting the goal if they are noop'ing everything. But what can be > automated like this isn't necessarily black and white either. > >> >> Upgrades have been a major focus of discussion lately, especially as our >> operators have been trying to get closer to the latest work upstream. >> This has >> been an ongoing challenge. >> >> There has also been a lot of talk about LTS releases. We've landed on >> fast >> forward upgrade to get between several releases, but I think improving >> upgrades >> eases the way both for easier and more frequent upgrades and also >> getting to >> the point some day where maybe we can think about upgrading over several >> releases to be able to do something like an LTS to LTS upgrade. >> >> Neither one of these upgrade goals really has a clearly defined plan that >> projects can pick up now and start working on, but I think with those >> involved >> in these areas we should be able to come up with a perscriptive plan for >> projects to follow. >> >> And it would really move our fast forward upgrade story forward. > > Agreed. In the FFU Forum session at the summit I mentioned the > 'nova-status upgrade check' CLI and a lot of people in the room had > never heard of it because they are still on Mitaka before we added that > CLI (new in Ocata). But they sounded really interested in it and said > they wished other projects were doing that to help ease upgrades so they > won't be stuck on older unmaintained releases for so long. So anything > we can do to improve upgrades, including our testing for them, will help > make FFU better. > >> >> Next Steps >> ========== >> >> I'm hoping with a strawman proposal we have a basis for debating the >> merits of >> these and getting closer to being able to officially select Stein >> goals. We >> still have some time, but I would like to avoid making late-cycle >> selections so >> teams can start planning ahead for what will need to be done in Stein. >> >> Please feel free to promote other ideas for goals. That would be a >> good way for >> us to weigh the pro's and con's between these and whatever else you >> have in >> mind. Then hopefully we can come to some consensus and work towards >> clearly >> defining what needs to be done and getting things well documented for >> teams to >> pick up as soon as they wrap up Rocky (or sooner). > > I still want to lobby for a push to move off the old per-project CLIs > and close the gap on using python-openstackclient CLI for everything, > but I'm unclear on what the roadmap is for the major refactor with the > SDK Monty was talking about in Vancouver. From a new user perspective, > the 2000 individual CLIs to get anything done in OpenStack has to be a > major turn off so we should make this a higher priority - including > modernizing our per-project documentation to give OSC examples instead > of per-project (e.g. nova boot) examples. > -- Thanks, Matt From sean.mcginnis at gmx.com Mon Jun 4 22:19:55 2018 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Mon, 4 Jun 2018 17:19:55 -0500 Subject: [Openstack-operators] [openstack-dev] [TC] Stein Goal Selection In-Reply-To: <8aade74e-7eeb-7d31-8331-e2a1e6be7b64@gmx.com> References: <20180604180742.GA6404@sm-xps> <1528146144-sup-2183@lrrr.local> <8aade74e-7eeb-7d31-8331-e2a1e6be7b64@gmx.com> Message-ID: <240736b2-d066-829f-55c0-3e46ffdebc0b@gmx.com> Adding back the openstack-operators list that Matt added. On 06/04/2018 05:13 PM, Sean McGinnis wrote: > On 06/04/2018 04:17 PM, Doug Hellmann wrote: >> Excerpts from Matt Riedemann's message of 2018-06-04 15:38:48 -0500: >>> On 6/4/2018 1:07 PM, Sean McGinnis wrote: >>>> Python 3 First >>>> ============== >>>> >>>> One of the things brought up in the session was picking things that >>>> bring >>>> excitement and are obvious benefits to deployers and users of >>>> OpenStack >>>> services. While this one is maybe not as immediately obvious, I >>>> think this >>>> is something that will end up helping deployers and also falls into >>>> the tech >>>> debt reduction category that will help us move quicker long term. >>>> >>>> Python 2 is going away soon, so I think we need something to help >>>> compel folks >>>> to work on making sure we are ready to transition. This will also >>>> be a good >>>> point to help switch the mindset over to Python 3 being the default >>>> used >>>> everywhere, with our Python 2 compatibility being just to continue >>>> legacy >>>> support. >>> I still don't really know what this goal means - we have python 3 >>> support across the projects for the most part don't we? Based on that, >>> this doesn't seem like much to take an entire "goal slot" for the >>> release. >> We still run docs, linters, functional tests, and other jobs under >> python 2 by default. Perhaps a better framing would be to call this >> "Python 3 by default", because the point is to change all of those jobs >> to use Python 3, and to set up all future jobs using Python 3 unless we >> specifically need to run them under Python 2. >> >> This seems like a small thing, but when we did it for Oslo we did find >> code issues because the linters apply different rules and we did find >> documentation build issues. The fixes were all straightforward, so I >> don't expect it to mean a lot of work, but it's more than a single patch >> per project. I also think using a goal is a good way to start shifting >> the mindset of the contributor base into this new perspective. > Yes, that's probably a better way to word it to properly convey the goal. > Basically, all things running under Python3, project code and tooling, as > the default unless specifically geared towards Python2. > >>>> Cold Upgrade Support >>>> ==================== >>>> >>>> The other suggestion in the Forum session related to upgrades was >>>> the addition >>>> of "upgrade check" CLIs for each project, and I was tempted to >>>> suggest that as >>>> my second strawman choice. For some projects that would be a very >>>> minimal or >>>> NOOP check, so it would probably be easy to complete the goal. But >>>> ultimately >>>> what I think would bring the most value would be the work on >>>> supporting cold >>>> upgrade, even if it will be more of a stretch for some projects to >>>> accomplish. >>> I think you might be mixing two concepts here. > Not so much mixing as discussing the two and the reason why I > personally thought > the one was a better goal, if you read through what was said about it. >>> >>> The cold upgrade support, per my understanding, is about getting the >>> assert:supports-upgrade tag: >>> >>> https://governance.openstack.org/tc/reference/tags/assert_supports-upgrade.html >>> >>> >>> Which to me basically means the project runs a grenade job. There was >>> discussion in the room about grenade not being a great tool for all >>> projects, but no one is working on a replacement for that, so I don't >>> think it's really justification at this point for *not* making it a >>> goal. >>> >>> The "upgrade check" CLIs is a different thing though, which is more >>> about automating as much of the upgrade release notes as possible. See >>> the nova docs for examples on how we have used it: >>> >>> https://docs.openstack.org/nova/latest/cli/nova-status.html >>> >>> I'm not sure what projects you had in mind when you said, "For some >>> projects that would be a very minimal or NOOP check, so it would >>> probably be easy to complete the goal." I would expect that projects >>> aren't meeting the goal if they are noop'ing everything. But what >>> can be >>> automated like this isn't necessarily black and white either. >> What I remember from the discussion in the room was that not all >> projects are going to have anything to do by hand that would block >> an upgrade, but we still want all projects to have the test command. >> That means many of those commands could potentially be no-ops, >> right? Unless they're all going to do something like verify the >> schema has been updated somehow? > > Yes, exactly what I meant by the NOOP. I'm not sure what Cinder would > check here. We don't have to see if placement has been set up or if cell0 > has been configured. Maybe once we have the facility in place we would > find some things worth checking, but at present I don't know what that > would be. > > Which also makes me wonder, should this be an oslo thing that projects > just plug in to for their specific checks? > >>>> Upgrades have been a major focus of discussion lately, especially >>>> as our >>>> operators have been trying to get closer to the latest work >>>> upstream. This has >>>> been an ongoing challenge. >>>> >>>> There has also been a lot of talk about LTS releases. We've landed >>>> on fast >>>> forward upgrade to get between several releases, but I think >>>> improving upgrades >>>> eases the way both for easier and more frequent upgrades and also >>>> getting to >>>> the point some day where maybe we can think about upgrading over >>>> several >>>> releases to be able to do something like an LTS to LTS upgrade. >>>> >>>> Neither one of these upgrade goals really has a clearly defined >>>> plan that >>>> projects can pick up now and start working on, but I think with >>>> those involved >>>> in these areas we should be able to come up with a perscriptive >>>> plan for >>>> projects to follow. >>>> >>>> And it would really move our fast forward upgrade story forward. >>> Agreed. In the FFU Forum session at the summit I mentioned the >>> 'nova-status upgrade check' CLI and a lot of people in the room had >>> never heard of it because they are still on Mitaka before we added that >>> CLI (new in Ocata). But they sounded really interested in it and said >>> they wished other projects were doing that to help ease upgrades so >>> they >>> won't be stuck on older unmaintained releases for so long. So anything >>> we can do to improve upgrades, including our testing for them, will >>> help >>> make FFU better. >>> >>>> Next Steps >>>> ========== >>>> >>>> I'm hoping with a strawman proposal we have a basis for debating >>>> the merits of >>>> these and getting closer to being able to officially select Stein >>>> goals. We >>>> still have some time, but I would like to avoid making late-cycle >>>> selections so >>>> teams can start planning ahead for what will need to be done in Stein. >>>> >>>> Please feel free to promote other ideas for goals. That would be a >>>> good way for >>>> us to weigh the pro's and con's between these and whatever else you >>>> have in >>>> mind. Then hopefully we can come to some consensus and work towards >>>> clearly >>>> defining what needs to be done and getting things well documented >>>> for teams to >>>> pick up as soon as they wrap up Rocky (or sooner). >>> I still want to lobby for a push to move off the old per-project CLIs >>> and close the gap on using python-openstackclient CLI for everything, >>> but I'm unclear on what the roadmap is for the major refactor with the >>> SDK Monty was talking about in Vancouver. From a new user perspective, >>> the 2000 individual CLIs to get anything done in OpenStack has to be a >>> major turn off so we should make this a higher priority - including >>> modernizing our per-project documentation to give OSC examples instead >>> of per-project (e.g. nova boot) examples. >> I support this one, too. We're going to need more contributors >> working on the CLI team, I think, to make it happen, though. Dean >> is way over his capacity, I'm basically not present, and we've lost >> Steve. That leaves Akihiro and Rui to do most of the review work, >> which isn't enough. >> >> Doug > I was tempted to go with the OSC one too, but I was afraid resource > constraints would make that unlikely. I haven't checked lately, but > last I heard neither Cinder v3 nor microversions were supported yet. > > Maybe this has changed, but my impression is that a lot of work needs > to be done before we can reasonably expect this to be a goal that we > have a chance of getting near completion in a cycle. > > > __________________________________________________________________________ > > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev From jean-philippe at evrard.me Tue Jun 5 08:14:12 2018 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Tue, 5 Jun 2018 10:14:12 +0200 Subject: [Openstack-operators] [openstack-ansible][releases][governance] Change in OSA roles tagging Message-ID: Hello, *TL:DR;* If you are an openstack-ansible user, consuming our roles directly, with tags, without using openstack-ansible plays or integrated repo, then things will change for you. Start using git shas instead of tags. All other openstack-ansible users should not see a difference, even if they use openstack-ansible tags. During the summit, I had a discussion with dhellman (and smcginnis) to change how openstack-ansible does its releases. Currently, we tag openstack-ansible + many roles under our umbrella every two weeks. As far as I know, nobody requested to have roles tagged every two weeks. Only OpenStack-Ansible need to be tagged for consumption. Even people using our roles directly outside openstack-ansible generally use sha for roles. We don't rely on ansible galaxy. Because there is no need to tag the roles, there is no need to make them part of the "openstack-ansible" deliverable [1][2]. I will therefore clarify the governance repo for that, separating the roles, each of them with their own deliverable, instead of grouping some roles within openstack-ansible, and some others outside it. With this done, a release of openstack-ansible becomes straightforward using the standard release tooling. The releases of openstack-ansible becomes far simpler to request, review, and will not have timeouts anymore :p There are a few issues I see from the change. Still according to the discussion, it seems we can overcome those. 1. As this will be applied on all the branches, we may reach some issues with releasing in the next days. While the validate tooling of releases has shown me that it wouldn't be a problem (just warning) to not have all the repos in the deliverable, I would expect a governance change could be impactful. However, that is only impacting openstack-ansible, releases, and governance team: Keep in mind, openstack-ansible will not change for its users, and will still be tagged as you know it. 2. We will keep branching our roles the same way we do now. What we liked for roles being part of this deliverable, is the ability of having them automatically branched and their files adapted. To what I heard, it is still possible to do so, by having a devstack-like behavior, which branches on a sha, instead of branching on tag. So I guess it means all our roles will now be part of release files like this one [3], or even on a single release file, similar to it. What I would like to have, from this email, is: 1. Raise awareness to all the involved parties; 2. Confirmation we can go ahead, from a governance standpoint; 3. Confirmation we can still benefit from this automatic branch tooling. Thank you in advance. Jean-Philippe Evrard (evrardjp) [1]: https://github.com/openstack/governance/blob/8215c5fd9b464b332b310bbb767812fefc5d9174/reference/projects.yaml#L2493-L2540 [2]: https://github.com/openstack/releases/blob/9db5991707458bbf26a4dd9f55c2a01fee96a45d/deliverables/queens/openstack-ansible.yaml#L768-L851 [3]: https://github.com/openstack/releases/blob/9db5991707458bbf26a4dd9f55c2a01fee96a45d/deliverables/queens/devstack.yaml From jean-philippe at evrard.me Tue Jun 5 08:14:52 2018 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Tue, 5 Jun 2018 10:14:52 +0200 Subject: [Openstack-operators] Switching from Fuel to OpenStack-Ansible In-Reply-To: References: Message-ID: Hello, That's doable indeed! (In fact it's exactly what I did during Kilo timeframe! :p) Have you looked at the openstack-ansible deploy guide? It should help you understand the process. On the way it should link you to the reference architecture, where you can have more details about the network flows. Don't hesitate to join us on our IRC channel for more detailed questions, on freenode #openstack-ansible. If you want to continue by email, don't hesitate to put [openstack-ansible] in the email title :) Best regards, Jean-Philippe Evrard (evrardjp) From JohnMP at cardiff.ac.uk Tue Jun 5 11:36:50 2018 From: JohnMP at cardiff.ac.uk (Matthew John) Date: Tue, 5 Jun 2018 11:36:50 +0000 Subject: [Openstack-operators] Switching from Fuel to OpenStack-Ansible In-Reply-To: References: , Message-ID: Hi both, Thanks for the info. Yep, I've had a look at the reference architecture and will probably end up running three controllers with six compute nodes using Ceph for storage. Initial thoughts are to put the Ceph OSDs on the compute nodes which each have 4x400GB SSD drives spare and use the controllers as the ceph monitors. It is a month or two until I start the migration so will hop on to IRC if I have any issues or questions! Cheers, Matt --- Dr Matt John Engineer (Service Delivery - COMSC) School of Computer Science & Informatics Cardiff University, 5 The Parade, Cardiff, CF24 3AA Tel: +44 2920 876536 JohnMP at cardiff.ac.uk The University welcomes correspondence in Welsh or English. Corresponding in Welsh will not lead to any delay. Dr Matt John Peiriannydd (Cyflwyno Gwasanaeth - COMSC) Ysgol Cyfrifiadureg a Gwybodeg Prifysgol Caerdydd, 5 The Parade, Caerdydd, CF24 3AA Ffôn : +44 2920 876536 JohnMP at caerdydd.ac.uk Mae'r Brifysgol yn croesawu gohebiaeth yn Gymraeg neu'n Saesneg. Ni fydd gohebu yn Gymraeg yn creu unrhyw oedi. ________________________________ From: Jean-Philippe Evrard Sent: 05 June 2018 09:14:52 To: Matthew John Cc: openstack-operators at lists.openstack.org Subject: Re: [Openstack-operators] Switching from Fuel to OpenStack-Ansible Hello, That's doable indeed! (In fact it's exactly what I did during Kilo timeframe! :p) Have you looked at the openstack-ansible deploy guide? It should help you understand the process. On the way it should link you to the reference architecture, where you can have more details about the network flows. Don't hesitate to join us on our IRC channel for more detailed questions, on freenode #openstack-ansible. If you want to continue by email, don't hesitate to put [openstack-ansible] in the email title :) Best regards, Jean-Philippe Evrard (evrardjp) -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at doughellmann.com Tue Jun 5 14:36:57 2018 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 05 Jun 2018 10:36:57 -0400 Subject: [Openstack-operators] [openstack-ansible][releases][governance] Change in OSA roles tagging In-Reply-To: References: Message-ID: <1528209045-sup-7456@lrrr.local> Excerpts from Jean-Philippe Evrard's message of 2018-06-05 10:14:12 +0200: > Hello, > > *TL:DR;* If you are an openstack-ansible user, consuming our roles directly, > with tags, without using openstack-ansible plays or integrated repo, > then things will change for you. Start using git shas instead of tags. > All other openstack-ansible users should not see a difference, even if > they use openstack-ansible tags. > > > During the summit, I had a discussion with dhellman (and smcginnis) > to change how openstack-ansible does its releases. > > Currently, we tag openstack-ansible + many roles under our umbrella > every two weeks. As far as I know, nobody requested to have roles > tagged every two weeks. Only OpenStack-Ansible need to be tagged > for consumption. Even people using our roles directly outside > openstack-ansible generally use sha for roles. We don't rely on > ansible galaxy. > > Because there is no need to tag the roles, there is no need to make them > part of the "openstack-ansible" deliverable [1][2]. I will therefore > clarify the governance repo for that, separating the roles, each of them > with their own deliverable, instead of grouping some roles within > openstack-ansible, and some others outside it. > > With this done, a release of openstack-ansible becomes straightforward > using the standard release tooling. The releases of openstack-ansible > becomes far simpler to request, review, and will not have timeouts > anymore :p > > There are a few issues I see from the change. Still according to the > discussion, it seems we can overcome those. > > 1. As this will be applied on all the branches, we may reach some > issues with releasing in the next days. While the validate tooling > of releases has shown me that it wouldn't be a problem (just > warning) to not have all the repos in the deliverable, I would > expect a governance change could be impactful. > However, that is only impacting openstack-ansible, releases, > and governance team: Keep in mind, openstack-ansible will not > change for its users, and will still be tagged as you know it. > > 2. We will keep branching our roles the same way we do now. What > we liked for roles being part of this deliverable, is the ability > of having them automatically branched and their files adapted. > To what I heard, it is still possible to do so, by having a > devstack-like behavior, which branches on a sha, instead of > branching on tag. So I guess it means all our roles will now be > part of release files like this one [3], or even on a single release > file, similar to it. Right, you can set the stable-branch-type field to 'tagless' (see http://git.openstack.org/cgit/openstack/releases/tree/README.rst#n462) and then set the branch location field to the SHA you want to use. If you would be ready to branch all of the roles at one time, you could put all of them into 1 deliverable file. Otherwise, you will want to split them up into their own files. And since you have so many, I will point out that we're really into automation over here on the release team, and if you wanted to work on making the edit-deliverable command smart enough to determine the SHA for you I could walk you through that code to get you started. Doug > > What I would like to have, from this email, is: > 1. Raise awareness to all the involved parties; > 2. Confirmation we can go ahead, from a governance standpoint; > 3. Confirmation we can still benefit from this automatic branch > tooling. > > Thank you in advance. > > Jean-Philippe Evrard (evrardjp) > > > [1]: https://github.com/openstack/governance/blob/8215c5fd9b464b332b310bbb767812fefc5d9174/reference/projects.yaml#L2493-L2540 > [2]: https://github.com/openstack/releases/blob/9db5991707458bbf26a4dd9f55c2a01fee96a45d/deliverables/queens/openstack-ansible.yaml#L768-L851 > [3]: https://github.com/openstack/releases/blob/9db5991707458bbf26a4dd9f55c2a01fee96a45d/deliverables/queens/devstack.yaml > From lebre.adrien at free.fr Tue Jun 5 14:48:55 2018 From: lebre.adrien at free.fr (lebre.adrien at free.fr) Date: Tue, 5 Jun 2018 16:48:55 +0200 (CEST) Subject: [Openstack-operators] [FEMDC] meetings suspended until further notice In-Reply-To: <67878378.160787292.1528209018026.JavaMail.root@zimbra29-e5.priv.proxad.net> Message-ID: <596449522.160890842.1528210135736.JavaMail.root@zimbra29-e5.priv.proxad.net> Dear all, Following the exchanges we had during the Vancouver summit, in particular the non-sense to maintain/animate two groups targeting similar challenges (ie., the FEMDC SIG [1] and the new Edge Computing Working group [2]), FEMDC meetings are suspended until further notice. If you are interested by Edge Computing discussions, please see information available on the new edge wiki page [2]. Thanks ad_ri3n_ [1]https://wiki.openstack.org/wiki/Fog_Edge_Massively_Distributed_Clouds [2]https://wiki.openstack.org/wiki/Edge_Computing_Group From mihalis68 at gmail.com Tue Jun 5 14:54:56 2018 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 5 Jun 2018 10:54:56 -0400 Subject: [Openstack-operators] Ops Meetups Team - meeting minutes from today's IRC meeting Message-ID: We had a brief meeting today, minutes linked below. Topic included restarting the operator-related docs, prep for Denver PTG and some organisational things for the team. We'll meet bi-weekly at 10am EST for the time being, so next meeting is 2018-6-19 at 10 am EST Minutes: http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-06-05-14.25.html 10:52 AM Minutes (text): http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-06-05-14.25.txt 10:52 AM Log: http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-06-05-14.25.log.html -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From rico.lin.guanyu at gmail.com Wed Jun 6 11:33:01 2018 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Wed, 6 Jun 2018 19:33:01 +0800 Subject: [Openstack-operators] [openstack-dev][heat] Heat Summit summary and project status Message-ID: Hi all Summit is over for weeks. Would like to share with team on what we got in Summit *Heat Onboarding Session* We didn't get many people shows up in Onboarding session this time, but we do get much more view in our video. Slide: https://www.slideshare.net/GuanYuLin1/openinfra-summit-2018-vancouver-heat-onboarding Video: https://www.youtube.com/watch?v=8rMkxdx5YKE (You can find videos from previous Summits in Slide) *Project Update Session* Slide: https://www.slideshare.net/GuanYuLin1/openinfra-summit-2018-vancouver-heat-project-update Video: https://www.youtube.com/watch?v=h4UXBRo948k (You can find videos from previous Summits in Slide) *User feedback Session* Etherpad: https://etherpad.openstack.org/p/2018-Vancouver-Summit-heat-ops-and-users-feedback (You can find Etherpad from the last Summit in Etherpad) Apparently, we got a lot of users which includes a lot of different domains (at least that's what I felt during summit). And according to feedbacks, I think our plans mostly match with what requirements from users.(if not, it still not too late to provide us feedbacks https://etherpad.openstack.org/p/2018-Vancouver-Summit-heat-ops-and-users-feedback ) *Project Status* Also, we're about to release Rocky-2, so would like to share current project status: We got less bug reported than the last cycle. For features, we seem got less implemented or WIP. We do get few WIP or under planned features: Blazar resource support(review in progress) Etcd support(work in progress) Multi-Cloud support (work in progress) Swift store for heat template (can input heat template from Swift) We do need more reviewer and people willing to help with features. For rocky release(about to release rocky-2) we got around 700 reviews Commits: 216 Filed Bugs: 56 Resolved Bugs: 34 (For reference. Here's Queens cycle number: around 1700 reviews, Commits: 417, Filed Bugs: 166, Resolved Bugs: 122 ) -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at medberry.net Wed Jun 6 15:18:18 2018 From: openstack at medberry.net (David Medberry) Date: Wed, 6 Jun 2018 09:18:18 -0600 Subject: [Openstack-operators] Ops Meetups Team - meeting minutes from today's IRC meeting In-Reply-To: References: Message-ID: I was unable to attend, thanks for the minutes. I wanted to note that there is an EARLY CALL FOR PRESENTATIONS DEADLINE for Berlin which is the end of June. If you plan a presentation (not an operator meeting as per se) you need to get your CFP response in by the end of June per: https://www.openstack.org/summit-login/login?BackURL=%2Fsummit%2Fberlin-2018%2Fcall-for-presentations%2F This seems much earlier than normal, so do note the date. On Tue, Jun 5, 2018 at 8:54 AM, Chris Morgan wrote: > We had a brief meeting today, minutes linked below. Topic included > restarting the operator-related docs, prep for Denver PTG and some > organisational things for the team. > > We'll meet bi-weekly at 10am EST for the time being, so next meeting is > 2018-6-19 at 10 am EST > > Minutes: http://eavesdrop.openstack.org/meetings/ops_meetup_team/ > 2018/ops_meetup_team.2018-06-05-14.25.html > 10:52 AM Minutes (text): http://eavesdrop.openstack. > org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-06-05-14.25.txt > 10:52 AM Log: http://eavesdrop.openstack.org/meetings/ops_meetup_team/ > 2018/ops_meetup_team.2018-06-05-14.25.log.html > > -- > Chris Morgan > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimmy at openstack.org Wed Jun 6 16:58:17 2018 From: jimmy at openstack.org (Jimmy McArthur) Date: Wed, 06 Jun 2018 11:58:17 -0500 Subject: [Openstack-operators] Ops Meetups Team - meeting minutes from today's IRC meeting In-Reply-To: References: Message-ID: <5B1812A9.3080906@openstack.org> We are actually still finalizing the due date. We've heard the feedback on the quick turn around and we're planning to push to July. Date TBD, but should be announced very soon. I know it seems like an early deadline, but part of what we're dealing with is trying to get the the date before the Travel Support and Early Bird pricing deadline. This should allow us to give as many people as possible an opportunity to attend. Thanks for the feedback and stay tuned... Cheers, Jimmy David Medberry wrote: > I was unable to attend, thanks for the minutes. > > I wanted to note that there is an EARLY CALL FOR PRESENTATIONS > DEADLINE for Berlin which is the end of June. If you plan a > presentation (not an operator meeting as per se) you need to get your > CFP response in by the end of June per: > > https://www.openstack.org/summit-login/login?BackURL=%2Fsummit%2Fberlin-2018%2Fcall-for-presentations%2F > > This seems much earlier than normal, so do note the date. > > On Tue, Jun 5, 2018 at 8:54 AM, Chris Morgan > wrote: > > We had a brief meeting today, minutes linked below. Topic included > restarting the operator-related docs, prep for Denver PTG and some > organisational things for the team. > > We'll meet bi-weekly at 10am EST for the time being, so next > meeting is 2018-6-19 at 10 am EST > > Minutes: > http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-06-05-14.25.html > > 10:52 AM Minutes (text): > http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-06-05-14.25.txt > > 10:52 AM Log: > http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-06-05-14.25.log.html > > > -- > Chris Morgan > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Wed Jun 6 17:32:06 2018 From: melwittt at gmail.com (melanie witt) Date: Wed, 6 Jun 2018 10:32:06 -0700 Subject: [Openstack-operators] [openstack-dev] [nova] proposal to postpone nova-network core functionality removal to Stein In-Reply-To: <1391ee64-90f7-9414-9168-3a4caf495555@gmail.com> References: <29873b6f-8a3c-ae6e-0756-c90d2c52a306@gmail.com> <1391ee64-90f7-9414-9168-3a4caf495555@gmail.com> Message-ID: <29096a1c-493d-2ba3-8ff4-2d0a15731916@gmail.com> On Thu, 31 May 2018 15:04:53 -0500, Matt Riedemann wrote: > On 5/31/2018 1:35 PM, melanie witt wrote: >> >> This cycle at the PTG, we had decided to start making some progress >> toward removing nova-network [1] (thanks to those who have helped!) and >> so far, we've landed some patches to extract common network utilities >> from nova-network core functionality into separate utility modules. And >> we've started proposing removal of nova-network REST APIs [2]. >> >> At the cells v2 sync with operators forum session at the summit [3], we >> learned that CERN is in the middle of migrating from nova-network to >> neutron and that holding off on removal of nova-network core >> functionality until Stein would help them out a lot to have a safety net >> as they continue progressing through the migration. >> >> If we recall correctly, they did say that removal of the nova-network >> REST APIs would not impact their migration and Surya Seetharaman is >> double-checking about that and will get back to us. If so, we were >> thinking we can go ahead and work on nova-network REST API removals this >> cycle to make some progress while holding off on removing the core >> functionality of nova-network until Stein. >> >> I wanted to send this to the ML to let everyone know what we were >> thinking about this and to receive any additional feedback folks might >> have about this plan. >> >> Thanks, >> -melanie >> >> [1] https://etherpad.openstack.org/p/nova-ptg-rocky L301 >> [2] https://review.openstack.org/567682 >> [3] >> https://etherpad.openstack.org/p/YVR18-cellsv2-migration-sync-with-operators >> L30 > > As a reminder, this is the etherpad I started to document the nova-net > specific compute REST APIs which are candidates for removal: > > https://etherpad.openstack.org/p/nova-network-removal-rocky Update: In the cells meeting today [4], Surya confirmed that CERN is okay with nova-network REST API pieces being removed this cycle while leaving the core functionality of nova-network intact, as they continue their migration from nova-network to neutron. We're tracking the nova-net REST API removal candidates on the aforementioned nova-network-removal etherpad. -melanie [4] http://eavesdrop.openstack.org/meetings/nova_cells/2018/nova_cells.2018-06-06-17.00.html From jean-philippe at evrard.me Thu Jun 7 07:54:48 2018 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Thu, 7 Jun 2018 09:54:48 +0200 Subject: [Openstack-operators] [openstack-ansible][releases][governance] Change in OSA roles tagging In-Reply-To: <1528209045-sup-7456@lrrr.local> References: <1528209045-sup-7456@lrrr.local> Message-ID: > Right, you can set the stable-branch-type field to 'tagless' (see > http://git.openstack.org/cgit/openstack/releases/tree/README.rst#n462) and > then set the branch location field to the SHA you want to use. Exactly what I thought. > If you would be ready to branch all of the roles at one time, you could > put all of them into 1 deliverable file. Otherwise, you will want to > split them up into their own files. Same. > > And since you have so many, I will point out that we're really into > automation over here on the release team, and if you wanted to work on > making the edit-deliverable command smart enough to determine the SHA > for you I could walk you through that code to get you started. Cool. From tobias.rydberg at citynetwork.eu Thu Jun 7 10:57:46 2018 From: tobias.rydberg at citynetwork.eu (Tobias Rydberg) Date: Thu, 7 Jun 2018 12:57:46 +0200 Subject: [Openstack-operators] [publiccloud-wg] Meeting this afternoon for Public Cloud WG Message-ID: <7efaf226-f610-6a2f-33ac-0d581e66ae21@citynetwork.eu> Hi folks, Time for a new meeting for the Public Cloud WG.  Agenda can be found at https://etherpad.openstack.org/p/publiccloud-wg See you all at IRC 1400 UTC in #openstack-publiccloud Cheers, Tobias -- Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED From mriedemos at gmail.com Thu Jun 7 14:02:15 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 7 Jun 2018 09:02:15 -0500 Subject: [Openstack-operators] [nova] Need feedback on spec for handling down cells in the API Message-ID: We have a nova spec [1] which is at the point that it needs some API user (and operator) feedback on what nova API should be doing when listing servers and there are down cells (unable to reach the cell DB or it times out). tl;dr: the spec proposes to return "shell" instances which have the server uuid and created_at fields set, and maybe some other fields we can set, but otherwise a bunch of fields in the server response would be set to UNKNOWN sentinel values. This would be unversioned, and therefore could wreak havoc on existing client side code that expects fields like 'config_drive' and 'updated' to be of a certain format. There are alternatives listed in the spec so please read this over and provide feedback since this is a pretty major UX change. Oh, and no pressure, but today is the spec freeze deadline for Rocky. [1] https://review.openstack.org/#/c/557369/ -- Thanks, Matt From mriedemos at gmail.com Thu Jun 7 14:32:20 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 7 Jun 2018 09:32:20 -0500 Subject: [Openstack-operators] [nova] nova-compute automatically disabling itself? In-Reply-To: <37d7ad5e-055f-991a-47da-5f263a898055@gmail.com> References: <6dcf6baa4104c1923fc8e954dbf2737a@bitskrieg.net> <172e32f3-e23e-15ce-33fa-6cd2af93eb73@gmail.com> <3a0453550ab5402d711f3ff06624b270@bitskrieg.net> <37d7ad5e-055f-991a-47da-5f263a898055@gmail.com> Message-ID: <8f338d4b-b831-f195-97b0-3dd617db8b6d@gmail.com> On 2/6/2018 6:44 PM, Matt Riedemann wrote: > On 2/6/2018 2:14 PM, Chris Apsey wrote: >> but we would rather have intermittent build failures rather than >> compute nodes falling over in the future. > > Note that once a compute has a successful build, the consecutive build > failures counter is reset. So if your limit is the default (10) and you > have 10 failures in a row, the compute service is auto-disabled. But if > you have say 5 failures and then a pass, it's reset to 0 failures. > > Obviously if you're doing a pack-first scheduling strategy rather than > spreading instances across the deployment, a burst of failures could > easily disable a compute, especially if that host is overloaded like you > saw. I'm not sure if rescheduling is helping you or not - that would be > useful information since we consider the need to reschedule off a failed > compute host as a bad thing. At the Forum in Boston when this idea came > up, it was specifically for the case that operators in the room didn't > want a bad compute to become a "black hole" in their deployment causing > lots of reschedules until they get that one fixed. Just an update on this. There is a change merged in Rocky [1] which is also going through backports to Queens and Pike. If you've already disabled the "consecutive_build_service_disable_threshold" config option then it's a no-op. If you haven't, "consecutive_build_service_disable_threshold" is now used to count build failures but no longer auto-disable the compute service on the configured threshold is met (10 by default). The build failure count is then used by a new weigher (enabled by default) to sort hosts with build failures to the back of the list of candidate hosts for new builds. Once there is a successful build on a given host, the failure count is reset. The idea here is that hosts which are failing are given lower priority during scheduling. [1] https://review.openstack.org/#/c/572195/ -- Thanks, Matt From ashlee at openstack.org Thu Jun 7 16:13:37 2018 From: ashlee at openstack.org (Ashlee Ferguson) Date: Thu, 7 Jun 2018 11:13:37 -0500 Subject: [Openstack-operators] Berlin Summit CFP Deadline July 17 Message-ID: <558EFA52-A6F3-4C7E-A509-301B9B17953E@openstack.org> Hi everyone, The Call for Presentations is open for the Berlin Summit , November 13-15. The deadline to submit your presentation has been extended to July 17. At the Vancouver Summit, we focused on open infrastructure integration as the Summit has evolved over the years to cover more than just OpenStack. We had over 30 different projects from the open infrastructure community join, including Kubernetes, Docker, Ansible, OpenShift and many more. The Tracks were organized around specific use cases and will remain the same for Berlin with the addition of Hands on Workshops as its own dedicated Track. We encourage you to submit presentations covering the open infrastructure tools you’re using, as well as the integration work needed to address these use cases. We also encourage you to invite peers from other open source communities to speak and collaborate. The Tracks are: • CI/CD • Container Infrastructure • Edge Computing • Hands on Workshops • HPC / GPU / AI • Open Source Community • Private & Hybrid Cloud • Public Cloud • Telecom & NFV Community voting, the first step in building the Summit schedule, will open in mid July. Once community voting concludes, a Programming Committee for each Track will build the schedule. Programming Committees are made up of individuals from many different open source communities working in open infrastructure, in addition to people who have participated in the past. If you’re interested in nominating yourself or someone else to be a member of the Summit Programming Committee for a specific Track, please fill out the nomination form . Nominations will close on June 28. Again, the deadline to submit proposals is July 17. Please note topic submissions for the OpenStack Forum (planning/working sessions with OpenStack devs and operators) will open at a later date. The Early Bird registration deadline will be in mid August. We’re working hard to make it the best Summit yet, and look forward to bringing together different open infrastructure communities to solve these hard problems together. Want to provide feedback on this process? Please focus discussion on the openstack-community mailing list, or contact the Summit Team directly at summit at openstack.org. See you in Berlin! Ashlee Ashlee Ferguson OpenStack Foundation ashlee at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Thu Jun 7 17:56:53 2018 From: melwittt at gmail.com (melanie witt) Date: Thu, 7 Jun 2018 10:56:53 -0700 Subject: [Openstack-operators] [nova] increasing the number of allowed volumes attached per instance > 26 Message-ID: <4bf3536e-0e3b-0fc4-2894-fabd32ef23dc@gmail.com> Hello Stackers, Recently, we've received interest about increasing the maximum number of allowed volumes to attach to a single instance > 26. The limit of 26 is because of a historical limitation in libvirt (if I remember correctly) and is no longer limited at the libvirt level in the present day. So, we're looking at providing a way to attach more than 26 volumes to a single instance and we want your feedback. We'd like to hear from operators and users about their use cases for wanting to be able to attach a large number of volumes to a single instance. If you could share your use cases, it would help us greatly in moving forward with an approach for increasing the maximum. Some ideas that have been discussed so far include: A) Selecting a new, higher maximum that still yields reasonable performance on a single compute host (64 or 128, for example). Pros: helps prevent the potential for poor performance on a compute host from attaching too many volumes. Cons: doesn't let anyone opt-in to a higher maximum if their environment can handle it. B) Creating a config option to let operators choose how many volumes allowed to attach to a single instance. Pros: lets operators opt-in to a maximum that works in their environment. Cons: it's not discoverable for those calling the API. C) Create a configurable API limit for maximum number of volumes to attach to a single instance that is either a quota or similar to a quota. Pros: lets operators opt-in to a maximum that works in their environment. Cons: it's yet another quota? Please chime in with your use cases and/or thoughts on the different approaches. Thanks for your help, -melanie From mriedemos at gmail.com Thu Jun 7 18:08:56 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 7 Jun 2018 13:08:56 -0500 Subject: [Openstack-operators] [openstack-dev] [nova] increasing the number of allowed volumes attached per instance > 26 In-Reply-To: <4254211e-7f4e-31c8-89f6-0338d6c7464f@gmail.com> References: <4bf3536e-0e3b-0fc4-2894-fabd32ef23dc@gmail.com> <4254211e-7f4e-31c8-89f6-0338d6c7464f@gmail.com> Message-ID: <2b0c02b2-0194-4f86-4719-058d08384e5b@gmail.com> +operators (I forgot) On 6/7/2018 1:07 PM, Matt Riedemann wrote: > On 6/7/2018 12:56 PM, melanie witt wrote: >> Recently, we've received interest about increasing the maximum number >> of allowed volumes to attach to a single instance > 26. The limit of >> 26 is because of a historical limitation in libvirt (if I remember >> correctly) and is no longer limited at the libvirt level in the >> present day. So, we're looking at providing a way to attach more than >> 26 volumes to a single instance and we want your feedback. > > The 26 volumes thing is a libvirt driver restriction. > > There was a bug at one point because powervm (or powervc) was capping > out at 80 volumes per instance because of restrictions in the > build_requests table in the API DB: > > https://bugs.launchpad.net/nova/+bug/1621138 > > They wanted to get to 128, because that's how power rolls. > >> >> We'd like to hear from operators and users about their use cases for >> wanting to be able to attach a large number of volumes to a single >> instance. If you could share your use cases, it would help us greatly >> in moving forward with an approach for increasing the maximum. >> >> Some ideas that have been discussed so far include: >> >> A) Selecting a new, higher maximum that still yields reasonable >> performance on a single compute host (64 or 128, for example). Pros: >> helps prevent the potential for poor performance on a compute host >> from attaching too many volumes. Cons: doesn't let anyone opt-in to a >> higher maximum if their environment can handle it. >> >> B) Creating a config option to let operators choose how many volumes >> allowed to attach to a single instance. Pros: lets operators opt-in to >> a maximum that works in their environment. Cons: it's not discoverable >> for those calling the API. > > I'm not a fan of a non-discoverable config option which will impact API > behavior indirectly, i.e. on cloud A I can boot from volume with 64 > volumes but not on cloud B. > >> >> C) Create a configurable API limit for maximum number of volumes to >> attach to a single instance that is either a quota or similar to a >> quota. Pros: lets operators opt-in to a maximum that works in their >> environment. Cons: it's yet another quota? > > This seems the most reasonable to me if we're going to do this, but I'm > probably in the minority. Yes more quota limits sucks, but it's (1) > discoverable by API users and therefore (2) interoperable. > > If we did the quota thing, I'd probably default to unlimited and let the > cinder volume quota cap it for the project as it does today. Then admins > can tune it as needed. > -- Thanks, Matt From jaypipes at gmail.com Thu Jun 7 18:54:54 2018 From: jaypipes at gmail.com (Jay Pipes) Date: Thu, 7 Jun 2018 14:54:54 -0400 Subject: [Openstack-operators] [openstack-dev] [nova] increasing the number of allowed volumes attached per instance > 26 In-Reply-To: <4bf3536e-0e3b-0fc4-2894-fabd32ef23dc@gmail.com> References: <4bf3536e-0e3b-0fc4-2894-fabd32ef23dc@gmail.com> Message-ID: <99d90de9-74b3-76d4-4320-5ce10a411234@gmail.com> On 06/07/2018 01:56 PM, melanie witt wrote: > Hello Stackers, > > Recently, we've received interest about increasing the maximum number of > allowed volumes to attach to a single instance > 26. The limit of 26 is > because of a historical limitation in libvirt (if I remember correctly) > and is no longer limited at the libvirt level in the present day. So, > we're looking at providing a way to attach more than 26 volumes to a > single instance and we want your feedback. > > We'd like to hear from operators and users about their use cases for > wanting to be able to attach a large number of volumes to a single > instance. If you could share your use cases, it would help us greatly in > moving forward with an approach for increasing the maximum. > > Some ideas that have been discussed so far include: > > A) Selecting a new, higher maximum that still yields reasonable > performance on a single compute host (64 or 128, for example). Pros: > helps prevent the potential for poor performance on a compute host from > attaching too many volumes. Cons: doesn't let anyone opt-in to a higher > maximum if their environment can handle it. > > B) Creating a config option to let operators choose how many volumes > allowed to attach to a single instance. Pros: lets operators opt-in to a > maximum that works in their environment. Cons: it's not discoverable for > those calling the API. > > C) Create a configurable API limit for maximum number of volumes to > attach to a single instance that is either a quota or similar to a > quota. Pros: lets operators opt-in to a maximum that works in their > environment. Cons: it's yet another quota? If Cinder tracks volume attachments as consumable resources, then this would be my preference. Best, -jay From mriedemos at gmail.com Thu Jun 7 21:17:06 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 7 Jun 2018 16:17:06 -0500 Subject: [Openstack-operators] [openstack-dev] [nova] increasing the number of allowed volumes attached per instance > 26 In-Reply-To: <99d90de9-74b3-76d4-4320-5ce10a411234@gmail.com> References: <4bf3536e-0e3b-0fc4-2894-fabd32ef23dc@gmail.com> <99d90de9-74b3-76d4-4320-5ce10a411234@gmail.com> Message-ID: <41e61eee-c2f5-589f-6f1a-89e82e1eb6c6@gmail.com> On 6/7/2018 1:54 PM, Jay Pipes wrote: > > If Cinder tracks volume attachments as consumable resources, then this > would be my preference. Cinder does: https://developer.openstack.org/api-ref/block-storage/v3/#attachments However, there is no limit in Cinder on those as far as I know. -- Thanks, Matt From rico.lin.guanyu at gmail.com Fri Jun 8 07:40:28 2018 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Fri, 8 Jun 2018 15:40:28 +0800 Subject: [Openstack-operators] [openstack-dev] [TC] Stein Goal Selection In-Reply-To: References: <20180604180742.GA6404@sm-xps> Message-ID: Kendall Nelson 於 2018年6月8日 週五 上午4:26寫道: > > I think that these two goals definitely fit the criteria we discussed in Vancouver during the S Release Goal Forum Session. I know Storyboard Migration was also mentioned after I had to dip out to another session so I wanted to follow up on that. > +1. To migrate to StoryBoard do seems like a good way to go. Heat just moved to StoryBoard, so there is no much long-term running experiences to share about, but it does look like a good way to target the piece which we been missing of. A workflow to connect users, ops, and developers (within Launchpad, we only care about bugs, and what generate that bug? well...we don't care). With Story + Task-oriented things can change (To me this is shiny). For migrate experience, the migration is quick, so if there is no project really really only can survive with Launchpad, I think there is no blocker for this goal. Also, it's quite convenient to target your story with your old bug, since your story id is your bug id. Since it might be difficult for all project directly migrated to it, IMO we should at least have a potential goal for T release (or a long-term goal for Stein?). Or we can directly set this as a Stein goal as well. Why? Because of the very first Story ID actually started from 2000000(and as I mentioned, after migrating, your story id is exactly your bug id ). So once we generate bug with ID 2000000, things will become interesting (and hard to migrate). Current is 1775759, so one or two years I guess? To interpreted `might be difficult` above, The overall experience is great, some small things should get improve: - I can't tell if current story is already reported or not. There is no way to filter stories and checking conflict if there is. - Things going slow if we try to use Board in StoryBoard to filter out a great number of stories (like when I need to see all `High Priority` tagged stories) - Needs better documentation, In Heat we create an Etherpad to describe and collect Q&A on how people can better adopt StoryBoard. It will be great if teams can directly get this information. Overall, I think this is a nice goal, and it's actually painless to migrate. -- May The Force of OpenStack Be With You, Rico Lin irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From rico.lin.guanyu at gmail.com Fri Jun 8 07:44:10 2018 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Fri, 8 Jun 2018 15:44:10 +0800 Subject: [Openstack-operators] [openstack-dev] [TC] Stein Goal Selection In-Reply-To: <5ac1a7b4-51a7-e9c0-d6cc-2670561f3424@gmail.com> References: <20180604180742.GA6404@sm-xps> <5ac1a7b4-51a7-e9c0-d6cc-2670561f3424@gmail.com> Message-ID: Matt Riedemann 於 2018年6月8日 週五 上午6:49寫道: > I haven't used it much, but it would be really nice if someone could > record a modern 'how to storyboard' video for just basic usage/flows > since most people are used to launchpad by now so dealing with an > entirely new task tracker is not trivial (or at least, not something I > want to spend a lot of time figuring out). > > I found: > > https://www.youtube.com/watch?v=b2vJ9G5pNb4 > > https://www.youtube.com/watch?v=n_PaKuN4Skk > > But those are a bit old. > I create an Etherpad to collect Q&A on Migrate from Launchpad to StoryBoard for Heat (most information were general). Hope this helps https://etherpad.openstack.org/p/Heat-StoryBoard-Migration-Info > -- > > Thanks, > > Matt > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- May The Force of OpenStack Be With You, Rico Lin irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From dms at danplanet.com Fri Jun 8 13:46:01 2018 From: dms at danplanet.com (Dan Smith) Date: Fri, 08 Jun 2018 06:46:01 -0700 Subject: [Openstack-operators] [openstack-dev] [nova] increasing the number of allowed volumes attached per instance > 26 In-Reply-To: <4bf3536e-0e3b-0fc4-2894-fabd32ef23dc@gmail.com> (melanie witt's message of "Thu, 7 Jun 2018 10:56:53 -0700") References: <4bf3536e-0e3b-0fc4-2894-fabd32ef23dc@gmail.com> Message-ID: > Some ideas that have been discussed so far include: FYI, these are already in my order of preference. > A) Selecting a new, higher maximum that still yields reasonable > performance on a single compute host (64 or 128, for example). Pros: > helps prevent the potential for poor performance on a compute host > from attaching too many volumes. Cons: doesn't let anyone opt-in to a > higher maximum if their environment can handle it. I prefer this because I think it can be done per virt driver, for whatever actually makes sense there. If powervm can handle 500 volumes in a meaningful way on one instance, then that's cool. I think libvirt's limit should likely be 64ish. > B) Creating a config option to let operators choose how many volumes > allowed to attach to a single instance. Pros: lets operators opt-in to > a maximum that works in their environment. Cons: it's not discoverable > for those calling the API. This is a fine compromise, IMHO, as it lets operators tune it per compute node based on the virt driver and the hardware. If one compute is using nothing but iSCSI over a single 10g link, then they may need to clamp that down to something more sane. Like the per virt driver restriction above, it's not discoverable via the API, but if it varies based on compute node and other factors in a single deployment, then making it discoverable isn't going to be very easy anyway. > C) Create a configurable API limit for maximum number of volumes to > attach to a single instance that is either a quota or similar to a > quota. Pros: lets operators opt-in to a maximum that works in their > environment. Cons: it's yet another quota? Do we have any other quota limits that are per-instance like this would be? If not, then this would likely be weird, but if so, then this would also be an option, IMHO. However, it's too much work for what is really not a hugely important problem, IMHO, and both of the above are lighter-weight ways to solve this and move on. --Dan From mrhillsman at gmail.com Fri Jun 8 19:42:55 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Fri, 8 Jun 2018 14:42:55 -0500 Subject: [Openstack-operators] Reminder: UC Meeting Monday 1400UTC Message-ID: Hey everyone, Please see https://wiki.openstack.org/wiki/Governance/ Foundation/UserCommittee for UC meeting info and add additional agenda items if needed. -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.urdin at crystone.com Mon Jun 11 03:20:47 2018 From: tobias.urdin at crystone.com (Tobias Urdin) Date: Mon, 11 Jun 2018 03:20:47 +0000 Subject: [Openstack-operators] [ovs] [neutron] openvswitch flows firewall driver Message-ID: <72e1c6c5254c43638f8a67cb8fa10f0e@mb01.staff.ognet.se> Hello everybody, I'm cross-posting this with operators list. The openvswitch flows-based stateful firewall driver which uses the conntrack support in Linux kernel >= 4.3 (iirc) has been marked as experimental for several releases now, is there any information about flaws in this and why it should not be used in production? It's still marked as experimental or missing documentation in the networking guide [1]. And to operators; is anybody running the OVS stateful firewall in production? (firewall_driver = openvswitch) Appreciate any feedback :) Best regards [1] https://docs.openstack.org/neutron/queens/admin/config-ovsfwdriver.html From mrhillsman at gmail.com Mon Jun 11 12:37:34 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Mon, 11 Jun 2018 07:37:34 -0500 Subject: [Openstack-operators] Reminder: UC Meeting Today 1400UTC / 0900CST Message-ID: Hey everyone, Please see https://wiki.openstack.org/wiki/Governance/Foundation/ UserCommittee for UC meeting info and add additional agenda items if needed. -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Jun 11 13:23:02 2018 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Mon, 11 Jun 2018 15:23:02 +0200 Subject: [Openstack-operators] [openstack-dev] [ovs] [neutron] openvswitch flows firewall driver In-Reply-To: <72e1c6c5254c43638f8a67cb8fa10f0e@mb01.staff.ognet.se> References: <72e1c6c5254c43638f8a67cb8fa10f0e@mb01.staff.ognet.se> Message-ID: <2CCDDC5F-BD4C-4722-BAA0-80A97A016E37@redhat.com> Hi, I’m not sure about Queens but recently with [1] we switched default security group driver in devstack to „openvswitch”. Since at least month we have scenario gate job with this SG driver running as voting and gating. Currently, after switch devstack default driver to openvswitch it’s tested in many jobs in Neutron. [1] https://review.openstack.org/#/c/568297/ > Wiadomość napisana przez Tobias Urdin w dniu 11.06.2018, o godz. 05:20: > > Hello everybody, > I'm cross-posting this with operators list. > > The openvswitch flows-based stateful firewall driver which uses the > conntrack support in Linux kernel >= 4.3 (iirc) has been > marked as experimental for several releases now, is there any > information about flaws in this and why it should not be used in production? > > It's still marked as experimental or missing documentation in the > networking guide [1]. > > And to operators; is anybody running the OVS stateful firewall in > production? (firewall_driver = openvswitch) > > Appreciate any feedback :) > Best regards > > [1] https://docs.openstack.org/neutron/queens/admin/config-ovsfwdriver.html > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev — Slawek Kaplonski Senior software engineer Red Hat From ignaziocassano at gmail.com Mon Jun 11 13:27:19 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 11 Jun 2018 15:27:19 +0200 Subject: [Openstack-operators] openstack ocata swift 404 Message-ID: Hi averyone, I have jist installed openstack ocata with 3 swift server for testing purposes. When all swift servers are up I can create containers. If one server is down, I can create containers also if try a lot of times. Probably I attemps to create on the shutted sown server and sometimes I try on the other servers. Is it the correct beahviour ? Or I must modify my configuration to exclude e server from the cluster ? Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Tue Jun 12 06:27:00 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Tue, 12 Jun 2018 08:27:00 +0200 Subject: [Openstack-operators] openstack ocata swift Message-ID: Hello everyone, I configured ocata with three vm backends and one proxy when a storage backends goes down, proxy continues to redirect write requests to it. Is is the default behaviour ? Please, could anyone help me ? I expect proxy does not redirect write request to a shutdown node automatically. On documentation I read I must exclude it manually. Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Jun 12 14:09:08 2018 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 12 Jun 2018 14:09:08 +0000 Subject: [Openstack-operators] Organizational diversity tag Message-ID: <20180612140908.mo3rnhd2zyuyk2fq@yuggoth.org> Just a heads up that there's a summary of the "Organizational diversity tag" thread for the openstack-dev ML starting here: http://lists.openstack.org/pipermail/openstack-dev/2018-June/131399.html Among other things, the follow-up discussion of the summary asks whether operators/deployers find the tag useful at all. If you or your organization relies on this tag in any way, the OpenStack Technical Committee would like to hear about it so they can know whether continuing to track it is in any way an effective use of time. Thanks! -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From quickconvey at gmail.com Tue Jun 12 14:49:01 2018 From: quickconvey at gmail.com (Quick Convey) Date: Tue, 12 Jun 2018 20:19:01 +0530 Subject: [Openstack-operators] [neutron][ovs] How to Backup and Restore OVSDB Message-ID: HI, I am using OpenvSwitch 2.6.1 I would like to know, How we can backup and restore ovsdb in the controller and compute nodes. I think, ovsdb not in cluster mode, So we have to take backup from all controller and compute nodes. Please let me know if you have any idea. $ ovsdb-server -V ovsdb-server (Open vSwitch) 2.6.1 I also tried with following files, but don't know how to backup and restore it /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema If I copy above files back to same location and try to start ovs, then it is not listening to port 6640.I also observed that in ovsdb IDs of the Port and Interfaces are chaining when I do this steps. Thanks, -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Wed Jun 13 07:36:37 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Wed, 13 Jun 2018 09:36:37 +0200 Subject: [Openstack-operators] openstack ocata gnocchi statsd errors Message-ID: Hello everyone, I installed centos 7 openstack ocata with gnocchi. The gnocchi backand is "file" and /var/lib/gnocchi is on netapp nfs. Sometime a lot of locks are created on nfs and the statsd.log reports the following: 2018-03-05 08:56:34.743 58931 ERROR trollius [-] Exception in callback _flush() at /usr/lib/python2.7/site-packages/gnocchi/statsd.py:179 handle: 2018-03-05 08:56:34.743 58931 ERROR trollius Traceback (most recent call last): 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/trollius/events.py", line 136, in _run 2018-03-05 08:56:34.743 58931 ERROR trollius self._callback(*self._args) 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/gnocchi/statsd.py", line 181, in _flush 2018-03-05 08:56:34.743 58931 ERROR trollius stats.flush() 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/gnocchi/statsd.py", line 93, in flush 2018-03-05 08:56:34.743 58931 ERROR trollius with_metrics=True) 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 151, in wrapper 2018-03-05 08:56:34.743 58931 ERROR trollius ectxt.value = e.inner_exc 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2018-03-05 08:56:34.743 58931 ERROR trollius self.force_reraise() 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2018-03-05 08:56:34.743 58931 ERROR trollius six.reraise(self.type_, self.value, self.tb) 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 139, in wrapper 2018-03-05 08:56:34.743 58931 ERROR trollius return f(*args, **kwargs) 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/gnocchi/indexer/sqlalchemy.py", line 932, in get_resource 2018-03-05 08:56:34.743 58931 ERROR trollius session, resource_type)['resource'] 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/gnocchi/indexer/sqlalchemy.py", line 574, in _resource_type_to_mappers Anyone could help me, please? What's happening ? Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-philippe at evrard.me Wed Jun 13 09:53:52 2018 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Wed, 13 Jun 2018 11:53:52 +0200 Subject: [Openstack-operators] [openstack-ansible] Restarting our very own "SIG" teams Message-ID: Hello, TL:DR; If you have spare cycles, join one of our interest groups! In the Queens cycle, I have formalised the "liaisons" work, making them an integral part of the Thursday's meeting agenda. Sadly, that initiative didn't work, as almost no liaison worked/reported on those meetings, and I stopped the initiative. Upon common agreement that we now need to change how we scale the team, we will now start our "liaison 2.0" work. So, I have started an etherpad [1], where you could see all the "groups" of people that OSA need, and where you could help. Please don't hesitate to edit this etherpad, adding your new special interest group, or simply joining an existing one if you have spare cycles! Thank you! Jean-Philippe Evrard (evrardjp) [1]: https://etherpad.openstack.org/p/osa-liaisons From radu.popescu at emag.ro Wed Jun 13 10:45:50 2018 From: radu.popescu at emag.ro (Radu Popescu | eMAG, Technology) Date: Wed, 13 Jun 2018 10:45:50 +0000 Subject: [Openstack-operators] Neutron not adding iptables rules for metadata agent Message-ID: <5f716e6218597640ec4c5aaa9a80518639088680.camel@emag.ro> Hi all, So, I'm having the following issue. I'm creating a VM with floating IP. Everything is fine, namespace is there, postrouting and prerouting from the internal IP to the floating IP are there. The only rules missing are the rules to access metadata service: -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697 -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0xffff (this is taken from another working namespace with iptables-save) Forgot to mention, VM is booting ok, I have both the default route and the one for the metadata service (cloud-init is running at boot time): [ 57.150766] cloud-init[892]: ci-info: +--------+------+--------------+---------------+-------+-------------------+ [ 57.150997] cloud-init[892]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | [ 57.151219] cloud-init[892]: ci-info: +--------+------+--------------+---------------+-------+-------------------+ [ 57.151431] cloud-init[892]: ci-info: | lo: | True | 127.0.0.1 | 255.0.0.0 | . | . | [ 57.151627] cloud-init[892]: ci-info: | eth0: | True | 10.240.9.186 | 255.255.252.0 | . | fa:16:3e:43:d1:c2 | [ 57.151815] cloud-init[892]: ci-info: +--------+------+--------------+---------------+-------+-------------------+ [ 57.152018] cloud-init[892]: ci-info: +++++++++++++++++++++++++++++++Route IPv4 info++++++++++++++++++++++++++++++++ [ 57.152225] cloud-init[892]: ci-info: +-------+-----------------+------------+-----------------+-----------+-------+ [ 57.152426] cloud-init[892]: ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags | [ 57.152621] cloud-init[892]: ci-info: +-------+-----------------+------------+-----------------+-----------+-------+ [ 57.152813] cloud-init[892]: ci-info: | 0 | 0.0.0.0 | 10.240.8.1 | 0.0.0.0 | eth0 | UG | [ 57.153013] cloud-init[892]: ci-info: | 1 | 10.240.1.0 | 0.0.0.0 | 255.255.255.0 | eth0 | U | [ 57.153202] cloud-init[892]: ci-info: | 2 | 10.240.8.0 | 0.0.0.0 | 255.255.252.0 | eth0 | U | [ 57.153397] cloud-init[892]: ci-info: | 3 | 169.254.169.254 | 10.240.8.1 | 255.255.255.255 | eth0 | UGH | [ 57.153579] cloud-init[892]: ci-info: +-------+-----------------+------------+-----------------+-----------+-------+ The extra route is there because the tenant has 2 subnets. Before adding those 2 rules manually, I had this coming from cloud-init: [ 192.451801] cloud-init[892]: 2018-06-13 12:29:26,179 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: request error [('Connection aborted.', error(113, 'No route to host'))] [ 193.456805] cloud-init[892]: 2018-06-13 12:29:27,184 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [1/120s]: request error [('Connection aborted.', error(113, 'No route to host'))] [ 194.461592] cloud-init[892]: 2018-06-13 12:29:28,189 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [2/120s]: request error [('Connection aborted.', error(113, 'No route to host'))] [ 195.466441] cloud-init[892]: 2018-06-13 12:29:29,194 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]: request error [('Connection aborted.', error(113, 'No route to host'))] I can see no errors in neither nova or neutron services. In the mean time, I've searched all our nova servers for this kind of behavior and we have 1 random namespace missing those rules on 6 of our 66 novas. Any ideas would be greatly appreciated. Thanks, Radu -------------- next part -------------- An HTML attachment was scrubbed... URL: From blair.bethwaite at gmail.com Wed Jun 13 13:58:34 2018 From: blair.bethwaite at gmail.com (Blair Bethwaite) Date: Wed, 13 Jun 2018 23:58:34 +1000 Subject: [Openstack-operators] large high-performance ephemeral storage Message-ID: Hi all, Wondering if anyone can share experience with architecting Nova KVM boxes for large capacity high-performance storage? We have some particular use-cases that want both high-IOPs and large capacity local storage. In the past we have used bcache with an SSD based RAID0 write-through caching for a hardware (PERC) backed RAID volume. This seemed to work ok, but we never really gave it a hard time. I guess if we followed a similar pattern today we would use lvmcache (or are people still using bcache with confidence?) with a few TB of NVMe and a NL-SAS array with write cache. Is the collective wisdom to use LVM based instances for these use-cases? Putting a host filesystem with qcow2 based disk images on it can't help performance-wise... Though we have not used LVM based instance storage before, are there any significant gotchas? And furthermore, is it possible to use set IO QoS limits on these? -- Cheers, ~Blairo -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaypipes at gmail.com Wed Jun 13 14:02:59 2018 From: jaypipes at gmail.com (Jay Pipes) Date: Wed, 13 Jun 2018 10:02:59 -0400 Subject: [Openstack-operators] large high-performance ephemeral storage In-Reply-To: References: Message-ID: <4d145846-2771-ae21-44bb-c69f320ccae5@gmail.com> On 06/13/2018 09:58 AM, Blair Bethwaite wrote: > Hi all, > > Wondering if anyone can share experience with architecting Nova KVM > boxes for large capacity high-performance storage? We have some > particular use-cases that want both high-IOPs and large capacity local > storage. > > In the past we have used bcache with an SSD based RAID0 write-through > caching for a hardware (PERC) backed RAID volume. This seemed to work > ok, but we never really gave it a hard time. I guess if we followed a > similar pattern today we would use lvmcache (or are people still using > bcache with confidence?) with a few TB of NVMe and a NL-SAS array with > write cache. > > Is the collective wisdom to use LVM based instances for these use-cases? > Putting a host filesystem with qcow2 based disk images on it can't help > performance-wise... Though we have not used LVM based instance storage > before, are there any significant gotchas? And furthermore, is it > possible to use set IO QoS limits on these? I've found /dev/null to be the fastest ephemeral storage system, bar none. Not sure if you can set QoS limits on it though. Best, -jay From blair.bethwaite at gmail.com Wed Jun 13 14:18:11 2018 From: blair.bethwaite at gmail.com (Blair Bethwaite) Date: Thu, 14 Jun 2018 00:18:11 +1000 Subject: [Openstack-operators] large high-performance ephemeral storage In-Reply-To: <4d145846-2771-ae21-44bb-c69f320ccae5@gmail.com> References: <4d145846-2771-ae21-44bb-c69f320ccae5@gmail.com> Message-ID: Hi Jay, Ha, I'm sure there's some wisdom hidden behind the trolling here? Believe me, I have tried to push these sorts of use-cases toward volume or share storage, but in the research/science domain there is often more accessible funding available to throw at infrastructure stop-gaps than software engineering (parallelism is hard). PS: when I say ephemeral I don't necessarily mean they aren't doing backups and otherwise caring that they have 100+TB of data on a stand alone host. PS: I imagine you can set QoS limits on /dev/null these days via CPU cgroups... Cheers, On Thu., 14 Jun. 2018, 00:03 Jay Pipes, wrote: > On 06/13/2018 09:58 AM, Blair Bethwaite wrote: > > Hi all, > > > > Wondering if anyone can share experience with architecting Nova KVM > > boxes for large capacity high-performance storage? We have some > > particular use-cases that want both high-IOPs and large capacity local > > storage. > > > > In the past we have used bcache with an SSD based RAID0 write-through > > caching for a hardware (PERC) backed RAID volume. This seemed to work > > ok, but we never really gave it a hard time. I guess if we followed a > > similar pattern today we would use lvmcache (or are people still using > > bcache with confidence?) with a few TB of NVMe and a NL-SAS array with > > write cache. > > > > Is the collective wisdom to use LVM based instances for these use-cases? > > Putting a host filesystem with qcow2 based disk images on it can't help > > performance-wise... Though we have not used LVM based instance storage > > before, are there any significant gotchas? And furthermore, is it > > possible to use set IO QoS limits on these? > > I've found /dev/null to be the fastest ephemeral storage system, bar none. > > Not sure if you can set QoS limits on it though. > > Best, > -jay > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaypipes at gmail.com Wed Jun 13 14:24:43 2018 From: jaypipes at gmail.com (Jay Pipes) Date: Wed, 13 Jun 2018 10:24:43 -0400 Subject: [Openstack-operators] large high-performance ephemeral storage In-Reply-To: References: <4d145846-2771-ae21-44bb-c69f320ccae5@gmail.com> Message-ID: <92facba8-7903-b619-d4f1-1b7284f7a157@gmail.com> On 06/13/2018 10:18 AM, Blair Bethwaite wrote: > Hi Jay, > > Ha, I'm sure there's some wisdom hidden behind the trolling here? I wasn't trolling at all. I was trying to be funny. Attempt failed I guess :) Best, -jay From joe at topjian.net Wed Jun 13 14:24:44 2018 From: joe at topjian.net (Joe Topjian) Date: Wed, 13 Jun 2018 08:24:44 -0600 Subject: [Openstack-operators] large high-performance ephemeral storage In-Reply-To: References: <4d145846-2771-ae21-44bb-c69f320ccae5@gmail.com> Message-ID: Yes, you can! The kernel documentation for read/write limits actually uses /dev/null in the examples :) But more seriously: while we have not architected specifically for high performance, for the past few years, we have used a zpool of cheap spindle disks and 1-2 SSD disks for caching. We have ZFS configured for deduplication which helps for the base images but not so much for ephemeral. If you have a standard benchmark command in mind to run, I'd be happy to post the results. Maybe others could do the same to create some type of matrix? On Wed, Jun 13, 2018 at 8:18 AM, Blair Bethwaite wrote: > Hi Jay, > > Ha, I'm sure there's some wisdom hidden behind the trolling here? > > Believe me, I have tried to push these sorts of use-cases toward volume or > share storage, but in the research/science domain there is often more > accessible funding available to throw at infrastructure stop-gaps than > software engineering (parallelism is hard). PS: when I say ephemeral I > don't necessarily mean they aren't doing backups and otherwise caring that > they have 100+TB of data on a stand alone host. > > PS: I imagine you can set QoS limits on /dev/null these days via CPU > cgroups... > > Cheers, > > > On Thu., 14 Jun. 2018, 00:03 Jay Pipes, wrote: > >> On 06/13/2018 09:58 AM, Blair Bethwaite wrote: >> > Hi all, >> > >> > Wondering if anyone can share experience with architecting Nova KVM >> > boxes for large capacity high-performance storage? We have some >> > particular use-cases that want both high-IOPs and large capacity local >> > storage. >> > >> > In the past we have used bcache with an SSD based RAID0 write-through >> > caching for a hardware (PERC) backed RAID volume. This seemed to work >> > ok, but we never really gave it a hard time. I guess if we followed a >> > similar pattern today we would use lvmcache (or are people still using >> > bcache with confidence?) with a few TB of NVMe and a NL-SAS array with >> > write cache. >> > >> > Is the collective wisdom to use LVM based instances for these >> use-cases? >> > Putting a host filesystem with qcow2 based disk images on it can't help >> > performance-wise... Though we have not used LVM based instance storage >> > before, are there any significant gotchas? And furthermore, is it >> > possible to use set IO QoS limits on these? >> >> I've found /dev/null to be the fastest ephemeral storage system, bar none. >> >> Not sure if you can set QoS limits on it though. >> >> Best, >> -jay >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From blair.bethwaite at gmail.com Wed Jun 13 14:32:33 2018 From: blair.bethwaite at gmail.com (Blair Bethwaite) Date: Thu, 14 Jun 2018 00:32:33 +1000 Subject: [Openstack-operators] large high-performance ephemeral storage In-Reply-To: <92facba8-7903-b619-d4f1-1b7284f7a157@gmail.com> References: <4d145846-2771-ae21-44bb-c69f320ccae5@gmail.com> <92facba8-7903-b619-d4f1-1b7284f7a157@gmail.com> Message-ID: Lol! Ok, forgive me, I wasn't sure if I had regular or existential Jay on the line :-). On Thu., 14 Jun. 2018, 00:24 Jay Pipes, wrote: > On 06/13/2018 10:18 AM, Blair Bethwaite wrote: > > Hi Jay, > > > > Ha, I'm sure there's some wisdom hidden behind the trolling here? > > I wasn't trolling at all. I was trying to be funny. Attempt failed I > guess :) > > Best, > -jay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From blair.bethwaite at gmail.com Wed Jun 13 14:45:37 2018 From: blair.bethwaite at gmail.com (Blair Bethwaite) Date: Thu, 14 Jun 2018 00:45:37 +1000 Subject: [Openstack-operators] large high-performance ephemeral storage In-Reply-To: References: <4d145846-2771-ae21-44bb-c69f320ccae5@gmail.com> Message-ID: Hey Joe, Thanks! So shall we settle on fio as a standard IO micro benchmarking tool? Seems to me the minimum we want is throughput and IOPs oriented tests for both the guest OS workload profile and the some sort of large working set application workload. For the latter it is probably best to ignore multiple files and focus solely on queue depth for parallelism, some sort of mixed block size profile/s, and some sort of r/w mix (where write <=50% to acknowledge this is ephemeral storage so hopefully something is using it soon after storing). Thoughts? Cheers, Blair On Thu., 14 Jun. 2018, 00:24 Joe Topjian, wrote: > Yes, you can! The kernel documentation for read/write limits actually uses > /dev/null in the examples :) > > But more seriously: while we have not architected specifically for high > performance, for the past few years, we have used a zpool of cheap spindle > disks and 1-2 SSD disks for caching. We have ZFS configured for > deduplication which helps for the base images but not so much for ephemeral. > > If you have a standard benchmark command in mind to run, I'd be happy to > post the results. Maybe others could do the same to create some type of > matrix? > > On Wed, Jun 13, 2018 at 8:18 AM, Blair Bethwaite < > blair.bethwaite at gmail.com> wrote: > >> Hi Jay, >> >> Ha, I'm sure there's some wisdom hidden behind the trolling here? >> >> Believe me, I have tried to push these sorts of use-cases toward volume >> or share storage, but in the research/science domain there is often more >> accessible funding available to throw at infrastructure stop-gaps than >> software engineering (parallelism is hard). PS: when I say ephemeral I >> don't necessarily mean they aren't doing backups and otherwise caring that >> they have 100+TB of data on a stand alone host. >> >> PS: I imagine you can set QoS limits on /dev/null these days via CPU >> cgroups... >> >> Cheers, >> >> >> On Thu., 14 Jun. 2018, 00:03 Jay Pipes, wrote: >> >>> On 06/13/2018 09:58 AM, Blair Bethwaite wrote: >>> > Hi all, >>> > >>> > Wondering if anyone can share experience with architecting Nova KVM >>> > boxes for large capacity high-performance storage? We have some >>> > particular use-cases that want both high-IOPs and large capacity local >>> > storage. >>> > >>> > In the past we have used bcache with an SSD based RAID0 write-through >>> > caching for a hardware (PERC) backed RAID volume. This seemed to work >>> > ok, but we never really gave it a hard time. I guess if we followed a >>> > similar pattern today we would use lvmcache (or are people still using >>> > bcache with confidence?) with a few TB of NVMe and a NL-SAS array with >>> > write cache. >>> > >>> > Is the collective wisdom to use LVM based instances for these >>> use-cases? >>> > Putting a host filesystem with qcow2 based disk images on it can't >>> help >>> > performance-wise... Though we have not used LVM based instance storage >>> > before, are there any significant gotchas? And furthermore, is it >>> > possible to use set IO QoS limits on these? >>> >>> I've found /dev/null to be the fastest ephemeral storage system, bar >>> none. >>> >>> Not sure if you can set QoS limits on it though. >>> >>> Best, >>> -jay >>> >>> _______________________________________________ >>> OpenStack-operators mailing list >>> OpenStack-operators at lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>> >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joe at topjian.net Wed Jun 13 14:59:07 2018 From: joe at topjian.net (Joe Topjian) Date: Wed, 13 Jun 2018 08:59:07 -0600 Subject: [Openstack-operators] large high-performance ephemeral storage In-Reply-To: References: <4d145846-2771-ae21-44bb-c69f320ccae5@gmail.com> Message-ID: fio is fine with me. I'll lazily defer to your expertise on the right fio commands to run for each case. :) If we're going to test within the guest, that's going to introduce a new set of variables, right? Should we settle on a standard flavor (maybe two if we wanted to include both virtio and virtio-scsi) or should the results make note of what local configuration was used? On Wed, Jun 13, 2018 at 8:45 AM, Blair Bethwaite wrote: > Hey Joe, > > Thanks! So shall we settle on fio as a standard IO micro benchmarking > tool? Seems to me the minimum we want is throughput and IOPs oriented tests > for both the guest OS workload profile and the some sort of large working > set application workload. For the latter it is probably best to ignore > multiple files and focus solely on queue depth for parallelism, some sort > of mixed block size profile/s, and some sort of r/w mix (where write <=50% > to acknowledge this is ephemeral storage so hopefully something is using it > soon after storing). Thoughts? > > Cheers, > Blair > > On Thu., 14 Jun. 2018, 00:24 Joe Topjian, wrote: > >> Yes, you can! The kernel documentation for read/write limits actually >> uses /dev/null in the examples :) >> >> But more seriously: while we have not architected specifically for high >> performance, for the past few years, we have used a zpool of cheap spindle >> disks and 1-2 SSD disks for caching. We have ZFS configured for >> deduplication which helps for the base images but not so much for ephemeral. >> >> If you have a standard benchmark command in mind to run, I'd be happy to >> post the results. Maybe others could do the same to create some type of >> matrix? >> >> On Wed, Jun 13, 2018 at 8:18 AM, Blair Bethwaite < >> blair.bethwaite at gmail.com> wrote: >> >>> Hi Jay, >>> >>> Ha, I'm sure there's some wisdom hidden behind the trolling here? >>> >>> Believe me, I have tried to push these sorts of use-cases toward volume >>> or share storage, but in the research/science domain there is often more >>> accessible funding available to throw at infrastructure stop-gaps than >>> software engineering (parallelism is hard). PS: when I say ephemeral I >>> don't necessarily mean they aren't doing backups and otherwise caring that >>> they have 100+TB of data on a stand alone host. >>> >>> PS: I imagine you can set QoS limits on /dev/null these days via CPU >>> cgroups... >>> >>> Cheers, >>> >>> >>> On Thu., 14 Jun. 2018, 00:03 Jay Pipes, wrote: >>> >>>> On 06/13/2018 09:58 AM, Blair Bethwaite wrote: >>>> > Hi all, >>>> > >>>> > Wondering if anyone can share experience with architecting Nova KVM >>>> > boxes for large capacity high-performance storage? We have some >>>> > particular use-cases that want both high-IOPs and large capacity >>>> local >>>> > storage. >>>> > >>>> > In the past we have used bcache with an SSD based RAID0 write-through >>>> > caching for a hardware (PERC) backed RAID volume. This seemed to work >>>> > ok, but we never really gave it a hard time. I guess if we followed a >>>> > similar pattern today we would use lvmcache (or are people still >>>> using >>>> > bcache with confidence?) with a few TB of NVMe and a NL-SAS array >>>> with >>>> > write cache. >>>> > >>>> > Is the collective wisdom to use LVM based instances for these >>>> use-cases? >>>> > Putting a host filesystem with qcow2 based disk images on it can't >>>> help >>>> > performance-wise... Though we have not used LVM based instance >>>> storage >>>> > before, are there any significant gotchas? And furthermore, is it >>>> > possible to use set IO QoS limits on these? >>>> >>>> I've found /dev/null to be the fastest ephemeral storage system, bar >>>> none. >>>> >>>> Not sure if you can set QoS limits on it though. >>>> >>>> Best, >>>> -jay >>>> >>>> _______________________________________________ >>>> OpenStack-operators mailing list >>>> OpenStack-operators at lists.openstack.org >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>>> >>> >>> _______________________________________________ >>> OpenStack-operators mailing list >>> OpenStack-operators at lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed Jun 13 15:14:32 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Jun 2018 10:14:32 -0500 Subject: [Openstack-operators] Reminder to add "nova-status upgrade check" to deployment tooling Message-ID: I was going through some recently reported nova bugs and came across [1] which I opened at the Summit during one of the FFU sessions where I realized the nova upgrade docs don't mention the nova-status upgrade check CLI [2] (added in Ocata). As a result, I was wondering how many deployment tools out there support upgrades and from those, which are actually integrating that upgrade status check command. I'm not really familiar with most of them, but I've dabbled in OSA enough to know where the code lived for nova upgrades, so I posted a patch [3]. I'm hoping this can serve as a template for other deployment projects to integrate similar checks into their upgrade (and install verification) flows. [1] https://bugs.launchpad.net/nova/+bug/1772973 [2] https://docs.openstack.org/nova/latest/cli/nova-status.html [3] https://review.openstack.org/#/c/575125/ -- Thanks, Matt From chris.friesen at windriver.com Wed Jun 13 15:54:40 2018 From: chris.friesen at windriver.com (Chris Friesen) Date: Wed, 13 Jun 2018 09:54:40 -0600 Subject: [Openstack-operators] large high-performance ephemeral storage In-Reply-To: References: Message-ID: <5B213E40.4030901@windriver.com> On 06/13/2018 07:58 AM, Blair Bethwaite wrote: > Is the collective wisdom to use LVM based instances for these use-cases? Putting > a host filesystem with qcow2 based disk images on it can't help > performance-wise... Though we have not used LVM based instance storage before, > are there any significant gotchas? And furthermore, is it possible to use set IO > QoS limits on these? LVM has the drawback that deleting instances results in significant disk traffic while the volume is scrubbed with zeros. If you don't care about security you can set a config option to turn this off. Also, while this is happening I think your disk resource tracking will be wrong because nova assumes the space is available. (At least it used to be this way, I haven't checked that code recently.) Also, migration and resize are not supported for LVM-backed instances. I proposed a patch to support them (https://review.openstack.org/#/c/337334/) but hit issues and never got around to fixing them up. Chris From kendall at openstack.org Wed Jun 13 18:29:54 2018 From: kendall at openstack.org (Kendall Waters) Date: Wed, 13 Jun 2018 13:29:54 -0500 Subject: [Openstack-operators] PTG Denver 2018 Registration & Hotel Info Message-ID: The fourth Project Teams Gathering will be held September 10-14th back at the Renaissance Stapleton Hotel in Denver, Colorado (3801 Quebec Street, Denver, Colorado 80207). REGISTRATION AND HOTEL Registration is now available here: https://denver2018ptg.eventbrite.com The price is currently USD $399 until August 23 at 6:59 UTC. After that date, the price will be USD $599 so buy your pass before the price increases! We've reserved a very limited block of discounted hotel rooms at $149/night USD (does not include breakfast) with the Renaissance Denver Stapleton Hotel where the event will be held. Please move quickly to reserve a room by August 20th or until they sell out! TRAIN NEAR HOTEL The hotel has informed us that the RTD is anticipating the area near the Renaissance Denver Stapleton Hotel being deemed a quiet zone by end of July, with a more realistic completion date of August 15th. This means there should not be any train horns during the week of the PTG! HELPFUL LINKS: Registration: https://denver2018ptg.eventbrite.com Visa Invitation Letter (deadline August 24): https://openstackfoundation.formstack.com/forms/visa_form_denver_2018_ptg Travel Support Program (first round deadline July 1): https://openstackfoundation.formstack.com/forms/travelsupportptg_denver_2018 Sponsorship: https://www.openstack.org/ptg#tab_sponsor Book a Hotel Room (deadline August 20): https://www.marriott.com/meeting-event-hotels/group-corporate-travel/groupCorp.mi?resLinkData=Project%20Teams%20Gathering%2C%20Openstack%5Edensa%60opnopna%7Copnopnb%60149.00%60USD%60false%604%609/5/18%609/18/18%608/20/18&app=resvlink&stop_mobi=yes Feel free to reach out to me directly with any questions, looking forward to seeing everyone in Denver! Cheers, Kendall -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.therond at gmail.com Wed Jun 13 18:42:26 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Wed, 13 Jun 2018 20:42:26 +0200 Subject: [Openstack-operators] [openstack-client] - missing commands? Message-ID: Hi guys, I use the «new» openstack-client command as much as possible since a couple of years now, but yet I had a hard time recently to find equivalent command of the following: nova force-delete & The command on swift that permit to recursively upload the content of a directory and automatically creating the same directory structure using pseudo-folders. Did I miss something somewhere or are those commands missing? On the nova part I think it’s not that important as a classic openstack server delete seems to do the same, but not quite sure. Thanks guys! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed Jun 13 21:25:49 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Jun 2018 16:25:49 -0500 Subject: [Openstack-operators] large high-performance ephemeral storage In-Reply-To: References: Message-ID: <3a2836f8-384b-227b-c457-4adef5a8c30d@gmail.com> On 6/13/2018 8:58 AM, Blair Bethwaite wrote: > Though we have not used LVM based instance storage before, are there any > significant gotchas? I know you can't resize/cold migrate lvm-backed ephemeral root disk instances: https://github.com/openstack/nova/blob/343c2bee234568855fd9e6ba075a05c2e70f3388/nova/virt/libvirt/driver.py#L8136 However, StarlingX has a patch for that (pretty sure anyway, I know WindRiver had one): https://review.openstack.org/#/c/337334/ -- Thanks, Matt From mriedemos at gmail.com Wed Jun 13 21:35:10 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Jun 2018 16:35:10 -0500 Subject: [Openstack-operators] large high-performance ephemeral storage In-Reply-To: <5B213E40.4030901@windriver.com> References: <5B213E40.4030901@windriver.com> Message-ID: <6fc5dd19-13aa-eeef-dbc6-9f8e97e4c07a@gmail.com> On 6/13/2018 10:54 AM, Chris Friesen wrote: > Also, migration and resize are not supported for LVM-backed instances. > I proposed a patch to support them > (https://review.openstack.org/#/c/337334/) but hit issues and never got > around to fixing them up. Yup, I guess I should have read the entire thread first. -- Thanks, Matt From mriedemos at gmail.com Wed Jun 13 21:38:02 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Jun 2018 16:38:02 -0500 Subject: [Openstack-operators] [openstack-client] - missing commands? In-Reply-To: References: Message-ID: On 6/13/2018 1:42 PM, Flint WALRUS wrote: > Hi guys, I use the «new» openstack-client command as much as possible > since a couple of years now, but yet I had a hard time recently to find > equivalent command of the following: > > nova force-delete > & > The command on swift that permit to recursively upload the content of a > directory and automatically creating the same directory structure using > pseudo-folders. > > Did I miss something somewhere or are those commands missing? > > On the nova part I think it’s not that important as a classic openstack > server delete seems to do the same, but not quite sure. Oh wow, great timing: http://lists.openstack.org/pipermail/openstack-dev/2018-June/131308.html I've also queued that up for the upcoming bug smash in China next week. -- Thanks, Matt From gord at live.ca Wed Jun 13 22:51:12 2018 From: gord at live.ca (gordon chung) Date: Wed, 13 Jun 2018 22:51:12 +0000 Subject: [Openstack-operators] openstack ocata gnocchi statsd errors In-Reply-To: References: Message-ID: i would probably ask this question on gnocchi[1] as i don't know how many devs read this. you should also add what numerical version of gnocchi you're using and what coordination service you're using. if i were to quickly guess, i'm going to to assume you're not using a dedicated coordination service and are relying on the db based on your comment. if so, i would add the version of tooz you have as there are certain versions of tooz that do not handle locking with SQL well. (don't have the exact versions of tooz that are broken handy) [1] https://github.com/gnocchixyz/gnocchi/issues -- gord ________________________________________ From: Ignazio Cassano Sent: June 13, 2018 7:36 AM To: OpenStack Operators Subject: [Openstack-operators] openstack ocata gnocchi statsd errors Hello everyone, I installed centos 7 openstack ocata with gnocchi. The gnocchi backand is "file" and /var/lib/gnocchi is on netapp nfs. Sometime a lot of locks are created on nfs and the statsd.log reports the following: 2018-03-05 08:56:34.743 58931 ERROR trollius [-] Exception in callback _flush() at /usr/lib/python2.7/site-packages/gnocchi/statsd.py:179 handle: 2018-03-05 08:56:34.743 58931 ERROR trollius Traceback (most recent call last): 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/trollius/events.py", line 136, in _run 2018-03-05 08:56:34.743 58931 ERROR trollius self._callback(*self._args) 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/gnocchi/statsd.py", line 181, in _flush 2018-03-05 08:56:34.743 58931 ERROR trollius stats.flush() 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/gnocchi/statsd.py", line 93, in flush 2018-03-05 08:56:34.743 58931 ERROR trollius with_metrics=True) 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 151, in wrapper 2018-03-05 08:56:34.743 58931 ERROR trollius ectxt.value = e.inner_exc 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2018-03-05 08:56:34.743 58931 ERROR trollius self.force_reraise() 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2018-03-05 08:56:34.743 58931 ERROR trollius six.reraise(self.type_, self.value, self.tb) 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 139, in wrapper 2018-03-05 08:56:34.743 58931 ERROR trollius return f(*args, **kwargs) 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/gnocchi/indexer/sqlalchemy.py", line 932, in get_resource 2018-03-05 08:56:34.743 58931 ERROR trollius session, resource_type)['resource'] 2018-03-05 08:56:34.743 58931 ERROR trollius File "/usr/lib/python2.7/site-packages/gnocchi/indexer/sqlalchemy.py", line 574, in _resource_type_to_mappers Anyone could help me, please? What's happening ? Regards Ignazio From ignaziocassano at gmail.com Thu Jun 14 05:30:55 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 14 Jun 2018 07:30:55 +0200 Subject: [Openstack-operators] openstack ocata gnocchi statsd errors In-Reply-To: References: Message-ID: Hello Gordon, what do you mean with "coordination service" ? I installed ceilometer and gnocchi following ocata community documentation. Versions are the followings : openstack-gnocchi-statsd-3.1.16-1.el7.noarch python2-gnocchiclient-3.1.0-1.el7.noarch python-gnocchi-3.1.16-1.el7.noarch openstack-gnocchi-api-3.1.16-1.el7.noarch openstack-gnocchi-indexer-sqlalchemy-3.1.16-1.el7.noarch openstack-gnocchi-common-3.1.16-1.el7.noarch openstack-gnocchi-metricd-3.1.16-1.el7.noarch The indexer is on mariadb. Thanks and Regards Ignazio 2018-06-14 0:51 GMT+02:00 gordon chung : > i would probably ask this question on gnocchi[1] as i don't know how many > devs read this. you should also add what numerical version of gnocchi > you're using and what coordination service you're using. > > if i were to quickly guess, i'm going to to assume you're not using a > dedicated coordination service and are relying on the db based on your > comment. if so, i would add the version of tooz you have as there are > certain versions of tooz that do not handle locking with SQL well. (don't > have the exact versions of tooz that are broken handy) > > [1] https://github.com/gnocchixyz/gnocchi/issues > -- > gord > > ________________________________________ > From: Ignazio Cassano > Sent: June 13, 2018 7:36 AM > To: OpenStack Operators > Subject: [Openstack-operators] openstack ocata gnocchi statsd errors > > Hello everyone, > I installed centos 7 openstack ocata with gnocchi. > The gnocchi backand is "file" and /var/lib/gnocchi is on netapp nfs. > Sometime a lot of locks are created on nfs and the statsd.log reports > the following: > > 2018-03-05 08:56:34.743 58931 ERROR trollius [-] Exception in callback > _flush() at /usr/lib/python2.7/site-packages/gnocchi/statsd.py:179 > handle: /usr/lib/python2.7/site-packages/gnocchi/statsd.py:179> > 2018-03-05 08:56:34.743 58931 ERROR trollius Traceback (most recent call > last): > 2018-03-05 08:56:34.743 58931 ERROR trollius File > "/usr/lib/python2.7/site-packages/trollius/events.py", line 136, in _run > 2018-03-05 08:56:34.743 58931 ERROR trollius > self._callback(*self._args) > 2018-03-05 08:56:34.743 58931 ERROR trollius File > "/usr/lib/python2.7/site-packages/gnocchi/statsd.py", line 181, in _flush > 2018-03-05 08:56:34.743 58931 ERROR trollius stats.flush() > 2018-03-05 08:56:34.743 58931 ERROR trollius File > "/usr/lib/python2.7/site-packages/gnocchi/statsd.py", line 93, in flush > 2018-03-05 08:56:34.743 58931 ERROR trollius with_metrics=True) > 2018-03-05 08:56:34.743 58931 ERROR trollius File > "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 151, in wrapper > 2018-03-05 08:56:34.743 58931 ERROR trollius ectxt.value = e.inner_exc > 2018-03-05 08:56:34.743 58931 ERROR trollius File > "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in > __exit__ > 2018-03-05 08:56:34.743 58931 ERROR trollius self.force_reraise() > 2018-03-05 08:56:34.743 58931 ERROR trollius File > "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in > force_reraise > 2018-03-05 08:56:34.743 58931 ERROR trollius six.reraise(self.type_, > self.value, self.tb) > 2018-03-05 08:56:34.743 58931 ERROR trollius File > "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 139, in wrapper > 2018-03-05 08:56:34.743 58931 ERROR trollius return f(*args, **kwargs) > 2018-03-05 08:56:34.743 58931 ERROR trollius File > "/usr/lib/python2.7/site-packages/gnocchi/indexer/sqlalchemy.py", line > 932, in get_resource > 2018-03-05 08:56:34.743 58931 ERROR trollius session, > resource_type)['resource'] > 2018-03-05 08:56:34.743 58931 ERROR trollius File > "/usr/lib/python2.7/site-packages/gnocchi/indexer/sqlalchemy.py", line > 574, in _resource_type_to_mappers > > > Anyone could help me, please? > What's happening ? > > Regards > Ignazio > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.therond at gmail.com Thu Jun 14 07:33:04 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Thu, 14 Jun 2018 09:33:04 +0200 Subject: [Openstack-operators] [openstack-client] - missing commands? In-Reply-To: References: Message-ID: Oh sweet! Indeed great timing and planetary alignment ! I’ll had a look at the etherpad and participate on it. I’ll ask my teammates to give me their own missing features etc. I’m glad someone is working on it! Le mer. 13 juin 2018 à 23:38, Matt Riedemann a écrit : > On 6/13/2018 1:42 PM, Flint WALRUS wrote: > > Hi guys, I use the «new» openstack-client command as much as possible > > since a couple of years now, but yet I had a hard time recently to find > > equivalent command of the following: > > > > nova force-delete > > & > > The command on swift that permit to recursively upload the content of a > > directory and automatically creating the same directory structure using > > pseudo-folders. > > > > Did I miss something somewhere or are those commands missing? > > > > On the nova part I think it’s not that important as a classic openstack > > server delete seems to do the same, but not quite sure. > > Oh wow, great timing: > > http://lists.openstack.org/pipermail/openstack-dev/2018-June/131308.html > > I've also queued that up for the upcoming bug smash in China next week. > > -- > > Thanks, > > Matt > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gord at live.ca Thu Jun 14 15:35:59 2018 From: gord at live.ca (gordon chung) Date: Thu, 14 Jun 2018 15:35:59 +0000 Subject: [Openstack-operators] openstack ocata gnocchi statsd errors In-Reply-To: References: Message-ID: On 2018-06-14 1:30 AM, Ignazio Cassano wrote: > Hello Gordon, what do you mean with "coordination service" ? it is what handles locking/coordination across multiple workers. you configure it via a config option in your gnocchi.conf file. as you didn't define one it is using your indexer by default and creates locks on, in your case, mariadb. i would upgrade your tooz as there was a bug which caused it to create a lot of open connections. you can also increase your mariadb connection limit. cheers, -- gord From dave at opensourcesolutions.co.uk Thu Jun 14 18:59:22 2018 From: dave at opensourcesolutions.co.uk (Dave Williams) Date: Thu, 14 Jun 2018 19:59:22 +0100 Subject: [Openstack-operators] [kolla][nova] Upgrade leaves nova_compute at earlier version Message-ID: <20180614185922.GB3725@opensourcesolutions.co.uk> I am using kolla-ansible 6.0.0 with openstack_release set to master in globals.yml in a production environment. I am trying to fix an oslo_messaging.rpc.client.RemoteError when undertaking server add volume (https://bugs.launchpad.net/nova/+bug/1773393) and appear to have tracked it down to an inconsistency of container versions. kollo-ansible upgrade runs to completion without error but leaves the running nova_compute containers (and possibly others) at an earlier version. The version running is CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 944f620a445c 7dc2d4695962 "kolla_start" 4 weeks ago Up 4 hours nova_compute whereas docker images -a shows REPOSITORY TAG IMAGE ID CREATED SIZE kolla/ubuntu-source-nova-compute master f2df8187f14e 15 hours ago 1.29GB kolla/ubuntu-source-nova-compute 582561ac010f 39 hours ago 1.29GB kolla/ubuntu-source-nova-compute 7dc2d4695962 3 months ago 1.22GB This implies f2df8187f14e is the one I should be using. The image 582561ac010f was after I tried to switch to queens from master but without success due to a bootstrap_cinder problem: Error during database migration: "Database schema file with version 122 doesn't exist." I tried to investigate this but without any obvious resolution. All compute nodes show the same issue. As per the notes on https://docs.openstack.org/kolla-ansible/latest/user/operating-kolla.html I have checked and virt_type is set to kvm in nova.conf and so I cannot see why the upgrade shouldnt have been successful. How do I get kolla to use the latest version pulled? Given I have running instances I am a little nervous of doing a deploy or reconfigure. Thanks for help. Dave From dabarren at gmail.com Thu Jun 14 19:09:45 2018 From: dabarren at gmail.com (Eduardo Gonzalez) Date: Thu, 14 Jun 2018 21:09:45 +0200 Subject: [Openstack-operators] [kolla][nova] Upgrade leaves nova_compute at earlier version In-Reply-To: <20180614185922.GB3725@opensourcesolutions.co.uk> References: <20180614185922.GB3725@opensourcesolutions.co.uk> Message-ID: Hi, could you share a your globals file without secrets, a docker ps -a on all compute hosts and images too. If you are able to get n upgrade log will be helpful too. By the way, using master is not really recommended, many changes from other projects and kolla may break the deployment. Regards On Thu, Jun 14, 2018, 9:00 PM Dave Williams wrote: > I am using kolla-ansible 6.0.0 with openstack_release set to master in > globals.yml in a production environment. > > I am trying to fix an oslo_messaging.rpc.client.RemoteError when > undertaking server add volume > (https://bugs.launchpad.net/nova/+bug/1773393) and appear to have > tracked it down to an inconsistency of container versions. > > kollo-ansible upgrade runs to completion without error but leaves the > running nova_compute containers (and possibly others) at an earlier > version. > > The version running is > CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES > 944f620a445c 7dc2d4695962 "kolla_start" 4 weeks ago Up 4 hours nova_compute > > whereas docker images -a shows > REPOSITORY TAG IMAGE ID CREATED SIZE > kolla/ubuntu-source-nova-compute master f2df8187f14e 15 hours ago 1.29GB > kolla/ubuntu-source-nova-compute 582561ac010f 39 hours ago 1.29GB > kolla/ubuntu-source-nova-compute 7dc2d4695962 3 months ago 1.22GB > > This implies f2df8187f14e is the one I should be using. > The image 582561ac010f was after I tried to switch to queens from master > but without success due to a bootstrap_cinder problem: > Error during database migration: > "Database schema file with version 122 doesn't exist." > I tried to investigate this but without any obvious resolution. > > All compute nodes show the same issue. > > As per the notes on > https://docs.openstack.org/kolla-ansible/latest/user/operating-kolla.html > I have checked and virt_type is set to kvm in nova.conf and so I cannot > see why the upgrade shouldnt have been successful. > > How do I get kolla to use the latest version pulled? > > Given I have running instances I am a little nervous of doing a deploy or > reconfigure. > > Thanks for help. > > Dave > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Jun 15 06:17:08 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 15 Jun 2018 08:17:08 +0200 Subject: [Openstack-operators] openstack ocata gnocchi statsd errors In-Reply-To: References: Message-ID: Many thanks for your explanation. Regards Ignazio 2018-06-14 17:35 GMT+02:00 gordon chung : > > > On 2018-06-14 1:30 AM, Ignazio Cassano wrote: > > Hello Gordon, what do you mean with "coordination service" ? > > it is what handles locking/coordination across multiple workers. you > configure it via a config option in your gnocchi.conf file. > > as you didn't define one it is using your indexer by default and creates > locks on, in your case, mariadb. > > > i would upgrade your tooz as there was a bug which caused it to create a > lot of open connections. you can also increase your mariadb connection > limit. > > cheers, > > > -- > gord > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lyarwood at redhat.com Fri Jun 15 06:28:29 2018 From: lyarwood at redhat.com (Lee Yarwood) Date: Fri, 15 Jun 2018 07:28:29 +0100 Subject: [Openstack-operators] [openstack-dev] Reminder to add "nova-status upgrade check" to deployment tooling In-Reply-To: References: Message-ID: <20180615062829.q6axidhwunw7xofy@lyarwood.usersys.redhat.com> On 13-06-18 10:14:32, Matt Riedemann wrote: > I was going through some recently reported nova bugs and came across [1] > which I opened at the Summit during one of the FFU sessions where I realized > the nova upgrade docs don't mention the nova-status upgrade check CLI [2] > (added in Ocata). > > As a result, I was wondering how many deployment tools out there support > upgrades and from those, which are actually integrating that upgrade status > check command. TripleO doesn't at present but like OSA it looks trivial to add: https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/nova-api.yaml I've created the following bug to track this: https://bugs.launchpad.net/tripleo/+bug/1777060 > I'm not really familiar with most of them, but I've dabbled in OSA enough to > know where the code lived for nova upgrades, so I posted a patch [3]. > > I'm hoping this can serve as a template for other deployment projects to > integrate similar checks into their upgrade (and install verification) > flows. > > [1] https://bugs.launchpad.net/nova/+bug/1772973 > [2] https://docs.openstack.org/nova/latest/cli/nova-status.html > [3] https://review.openstack.org/#/c/575125/ Cheers, -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 From zioproto at gmail.com Fri Jun 15 15:11:21 2018 From: zioproto at gmail.com (Saverio Proto) Date: Fri, 15 Jun 2018 17:11:21 +0200 Subject: [Openstack-operators] Neutron not adding iptables rules for metadata agent In-Reply-To: <5f716e6218597640ec4c5aaa9a80518639088680.camel@emag.ro> References: <5f716e6218597640ec4c5aaa9a80518639088680.camel@emag.ro> Message-ID: Hello Radu, yours look more or less like a bug report. This you check existing open bugs for neutron ? Also what version of openstack are you running ? how did you configure enable_isolated_metadata and enable_metadata_network options ? Saverio 2018-06-13 12:45 GMT+02:00 Radu Popescu | eMAG, Technology : > Hi all, > > So, I'm having the following issue. I'm creating a VM with floating IP. > Everything is fine, namespace is there, postrouting and prerouting from the > internal IP to the floating IP are there. The only rules missing are the > rules to access metadata service: > > -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp > --dport 80 -j REDIRECT --to-ports 9697 > -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp > --dport 80 -j MARK --set-xmark 0x1/0xffff > > (this is taken from another working namespace with iptables-save) > > Forgot to mention, VM is booting ok, I have both the default route and the > one for the metadata service (cloud-init is running at boot time): > [ 57.150766] cloud-init[892]: ci-info: > +--------+------+--------------+---------------+-------+-------------------+ > [ 57.150997] cloud-init[892]: ci-info: | Device | Up | Address | > Mask | Scope | Hw-Address | > [ 57.151219] cloud-init[892]: ci-info: > +--------+------+--------------+---------------+-------+-------------------+ > [ 57.151431] cloud-init[892]: ci-info: | lo: | True | 127.0.0.1 | > 255.0.0.0 | . | . | > [ 57.151627] cloud-init[892]: ci-info: | eth0: | True | 10.240.9.186 | > 255.255.252.0 | . | fa:16:3e:43:d1:c2 | > [ 57.151815] cloud-init[892]: ci-info: > +--------+------+--------------+---------------+-------+-------------------+ > [ 57.152018] cloud-init[892]: ci-info: > +++++++++++++++++++++++++++++++Route IPv4 > info++++++++++++++++++++++++++++++++ > [ 57.152225] cloud-init[892]: ci-info: > +-------+-----------------+------------+-----------------+-----------+-------+ > [ 57.152426] cloud-init[892]: ci-info: | Route | Destination | > Gateway | Genmask | Interface | Flags | > [ 57.152621] cloud-init[892]: ci-info: > +-------+-----------------+------------+-----------------+-----------+-------+ > [ 57.152813] cloud-init[892]: ci-info: | 0 | 0.0.0.0 | > 10.240.8.1 | 0.0.0.0 | eth0 | UG | > [ 57.153013] cloud-init[892]: ci-info: | 1 | 10.240.1.0 | > 0.0.0.0 | 255.255.255.0 | eth0 | U | > [ 57.153202] cloud-init[892]: ci-info: | 2 | 10.240.8.0 | > 0.0.0.0 | 255.255.252.0 | eth0 | U | > [ 57.153397] cloud-init[892]: ci-info: | 3 | 169.254.169.254 | > 10.240.8.1 | 255.255.255.255 | eth0 | UGH | > [ 57.153579] cloud-init[892]: ci-info: > +-------+-----------------+------------+-----------------+-----------+-------+ > > The extra route is there because the tenant has 2 subnets. > > Before adding those 2 rules manually, I had this coming from cloud-init: > > [ 192.451801] cloud-init[892]: 2018-06-13 12:29:26,179 - > url_helper.py[WARNING]: Calling > 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: > request error [('Connection aborted.', error(113, 'No route to host'))] > [ 193.456805] cloud-init[892]: 2018-06-13 12:29:27,184 - > url_helper.py[WARNING]: Calling > 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [1/120s]: > request error [('Connection aborted.', error(113, 'No route to host'))] > [ 194.461592] cloud-init[892]: 2018-06-13 12:29:28,189 - > url_helper.py[WARNING]: Calling > 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [2/120s]: > request error [('Connection aborted.', error(113, 'No route to host'))] > [ 195.466441] cloud-init[892]: 2018-06-13 12:29:29,194 - > url_helper.py[WARNING]: Calling > 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]: > request error [('Connection aborted.', error(113, 'No route to host'))] > > I can see no errors in neither nova or neutron services. > In the mean time, I've searched all our nova servers for this kind of > behavior and we have 1 random namespace missing those rules on 6 of our 66 > novas. > > Any ideas would be greatly appreciated. > > Thanks, > Radu > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > From dave at opensourcesolutions.co.uk Fri Jun 15 20:46:50 2018 From: dave at opensourcesolutions.co.uk (Dave Williams) Date: Fri, 15 Jun 2018 21:46:50 +0100 Subject: [Openstack-operators] [kolla][nova] Upgrade leaves nova_compute at earlier version] Message-ID: <20180615204650.GA7334@opensourcesolutions.co.uk> Sure: Apologies for a longer post than I would like. globals.yml with all commented out (default) values removed: kolla_base_distro: "ubuntu" kolla_install_type: "source" openstack_release: "master" kolla_internal_vip_address: "10.10.66.138" kolla_internal_fqdn: "openstackint.uklab.local" kolla_external_vip_address: "10.10.66.134" kolla_external_fqdn: "openstackext.uklab.local" kolla_enable_tls_external: "yes" kolla_external_fqdn_cert: "{{ node_config_directory }}/certificates/haproxy.pem" enable_central_logging: "yes" enable_ceph: "yes" enable_cinder: "yes" enable_cinder_backend_netapp_iscsi: "no" enable_neutron_provider_networks: "yes" ceph_pool_type: "replicated" ceph_pool_pg_num: 128 ceph_pool_pgp_num: 128 cinder_backup_driver: "ceph" tempest_image_id: tempest_flavor_ref_id: tempest_public_network_id: tempest_floating_network_name: inventory file again only section with changes (rest default): [control] mclaren ansible_connection=local network_interface=enp5s0 neutron_external_interface=enp3s0 api_interface=enp5s0 [network] mclaren ansible_connection=local network_interface=enp5s0 neutron_external_interface=enp3s0 api_interface=enp5s0 [inner-compute] [external-compute] mclaren ansible_connection=local network_interface=enp5s0 neutron_external_interface=enp3s0 api_interface=enp5s0 aston network_interface=enp3s0 neutron_external_interface=enp5s0 corvette2 network_interface=enp3s0 neutron_external_interface=enp5s0 lambo2 network_interface=enp3s0 neutron_external_interface=enp5s0 porsche network_interface=enp3s0 neutron_external_interface=enp5s0 ferrari network_interface=enp3s0 neutron_external_interface=enp5s0 lotus network_interface=enp3s0 neutron_external_interface=enp5s0 maserati network_interface=eth0 neutron_external_interface=eth1 chevi2 network_interface=enp4s0f1 neutron_external_interface=enp4s0f0 audi network_interface=em1 neutron_external_interface=em2 saab network_interface=em1 neutron_external_interface=em2 radical network_interface=eth0 neutron_external_interface=eth1 [compute:children] inner-compute external-compute [storage] mclaren ansible_connection=local network_interface=enp5s0 neutron_external_interface=enp3s0 api_interface=enp5s0 lambo2 network_interface=enp3s0 neutron_external_interface=enp5s0 porsche network_interface=enp3s0 neutron_external_interface=enp5s0 ferrari network_interface=enp3s0 neutron_external_interface=enp5s0 lotus network_interface=enp3s0 neutron_external_interface=enp5s0 maserati network_interface=eth0 neutron_external_interface=eth1 chevi2 network_interface=enp4s0f1 neutron_external_interface=enp4s0f0 audi network_interface=em1 neutron_external_interface=em2 saab network_interface=em1 neutron_external_interface=em2 radical network_interface=eth0 neutron_external_interface=eth1 [monitoring] mclaren ansible_connection=local network_interface=enp5s0 neutron_external_interface=enp3s0 api_interface=enp5s0 radical network_interface=eth0 neutron_external_interface=eth1 aston network_interface=enp3s0 neutron_external_interface=enp5s0 [deployment] mclaren ansible_connection=local network_interface=enp5s0 neutron_external_interface=enp3s0 api_interface=enp5s0 docker ps -a (sorry for lack of line wrap) SERVER aston CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 24e87677d14c 0a1d3587ae5c "kolla_start" 9 days ago Up 9 days neutron_openvswitch_agent 22e7e820eb96 760e48f2527b "kolla_start" 9 days ago Up 9 days openvswitch_vswitchd e6cb63995d02 c07ab7bd1910 "kolla_start" 9 days ago Up 9 days openvswitch_db 71ffd7f32eaa c9af2493c9e1 "kolla_start" 9 days ago Up 9 days nova_libvirt 8870b6ebaf61 e9d2082aa62f "kolla_start" 9 days ago Up 9 days nova_ssh 042510be7389 72e8b021c493 "kolla_start" 9 days ago Up 9 days cron 83e50da14764 ac0c8b1aeef2 "kolla_start" 9 days ago Up 9 days kolla_toolbox 5e8efb880a44 afa0d8636841 "kolla_start" 9 days ago Up 9 days fluentd 76a7b24d339f 7dc2d4695962 "kolla_start" 6 weeks ago Up 9 days nova_compute SERVER chevi2 CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 98c58e80ed71 kolla/ubuntu-source-ceph-osd:queens "kolla_start" 8 days ago Up 8 days ceph_osd_14 1d94ef829a90 kolla/ubuntu-source-cron:queens "kolla_start" 8 days ago Up 8 days cron af6023a2e642 kolla/ubuntu-source-kolla-toolbox:queens "kolla_start" 8 days ago Up 8 days kolla_toolbox 1d5c05737740 kolla/ubuntu-source-fluentd:queens "kolla_start" 8 days ago Up 8 days fluentd e9bfa550a752 af91cf6fca37 "kolla_start" 9 days ago Up 9 days neutron_openvswitch_agent 56e598f5c245 75bfdbba3121 "kolla_start" 9 days ago Up 9 days openvswitch_vswitchd 2dfb4a5c35c4 4a45aebbe6db "kolla_start" 9 days ago Up 9 days openvswitch_db 2ae36973b03d b78203d88001 "kolla_start" 9 days ago Up 9 days nova_libvirt 3e563c57b923 5e7b9c7218e4 "kolla_start" 9 days ago Up 9 days nova_ssh 55212992abe8 89c2ec59b2b1 "kolla_start" 9 days ago Up 9 days cinder_backup b40a438fc967 52ac8cf90164 "kolla_start" 9 days ago Up 9 days cinder_volume 8a3d52181f3e e2ef2ac054f0 "kolla_start" 3 weeks ago Exited (0) 3 weeks ago bootstrap_osd_0 ea8173dce027 1327db800fb5 "kolla_start" 6 weeks ago Up 9 days nova_compute SERVER corvette2 CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 354861c2f74d 0a1d3587ae5c "kolla_start" 9 days ago Up 9 days neutron_openvswitch_agent 405a83c73095 760e48f2527b "kolla_start" 9 days ago Up 9 days openvswitch_vswitchd afa5dc83b65c c07ab7bd1910 "kolla_start" 9 days ago Up 9 days openvswitch_db 727bd00a3b4b c9af2493c9e1 "kolla_start" 9 days ago Up 9 days nova_libvirt 54958199183d e9d2082aa62f "kolla_start" 9 days ago Up 9 days nova_ssh 64dde7c9e54d 72e8b021c493 "kolla_start" 9 days ago Up 9 days cron 32b82bc71b64 ac0c8b1aeef2 "kolla_start" 9 days ago Up 9 days kolla_toolbox 08bd92b84784 afa0d8636841 "kolla_start" 9 days ago Up 9 days fluentd 87b358a2a203 7dc2d4695962 "kolla_start" 6 weeks ago Up 9 days nova_compute SERVER ferrari CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 646d8fb9a744 kolla/ubuntu-source-ceph-osd:queens "kolla_start" 8 days ago Up 8 days ceph_osd_21 24f5837553e6 kolla/ubuntu-source-ceph-osd:queens "kolla_start" 8 days ago Up 8 days ceph_osd_20 20fe3af4d8ea kolla/ubuntu-source-cron:queens "kolla_start" 8 days ago Up 8 days cron a31a42e1fff1 kolla/ubuntu-source-kolla-toolbox:queens "kolla_start" 8 days ago Up 8 days kolla_toolbox e42348f61baf kolla/ubuntu-source-fluentd:queens "kolla_start" 8 days ago Up 8 days fluentd 6bea5724925a 0a1d3587ae5c "kolla_start" 9 days ago Up 9 days neutron_openvswitch_agent f7a0fbf9fa9c 760e48f2527b "kolla_start" 9 days ago Up 9 days openvswitch_vswitchd f0ecec45e6c1 c07ab7bd1910 "kolla_start" 9 days ago Up 9 days openvswitch_db 798d1afb8152 c9af2493c9e1 "kolla_start" 9 days ago Up 9 days nova_libvirt cd10ad9ab165 e9d2082aa62f "kolla_start" 9 days ago Up 9 days nova_ssh 229d12b12b37 89c2ec59b2b1 "kolla_start" 9 days ago Up 9 days cinder_backup 587ad9422bae 52ac8cf90164 "kolla_start" 9 days ago Up 9 days cinder_volume e70eafcb4e0d e2ef2ac054f0 "kolla_start" 3 weeks ago Exited (0) 3 weeks ago bootstrap_osd_1 513eea410e7e e2ef2ac054f0 "kolla_start" 3 weeks ago Exited (0) 3 weeks ago bootstrap_osd_0 1e7276bc8e8e 7dc2d4695962 "kolla_start" 6 weeks ago Up 9 days nova_compute SERVER lambo2 CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b4e61110c83e kolla/ubuntu-source-ceph-osd:queens "kolla_start" 8 days ago Up 8 days ceph_osd_23 bff1cac171ce kolla/ubuntu-source-ceph-osd:queens "kolla_start" 8 days ago Up 8 days ceph_osd_24 b1d7048a09a3 kolla/ubuntu-source-cron:queens "kolla_start" 8 days ago Up 8 days cron ce0b01b2bb8f kolla/ubuntu-source-kolla-toolbox:queens "kolla_start" 8 days ago Up 8 days kolla_toolbox 0e9852fb3df7 kolla/ubuntu-source-fluentd:queens "kolla_start" 8 days ago Up 8 days fluentd 92edd06b97f2 0a1d3587ae5c "kolla_start" 9 days ago Up 9 days neutron_openvswitch_agent a859a5c03ec2 760e48f2527b "kolla_start" 9 days ago Up 9 days openvswitch_vswitchd c57a565acfb8 c07ab7bd1910 "kolla_start" 9 days ago Up 9 days openvswitch_db 218c6f0475b3 c9af2493c9e1 "kolla_start" 9 days ago Up 9 days nova_libvirt 92a046c72fc5 e9d2082aa62f "kolla_start" 9 days ago Up 9 days nova_ssh cf19a92fa746 89c2ec59b2b1 "kolla_start" 9 days ago Up 9 days cinder_backup 84e4a01476a0 52ac8cf90164 "kolla_start" 9 days ago Up 9 days cinder_volume e56862ea2485 e2ef2ac054f0 "kolla_start" 3 weeks ago Exited (0) 3 weeks ago bootstrap_osd_1 440a7b5dff76 e2ef2ac054f0 "kolla_start" 3 weeks ago Exited (0) 3 weeks ago bootstrap_osd_0 42b7e96c20d4 7dc2d4695962 "kolla_start" 6 weeks ago Up 9 days nova_compute SERVER lotus CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 9b20fe802f5c kolla/ubuntu-source-ceph-osd:queens "kolla_start" 8 days ago Up 8 days ceph_osd_25 f3c5cfa65f05 kolla/ubuntu-source-ceph-osd:queens "kolla_start" 8 days ago Up 8 days ceph_osd_26 e091b7075b9e kolla/ubuntu-source-cron:queens "kolla_start" 8 days ago Up 8 days cron 24c18619c360 kolla/ubuntu-source-kolla-toolbox:queens "kolla_start" 8 days ago Up 8 days kolla_toolbox be83c0455758 kolla/ubuntu-source-fluentd:queens "kolla_start" 8 days ago Up 8 days fluentd 78619cba1bbf 0a1d3587ae5c "kolla_start" 9 days ago Up 9 days neutron_openvswitch_agent c363d620525a 760e48f2527b "kolla_start" 9 days ago Up 9 days openvswitch_vswitchd e10eebf667d2 c07ab7bd1910 "kolla_start" 9 days ago Up 9 days openvswitch_db 9afe512e474a c9af2493c9e1 "kolla_start" 9 days ago Up 9 days nova_libvirt 7742334d9854 e9d2082aa62f "kolla_start" 9 days ago Up 9 days nova_ssh bc49ce4ec413 89c2ec59b2b1 "kolla_start" 9 days ago Up 9 days cinder_backup 82ce2ee5f951 52ac8cf90164 "kolla_start" 9 days ago Up 9 days cinder_volume 48a88eef1323 e2ef2ac054f0 "kolla_start" 3 weeks ago Exited (0) 3 weeks ago bootstrap_osd_1 49c7c759f007 e2ef2ac054f0 "kolla_start" 3 weeks ago Exited (0) 3 weeks ago bootstrap_osd_0 944f620a445c 7dc2d4695962 "kolla_start" 6 weeks ago Up 8 days nova_compute SERVER maserati CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 22421a9ab59d kolla/ubuntu-source-ceph-osd:queens "kolla_start" 8 days ago Up 8 days ceph_osd_15 be37172c2d29 kolla/ubuntu-source-cron:queens "kolla_start" 8 days ago Up 8 days cron 6d36d1aa301c kolla/ubuntu-source-kolla-toolbox:queens "kolla_start" 8 days ago Up 8 days kolla_toolbox e4878b5ccc62 kolla/ubuntu-source-fluentd:queens "kolla_start" 8 days ago Up 8 days fluentd 919a1630fbc4 af91cf6fca37 "kolla_start" 9 days ago Up 9 days neutron_openvswitch_agent 9f120c14f742 75bfdbba3121 "kolla_start" 9 days ago Up 9 days openvswitch_vswitchd 9aeb74d824a0 4a45aebbe6db "kolla_start" 9 days ago Up 9 days openvswitch_db a49d120be4e7 b78203d88001 "kolla_start" 9 days ago Up 9 days nova_libvirt 71342a9de120 5e7b9c7218e4 "kolla_start" 9 days ago Up 9 days nova_ssh c92b64ead946 89c2ec59b2b1 "kolla_start" 9 days ago Up 9 days cinder_backup e2c4c99cfa94 52ac8cf90164 "kolla_start" 9 days ago Up 9 days cinder_volume bbc1a10e0720 e2ef2ac054f0 "kolla_start" 3 weeks ago Exited (0) 3 weeks ago bootstrap_osd_0 40fbab40f6af 1327db800fb5 "kolla_start" 6 weeks ago Up 9 days nova_compute SERVER mclaren CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2030bbc9cff2 kolla/ubuntu-source-cinder-api:queens "kolla_start" 8 days ago Exited (1) 8 days ago bootstrap_cinder 6e276a11497b kolla/ubuntu-source-cinder-api:queens "bash" 8 days ago Exited (0) 8 days ago stupefied_agnesi 2ac0ee19add0 kolla/ubuntu-source-cinder-api:queens "bash" 8 days ago Exited (0) 8 days ago backstabbing_ride 9fcb9681a490 kolla/ubuntu-source-glance-registry:queens "kolla_start" 8 days ago Up 8 days glance_registry 1cd803c9702b kolla/ubuntu-source-glance-api:queens "kolla_start" 8 days ago Up 8 days glance_api 4e5674eecd6b kolla/ubuntu-source-ceph-mgr:queens "kolla_start" 8 days ago Up 8 days ceph_mgr 4b20de375731 kolla/ubuntu-source-ceph-mon:queens "kolla_start" 8 days ago Up 8 days ceph_mon a5f87a81f742 kolla/ubuntu-source-keystone:queens "kolla_start" 8 days ago Up 8 days keystone 973400381ca9 kolla/ubuntu-source-rabbitmq:queens "kolla_start" 8 days ago Up 8 days rabbitmq 8b67f06d16fd kolla/ubuntu-source-mariadb:queens "kolla_start" 8 days ago Up 8 days mariadb 6b75d38f8c75 kolla/ubuntu-source-memcached:queens "kolla_start" 8 days ago Up 8 days memcached dd7988346ba3 kolla/ubuntu-source-kibana:queens "kolla_start" 8 days ago Up 8 days kibana 1de09ab2d131 kolla/ubuntu-source-keepalived:queens "kolla_start" 8 days ago Up 8 days keepalived d114b971eaa7 kolla/ubuntu-source-haproxy:queens "kolla_start" 8 days ago Up 8 days haproxy 1ddd175d825b kolla/ubuntu-source-elasticsearch:queens "kolla_start" 8 days ago Up 8 days elasticsearch 593760dfdffd kolla/ubuntu-source-cron:queens "kolla_start" 8 days ago Up 8 days cron 250cf2571aad kolla/ubuntu-source-kolla-toolbox:queens "kolla_start" 8 days ago Up 8 days kolla_toolbox 5d6c88226710 kolla/ubuntu-source-fluentd:queens "kolla_start" 8 days ago Up 8 days fluentd 308c8719b804 89c2ec59b2b1 "kolla_start" 9 days ago Up 9 days cinder_backup f111db9b57fd 52ac8cf90164 "kolla_start" 9 days ago Up 9 days cinder_volume afb246359b0e bef10157d50e "kolla_start" 9 days ago Up 9 days cinder_scheduler 37390f582dd7 598a51396443 "kolla_start" 9 days ago Up 9 days cinder_api 564c44963116 6b55bb6a68cc "kolla_start" 5 weeks ago Up 9 days horizon 819b9f1a2a15 6e277850ea95 "kolla_start" 5 weeks ago Up 9 days heat_engine 386e62406ada 21ad7a0470a2 "kolla_start" 5 weeks ago Up 9 days heat_api_cfn e66aa44f47b8 ca6a916cfd05 "kolla_start" 5 weeks ago Up 9 days heat_api 9ab6392ef1b2 8d58993a23ab "kolla_start" 5 weeks ago Up 3 weeks neutron_metadata_agent 569f1354a87c 14e8c85e8ef8 "kolla_start" 5 weeks ago Up 3 weeks neutron_l3_agent e9099c5310c0 cd0739afdb29 "kolla_start" 5 weeks ago Up 3 weeks neutron_dhcp_agent 11eb7cacd3b3 aa895811b26e "kolla_start" 5 weeks ago Up 3 weeks neutron_openvswitch_agent 785cf3e744ea 7ab18cfbe4f5 "kolla_start" 5 weeks ago Up 3 weeks neutron_server 427567cc1180 760e48f2527b "kolla_start" 6 weeks ago Up 3 weeks openvswitch_vswitchd 8dfcd4fd96d4 c07ab7bd1910 "kolla_start" 6 weeks ago Up 3 weeks openvswitch_db ab270aac1f9c b8af432eacfb "kolla_start" 6 weeks ago Up 9 days nova_compute b5a9c0a1b4fb 42b0feb8ac68 "kolla_start" 6 weeks ago Up 3 weeks nova_novncproxy ce83a1689d7a ff68c44578c6 "kolla_start" 6 weeks ago Up 3 weeks nova_consoleauth 5f86ab44406a b73d02cd3549 "kolla_start" 6 weeks ago Up 3 weeks nova_conductor e032e7470744 a753a392e019 "kolla_start" 6 weeks ago Up 3 weeks nova_scheduler 32adaa773e43 c6a04101cdc2 "kolla_start" 6 weeks ago Up 3 weeks nova_api 828b050d9ef9 f477e69285b6 "kolla_start" 6 weeks ago Up 3 weeks placement_api 18cf42dc8e55 c9af2493c9e1 "kolla_start" 6 weeks ago Up 3 weeks nova_libvirt 45c35dc7faf0 b83bbe0247a8 "kolla_start" 6 weeks ago Up 3 weeks nova_ssh SERVER porsche CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2d9507963c87 kolla/ubuntu-source-ceph-osd:queens "kolla_start" 8 days ago Up 8 days ceph_osd_17 9769b466096d kolla/ubuntu-source-ceph-osd:queens "kolla_start" 8 days ago Up 8 days ceph_osd_16 6791b2bb9c02 kolla/ubuntu-source-cron:queens "kolla_start" 8 days ago Up 8 days cron ffc300935f90 kolla/ubuntu-source-kolla-toolbox:queens "kolla_start" 8 days ago Up 8 days kolla_toolbox 855ab15b0cbd kolla/ubuntu-source-fluentd:queens "kolla_start" 8 days ago Up 8 days fluentd a263b6fa8ac3 085ed25f53ec "kolla_start" 9 days ago Up 9 days neutron_openvswitch_agent b756d88de4b7 7aff079c0f1e "kolla_start" 9 days ago Up 9 days openvswitch_vswitchd 1dfdedd9fbb7 bfe9738a875a "kolla_start" 9 days ago Up 9 days openvswitch_db 362b4f7dc5f6 eea114977ec7 "kolla_start" 9 days ago Up 9 days nova_libvirt 18136a1b4414 098795b22f46 "kolla_start" 9 days ago Up 9 days nova_ssh 73f05836bf59 89c2ec59b2b1 "kolla_start" 9 days ago Up 9 days cinder_backup 250df208df5c 52ac8cf90164 "kolla_start" 9 days ago Up 9 days cinder_volume 8ad19d1e6da6 418b2bea1e64 "kolla_start" 3 weeks ago Exited (0) 3 weeks ago bootstrap_osd_1 990fc14fa554 418b2bea1e64 "kolla_start" 3 weeks ago Exited (0) 3 weeks ago bootstrap_osd_0 009e0b9e317a 493e108e9514 "kolla_start" 4 weeks ago Up 9 days nova_compute SERVER saab CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b2f5d2b8e417 kolla/ubuntu-source-ceph-osd:queens "kolla_start" 8 days ago Up 8 days ceph_osd_13 64535306319d kolla/ubuntu-source-cron:queens "kolla_start" 8 days ago Up 8 days cron 1c7594f693e0 kolla/ubuntu-source-kolla-toolbox:queens "kolla_start" 8 days ago Up 8 days kolla_toolbox 96b377de4ffa kolla/ubuntu-source-fluentd:queens "kolla_start" 8 days ago Up 8 days fluentd 4ad987e372f5 3243d4926a66 "kolla_start" 9 days ago Up 9 days neutron_openvswitch_agent f12b6a1beaa0 101695c48d48 "kolla_start" 9 days ago Up 9 days openvswitch_vswitchd 13defa55503d 9de6f3c41d83 "kolla_start" 9 days ago Up 9 days openvswitch_db dd84be822bd0 78ace7fc263d "kolla_start" 9 days ago Up 9 days nova_libvirt 6bc5c8b5b7e8 bc68d85b4669 "kolla_start" 9 days ago Up 9 days nova_ssh 43d75a221403 89c2ec59b2b1 "kolla_start" 9 days ago Up 9 days cinder_backup b009c3186a74 52ac8cf90164 "kolla_start" 9 days ago Up 9 days cinder_volume 16277267d571 ff11952a6a94 "kolla_start" 3 weeks ago Exited (0) 3 weeks ago bootstrap_osd_0 7269edbe47eb abbf3258d4ee "kolla_start" 6 weeks ago Up 9 days nova_compute SERVER audi CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 930f4733b77f kolla/ubuntu-source-ceph-osd:queens "kolla_start" 8 days ago Up 8 days ceph_osd_22 4c9c4cd98dda kolla/ubuntu-source-cron:queens "kolla_start" 8 days ago Up 8 days cron 86173044a85b kolla/ubuntu-source-kolla-toolbox:queens "kolla_start" 8 days ago Up 8 days kolla_toolbox 46cbf7bfb16e kolla/ubuntu-source-fluentd:queens "kolla_start" 8 days ago Up 8 days fluentd 62691e7cf399 3243d4926a66 "kolla_start" 9 days ago Up 9 days neutron_openvswitch_agent c9bdcca5cd7d 101695c48d48 "kolla_start" 9 days ago Up 9 days openvswitch_vswitchd c53973ba8a4f 9de6f3c41d83 "kolla_start" 9 days ago Up 9 days openvswitch_db 0831964e140c 78ace7fc263d "kolla_start" 9 days ago Up 9 days nova_libvirt a98850a1ab82 bc68d85b4669 "kolla_start" 9 days ago Up 9 days nova_ssh e1f5b4c96846 89c2ec59b2b1 "kolla_start" 9 days ago Up 9 days cinder_backup 2e888d0d3a9e 52ac8cf90164 "kolla_start" 9 days ago Up 9 days cinder_volume 452d6465a5d6 ff11952a6a94 "kolla_start" 3 weeks ago Exited (0) 3 weeks ago bootstrap_osd_0 e4f15b62c6a7 abbf3258d4ee "kolla_start" 6 weeks ago Up 9 days nova_compute SERVER radical CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 608a82d6ad45 kolla/ubuntu-source-cron:queens "kolla_start" 8 days ago Up 8 days cron a775d01eabe8 kolla/ubuntu-source-kolla-toolbox:queens "kolla_start" 8 days ago Up 8 days kolla_toolbox 475b8900989b kolla/ubuntu-source-fluentd:queens "kolla_start" 8 days ago Up 8 days fluentd a48c59fd969e d3cc178ce8c0 "kolla_start" 9 days ago Up 9 days neutron_openvswitch_agent 7369e5368602 800896c34a2e "kolla_start" 9 days ago Up 9 days openvswitch_vswitchd 0ca90d6a09e8 551909d28c50 "kolla_start" 9 days ago Up 9 days openvswitch_db cc5228efb01b 6307a42d968d "kolla_start" 9 days ago Up 9 days nova_libvirt bc3a36aa480f 0e00da6547d0 "kolla_start" 9 days ago Up 9 days nova_ssh dd21cf3c1b15 89c2ec59b2b1 "kolla_start" 9 days ago Up 9 days cinder_backup 82b56aa60102 52ac8cf90164 "kolla_start" 9 days ago Up 9 days cinder_volume 0feafa193c05 64c10510ae70 "kolla_start" 6 weeks ago Up 9 days nova_compute The various ceph startups were due a serious network failure I suffered using iscsi based OSD's. None of the kolla instructions to recreate the storage worked. I had to resort to copious dd'ing to zero all the disk partitions before I could finally get it working afresh. Painful!! The bootstrap_cinder with exit 1 was my attempt to downgrade to move to queens rather than master but that failed due to the database schema issue described below. The last kolla-ansible upgrade (back to master) was successful with ansible.log from the fluentd container showing all lines like: 2018-06-07 09:18:46,265 p=767 u=ansible | localhost | SUCCESS => { "changed": false, "user": "haproxy" } 2018-06-07 09:19:14,436 p=796 u=ansible | localhost | SUCCESS => { "changed": true, "msg": "Variable change succeeded prev_value=OFF" } 2018-06-07 09:19:29,562 p=826 u=ansible | localhost | SUCCESS => { "changed": true, "msg": "Variable change succeeded prev_value=ON" } 2018-06-07 09:23:11,834 p=860 u=root | localhost | SUCCESS => { "changed": false, "disks": "[]" } 2018-06-07 09:23:59,247 p=892 u=ansible | localhost | SUCCESS => { "changed": true, "msg": "Variable change succeeded prev_value=OFF" } 2018-06-07 09:24:08,053 p=921 u=ansible | localhost | SUCCESS => { "changed": true, "msg": "Variable change succeeded prev_value=ON" } Not sure what else I can provide beyond the bootstrap_cinder logs which show more fully: ++ cat /run_command + CMD='apache2 -DFOREGROUND' + ARGS= + [[ ! -n '' ]] + . kolla_extend_start ++ [[ ! -d /var/log/kolla/cinder ]] +++ stat -c %a /var/log/kolla/cinder ++ [[ 755 != \7\5\5 ]] ++ . /usr/local/bin/kolla_cinder_extend_start +++ set -o errexit +++ [[ -n 0 ]] +++ cinder-manage db sync Error during database migration: "Database schema file with version 122 doesn't exist." I am happy with running a stable version rather than master. Using master was my trying to fix problems with ceph. I appear to be stuck not being able to upgrade or downgrade at present. Whilst my cloud is mainly working its just some secondary features like adding volumes that now no longer work. I am not in the position to destroy the system and restart without significant disruption. Thank you for your attention. Dave On 21:09, Thu 14 Jun 18, Eduardo Gonzalez wrote: > Hi, > > could you share a your globals file without secrets, a docker ps -a on all > compute hosts and images too. If you are able to get n upgrade log will be > helpful too. > > By the way, using master is not really recommended, many changes from other > projects and kolla may break the deployment. > > Regards > > On Thu, Jun 14, 2018, 9:00 PM Dave Williams > wrote: > > > I am using kolla-ansible 6.0.0 with openstack_release set to master in > > globals.yml in a production environment. > > > > I am trying to fix an oslo_messaging.rpc.client.RemoteError when > > undertaking server add volume > > (https://bugs.launchpad.net/nova/+bug/1773393) and appear to have > > tracked it down to an inconsistency of container versions. > > > > kollo-ansible upgrade runs to completion without error but leaves the > > running nova_compute containers (and possibly others) at an earlier > > version. > > > > The version running is > > CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES > > 944f620a445c 7dc2d4695962 "kolla_start" 4 weeks ago Up 4 hours nova_compute > > > > whereas docker images -a shows > > REPOSITORY TAG IMAGE ID CREATED SIZE > > kolla/ubuntu-source-nova-compute master f2df8187f14e 15 hours ago 1.29GB > > kolla/ubuntu-source-nova-compute 582561ac010f 39 hours ago 1.29GB > > kolla/ubuntu-source-nova-compute 7dc2d4695962 3 months ago 1.22GB > > > > This implies f2df8187f14e is the one I should be using. > > The image 582561ac010f was after I tried to switch to queens from master > > but without success due to a bootstrap_cinder problem: > > Error during database migration: > > "Database schema file with version 122 doesn't exist." > > I tried to investigate this but without any obvious resolution. > > > > All compute nodes show the same issue. > > > > As per the notes on > > https://docs.openstack.org/kolla-ansible/latest/user/operating-kolla.html > > I have checked and virt_type is set to kvm in nova.conf and so I cannot > > see why the upgrade shouldnt have been successful. > > > > How do I get kolla to use the latest version pulled? > > > > Given I have running instances I am a little nervous of doing a deploy or > > reconfigure. > > > > Thanks for help. > > > > Dave > > > > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > From radu.popescu at emag.ro Mon Jun 18 10:19:04 2018 From: radu.popescu at emag.ro (Radu Popescu | eMAG, Technology) Date: Mon, 18 Jun 2018 10:19:04 +0000 Subject: [Openstack-operators] Neutron not adding iptables rules for metadata agent In-Reply-To: References: <5f716e6218597640ec4c5aaa9a80518639088680.camel@emag.ro> Message-ID: Hi, We're using Openstack Ocata, deployed using Openstack Ansible v15.1.7. Neutron server is v10.0.3. I can see enable_isolated_metadata and enable_metadata_network only used for isolated networks that don't have a router which is not our case. Also, I checked all namespaces on all our novas and only affected 6 out of 66 ..and only 1 namespace / nova. Seems like isolated case that doesn't happen very often. Can it be RabbitMQ? I'm not sure where to check. Thanks, Radu On Fri, 2018-06-15 at 17:11 +0200, Saverio Proto wrote: Hello Radu, yours look more or less like a bug report. This you check existing open bugs for neutron ? Also what version of openstack are you running ? how did you configure enable_isolated_metadata and enable_metadata_network options ? Saverio 2018-06-13 12:45 GMT+02:00 Radu Popescu | eMAG, Technology >: Hi all, So, I'm having the following issue. I'm creating a VM with floating IP. Everything is fine, namespace is there, postrouting and prerouting from the internal IP to the floating IP are there. The only rules missing are the rules to access metadata service: -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697 -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0xffff (this is taken from another working namespace with iptables-save) Forgot to mention, VM is booting ok, I have both the default route and the one for the metadata service (cloud-init is running at boot time): [ 57.150766] cloud-init[892]: ci-info: +--------+------+--------------+---------------+-------+-------------------+ [ 57.150997] cloud-init[892]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | [ 57.151219] cloud-init[892]: ci-info: +--------+------+--------------+---------------+-------+-------------------+ [ 57.151431] cloud-init[892]: ci-info: | lo: | True | 127.0.0.1 | 255.0.0.0 | . | . | [ 57.151627] cloud-init[892]: ci-info: | eth0: | True | 10.240.9.186 | 255.255.252.0 | . | fa:16:3e:43:d1:c2 | [ 57.151815] cloud-init[892]: ci-info: +--------+------+--------------+---------------+-------+-------------------+ [ 57.152018] cloud-init[892]: ci-info: +++++++++++++++++++++++++++++++Route IPv4 info++++++++++++++++++++++++++++++++ [ 57.152225] cloud-init[892]: ci-info: +-------+-----------------+------------+-----------------+-----------+-------+ [ 57.152426] cloud-init[892]: ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags | [ 57.152621] cloud-init[892]: ci-info: +-------+-----------------+------------+-----------------+-----------+-------+ [ 57.152813] cloud-init[892]: ci-info: | 0 | 0.0.0.0 | 10.240.8.1 | 0.0.0.0 | eth0 | UG | [ 57.153013] cloud-init[892]: ci-info: | 1 | 10.240.1.0 | 0.0.0.0 | 255.255.255.0 | eth0 | U | [ 57.153202] cloud-init[892]: ci-info: | 2 | 10.240.8.0 | 0.0.0.0 | 255.255.252.0 | eth0 | U | [ 57.153397] cloud-init[892]: ci-info: | 3 | 169.254.169.254 | 10.240.8.1 | 255.255.255.255 | eth0 | UGH | [ 57.153579] cloud-init[892]: ci-info: +-------+-----------------+------------+-----------------+-----------+-------+ The extra route is there because the tenant has 2 subnets. Before adding those 2 rules manually, I had this coming from cloud-init: [ 192.451801] cloud-init[892]: 2018-06-13 12:29:26,179 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: request error [('Connection aborted.', error(113, 'No route to host'))] [ 193.456805] cloud-init[892]: 2018-06-13 12:29:27,184 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [1/120s]: request error [('Connection aborted.', error(113, 'No route to host'))] [ 194.461592] cloud-init[892]: 2018-06-13 12:29:28,189 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [2/120s]: request error [('Connection aborted.', error(113, 'No route to host'))] [ 195.466441] cloud-init[892]: 2018-06-13 12:29:29,194 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]: request error [('Connection aborted.', error(113, 'No route to host'))] I can see no errors in neither nova or neutron services. In the mean time, I've searched all our nova servers for this kind of behavior and we have 1 random namespace missing those rules on 6 of our 66 novas. Any ideas would be greatly appreciated. Thanks, Radu _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrhillsman at gmail.com Mon Jun 18 14:30:02 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Mon, 18 Jun 2018 09:30:02 -0500 Subject: [Openstack-operators] Reminder: UC Meeting Today 1800UTC Message-ID: Hey everyone, Please see https://wiki.openstack.org/wiki/Governance/Foundation/ UserCommittee for UC meeting info and add additional agenda items if needed. -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From strigazi at gmail.com Mon Jun 18 15:19:46 2018 From: strigazi at gmail.com (Spyros Trigazis) Date: Mon, 18 Jun 2018 17:19:46 +0200 Subject: [Openstack-operators] [openstack-operators][heat][oslo.db] Configure maximum number of db connections Message-ID: Hello list, I'm hitting quite easily this [1] exception with heat. The db server is configured to have 1000 max_connnections and 1000 max_user_connections and in the database section of heat conf I have these values set: max_pool_size = 22 max_overflow = 0 Full config attached. I ended up with this configuration based on this formula: num_heat_hosts=4 heat_api_workers=2 heat_api_cfn_workers=2 num_engine_workers=4 max_pool_size=22 max_overflow=0 num_heat_hosts * (max_pool_size + max_overflow) * (heat_api_workers + num_engine_workers + heat_api_cfn_workers) 704 What I have noticed is that the number of connections I expected with the above formula is not respected. Based on this formula each node (every node runs the heat-api, heat-api-cfn and heat-engine) should use up to 176 connections but they even reach 400 connections. Has anyone noticed a similar behavior? Cheers, Spyros heat-version: ocata [1] "User heat already has more than 'max_user_connections' active connections" -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: heat-conf Type: application/octet-stream Size: 1979 bytes Desc: not available URL: From james.page at canonical.com Mon Jun 18 15:45:31 2018 From: james.page at canonical.com (James Page) Date: Mon, 18 Jun 2018 16:45:31 +0100 Subject: [Openstack-operators] [sig][upgrade] Upgrade SIG IRC meeting 1600 UTC Tuesday Message-ID: Hi All Just a quick reminder that the Upgrade SIG IRC meeting will be held at 1600 UTC tomorrow (Tuesday) in #openstack-meeting-4. If you're interested in helping improve the OpenStack upgrade experience be sure to attend! See [0] for previous meeting minutes and our standing agenda. Regards James [0] https://etherpad.openstack.org/p/upgrades-sig-meeting -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaypipes at gmail.com Mon Jun 18 17:39:33 2018 From: jaypipes at gmail.com (Jay Pipes) Date: Mon, 18 Jun 2018 13:39:33 -0400 Subject: [Openstack-operators] [openstack-operators][heat][oslo.db] Configure maximum number of db connections In-Reply-To: References: Message-ID: +openstack-dev since I believe this is an issue with the Heat source code. On 06/18/2018 11:19 AM, Spyros Trigazis wrote: > Hello list, > > I'm hitting quite easily this [1] exception with heat. The db server is > configured to have 1000 > max_connnections and 1000 max_user_connections and in the database > section of heat > conf I have these values set: > max_pool_size = 22 > max_overflow = 0 > Full config attached. > > I ended up with this configuration based on this formula: > num_heat_hosts=4 > heat_api_workers=2 > heat_api_cfn_workers=2 > num_engine_workers=4 > max_pool_size=22 > max_overflow=0 > num_heat_hosts * (max_pool_size + max_overflow) * (heat_api_workers + > num_engine_workers + heat_api_cfn_workers) > 704 > > What I have noticed is that the number of connections I expected with > the above formula is not respected. > Based on this formula each node (every node runs the heat-api, > heat-api-cfn and heat-engine) should > use up to 176 connections but they even reach 400 connections. > > Has anyone noticed a similar behavior? Looking through the Heat code, I see that there are many methods in the /heat/db/sqlalchemy/api.py module that use a SQLAlchemy session but never actually call session.close() [1] which means that the session will not be released back to the connection pool, which might be the reason why connections keep piling up. Not sure if there's any setting in Heat that will fix this problem. Disabling connection pooling will likely not help since connections are not properly being closed and returned to the connection pool to begin with. Best, -jay [1] Heat apparently doesn't use the oslo.db enginefacade transaction context managers either, which would help with this problem since the transaction context manager would take responsibility for calling session.flush()/close() appropriately. https://github.com/openstack/oslo.db/blob/43af1cf08372006aa46d836ec45482dd4b5b5349/oslo_db/sqlalchemy/enginefacade.py#L626 From strigazi at gmail.com Tue Jun 19 09:17:58 2018 From: strigazi at gmail.com (Spyros Trigazis) Date: Tue, 19 Jun 2018 11:17:58 +0200 Subject: [Openstack-operators] [openstack-dev] [openstack-operators][heat][oslo.db][magnum] Configure maximum number of db connections In-Reply-To: References: Message-ID: Hello lists, With heat's team help I figured it out. Thanks Jay for looking into it. The issue is coming from [1], where the max_overflow is set to executor_thread_pool_size if it is set to a lower value to address another issue. In my case, I had a lot of RAM and CPU so I could push for threads but I was "short" in db connections. The formula to calculate the number of connections can be like this: num_heat_hosts=4 heat_api_workers=2 heat_api_cfn_workers=2 num_engine_workers=4 executor_thread_pool_size = 22 max_pool_size=4 max_overflow=executor_thread_pool_size num_heat_hosts * (max_pool_size + max_overflow) * (heat_api_workers + num_engine_workers + heat_api_cfn_workers) 832 And a note for magnum deployments medium to large, see the options we have changed in heat conf and change according to your needs. The db configuration described here and changes we discovered in a previous scale test can help to have a stable magnum and heat service. For large stacks or projects with many stacks you need to change the following in these values or better, according to your needs. [Default] executor_thread_pool_size = 22 max_resources_per_stack = -1 max_stacks_per_tenant = 10000 action_retry_limit = 10 client_retry_limit = 10 engine_life_check_timeout = 600 max_template_size = 5242880 rpc_poll_timeout = 600 rpc_response_timeout = 600 num_engine_workers = 4 [database] max_pool_size = 4 max_overflow = 22 Cheers, Spyros [heat_api] workers = 2 [heat_api_cfn] workers = 2 Cheers, Spyros ps We will update the magnum docs as well [1] http://git.openstack.org/cgit/openstack/heat/tree/heat/engine/service.py#n375 On Mon, 18 Jun 2018 at 19:39, Jay Pipes wrote: > +openstack-dev since I believe this is an issue with the Heat source code. > > On 06/18/2018 11:19 AM, Spyros Trigazis wrote: > > Hello list, > > > > I'm hitting quite easily this [1] exception with heat. The db server is > > configured to have 1000 > > max_connnections and 1000 max_user_connections and in the database > > section of heat > > conf I have these values set: > > max_pool_size = 22 > > max_overflow = 0 > > Full config attached. > > > > I ended up with this configuration based on this formula: > > num_heat_hosts=4 > > heat_api_workers=2 > > heat_api_cfn_workers=2 > > num_engine_workers=4 > > max_pool_size=22 > > max_overflow=0 > > num_heat_hosts * (max_pool_size + max_overflow) * (heat_api_workers + > > num_engine_workers + heat_api_cfn_workers) > > 704 > > > > What I have noticed is that the number of connections I expected with > > the above formula is not respected. > > Based on this formula each node (every node runs the heat-api, > > heat-api-cfn and heat-engine) should > > use up to 176 connections but they even reach 400 connections. > > > > Has anyone noticed a similar behavior? > > Looking through the Heat code, I see that there are many methods in the > /heat/db/sqlalchemy/api.py module that use a SQLAlchemy session but > never actually call session.close() [1] which means that the session > will not be released back to the connection pool, which might be the > reason why connections keep piling up. > > Not sure if there's any setting in Heat that will fix this problem. > Disabling connection pooling will likely not help since connections are > not properly being closed and returned to the connection pool to begin > with. > > Best, > -jay > > [1] Heat apparently doesn't use the oslo.db enginefacade transaction > context managers either, which would help with this problem since the > transaction context manager would take responsibility for calling > session.flush()/close() appropriately. > > > https://github.com/openstack/oslo.db/blob/43af1cf08372006aa46d836ec45482dd4b5b5349/oslo_db/sqlalchemy/enginefacade.py#L626 > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Tue Jun 19 16:43:20 2018 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 19 Jun 2018 12:43:20 -0400 Subject: [Openstack-operators] Ops Meetups Team meeting 2018-6-19 Message-ID: Meeting minutes for today's OpenStack Ops Meetups team meeting on IRC Minutes: http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-06-19-14.05.html Minutes (text): http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-06-19-14.05.txt Log: http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-06-19-14.05.log.html Next meeting 10am EST 2018-6-26 -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From cristian.calin at orange.com Wed Jun 20 06:43:44 2018 From: cristian.calin at orange.com (cristian.calin at orange.com) Date: Wed, 20 Jun 2018 06:43:44 +0000 Subject: [Openstack-operators] [ceilometer][panko][pike] elasticsearch integration Message-ID: <8417_1529477025_5B29F7A1_8417_225_1_00dd3b5081db43ca9eac794c4536ec3a@orange.com> Hello, I'm trying to run ceilometer with panko publishers in pike release and when I run the ceilometer-agent-notification I get a trace complaining about NoSuchOptError, but without the actual parameter that is missing (see trace below). I have configured panko.conf with the following: [database] connection = es://user:password at elasticsearch.service.consul.:9200 [storage] es_ssl_enable = False es_index_name = events As far as I can tell from the debug log, the storage.es_ssl_enable and storage.es_index_name parameters are not loaded, they don't show up in the "cotyledon.oslo_config_glue" output so I assume the trace relates to these parameters. Has anybody else seen this error before? PS: sorry for CC'ing the dev list but I hope to reach the right audience ================ TRACE ==================== {"asctime": "2018-06-20 05:49:09.405","process": "59","levelname": "DEBUG","name": "panko.storage", "instance": {},"message":"looking for 'es' driver in panko.storage"} {"funcName": "get_connection","source": {"p ath": "/opt/ceilometer/lib/python2.7/site-packages/panko/storage/__init__.py","lineno": "84"}} {"asctime": "2018-06-20 05:49:10.436","process": "61","levelname": "DEBUG","name": "panko.storage", "instance": {},"message":"looking for 'es' driver in panko.storage"} {"funcName": "get_connection","source": {"p ath": "/opt/ceilometer/lib/python2.7/site-packages/panko/storage/__init__.py","lineno": "84"}} {"asctime": "2018-06-20 05:49:11.409","process": "63","levelname": "DEBUG","name": "panko.storage", "instance": {},"message":"looking for 'es' driver in panko.storage"} {"funcName": "get_connection","source": {"p ath": "/opt/ceilometer/lib/python2.7/site-packages/panko/storage/__init__.py","lineno": "84"}} {"asctime": "2018-06-20 05:49:18.467","process": "57","levelname": "DEBUG","name": "panko.storage", "instance": {},"message":"looking for 'es' driver in panko.storage"} {"funcName": "get_connection","source": {"p ath": "/opt/ceilometer/lib/python2.7/site-packages/panko/storage/__init__.py","lineno": "84"}} {"asctime": "2018-06-20 05:49:18.468","process": "57","levelname": "ERROR","name": "ceilometer.pipeline", "instance": {},"message":"Unable to load publisher panko://"}: RetryError: RetryError[] 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>>Traceback (most recent call last): 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/ceilometer/pipeline.py", line 419, in __init__ 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> self.publishers.append(publisher_manager.get(p)) 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/ceilometer/pipeline.py", line 713, in get 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> 'ceilometer.%s.publisher' % self._purpose) 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/ceilometer/publisher/__init__.py", line 36, in get_publisher 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> return loaded_driver.driver(parse_result) 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/panko/publisher/database.py", line 35, in __init__ 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> self.conn = storage.get_connection_from_config(conf) 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/panko/storage/__init__.py", line 73, in get_connection_from_config 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> return _inner() 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/tenacity/__init__.py", line 171, in wrapped_f 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> return self.call(f, *args, **kw) 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/tenacity/__init__.py", line 248, in call 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> start_time=start_time) 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/tenacity/__init__.py", line 217, in iter 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> six.raise_from(RetryError(fut), fut.exception()) 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/six.py", line 718, in raise_from 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> raise value 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>>RetryError: RetryError[] 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cristian.calin at orange.com Wed Jun 20 07:08:57 2018 From: cristian.calin at orange.com (cristian.calin at orange.com) Date: Wed, 20 Jun 2018 07:08:57 +0000 Subject: [Openstack-operators] [ceilometer][panko][pike] elasticsearch integration In-Reply-To: <8417_1529477025_5B29F7A1_8417_225_1_00dd3b5081db43ca9eac794c4536ec3a@orange.com> References: <8417_1529477025_5B29F7A1_8417_225_1_00dd3b5081db43ca9eac794c4536ec3a@orange.com> Message-ID: <22471_1529478538_5B29FD8A_22471_266_1_b3e6711edc72473bab8136aae2bfd6a4@orange.com> Some more details, I tried running with python3 and the error I got with it is a bit more detailed: {"asctime": "2018-06-20 07:06:11.537","process": "24","levelname": "ERROR","name": "ceilometer.pipeline", "instance": {},"message":"Unable to load publisher panko://"}: tenacity.RetryError: RetryError[] 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>>Traceback (most recent call last): 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/tenacity/__init__.py", line 251, in call 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> result = fn(*args, **kwargs) 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/panko/storage/__init__.py", line 71, in _inner 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> return get_connection(url, conf) 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/panko/storage/__init__.py", line 86, in get_connection 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> return mgr.driver(url, conf) 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/panko/storage/impl_elasticsearch.py", line 74, in __init__ 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> use_ssl = conf.database.es_ssl_enabled 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/oslo_config/cfg.py", line 3363, in __getattr__ 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> return self._conf._get(name, self._group) 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/oslo_config/cfg.py", line 2925, in _get 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> value = self._do_get(name, group, namespace) 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/oslo_config/cfg.py", line 2942, in _do_get 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> info = self._get_opt_info(name, group) 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/oslo_config/cfg.py", line 3099, in _get_opt_info 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> raise NoSuchOptError(opt_name, group) 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>>oslo_config.cfg.NoSuchOptError: no such option es_ssl_enabled in group [database] 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>>The above exception was the direct cause of the following exception: 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>>Traceback (most recent call last): 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/ceilometer/pipeline.py", line 419, in __init__ 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> self.publishers.append(publisher_manager.get(p)) 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/ceilometer/pipeline.py", line 713, in get 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> 'ceilometer.%s.publisher' % self._purpose) 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/ceilometer/publisher/__init__.py", line 36, in get_publisher 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> return loaded_driver.driver(parse_result) 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/panko/publisher/database.py", line 35, in __init__ 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> self.conn = storage.get_connection_from_config(conf) 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/panko/storage/__init__.py", line 73, in get_connection_from_config 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> return _inner() 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/tenacity/__init__.py", line 171, in wrapped_f 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> return self.call(f, *args, **kw) 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/tenacity/__init__.py", line 248, in call 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> start_time=start_time) 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python3.5/site-packages/tenacity/__init__.py", line 217, in iter 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> six.raise_from(RetryError(fut), fut.exception()) 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> File "", line 2, in raise_from 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>>tenacity.RetryError: RetryError[] 2018-06-20 07:06:11.537 24 TRACE ceilometer.pipeline >>>>> I changed my panko.conf to: [database] connection = es://user:password at elasticsearch.service.consul.:9200 es_ssl_enable = False [storage] es_index_name = events But I get the same error which means that the es_* parameters are not properly merged from panko.conf when ceilometer-agent-notification starts up. From: cristian.calin at orange.com [mailto:cristian.calin at orange.com] Sent: Wednesday, June 20, 2018 9:44 AM To: openstack-operators at lists.openstack.org Cc: openstack-dev at lists.openstack.org Subject: [openstack-dev] [ceilometer][panko][pike] elasticsearch integration Hello, I'm trying to run ceilometer with panko publishers in pike release and when I run the ceilometer-agent-notification I get a trace complaining about NoSuchOptError, but without the actual parameter that is missing (see trace below). I have configured panko.conf with the following: [database] connection = es://user:password at elasticsearch.service.consul.:9200 [storage] es_ssl_enable = False es_index_name = events As far as I can tell from the debug log, the storage.es_ssl_enable and storage.es_index_name parameters are not loaded, they don't show up in the "cotyledon.oslo_config_glue" output so I assume the trace relates to these parameters. Has anybody else seen this error before? PS: sorry for CC'ing the dev list but I hope to reach the right audience ================ TRACE ==================== {"asctime": "2018-06-20 05:49:09.405","process": "59","levelname": "DEBUG","name": "panko.storage", "instance": {},"message":"looking for 'es' driver in panko.storage"} {"funcName": "get_connection","source": {"p ath": "/opt/ceilometer/lib/python2.7/site-packages/panko/storage/__init__.py","lineno": "84"}} {"asctime": "2018-06-20 05:49:10.436","process": "61","levelname": "DEBUG","name": "panko.storage", "instance": {},"message":"looking for 'es' driver in panko.storage"} {"funcName": "get_connection","source": {"p ath": "/opt/ceilometer/lib/python2.7/site-packages/panko/storage/__init__.py","lineno": "84"}} {"asctime": "2018-06-20 05:49:11.409","process": "63","levelname": "DEBUG","name": "panko.storage", "instance": {},"message":"looking for 'es' driver in panko.storage"} {"funcName": "get_connection","source": {"p ath": "/opt/ceilometer/lib/python2.7/site-packages/panko/storage/__init__.py","lineno": "84"}} {"asctime": "2018-06-20 05:49:18.467","process": "57","levelname": "DEBUG","name": "panko.storage", "instance": {},"message":"looking for 'es' driver in panko.storage"} {"funcName": "get_connection","source": {"p ath": "/opt/ceilometer/lib/python2.7/site-packages/panko/storage/__init__.py","lineno": "84"}} {"asctime": "2018-06-20 05:49:18.468","process": "57","levelname": "ERROR","name": "ceilometer.pipeline", "instance": {},"message":"Unable to load publisher panko://"}: RetryError: RetryError[] 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>>Traceback (most recent call last): 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/ceilometer/pipeline.py", line 419, in __init__ 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> self.publishers.append(publisher_manager.get(p)) 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/ceilometer/pipeline.py", line 713, in get 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> 'ceilometer.%s.publisher' % self._purpose) 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/ceilometer/publisher/__init__.py", line 36, in get_publisher 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> return loaded_driver.driver(parse_result) 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/panko/publisher/database.py", line 35, in __init__ 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> self.conn = storage.get_connection_from_config(conf) 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/panko/storage/__init__.py", line 73, in get_connection_from_config 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> return _inner() 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/tenacity/__init__.py", line 171, in wrapped_f 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> return self.call(f, *args, **kw) 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/tenacity/__init__.py", line 248, in call 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> start_time=start_time) 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/tenacity/__init__.py", line 217, in iter 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> six.raise_from(RetryError(fut), fut.exception()) 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> File "/opt/ceilometer/lib/python2.7/site-packages/six.py", line 718, in raise_from 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> raise value 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>>RetryError: RetryError[] 2018-06-20 05:49:18.468 57 TRACE ceilometer.pipeline >>>>> _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Wed Jun 20 18:24:38 2018 From: kennelson11 at gmail.com (Kendall Nelson) Date: Wed, 20 Jun 2018 11:24:38 -0700 Subject: [Openstack-operators] [PTG] Updates! Message-ID: Hello Everyone! Wanted to give you some updates on PTG4 planning. We have finalized the list of SIGs/ Groups/WGs/Teams that are attending. They are as follows: - Airship - API SIG - Barbican/Security SIG - Blazar - Chef OpenStack - Cinder - Cyborg - Designate - Documentation - Edge Computing Group - First Contact SIG - Glance - Heat - Horizon - Infrastructure - Interop WG - Ironic - Kata - Keystone - Kolla - LOCI - Manila - Masakari - Mistral - Monasca - Neutron - Nova - Octavia - OpenStack Ansible - OpenStack Charms - OpenStack Helm - OpenStackClient - Operator Meetup Puppet OpenStack - QA - Oslo - Public Cloud WG - Release Management - Requirements - Sahara - Scientific SIG - Self-Healing SIG - SIG- K8s - StarlingX - Swift - TC - TripleO - Upgrades SIG - Watcher - Zuul (pending confirmation) Thierry and I are working on placing them into a strawman schedule to reduce conflicts between related or overlapping groups. We should have more on what that will look like and a draft for you all to review in the next few weeks. We also wanted to remind you all of the Travel Support Program. We are again doing a two phase selection. The first deadline is approaching: July 1st. At this point we have less than a dozen applicants so if you need it or even think you need it, I urge you to apply here[1]. Also! Reminder that we have a finite number of rooms in the hotel block so please book early to make sure you get the discounted rate before they run out. You can book those rooms here[2] (pardon the ugly URL). Can't wait to see you all there! -Kendall Nelson (diablo_rojo) P.S. Gonna try to do a game night again since you all seemed to enjoy it so much last time :) [1] https://openstackfoundation.formstack.com/forms/travelsupportptg_denver_2018 [2] https://www.marriott.com/meeting-event-hotels/group-corporate-travel/groupCorp.mi?resLinkData=Project%20Teams%20Gathering%2C%20Openstack%5Edensa%60opnopna%7Copnopnb%60149.00%60USD%60false%604%609/5/18%609/18/18%608/20/18&app=resvlink&stop_mobi=yes -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.rydberg at citynetwork.eu Thu Jun 21 08:00:39 2018 From: tobias.rydberg at citynetwork.eu (Tobias Rydberg) Date: Thu, 21 Jun 2018 10:00:39 +0200 Subject: [Openstack-operators] [publiccloud-wg] Meeting this afternoon for Public Cloud WG Message-ID: <267af813-4983-c7a6-48e7-c30040615529@citynetwork.eu> Hi folks, Time for a new meeting for the Public Cloud WG. Agenda draft can be found at https://etherpad.openstack.org/p/publiccloud-wg, feel free to add items to that list. See you all at IRC 1400 UTC in #openstack-publiccloud Cheers, Tobias -- Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED From ignaziocassano at gmail.com Fri Jun 22 06:11:11 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 22 Jun 2018 08:11:11 +0200 Subject: [Openstack-operators] security groups loggin on ocata Message-ID: Dear All, I read neutron service_plugins supports log for security groups. I did not understand if the above feature is available only from queens release or if there is the opportunity to enable it also on ocata. Any case, could anyone suggest a way to log securoty groups ? Many Thanks and Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Fri Jun 22 20:41:40 2018 From: mihalis68 at gmail.com (Chris Morgan) Date: Fri, 22 Jun 2018 16:41:40 -0400 Subject: [Openstack-operators] Requesting input on PTG for operators Message-ID: The OpenStack Ops Meetups team would like to request your input on the upcoming PTG this September. Here's a very brief. completely anonymous poll (only 3 questions): https://www.surveymonkey.com/r/ZSLF9GB Please fill it out to help us and the foundation craft the first PTG to combine development teams and openstack operators! Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellstripes at gmail.com Mon Jun 25 18:22:27 2018 From: ellstripes at gmail.com (Ell Marquez) Date: Mon, 25 Jun 2018 13:22:27 -0500 Subject: [Openstack-operators] OpenStack Mentorship Program Relaunch Message-ID: Hello all, We are happy to announce the relaunch of the OpenStack Mentoring program, and we are kicking off with some changes to the program that we hope will better serve the community. Previously mentoring occurred through one on one partnering of mentor and mentee; this new program will focus on providing mentorship through goal-focused cohorts of mentors. This change will allow mentoring responsibilities to be shared among each group's mentors. The initial cohorts will be: - Get your first patch merged - First CFP submission / Give your first talk - Become COA certified / study for COA - Deploy your first Cloud If you are interested in joining as a mentor or mentee, please sign up at : Mentor Signup: https://openstackfoundation.formstack.com/forms/mentoring_co horts_mentors Mentee Signup: https://openstackfoundation.formstack.com/forms/mentoring_co horts_mentees freenode irc room: #openstack-mentoring Cheers, Ell Marquez and Jill Rouleau -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrhillsman at gmail.com Mon Jun 25 19:10:46 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Mon, 25 Jun 2018 14:10:46 -0500 Subject: [Openstack-operators] Get Your Survey On! Message-ID: Hi everyone, Running, operating, supporting an OpenStack cloud? Participate in the User Survey to share more about your technology implementations and provide feedback for the community. The deadline to complete the survey and be part of the next report is *Friday, August 3 at 23:59 UTC.* - Login and complete the OpenStack User Survey here: http://www.openstack.org/user-survey - Help with the survey analysis by jointing the OpenStack User Survey Working Group: https://openstackfoundation.formstack.com/forms/user_ survey_working_group - And promote the User Survey!: https://twitter.com/Op enStack/status/993589356312088577 -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Mon Jun 25 21:17:59 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 25 Jun 2018 16:17:59 -0500 Subject: [Openstack-operators] [nova] Need feedback on spec for handling down cells in the API In-Reply-To: References: Message-ID: <85b298dd-57d8-7ae1-8b35-121c78aacd3c@gmail.com> On 6/7/2018 9:02 AM, Matt Riedemann wrote: > We have a nova spec [1] which is at the point that it needs some API > user (and operator) feedback on what nova API should be doing when > listing servers and there are down cells (unable to reach the cell DB or > it times out). > > tl;dr: the spec proposes to return "shell" instances which have the > server uuid and created_at fields set, and maybe some other fields we > can set, but otherwise a bunch of fields in the server response would be > set to UNKNOWN sentinel values. This would be unversioned, and therefore > could wreak havoc on existing client side code that expects fields like > 'config_drive' and 'updated' to be of a certain format. > > There are alternatives listed in the spec so please read this over and > provide feedback since this is a pretty major UX change. > > Oh, and no pressure, but today is the spec freeze deadline for Rocky. > > [1] https://review.openstack.org/#/c/557369/ The options laid out right now are: 1. Without a new microversion, include 'shell' servers in the response when listing over down cells. These would have UNKNOWN values for the fields in the server object. gibi and I didn't like this because existing client code wouldn't know how to deal with these UNKNOWN shell instances - and not all of the server fields are simple strings, we have booleans, integers, dicts and lists, so what would those values be? 2. In a new microversion, return a new top-level parameter when listing servers which would include minimal details about servers that are in down cells (minimal like just the uuid). This was an alternative gibi and I had discussed because we didn't like the client-side impacts w/o a microversion or the full 'shell' servers in option 1. From an IRC conversation last week with mordred [1], dansmith and mordred don't care for the new top-level parameter since clients would have to merge that in to the full list of available servers. Plus, in the future, if we ever have some kind of caching mechanism in the API from which we can pull instance information if it's in a down cell, then the new top-level parameter becomes kind of pointless. 3. In a new microversion, include servers from down cells in the same top-level servers response parameter but for those in down cells, we'll just include minimal information (status=UNKNOWN and the uuid). Clients would opt-in to the new microversion when they know how to deal with what an instance in UNKNOWN status means. In the future, we could use a caching mechanism to fill in these details for instances in down cells. #3 is kind of a compromise on options 1 and 2, and I'm OK with it (barring any hairy details). In all cases, we won't include 'shell' servers in the response if the user is filtering (or paging?) because we can't be honest about the results and just have to treat the filters as if they don't apply to the instances in the down cell. If you have a server in a down cell, you can't delete it or really do anything with it because we literally can't pull the instance out of the cell database while the cell is down. You'd get a 500 or 503 in that case. Regardless of microversion, we plan on omitting instances from down cells when listing which is a backportable reliability bug fix [2] so we don't 500 the API when listing across 70 cells and 1 is down. [1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-06-20.log.html#t2018-06-20T16:52:27 [2] https://review.openstack.org/#/c/575734/ -- Thanks, Matt From sean.mcginnis at gmx.com Tue Jun 26 16:42:10 2018 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Tue, 26 Jun 2018 11:42:10 -0500 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: References: Message-ID: <20180626164210.GA1445@sm-workstation> Reviving this thread with a fresh start. See below for the original. To recap, the ops community is willing to take over some of the operator documentation that is no longer available due to the loss of documentation team resources. From discussions, there needs to be some official governance over this operator owned repo (or repos) so it is recommended that a sig be formed. The repos can be created in the meantime, but consideration needs to be taken about naming as by default, the repo name is what is reflected in the documentation publishing location. SIG Formation ------------- There were a couple suggestions on naming and focus for this sig, but I would like to make a slightly different proposal. I would actually like to see a sig-operator group formed. We have repos for operator tools and other useful things and we have a mix of operators, vendors, and others that work together on things like the ops meetup. I think it would make sense to make this into an official SIG that could have a broader scope than just documentation. Docs Repos ---------- Doug made a good suggestion that we may want these things published under something like docs.openstack.org/operations-guide. So based on this, I think for now at least we should create an opestack/operations-guide repo that will end up being owned by this SIG. I would expect most documentation generated or owned by this group would just be located somewhere under that repo, but if the need arises we can add additional repos. There are other ops repos out there right now. I would expect the ownership of those to move under this sig as well, but that is a seperate and less pressing concern at this point. Bug Tracking ------------ There should be some way to track tasks and needs for this documentation and any other repos that are moved under this sig. Since it is the currently planned direction for all OpenStack projects (or at least there is a vocal desire for it to be) I think a Storyboard project should be created for this SIG's activities. Plan ---- So to recap above, I would propose the following actions be taken: 1. Create sig-operators as a group to manage operator efforts at least related to what needs to be done in repos. 2. Create an openstack/operations-guide repo to be the new home of the operations documentation. 3. Create a new StoryBoard project to help track work in these repos x. Document all this. 9. Profit! I'm willing to work through the steps to get these things set up. Please give feedback if this proposed plan makes sense or if there is anything different that would be preferred. Thanks, Sean On Wed, May 23, 2018 at 06:38:32PM -0700, Chris Morgan wrote: > Hello Everyone, > > In the Ops Community documentation working session today in Vancouver, we > made some really good progress (etherpad here: > https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but not all of the > good stuff is yet written down). > > In short, we're going to course correct on maintaining the Operators Guide, > the HA Guide and Architecture Guide, not edit-in-place via the wiki and > instead try still maintaining them as code, but with a different, new set > of owners, possibly in a new Ops-focused repo. There was a strong consensus > that a) code workflow >> wiki workflow and that b) openstack core docs > tools are just fine. > > There is a lot still to be decided on how where and when, but we do have an > offer of a rewrite of the HA Guide, as long as the changes will be allowed > to actually land, so we expect to actually start showing some progress. > > At the end of the session, people wanted to know how to follow along as > various people work out how to do this... and so for now that place is this > very email thread. The idea is if the code for those documents goes to live > in a different repo, or if new contributors turn up, or if a new version we > will announce/discuss it here until such time as we have a better home for > this initiative. > > Cheers > > Chris > > -- > Chris Morgan > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From mihalis68 at gmail.com Tue Jun 26 16:47:57 2018 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 26 Jun 2018 12:47:57 -0400 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <20180626164210.GA1445@sm-workstation> References: <20180626164210.GA1445@sm-workstation> Message-ID: This sounds great. As I understand it Sean can set up a skeleton for us to work on ops docs (and maybe other things later) with a minimum of initiation energy. Count me in. Chris On Tue, Jun 26, 2018 at 12:42 PM Sean McGinnis wrote: > Reviving this thread with a fresh start. See below for the original. > > To recap, the ops community is willing to take over some of the operator > documentation that is no longer available due to the loss of documentation > team > resources. From discussions, there needs to be some official governance > over > this operator owned repo (or repos) so it is recommended that a sig be > formed. > The repos can be created in the meantime, but consideration needs to be > taken > about naming as by default, the repo name is what is reflected in the > documentation publishing location. > > SIG Formation > ------------- > There were a couple suggestions on naming and focus for this sig, but I > would > like to make a slightly different proposal. I would actually like to see a > sig-operator group formed. We have repos for operator tools and other > useful > things and we have a mix of operators, vendors, and others that work > together > on things like the ops meetup. I think it would make sense to make this > into an > official SIG that could have a broader scope than just documentation. > > Docs Repos > ---------- > Doug made a good suggestion that we may want these things published under > something like docs.openstack.org/operations-guide. So based on this, I > think > for now at least we should create an opestack/operations-guide repo that > will > end up being owned by this SIG. I would expect most documentation > generated or > owned by this group would just be located somewhere under that repo, but > if the > need arises we can add additional repos. > > There are other ops repos out there right now. I would expect the > ownership of > those to move under this sig as well, but that is a seperate and less > pressing > concern at this point. > > Bug Tracking > ------------ > There should be some way to track tasks and needs for this documentation > and > any other repos that are moved under this sig. Since it is the currently > planned direction for all OpenStack projects (or at least there is a vocal > desire for it to be) I think a Storyboard project should be created for > this > SIG's activities. > > Plan > ---- > So to recap above, I would propose the following actions be taken: > > 1. Create sig-operators as a group to manage operator efforts at least > related > to what needs to be done in repos. > 2. Create an openstack/operations-guide repo to be the new home of the > operations documentation. > 3. Create a new StoryBoard project to help track work in these repos > x. Document all this. > 9. Profit! > > I'm willing to work through the steps to get these things set up. Please > give > feedback if this proposed plan makes sense or if there is anything > different > that would be preferred. > > Thanks, > Sean > > On Wed, May 23, 2018 at 06:38:32PM -0700, Chris Morgan wrote: > > Hello Everyone, > > > > In the Ops Community documentation working session today in Vancouver, we > > made some really good progress (etherpad here: > > https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but not all of > the > > good stuff is yet written down). > > > > In short, we're going to course correct on maintaining the Operators > Guide, > > the HA Guide and Architecture Guide, not edit-in-place via the wiki and > > instead try still maintaining them as code, but with a different, new set > > of owners, possibly in a new Ops-focused repo. There was a strong > consensus > > that a) code workflow >> wiki workflow and that b) openstack core docs > > tools are just fine. > > > > There is a lot still to be decided on how where and when, but we do have > an > > offer of a rewrite of the HA Guide, as long as the changes will be > allowed > > to actually land, so we expect to actually start showing some progress. > > > > At the end of the session, people wanted to know how to follow along as > > various people work out how to do this... and so for now that place is > this > > very email thread. The idea is if the code for those documents goes to > live > > in a different repo, or if new contributors turn up, or if a new version > we > > will announce/discuss it here until such time as we have a better home > for > > this initiative. > > > > Cheers > > > > Chris > > > > -- > > Chris Morgan > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Tue Jun 26 17:36:52 2018 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Tue, 26 Jun 2018 12:36:52 -0500 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <20180626164210.GA1445@sm-workstation> References: <20180626164210.GA1445@sm-workstation> Message-ID: <20180626173651.GA3929@sm-workstation> > > Plan > ---- > So to recap above, I would propose the following actions be taken: > > 1. Create sig-operators as a group to manage operator efforts at least related > to what needs to be done in repos. > 2. Create an openstack/operations-guide repo to be the new home of the > operations documentation. One correction to this - that repo already exists. It has been retired, so I think the action here would just be to "un-retire" the repo and get things updated to start publishing again. > 3. Create a new StoryBoard project to help track work in these repos > x. Document all this. > 9. Profit! > > I'm willing to work through the steps to get these things set up. Please give > feedback if this proposed plan makes sense or if there is anything different > that would be preferred. From iain.macdonnell at oracle.com Tue Jun 26 19:59:05 2018 From: iain.macdonnell at oracle.com (iain MacDonnell) Date: Tue, 26 Jun 2018 12:59:05 -0700 Subject: [Openstack-operators] neutron-server memcached connections Message-ID: <9598665a-8748-9fa8-147d-e618db3f7b94@oracle.com> In diagnosing a situation where a Pike deployment was intermittently slower (in general), I discovered that it was (sometimes) exceeding memcached's maximum connection limit, which is set to 4096. Looking closer, ~2750 of the connections are from 8 neutron-server process. neutron-server is configured with 8 API workers, and those 8 processes have a combined total of ~2750 connections to memcached: # lsof -i TCP:11211 | awk '/^neutron-s/ {print $2}' | sort | uniq -c 245 2611 306 2612 228 2613 406 2614 407 2615 385 2616 369 2617 398 2618 # There doesn't seem to be much turnover - comparing samples of the connections (incl. source port) 15 mins apart, two were dropped, and one new one added. In neutron.conf, keystone_authtoken.memcached_servers is configured, but nothing else pertaining to caching, so keystone_authtoken.memcache_pool_maxsize should default to 10. Am I misunderstanding something, or shouldn't I see a maximum of 10 connections from each of the neutron-server API workers, with this configuration? Any known issues, or pointers to what I'm missing? TIA, ~iain From pkovar at redhat.com Tue Jun 26 21:10:29 2018 From: pkovar at redhat.com (Petr Kovar) Date: Tue, 26 Jun 2018 23:10:29 +0200 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <20180626173651.GA3929@sm-workstation> References: <20180626164210.GA1445@sm-workstation> <20180626173651.GA3929@sm-workstation> Message-ID: <20180626231029.ee392fdeb8e964e85df870e3@redhat.com> On Tue, 26 Jun 2018 12:36:52 -0500 Sean McGinnis wrote: > > > > Plan > > ---- > > So to recap above, I would propose the following actions be taken: > > > > 1. Create sig-operators as a group to manage operator efforts at least related > > to what needs to be done in repos. > > 2. Create an openstack/operations-guide repo to be the new home of the > > operations documentation. > > One correction to this - that repo already exists. It has been retired, so I > think the action here would just be to "un-retire" the repo and get things > updated to start publishing again. That's great, looks like most of the skeleton from https://github.com/openstack/operations-guide/tree/c628640944c9de139b4bc9dee80885060d4b6f83 can just be reused. For step 2, let's start with moving https://github.com/openstack/openstack-manuals/tree/a1f1748478125ccd68d90a98ccc06c7ec359d3a0/doc/ops-guide from openstack-manuals, then. Other guides like ha or architecture can live in their own repos, if the new SIG wants to own them, or can be merged into the operations guide later on. > > 3. Create a new StoryBoard project to help track work in these repos > > x. Document all this. > > 9. Profit! > > > > I'm willing to work through the steps to get these things set up. Please give > > feedback if this proposed plan makes sense or if there is anything different > > that would be preferred. Thanks for the recap and for your help, Sean. If you need help from the docs team side, please let me know / CC me on your patches. Cheers, pk From amy at demarco.com Wed Jun 27 00:40:33 2018 From: amy at demarco.com (Amy Marrich) Date: Tue, 26 Jun 2018 19:40:33 -0500 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <20180626164210.GA1445@sm-workstation> References: <20180626164210.GA1445@sm-workstation> Message-ID: Sean put together some really great things here and I do think the SiG might be the way to go as far as ownership for the repos and the plan looks pretty complete. I've offered to do the Git and Gerrit Lunch and Learn at the OPS mmetup if needed to help get folks set up and going. Amy (spotz) On Tue, Jun 26, 2018 at 11:42 AM, Sean McGinnis wrote: > Reviving this thread with a fresh start. See below for the original. > > To recap, the ops community is willing to take over some of the operator > documentation that is no longer available due to the loss of documentation > team > resources. From discussions, there needs to be some official governance > over > this operator owned repo (or repos) so it is recommended that a sig be > formed. > The repos can be created in the meantime, but consideration needs to be > taken > about naming as by default, the repo name is what is reflected in the > documentation publishing location. > > SIG Formation > ------------- > There were a couple suggestions on naming and focus for this sig, but I > would > like to make a slightly different proposal. I would actually like to see a > sig-operator group formed. We have repos for operator tools and other > useful > things and we have a mix of operators, vendors, and others that work > together > on things like the ops meetup. I think it would make sense to make this > into an > official SIG that could have a broader scope than just documentation. > > Docs Repos > ---------- > Doug made a good suggestion that we may want these things published under > something like docs.openstack.org/operations-guide. So based on this, I > think > for now at least we should create an opestack/operations-guide repo that > will > end up being owned by this SIG. I would expect most documentation > generated or > owned by this group would just be located somewhere under that repo, but > if the > need arises we can add additional repos. > > There are other ops repos out there right now. I would expect the > ownership of > those to move under this sig as well, but that is a seperate and less > pressing > concern at this point. > > Bug Tracking > ------------ > There should be some way to track tasks and needs for this documentation > and > any other repos that are moved under this sig. Since it is the currently > planned direction for all OpenStack projects (or at least there is a vocal > desire for it to be) I think a Storyboard project should be created for > this > SIG's activities. > > Plan > ---- > So to recap above, I would propose the following actions be taken: > > 1. Create sig-operators as a group to manage operator efforts at least > related > to what needs to be done in repos. > 2. Create an openstack/operations-guide repo to be the new home of the > operations documentation. > 3. Create a new StoryBoard project to help track work in these repos > x. Document all this. > 9. Profit! > > I'm willing to work through the steps to get these things set up. Please > give > feedback if this proposed plan makes sense or if there is anything > different > that would be preferred. > > Thanks, > Sean > > On Wed, May 23, 2018 at 06:38:32PM -0700, Chris Morgan wrote: > > Hello Everyone, > > > > In the Ops Community documentation working session today in Vancouver, we > > made some really good progress (etherpad here: > > https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but not all of > the > > good stuff is yet written down). > > > > In short, we're going to course correct on maintaining the Operators > Guide, > > the HA Guide and Architecture Guide, not edit-in-place via the wiki and > > instead try still maintaining them as code, but with a different, new set > > of owners, possibly in a new Ops-focused repo. There was a strong > consensus > > that a) code workflow >> wiki workflow and that b) openstack core docs > > tools are just fine. > > > > There is a lot still to be decided on how where and when, but we do have > an > > offer of a rewrite of the HA Guide, as long as the changes will be > allowed > > to actually land, so we expect to actually start showing some progress. > > > > At the end of the session, people wanted to know how to follow along as > > various people work out how to do this... and so for now that place is > this > > very email thread. The idea is if the code for those documents goes to > live > > in a different repo, or if new contributors turn up, or if a new version > we > > will announce/discuss it here until such time as we have a better home > for > > this initiative. > > > > Cheers > > > > Chris > > > > -- > > Chris Morgan > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Jun 27 01:19:17 2018 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 27 Jun 2018 10:19:17 +0900 Subject: [Openstack-operators] [openstack-dev] [qa][tempest-plugins][release][tc][ptl]: Coordinated Release Model proposal for Tempest & Tempest Plugins In-Reply-To: <1643ed12ccd.c62ee6db17998.6818813333945980470@ghanshyammann.com> References: <1643b637954.12a76afca1193.5658117153151589198@ghanshyammann.com> <1643b8715e6.fca543903252.1902631162047144959@ghanshyammann.com> <0dbace6e-e3be-1c43-44bc-06f2be7bcdb0@openstack.org> <1530017472-sup-6339@lrrr.local> <20180626135209.GA15436@zeong> <1530021936-sup-5714@lrrr.local> <1643ed12ccd.c62ee6db17998.6818813333945980470@ghanshyammann.com> Message-ID: <1643ed2c5d2.f4af11ae18001.3627876780003826684@ghanshyammann.com> ++ operator ML ---- On Wed, 27 Jun 2018 10:17:33 +0900 Ghanshyam Mann wrote ---- > > > > ---- On Tue, 26 Jun 2018 23:12:30 +0900 Doug Hellmann wrote ---- > > Excerpts from Matthew Treinish's message of 2018-06-26 09:52:09 -0400: > > > On Tue, Jun 26, 2018 at 08:53:21AM -0400, Doug Hellmann wrote: > > > > Excerpts from Andrea Frittoli's message of 2018-06-26 13:35:11 +0100: > > > > > On Tue, 26 Jun 2018, 1:08 pm Thierry Carrez, wrote: > > > > > > > > > > > Dmitry Tantsur wrote: > > > > > > > [...] > > > > > > > My suggestion: tempest has to be compatible with all supported releases > > > > > > > (of both services and plugins) OR be branched. > > > > > > > [...] > > > > > > I tend to agree with Dmitry... We have a model for things that need > > > > > > release alignment, and that's the cycle-bound series. The reason tempest > > > > > > is branchless was because there was no compatibility issue. If the split > > > > > > of tempest plugins introduces a potential incompatibility, then I would > > > > > > prefer aligning tempest to the existing model rather than introduce a > > > > > > parallel tempest-specific cycle just so that tempest can stay > > > > > > release-independent... > > > > > > > > > > > > I seem to remember there were drawbacks in branching tempest, though... > > > > > > Can someone with functioning memory brain cells summarize them again ? > > > > > > > > > > > > > > > > > > > > > Branchless Tempest enforces api stability across branches. > > > > > > > > I'm sorry, but I'm having a hard time taking this statement seriously > > > > when the current source of tension is that the Tempest API itself > > > > is breaking for its plugins. > > > > > > > > Maybe rather than talking about how to release compatible things > > > > together, we should go back and talk about why Tempest's API is changing > > > > in a way that can't be made backwards-compatible. Can you give some more > > > > detail about that? > > > > > > > > > > Well it's not, if it did that would violate all the stability guarantees > > > provided by Tempest's library and plugin interface. I've not ever heard of > > > these kind of backwards incompatibilities in those interfaces and we go to > > > all effort to make sure we don't break them. Where did the idea that > > > backwards incompatible changes where being introduced come from? > > > > In his original post, gmann said, "There might be some changes in > > Tempest which might not work with older version of Tempest Plugins." > > I was surprised to hear that, but I'm not sure how else to interpret > > that statement. > > I did not mean to say that Tempest will introduce the changes in backward incompatible way which can break plugins. That cannot happen as all plugins and tempest are branchless and they are being tested with master Tempest so if we change anything backward incompatible then it break the plugins gate. Even we have to remove any deprecated interfaces from Tempest, we fix all plugins first like - https://review.openstack.org/#/q/topic:remove-support-of-cinder-v1-api+(status:open+OR+status:merged) > > What I mean to say here is that adding new or removing deprecated interface in Tempest might not work with all released version or unreleased Plugins. That point is from point of view of using Tempest and Plugins in production cloud testing not gate(where we keep the compatibility). Production Cloud user use Tempest cycle based version. Pike based Cloud will be tested by Tempest 17.0.0 not latest version (though latest version might work). > > This thread is not just for gate testing point of view (which seems to be always interpreted), this is more for user using Tempest and Plugins for their cloud testing. I am looping operator mail list also which i forgot in initial post. > > We do not have any tag/release from plugins to know what version of plugin can work with what version of tempest. For Example If There is new interface introduced by Tempest 19.0.0 and pluginX start using it. Now it can create issues for pluginX in both release model 1. plugins with no release (I will call this PluginNR), 2. plugins with independent release (I will call it PluginIR). > > Users (Not Gate) will face below issues: > - User cannot use PluginNR with Tempest <19.0.0 (where that new interface was not present). And there is no PluginNR release/tag as this is unreleased and not branched software. > - User cannot find a PluginIR particular tag/release which can work with tempest <19.0.0 (where that new interface was not present). Only way for user to make it work is to manually find out the PluginIR tag/commit before PluginIR started consuming the new interface. > > Let me make it more clear via diagram: > PluginNR PluginIR > > Tempest 19.0.0 > Add NewInterface Use NewInterface Use NewInterface > > > Tempest 18.0.0 > NewInterface not present No version of PluginNR Unknown version of PluginIR > > > GATE (No Issue as latest things always being tested live ): OK OK > > User issues: X (does not work) Hard to find compatible version > > > We need a particular tag from Plugins for OpenStack release, EOL of OpenStack release like Tempest does so that user can test their old release Cloud in easy way. > > -gmann > > > > > > As for this whole thread I don't understand any of the points being brought up > > > in the original post or any of the follow ons, things seem to have been confused > > > from the start. The ask from users at the summit was simple. When a new OpenStack > > > release is pushed we push a tempest release to mark that (the next one will be > > > 19.0.0 to mark Rocky). Users were complaining that many plugins don't have a > > > corresponding version to mark support for a new release. So when trying to run > > > against a rocky cloud you get tempest 19.0.0 and then a bunch of plugins for > > > various services at different sha1s which have to be manually looked up based > > > on dates. All users wanted at the summit was a tag for plugins like tempest > > > does with the first number in: > > > > > > https://docs.openstack.org/tempest/latest/overview.html#release-versioning > > > > > > which didn't seem like a bad idea to me. I'm not sure the best mechanism to > > > accomplish this, because I agree with much of what plugin maintainers were > > > saying on the thread about wanting to control their own releases. But the > > > desire to make sure users have a tag they can pull for the addition or > > > removal of a supported release makes sense as something a plugin should do. > > > > We don't coordinate versions across projects anywhere else, for a > > bunch of reasons including the complexity of coordinating the details > > and the confusion it causes when the first version of something is > > 19.0.0. Instead, we list the compatible versions of everything > > together on a series-specific page on releases.o.o. That seems to > > be enough to help anyone wanting to know which versions of tools > > work together. The data is also available in YAML files, so it's easy > > enough to consume by automation. > > > > Would that work for tempest and it's plugins, too? > > > > Is the problem that the versions are not the same, or that some of the > > plugins are not being tagged at all? > > > > Doug > > > > __________________________________________________________________________ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > From gmann at ghanshyammann.com Wed Jun 27 01:31:42 2018 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 27 Jun 2018 10:31:42 +0900 Subject: [Openstack-operators] [openstack-dev] [qa][tempest-plugins][release][tc][ptl]: Coordinated Release Model proposal for Tempest & Tempest Plugins In-Reply-To: <1643ed2c5d2.f4af11ae18001.3627876780003826684@ghanshyammann.com> References: <1643b637954.12a76afca1193.5658117153151589198@ghanshyammann.com> <1643b8715e6.fca543903252.1902631162047144959@ghanshyammann.com> <0dbace6e-e3be-1c43-44bc-06f2be7bcdb0@openstack.org> <1530017472-sup-6339@lrrr.local> <20180626135209.GA15436@zeong> <1530021936-sup-5714@lrrr.local> <1643ed12ccd.c62ee6db17998.6818813333945980470@ghanshyammann.com> <1643ed2c5d2.f4af11ae18001.3627876780003826684@ghanshyammann.com> Message-ID: <1643ede22bb.c88c0cdb18029.7871042374175052950@ghanshyammann.com> ---- On Wed, 27 Jun 2018 10:19:17 +0900 Ghanshyam Mann wrote ---- > ++ operator ML > > ---- On Wed, 27 Jun 2018 10:17:33 +0900 Ghanshyam Mann wrote ---- > > > > > > > > ---- On Tue, 26 Jun 2018 23:12:30 +0900 Doug Hellmann wrote ---- > > > Excerpts from Matthew Treinish's message of 2018-06-26 09:52:09 -0400: > > > > On Tue, Jun 26, 2018 at 08:53:21AM -0400, Doug Hellmann wrote: > > > > > Excerpts from Andrea Frittoli's message of 2018-06-26 13:35:11 +0100: > > > > > > On Tue, 26 Jun 2018, 1:08 pm Thierry Carrez, wrote: > > > > > > > > > > > > > Dmitry Tantsur wrote: > > > > > > > > [...] > > > > > > > > My suggestion: tempest has to be compatible with all supported releases > > > > > > > > (of both services and plugins) OR be branched. > > > > > > > > [...] > > > > > > > I tend to agree with Dmitry... We have a model for things that need > > > > > > > release alignment, and that's the cycle-bound series. The reason tempest > > > > > > > is branchless was because there was no compatibility issue. If the split > > > > > > > of tempest plugins introduces a potential incompatibility, then I would > > > > > > > prefer aligning tempest to the existing model rather than introduce a > > > > > > > parallel tempest-specific cycle just so that tempest can stay > > > > > > > release-independent... > > > > > > > > > > > > > > I seem to remember there were drawbacks in branching tempest, though... > > > > > > > Can someone with functioning memory brain cells summarize them again ? > > > > > > > > > > > > > > > > > > > > > > > > > Branchless Tempest enforces api stability across branches. > > > > > > > > > > I'm sorry, but I'm having a hard time taking this statement seriously > > > > > when the current source of tension is that the Tempest API itself > > > > > is breaking for its plugins. > > > > > > > > > > Maybe rather than talking about how to release compatible things > > > > > together, we should go back and talk about why Tempest's API is changing > > > > > in a way that can't be made backwards-compatible. Can you give some more > > > > > detail about that? > > > > > > > > > > > > > Well it's not, if it did that would violate all the stability guarantees > > > > provided by Tempest's library and plugin interface. I've not ever heard of > > > > these kind of backwards incompatibilities in those interfaces and we go to > > > > all effort to make sure we don't break them. Where did the idea that > > > > backwards incompatible changes where being introduced come from? > > > > > > In his original post, gmann said, "There might be some changes in > > > Tempest which might not work with older version of Tempest Plugins." > > > I was surprised to hear that, but I'm not sure how else to interpret > > > that statement. > > > > I did not mean to say that Tempest will introduce the changes in backward incompatible way which can break plugins. That cannot happen as all plugins and tempest are branchless and they are being tested with master Tempest so if we change anything backward incompatible then it break the plugins gate. Even we have to remove any deprecated interfaces from Tempest, we fix all plugins first like - https://review.openstack.org/#/q/topic:remove-support-of-cinder-v1-api+(status:open+OR+status:merged) > > > > What I mean to say here is that adding new or removing deprecated interface in Tempest might not work with all released version or unreleased Plugins. That point is from point of view of using Tempest and Plugins in production cloud testing not gate(where we keep the compatibility). Production Cloud user use Tempest cycle based version. Pike based Cloud will be tested by Tempest 17.0.0 not latest version (though latest version might work). > > > > This thread is not just for gate testing point of view (which seems to be always interpreted), this is more for user using Tempest and Plugins for their cloud testing. I am looping operator mail list also which i forgot in initial post. > > > > We do not have any tag/release from plugins to know what version of plugin can work with what version of tempest. For Example If There is new interface introduced by Tempest 19.0.0 and pluginX start using it. Now it can create issues for pluginX in both release model 1. plugins with no release (I will call this PluginNR), 2. plugins with independent release (I will call it PluginIR). > > > > Users (Not Gate) will face below issues: > > - User cannot use PluginNR with Tempest <19.0.0 (where that new interface was not present). And there is no PluginNR release/tag as this is unreleased and not branched software. > > - User cannot find a PluginIR particular tag/release which can work with tempest <19.0.0 (where that new interface was not present). Only way for user to make it work is to manually find out the PluginIR tag/commit before PluginIR started consuming the new interface. > > > > Let me make it more clear via diagram: > > PluginNR PluginIR > > > > Tempest 19.0.0 > > Add NewInterface Use NewInterface Use NewInterface > > > > > > Tempest 18.0.0 > > NewInterface not present No version of PluginNR Unknown version of PluginIR > > > > > > GATE (No Issue as latest things always being tested live ): OK OK > > > > User issues: X (does not work) Hard to find compatible version > > > > Adding it here as formatting issue to read it- http://paste.openstack.org/show/724347/ > > We need a particular tag from Plugins for OpenStack release, EOL of OpenStack release like Tempest does so that user can test their old release Cloud in easy way. > > > > -gmann > > > > > > > > > As for this whole thread I don't understand any of the points being brought up > > > > in the original post or any of the follow ons, things seem to have been confused > > > > from the start. The ask from users at the summit was simple. When a new OpenStack > > > > release is pushed we push a tempest release to mark that (the next one will be > > > > 19.0.0 to mark Rocky). Users were complaining that many plugins don't have a > > > > corresponding version to mark support for a new release. So when trying to run > > > > against a rocky cloud you get tempest 19.0.0 and then a bunch of plugins for > > > > various services at different sha1s which have to be manually looked up based > > > > on dates. All users wanted at the summit was a tag for plugins like tempest > > > > does with the first number in: > > > > > > > > https://docs.openstack.org/tempest/latest/overview.html#release-versioning > > > > > > > > which didn't seem like a bad idea to me. I'm not sure the best mechanism to > > > > accomplish this, because I agree with much of what plugin maintainers were > > > > saying on the thread about wanting to control their own releases. But the > > > > desire to make sure users have a tag they can pull for the addition or > > > > removal of a supported release makes sense as something a plugin should do. > > > > > > We don't coordinate versions across projects anywhere else, for a > > > bunch of reasons including the complexity of coordinating the details > > > and the confusion it causes when the first version of something is > > > 19.0.0. Instead, we list the compatible versions of everything > > > together on a series-specific page on releases.o.o. That seems to > > > be enough to help anyone wanting to know which versions of tools > > > work together. The data is also available in YAML files, so it's easy > > > enough to consume by automation. > > > > > > Would that work for tempest and it's plugins, too? > > > > > > Is the problem that the versions are not the same, or that some of the > > > plugins are not being tagged at all? > > > > > > Doug > > > > > > __________________________________________________________________________ > > > OpenStack Development Mailing List (not for usage questions) > > > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > > > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > From thierry at openstack.org Thu Jun 28 08:47:31 2018 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 28 Jun 2018 10:47:31 +0200 Subject: [Openstack-operators] [all] [ptg] PTG high-level schedule Message-ID: Hi everyone, In the attached picture you will find the proposed schedule for the various tracks at the Denver PTG in September. We did our best to avoid the key conflicts that the track leads (PTLs, SIG leads...) mentioned in their PTG survey responses, although there was no perfect solution that would avoid all conflicts. If there is a critical conflict that was missed, please let us know, but otherwise we are not planning to change this proposal. You'll notice that: - The Ops meetup team is still evaluating what days would be best for the Ops meetup that will be co-located with the PTG. We'll communicate about it as soon as we have the information. - Keystone track is split in two: one day on Monday for cross-project discussions around identity management, and two days on Thursday/Friday for team discussions. - The "Ask me anything" project helproom on Monday/Tuesday is for horizontal support teams (infrastructure, release management, stable maint, requirements...) to provide support for other teams, SIGs and workgroups and answer their questions. Goal champions should also be available there to help with Stein goal completion questions. - Like in Dublin, a number of tracks do not get pre-allocated time, and will be scheduled on the spot in available rooms at the time that makes the most sense for the participants. - Every track will be able to book extra time and space in available extra rooms at the event. To find more information about the event, register or book a room at the event hotel, visit: https://www.openstack.org/ptg Note that the first round of applications for travel support to the event is closing at the end of this week ! Apply if you need financial help attending the event: https://openstackfoundation.formstack.com/forms/travelsupportptg_denver_2018 See you there ! -- Thierry Carrez (ttx) -------------- next part -------------- A non-text attachment was scrubbed... Name: ptg4.png Type: image/png Size: 80930 bytes Desc: not available URL: From mnaser at vexxhost.com Thu Jun 28 16:56:22 2018 From: mnaser at vexxhost.com (Mohammed Naser) Date: Thu, 28 Jun 2018 12:56:22 -0400 Subject: [Openstack-operators] [openstack-ansible] dropping selinux support Message-ID: Hi everyone: This email is to ask if there is anyone out there opposed to removing SELinux bits from OpenStack ansible, it's blocking some of the gates and the maintainers for them are no longer working on the project unfortunately. I'd like to propose removing any SELinux stuff from OSA based on the following: 1) We don't gate on it, we don't test it, we don't support it. If you're running OSA with SELinux enforcing, please let us know how :-) 2) It extends beyond the scope of the deployment project and there are no active maintainers with the resources to deal with them 3) With the work currently in place to let OpenStack Ansible install distro packages, we can rely on upstream `openstack-selinux` package to deliver deployments that run with SELinux on. Is there anyone opposed to removing it? If so, please let us know. :-) Thanks! Mohammed -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From mnaser at vexxhost.com Thu Jun 28 17:00:03 2018 From: mnaser at vexxhost.com (Mohammed Naser) Date: Thu, 28 Jun 2018 13:00:03 -0400 Subject: [Openstack-operators] [openstack-ansible] dropping selinux support In-Reply-To: References: Message-ID: Also, this is the change that drops it, so feel free to vote with your opinion there too: https://review.openstack.org/578887 Drop SELinux support from os_swift On Thu, Jun 28, 2018 at 12:56 PM, Mohammed Naser wrote: > Hi everyone: > > This email is to ask if there is anyone out there opposed to removing > SELinux bits from OpenStack ansible, it's blocking some of the gates > and the maintainers for them are no longer working on the project > unfortunately. > > I'd like to propose removing any SELinux stuff from OSA based on the following: > > 1) We don't gate on it, we don't test it, we don't support it. If > you're running OSA with SELinux enforcing, please let us know how :-) > 2) It extends beyond the scope of the deployment project and there are > no active maintainers with the resources to deal with them > 3) With the work currently in place to let OpenStack Ansible install > distro packages, we can rely on upstream `openstack-selinux` package > to deliver deployments that run with SELinux on. > > Is there anyone opposed to removing it? If so, please let us know. :-) > > Thanks! > Mohammed > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From mnaser at vexxhost.com Thu Jun 28 21:08:19 2018 From: mnaser at vexxhost.com (Mohammed Naser) Date: Thu, 28 Jun 2018 17:08:19 -0400 Subject: [Openstack-operators] [openstack-dev] [openstack-ansible] dropping selinux support In-Reply-To: <20180628210334.GA17798@localhost.localdomain> References: <20180628210334.GA17798@localhost.localdomain> Message-ID: Hi Paul: On Thu, Jun 28, 2018 at 5:03 PM, Paul Belanger wrote: > On Thu, Jun 28, 2018 at 12:56:22PM -0400, Mohammed Naser wrote: >> Hi everyone: >> >> This email is to ask if there is anyone out there opposed to removing >> SELinux bits from OpenStack ansible, it's blocking some of the gates >> and the maintainers for them are no longer working on the project >> unfortunately. >> >> I'd like to propose removing any SELinux stuff from OSA based on the following: >> >> 1) We don't gate on it, we don't test it, we don't support it. If >> you're running OSA with SELinux enforcing, please let us know how :-) >> 2) It extends beyond the scope of the deployment project and there are >> no active maintainers with the resources to deal with them >> 3) With the work currently in place to let OpenStack Ansible install >> distro packages, we can rely on upstream `openstack-selinux` package >> to deliver deployments that run with SELinux on. >> >> Is there anyone opposed to removing it? If so, please let us know. :-) >> > While I don't use OSA, I would be surprised to learn that selinux wouldn't be > supported. I also understand it requires time and care to maintain. Have you > tried reaching out to people in #RDO, IIRC all those packages should support > selinux. Indeed, the support from RDO for SELinux works very well. In this case however, OpenStack ansible deploys from source and therefore places binaries in different places than the default expected locations for the upstream `openstack-selinux`. As we work towards adding 'distro' support (which to clarify, it means install from RPMs or DEBs rather than from source), we'll be able to pull in that package and automagically get SELinux support that's supported by an upstream that tracks it. > As for gating, maybe default to selinux passive for it to report errors, but not > fail. And if anybody is interested in support it, they can do so and enable > enforcing again when everything is fixed. That's reasonable. However, right now we have bugs around the distribution of SELinux modules and how they are compiled inside the the containers, which means that we're not having problems with the rules as much as uploading the rules and getting them compiled inside the server. I hope I cleared up a bit more of our side of things, I'm actually looking forward for us being able to support upstream distro packages. > - Paul > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From sean.mcginnis at gmx.com Thu Jun 28 22:19:55 2018 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 28 Jun 2018 17:19:55 -0500 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <20180626173651.GA3929@sm-workstation> References: <20180626164210.GA1445@sm-workstation> <20180626173651.GA3929@sm-workstation> Message-ID: <20180628221955.GA6855@sm-workstation> > > Plan > > ---- > > So to recap above, I would propose the following actions be taken: > > > > 1. Create sig-operators as a group to manage operator efforts at least related > > to what needs to be done in repos. > > 2. Create an openstack/operations-guide repo to be the new home of the > > operations documentation. > > One correction to this - that repo already exists. It has been retired, so I > think the action here would just be to "un-retire" the repo and get things > updated to start publishing again. > > > 3. Create a new StoryBoard project to help track work in these repos Update on progress for this: Step 1 ------ For step 1, the reason for creating a SIG is there is a policy that anything publishing to docs.openstack.org is owned by some sort of official team. I had proposed an Operator SIG (https://review.openstack.org/578408) but Thierry rightly points out that the scope of the way I proposed it is perhaps a little too broad. I wanted to leave room for ownership of other non-documentation efforts, but perhaps that is not the best plan. I can change that, but I think the better approach right now is just to see if the UC is willing to be the owners of this repo since they are already the owners for a few others: http://git.openstack.org/cgit/openstack/governance/tree/reference/user-committee-repos.yaml#n14 I will propose that soon. Step 2 ------ I have gone through the infra steps to unretire the openstack/operations-guide repo and set up docs build jobs to run on proposed patches and a publishing job to publish merged patches to docs.openstack.org. That should give us https://docs.openstack.org/operations-guide/ once we merge an update. I've also restored the content by pulling the latest out of the openstack-manuals repo just prior to when it was removed. Sorry, but I was not able to preserve the git history for all of it, but I do not the source of the content in the commit message: https://review.openstack.org/578946 We've updated the "core" group that has rights to merge patches for that repo and Melvin has sent out an email to see if any of the existing members still want to be involved. Hopefully we can regrow that list over time. Step 3 ------ I do still need to look into the StoryBoard project creation. This is lower priority than the other tasks, but I will try to get to this step soon. Thanks, Sean From zioproto at gmail.com Fri Jun 29 12:55:36 2018 From: zioproto at gmail.com (Saverio Proto) Date: Fri, 29 Jun 2018 14:55:36 +0200 Subject: [Openstack-operators] Neutron not adding iptables rules for metadata agent In-Reply-To: References: <5f716e6218597640ec4c5aaa9a80518639088680.camel@emag.ro> Message-ID: Hello, I would suggest to open a bug on launchpad to track this issue. thank you Saverio 2018-06-18 12:19 GMT+02:00 Radu Popescu | eMAG, Technology : > Hi, > > We're using Openstack Ocata, deployed using Openstack Ansible v15.1.7. > Neutron server is v10.0.3. > I can see enable_isolated_metadata and enable_metadata_network only used for > isolated networks that don't have a router which is not our case. > Also, I checked all namespaces on all our novas and only affected 6 out of > 66 ..and only 1 namespace / nova. Seems like isolated case that doesn't > happen very often. > > Can it be RabbitMQ? I'm not sure where to check. > > Thanks, > Radu > > On Fri, 2018-06-15 at 17:11 +0200, Saverio Proto wrote: > > Hello Radu, > > > yours look more or less like a bug report. This you check existing > > open bugs for neutron ? Also what version of openstack are you running > > ? > > > how did you configure enable_isolated_metadata and > > enable_metadata_network options ? > > > Saverio > > > 2018-06-13 12:45 GMT+02:00 Radu Popescu | eMAG, Technology > > : > > Hi all, > > > So, I'm having the following issue. I'm creating a VM with floating IP. > > Everything is fine, namespace is there, postrouting and prerouting from the > > internal IP to the floating IP are there. The only rules missing are the > > rules to access metadata service: > > > -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp > > --dport 80 -j REDIRECT --to-ports 9697 > > -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp > > --dport 80 -j MARK --set-xmark 0x1/0xffff > > > (this is taken from another working namespace with iptables-save) > > > Forgot to mention, VM is booting ok, I have both the default route and the > > one for the metadata service (cloud-init is running at boot time): > > [ 57.150766] cloud-init[892]: ci-info: > > +--------+------+--------------+---------------+-------+-------------------+ > > [ 57.150997] cloud-init[892]: ci-info: | Device | Up | Address | > > Mask | Scope | Hw-Address | > > [ 57.151219] cloud-init[892]: ci-info: > > +--------+------+--------------+---------------+-------+-------------------+ > > [ 57.151431] cloud-init[892]: ci-info: | lo: | True | 127.0.0.1 | > > 255.0.0.0 | . | . | > > [ 57.151627] cloud-init[892]: ci-info: | eth0: | True | 10.240.9.186 | > > 255.255.252.0 | . | fa:16:3e:43:d1:c2 | > > [ 57.151815] cloud-init[892]: ci-info: > > +--------+------+--------------+---------------+-------+-------------------+ > > [ 57.152018] cloud-init[892]: ci-info: > > +++++++++++++++++++++++++++++++Route IPv4 > > info++++++++++++++++++++++++++++++++ > > [ 57.152225] cloud-init[892]: ci-info: > > +-------+-----------------+------------+-----------------+-----------+-------+ > > [ 57.152426] cloud-init[892]: ci-info: | Route | Destination | > > Gateway | Genmask | Interface | Flags | > > [ 57.152621] cloud-init[892]: ci-info: > > +-------+-----------------+------------+-----------------+-----------+-------+ > > [ 57.152813] cloud-init[892]: ci-info: | 0 | 0.0.0.0 | > > 10.240.8.1 | 0.0.0.0 | eth0 | UG | > > [ 57.153013] cloud-init[892]: ci-info: | 1 | 10.240.1.0 | > > 0.0.0.0 | 255.255.255.0 | eth0 | U | > > [ 57.153202] cloud-init[892]: ci-info: | 2 | 10.240.8.0 | > > 0.0.0.0 | 255.255.252.0 | eth0 | U | > > [ 57.153397] cloud-init[892]: ci-info: | 3 | 169.254.169.254 | > > 10.240.8.1 | 255.255.255.255 | eth0 | UGH | > > [ 57.153579] cloud-init[892]: ci-info: > > +-------+-----------------+------------+-----------------+-----------+-------+ > > > The extra route is there because the tenant has 2 subnets. > > > Before adding those 2 rules manually, I had this coming from cloud-init: > > > [ 192.451801] cloud-init[892]: 2018-06-13 12:29:26,179 - > > url_helper.py[WARNING]: Calling > > 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: > > request error [('Connection aborted.', error(113, 'No route to host'))] > > [ 193.456805] cloud-init[892]: 2018-06-13 12:29:27,184 - > > url_helper.py[WARNING]: Calling > > 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [1/120s]: > > request error [('Connection aborted.', error(113, 'No route to host'))] > > [ 194.461592] cloud-init[892]: 2018-06-13 12:29:28,189 - > > url_helper.py[WARNING]: Calling > > 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [2/120s]: > > request error [('Connection aborted.', error(113, 'No route to host'))] > > [ 195.466441] cloud-init[892]: 2018-06-13 12:29:29,194 - > > url_helper.py[WARNING]: Calling > > 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]: > > request error [('Connection aborted.', error(113, 'No route to host'))] > > > I can see no errors in neither nova or neutron services. > > In the mean time, I've searched all our nova servers for this kind of > > behavior and we have 1 random namespace missing those rules on 6 of our 66 > > novas. > > > Any ideas would be greatly appreciated. > > > Thanks, > > Radu > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > From radu.popescu at emag.ro Fri Jun 29 13:14:12 2018 From: radu.popescu at emag.ro (Radu Popescu | eMAG, Technology) Date: Fri, 29 Jun 2018 13:14:12 +0000 Subject: [Openstack-operators] Neutron not adding iptables rules for metadata agent In-Reply-To: References: <5f716e6218597640ec4c5aaa9a80518639088680.camel@emag.ro> Message-ID: Well, right now, I’ve managed to manually add those rules. For now, I will assume it was from the RabbitMQ upgrade process I’ve done few weeks ago. If the issue reappears, I’ll make sure I’ll add a bug report. Thanks, Radu > On Jun 29, 2018, at 3:55 PM, Saverio Proto wrote: > > Hello, > > I would suggest to open a bug on launchpad to track this issue. > > thank you > > Saverio > > 2018-06-18 12:19 GMT+02:00 Radu Popescu | eMAG, Technology > : >> Hi, >> >> We're using Openstack Ocata, deployed using Openstack Ansible v15.1.7. >> Neutron server is v10.0.3. >> I can see enable_isolated_metadata and enable_metadata_network only used for >> isolated networks that don't have a router which is not our case. >> Also, I checked all namespaces on all our novas and only affected 6 out of >> 66 ..and only 1 namespace / nova. Seems like isolated case that doesn't >> happen very often. >> >> Can it be RabbitMQ? I'm not sure where to check. >> >> Thanks, >> Radu >> >> On Fri, 2018-06-15 at 17:11 +0200, Saverio Proto wrote: >> >> Hello Radu, >> >> >> yours look more or less like a bug report. This you check existing >> >> open bugs for neutron ? Also what version of openstack are you running >> >> ? >> >> >> how did you configure enable_isolated_metadata and >> >> enable_metadata_network options ? >> >> >> Saverio >> >> >> 2018-06-13 12:45 GMT+02:00 Radu Popescu | eMAG, Technology >> >> : >> >> Hi all, >> >> >> So, I'm having the following issue. I'm creating a VM with floating IP. >> >> Everything is fine, namespace is there, postrouting and prerouting from the >> >> internal IP to the floating IP are there. The only rules missing are the >> >> rules to access metadata service: >> >> >> -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp >> >> --dport 80 -j REDIRECT --to-ports 9697 >> >> -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp >> >> --dport 80 -j MARK --set-xmark 0x1/0xffff >> >> >> (this is taken from another working namespace with iptables-save) >> >> >> Forgot to mention, VM is booting ok, I have both the default route and the >> >> one for the metadata service (cloud-init is running at boot time): >> >> [ 57.150766] cloud-init[892]: ci-info: >> >> +--------+------+--------------+---------------+-------+-------------------+ >> >> [ 57.150997] cloud-init[892]: ci-info: | Device | Up | Address | >> >> Mask | Scope | Hw-Address | >> >> [ 57.151219] cloud-init[892]: ci-info: >> >> +--------+------+--------------+---------------+-------+-------------------+ >> >> [ 57.151431] cloud-init[892]: ci-info: | lo: | True | 127.0.0.1 | >> >> 255.0.0.0 | . | . | >> >> [ 57.151627] cloud-init[892]: ci-info: | eth0: | True | 10.240.9.186 | >> >> 255.255.252.0 | . | fa:16:3e:43:d1:c2 | >> >> [ 57.151815] cloud-init[892]: ci-info: >> >> +--------+------+--------------+---------------+-------+-------------------+ >> >> [ 57.152018] cloud-init[892]: ci-info: >> >> +++++++++++++++++++++++++++++++Route IPv4 >> >> info++++++++++++++++++++++++++++++++ >> >> [ 57.152225] cloud-init[892]: ci-info: >> >> +-------+-----------------+------------+-----------------+-----------+-------+ >> >> [ 57.152426] cloud-init[892]: ci-info: | Route | Destination | >> >> Gateway | Genmask | Interface | Flags | >> >> [ 57.152621] cloud-init[892]: ci-info: >> >> +-------+-----------------+------------+-----------------+-----------+-------+ >> >> [ 57.152813] cloud-init[892]: ci-info: | 0 | 0.0.0.0 | >> >> 10.240.8.1 | 0.0.0.0 | eth0 | UG | >> >> [ 57.153013] cloud-init[892]: ci-info: | 1 | 10.240.1.0 | >> >> 0.0.0.0 | 255.255.255.0 | eth0 | U | >> >> [ 57.153202] cloud-init[892]: ci-info: | 2 | 10.240.8.0 | >> >> 0.0.0.0 | 255.255.252.0 | eth0 | U | >> >> [ 57.153397] cloud-init[892]: ci-info: | 3 | 169.254.169.254 | >> >> 10.240.8.1 | 255.255.255.255 | eth0 | UGH | >> >> [ 57.153579] cloud-init[892]: ci-info: >> >> +-------+-----------------+------------+-----------------+-----------+-------+ >> >> >> The extra route is there because the tenant has 2 subnets. >> >> >> Before adding those 2 rules manually, I had this coming from cloud-init: >> >> >> [ 192.451801] cloud-init[892]: 2018-06-13 12:29:26,179 - >> >> url_helper.py[WARNING]: Calling >> >> 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: >> >> request error [('Connection aborted.', error(113, 'No route to host'))] >> >> [ 193.456805] cloud-init[892]: 2018-06-13 12:29:27,184 - >> >> url_helper.py[WARNING]: Calling >> >> 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [1/120s]: >> >> request error [('Connection aborted.', error(113, 'No route to host'))] >> >> [ 194.461592] cloud-init[892]: 2018-06-13 12:29:28,189 - >> >> url_helper.py[WARNING]: Calling >> >> 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [2/120s]: >> >> request error [('Connection aborted.', error(113, 'No route to host'))] >> >> [ 195.466441] cloud-init[892]: 2018-06-13 12:29:29,194 - >> >> url_helper.py[WARNING]: Calling >> >> 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]: >> >> request error [('Connection aborted.', error(113, 'No route to host'))] >> >> >> I can see no errors in neither nova or neutron services. >> >> In the mean time, I've searched all our nova servers for this kind of >> >> behavior and we have 1 random namespace missing those rules on 6 of our 66 >> >> novas. >> >> >> Any ideas would be greatly appreciated. >> >> >> Thanks, >> >> Radu >> >> >> _______________________________________________ >> >> OpenStack-operators mailing list >> >> OpenStack-operators at lists.openstack.org >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >> From mrhillsman at gmail.com Fri Jun 29 17:59:00 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Fri, 29 Jun 2018 10:59:00 -0700 Subject: [Openstack-operators] Reminder: User Committee Meeting - Monday July 2nd @1400UTC Message-ID: <5b366ec68bafdc64b1000004@polymail.io> Hi everyone, Please be sure to join us - if not getting ready for firecrackers - on Monday July 2nd @1400UTC in #openstack-uc for weekly User Committee meeting. Also you can freely add to the meeting agenda here -  ( https://share.polymail.io/v1/z/b/NWIzNjZlYzY4YmFm/a4uka23nadCqVfJyXGVUd8seO-2TVguxjo5CXjMRfv6BfwuOHgsFdwaD5gqm42rq7S0EQJ2MIGcgH9AtUVfEieePkQzsFoAt1OaUgaIp0NtZpZK4dWfyHXTS3KuBASt50Uw1EdlADr41wcc2nQVQpFf9trzWdTHt9_ZjAc0PQrBJvTlG2nXDmvunA1m2N-H8jMIRsejqbpleDwqc7eXzV-xJPvCinnzRWGeMohmiMraUGS3wlftXrtqhmmXCWh0aW0Xrr-GB2aoJBOwSodyJl5DisHXFxMlnk_z6OYrHfl2rU_ByIO4rhUL9zYxT ) Governance/Foundation/UserCommittee - OpenStack ( https://share.polymail.io/v1/z/b/NWIzNjZlYzY4YmFm/a4uka23nadCqVfJyXGVUd8seO-2TVguxjo5CXjMRfv6BfwuOHgsFdwaD5gqm42rq7S0EQJ2MIGcgH9AtUVfEieePkQzsFoAt1OaUgaIp0NtZpZK4dWfyHXTS3KuBASt50Uw1EdlADr41wcc2nQVQpFf9trzWdTHt9_ZjAc0PQrBJvTlG2nXDmvunA1m2N-H8jMIRsejqbpleDwqc7eXzV-xJPvCinnzRWGeMohmiMraUGS3wlftXrtqhmmXCWh0aW0Xrr-GB2aoJBOwSodyJl5DisHXFxMlnk_z6OYrHfl2rU_ByIO4rhUL9zYxT ) ( https://share.polymail.io/v1/z/b/NWIzNjZlYzY4YmFm/a4uka23nadCqVfJyXGVUd8seO-2TVguxjo5CXjMRfv6BfwuOHgsFdwaD5gqm42rq7S0EQJ2MIGcgH9AtUVfEieePkQzsFoAt1OaUgaIp0NtZpZK4dWfyHXTS3KuBASt50Uw1EdlADr41wcc2nQVQpFf9trzWdTHt9_ZjAc0PQrBJvTlG2nXDmvunA1m2N-H8jMIRsejqbpleDwqc7eXzV-xJPvCinnzRWGeMohmiMraUGS3wlftXrtqhmmXCWh0aW0Xrr-GB2aoJBOwSodyJl5DisHXFxMlnk_z6OYrHfl2rU_ByIO4rhUL9zYxT ) WIKI.OPENSTACK.ORG ( https://share.polymail.io/v1/z/b/NWIzNjZlYzY4YmFm/a4uka23nadCqVfJyXGVUd8seO-2TVguxjo5CXjMRfv6BfwuOHgsFdwaD5gqm42rq7S0EQJ2MIGcgH9AtUVfEieePkQzsFoAt1OaUgaIp0NtZpZK4dWfyHXTS3KuBASt50Uw1EdlADr41wcc2nQVQpFf9trzWdTHt9_ZjAc0PQrBJvTlG2nXDmvunA1m2N-H8jMIRsejqbpleDwqc7eXzV-xJPvCinnzRWGeMohmiMraUGS3wlftXrtqhmmXCWh0aW0Xrr-GB2aoJBOwSodyJl5DisHXFxMlnk_z6OYrHfl2rU_ByIO4rhUL9zYxT ) --  Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: