From flux.adam at gmail.com Fri Nov 1 02:29:44 2019 From: flux.adam at gmail.com (Adam Harwell) Date: Thu, 31 Oct 2019 19:29:44 -0700 Subject: [ospurge] looking for project owners / considering adoption In-Reply-To: <576E74EB-ED80-497F-9706-482FE0433208@gmail.com> References: <342983ed-1d22-8f3a-3335-f153512ec2b2@catalyst.net.nz> <576E74EB-ED80-497F-9706-482FE0433208@gmail.com> Message-ID: Yeah, I've halted work for now until the summit when I can talk to other folks (at the related meeting that is scheduled) -- and I believe I agree that merging this functionality into the SDK is the best path forward. I will probably be able to assist somewhat with that as well. I wish I knew about the discussions that happened last year, I could already have been working on that... T_T --Adam On Wed, Oct 30, 2019 at 7:43 AM Artem Goncharov wrote: > Hi Adam, > > Since I need this now as well I will start working on implementation how > it was agreed (in SDK and in OSC) during last summit by mid of November. > There is no need for discussing this further, it just need to be > implemented. Sad that we got no progress in half a year. > > Regards, > Artem (gtema). > > On 30. Oct 2019, at 14:26, Adam Harwell wrote: > > That's too bad that you won't be at the summit, but I think there may > still be some discussion planned about this topic. > > Yeah, I understand completely about priorities and such internally. Same > for me... It just happens that this IS priority work for us right now. :) > > > On Tue, Oct 29, 2019, 07:48 Adrian Turjak wrote: > >> My apologies I missed this email. >> >> Sadly I won't be at the summit this time around. There may be some public >> cloud focused discussions, and some of those often have this topic come up. >> Also if Monty from the SDK team is around, I'd suggest finding him and >> having a chat. >> >> I'll help if I can but we are swamped with internal work and I can't >> dedicate much time to do upstream work that isn't urgent. :( >> On 17/10/19 8:48 am, Adam Harwell wrote: >> >> That's interesting -- we have already started working to add features and >> improve ospurge, and it seems like a plenty useful tool for our needs, but >> I think I agree that it would be nice to have that functionality built into >> the sdk. I might be able to help with both, since one is immediately useful >> and we (like everyone) have deadlines to meet, and the other makes sense to >> me as a possible future direction that could be more widely supported. >> >> Will you or someone else be hosting and discussion about this at the >> Shanghai summit? I'll be there and would be happy to join and discuss. >> >> --Adam >> >> On Tue, Oct 15, 2019, 22:04 Adrian Turjak >> wrote: >> >>> I tried to get a community goal to do project deletion per project, but >>> we ended up deciding that a community goal wasn't ideal unless we did >>> build a bulk delete API in each service: >>> https://review.opendev.org/#/c/639010/ >>> https://etherpad.openstack.org/p/community-goal-project-deletion >>> https://etherpad.openstack.org/p/DEN-Deletion-of-resources >>> https://etherpad.openstack.org/p/DEN-Train-PublicCloudWG-brainstorming >>> >>> What we decided on, but didn't get a chance to work on, was building >>> into the OpenstackSDK OS-purge like functionality, as well as reporting >>> functionality (of all project resources to be deleted). That way we >>> could have per project per resource deletion logic, and all of that >>> defined in the SDK. >>> >>> I was up for doing some of the work, but ended up swamped with internal >>> work and just didn't drive or push for the deletion work upstream. >>> >>> If you want to do something useful, don't pursue OS-Purge, help us add >>> that official functionality to the SDK, and then we can push for bulk >>> deletion APIs in each project to make resource deletion more pleasant. >>> >>> I'd be happy to help with the work, and Monty on the SDK team will most >>> likely be happy to as well. :) >>> >>> Cheers, >>> Adrian >>> >>> On 1/10/19 11:48 am, Adam Harwell wrote: >>> > I haven't seen much activity on this project in a while, and it's been >>> > moved to opendev/x since the opendev migration... Who is the current >>> > owner of this project? Is there anyone who actually is maintaining it, >>> > or would mind if others wanted to adopt the project to move it forward? >>> > >>> > Thanks, >>> > --Adam Harwell >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From colleen at gazlene.net Fri Nov 1 02:49:37 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Thu, 31 Oct 2019 19:49:37 -0700 Subject: [dev][ops][ptg][keystone] Join the keystone onboarding session! Message-ID: <7e0350e7-d249-4f5c-8a54-50c883bfb350@www.fastmail.com> Hello Stackers, If you're a developer, technical writer, operator, or user and interested in getting involved in the keystone project, stop by the keystone onboarding session in Shanghai next week! We will be at the Kilo table in the Blue Room on Wednesday from 9 to 10:30. The format will be open ended, so come with all your questions about how you can participate on the keystone team. Can't make it to the session? Take a look at our contributing guide[1] and feel free to get in touch with me directly. Colleen Murphy / cmurphy (keystone PTL) [1] https://docs.openstack.org/keystone/latest/contributor/how-can-i-help.html From zhang.lei.fly+os-discuss at gmail.com Fri Nov 1 03:42:16 2019 From: zhang.lei.fly+os-discuss at gmail.com (Jeffrey Zhang) Date: Fri, 1 Nov 2019 11:42:16 +0800 Subject: [kolla][ptg] Kolla Ussuri PVPTG (Partially Virtual PTG) In-Reply-To: <196d5a99-53f4-68c7-d28d-b6962abb8b3b@linaro.org> References: <196d5a99-53f4-68c7-d28d-b6962abb8b3b@linaro.org> Message-ID: Zoom is usable in China right now. On Thu, Oct 31, 2019 at 4:58 PM Marcin Juszkiewicz < marcin.juszkiewicz at linaro.org> wrote: > W dniu 30.10.2019 o 23:23, Kendall Nelson pisze: > > > If people were going to be in Shanghai for the Summit (or live in > > China) they wouldn't be able to participate because of the firewall. > > Can you (or someone else present in Poland) provide an alternative > > solution to Google meet so that everyone interested could join? > > Tell us which of them work for you: > > - Bluejeans > - Zoom > > > As I have access to both platforms at work. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yamamoto at midokura.com Fri Nov 1 05:24:51 2019 From: yamamoto at midokura.com (Takashi Yamamoto) Date: Fri, 1 Nov 2019 14:24:51 +0900 Subject: [neutron][ptg] Team dinner In-Reply-To: <20191030211537.trgnve7df27g3jh4@skaplons-mac> References: <20191030211537.trgnve7df27g3jh4@skaplons-mac> Message-ID: hi, On Thu, Oct 31, 2019 at 6:19 AM Slawek Kaplonski wrote: > > Hi neutrinos, > > Thanks to LIU Yulong who helped me a lot to choose and book some restaurant, we > have now booked restaurant: > > Expo source B2, No.168, Shangnan Road, Pudong New Area, Shanghai, TEL: +86 21 > 58882117 > 书院人家(世博源店) 上海市浦东新区上南路168号世博源B2 > The Dianping page: http://www.dianping.com/shop/20877292 > > Dinner is scheduled to Tuesday, 5th Nov at 6pm. > > Restaurant is close to the Expo center. It's about 15 minutes walk according to > the Google maps: https://tinyurl.com/y2rc83ej google maps is quite inaccurate in China. https://j.map.baidu.com/e8/6h https://router.map.qq.com/short?l=8af77e82c3b0deff1adeffb171fedf19 > > -- > Slawek Kaplonski > Senior software engineer > Red Hat > > From cdent+os at anticdent.org Fri Nov 1 10:34:50 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Fri, 1 Nov 2019 10:34:50 +0000 (GMT) Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> Message-ID: On Thu, 31 Oct 2019, Matt Riedemann wrote: > After that we call the scheduler to find a host and that only takes about 1 > second. That POST /allocations call to placement shouldn't be taking around 3 > minutes so something crazy is going on there. Yeah, this is indeed weird and wrong. There are two obvious ways, in the placement code, that _set_allocations (which underlies all the allocation changing requests) could be slowed down: * If the resource provider's generation has changed during the request, because something else is also writing allocations at the same time, there will be up to 10 retries, server-side. * For each of those tries, there's (apparently) a chance of a db deadlock, because the method is guarded by a retry on deadlock. We added that in a long long time ago. However, if the generation retry was happening, it would show up in the logs as "Retrying allocations write on resource provider". And if we were getting deadlock we ought to see "Performing DB retry for function". There could be other non-obvious things, but more data required... (more) > Oct 31 16:52:24.721346 ubuntu-bionic-inap-mtl01-0012620879 > devstack at placement-api.service[8591]: DEBUG placement.requestlog > [req-275af2df-bd4e-4e64-b46e-6582e8de5148 > req-295f7350-2f0b-4f85-8cdf-d76801637221 None None] Starting request: > 198.72.124.104 "POST /placement/allocations" {{(pid=8593) __call__ > /opt/stack/placement/placement/requestlog.py:61}} We start the requestlog at the very beginning of the request so there are a few steps between here and the actual data interaction, so another possible angle of investigation here is that keystonemiddleware was being slow to validate a token for some reason. > I know Chris Dent has done a lot of profiling on placement recently but I'm > not sure if much has been done around profiling the POST /allocations call to > move allocations from one consumer to another. You're right that little was done to profile allocations, mostly because initial experiments showed that it was fast and get /a_c was slow. In the fullness of time I probably would have moved on to allocations but reality intervened. What's probably most important is evaluating allocation writes to the same set of several resource providers under high concurrency. However it doesn't seem like that is what is happening in this case (otherwise I'd expect to see more evidence in the logs). Instead it's "just being slow" which could be any of: * something in the auth process (which if other services are also being slow could be a unifying thing) * the database server having a sad (do we keep a slow query log?) * the vm/vms has/have a noisy neighbor Do we know anything about cpu and io stats during the slow period? We've (sorry, let me rephrase, I've) known for the entire 5.5 years I've been involved in OpenStack that the CI resources are way over-subscribed and controller nodes in the wild are typically way under-specified. Yes, our code should be robust in the face of that, but... Of course, I could be totally wrong, there could be something flat out wrong in the placement code, but if that were the case I'd expect (like Matt did) to see it more often. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From mriedemos at gmail.com Fri Nov 1 13:30:59 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 1 Nov 2019 08:30:59 -0500 Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> Message-ID: On 11/1/2019 5:34 AM, Chris Dent wrote: > Instead it's "just being slow" which could be any of: > > * something in the auth process (which if other services are also >   being slow could be a unifying thing) > * the database server having a sad (do we keep a slow query log?) Not by default but it can be enabled in devstack, like [1] but the resulting log file is so big I can't open it in my browser. So hitting this kind of issue with that enabled is going to be like finding a needle in a haystack I think. > * the vm/vms has/have a noisy neighbor This is my guess as to the culprit. Like Slawek said elsewhere, he's seen this in other APIs that are otherwise really fast. > > Do we know anything about cpu and io stats during the slow period? We have peakmemtracker [2] which shows a jump during the slow period noticed: Oct 31 16:51:43.132081 ubuntu-bionic-inap-mtl01-0012620879 memory_tracker.sh[26186]: [iscsid (pid:18546)]=34352KB; [dmeventd (pid:18039)]=119732KB; [ovs-vswitchd (pid:3626)]=691960KB Oct 31 16:51:43.133012 ubuntu-bionic-inap-mtl01-0012620879 memory_tracker.sh[26186]: ]]] Oct 31 16:55:23.220099 ubuntu-bionic-inap-mtl01-0012620879 memory_tracker.sh[26186]: [[[ Oct 31 16:55:23.221255 ubuntu-bionic-inap-mtl01-0012620879 memory_tracker.sh[26186]: Thu Oct 31 16:55:23 UTC 2019 I don't know what that means though. We also have dstat [3] (which I find hard as hell to read - do people throw that into a nice graphical tool to massage that data for inspection?) which shows a jump as well: Oct 31 16:53:31.515005 ubuntu-bionic-inap-mtl01-0012620879 dstat.sh[25748]: 31-10 16:53:31| 19 7 12 61 0|5806M 509M 350M 1194M| 612B 0 | 0 0 | 0 0 |1510 2087 |17.7 9.92 5.82|3.0 13 2.0| 0 0 |qemu-system-x86 24768 13% 0 0 |python2 25797 99k 451B 0%|mysqld 464M| 20M 8172M| 37 513 0 4 12 Oct 31 16:55:08.604634 ubuntu-bionic-inap-mtl01-0012620879 dstat.sh[25748]: 31-10 16:53:32| 20 7 12 61 0|5806M 509M 350M 1194M|1052B 0 | 0 0 | 0 0 |1495 2076 |17.7 9.92 5.82|4.0 13 0| 0 0 |qemu-system-x86 24768 13% 0 0 |python2 25797 99k 446B0.1%|mysqld 464M| 20M 8172M| 37 513 0 4 12 Looks like around the time of slowness mysqld is the top consuming process which is probably not surprising. > > We've (sorry, let me rephrase, I've) known for the entire 5.5 years > I've been involved in OpenStack that the CI resources are way > over-subscribed and controller nodes in the wild are typically way > under-specified.  Yes, our code should be robust in the face of > that, but... Looking at logstash this mostly hits on OVH and INAP nodes. Question to infra: do we know if those are more oversubscribed than some others like RAX or VEXXHOST nodes? [1] https://review.opendev.org/#/c/691995/ [2] https://13cf3dd11b8f009809dc-97cb3b32849366f5bed744685e46b266.ssl.cf5.rackcdn.com/692206/3/check/tempest-integrated-compute/35ecb4a/controller/logs/screen-peakmem_tracker.txt.gz [3] https://13cf3dd11b8f009809dc-97cb3b32849366f5bed744685e46b266.ssl.cf5.rackcdn.com/692206/3/check/tempest-integrated-compute/35ecb4a/controller/logs/screen-dstat.txt.gz -- Thanks, Matt From mriedemos at gmail.com Fri Nov 1 13:32:52 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 1 Nov 2019 08:32:52 -0500 Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> Message-ID: On 11/1/2019 8:30 AM, Matt Riedemann wrote: > Looks like around the time of slowness mysqld is the top consuming > process which is probably not surprising. Oh I also see this in the mysql error log [1] around the time of the slow period, right after things pick back up: 2019-10-31T16:55:08.603773Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 161281ms. The settings might not be optimal. (flushed=201 and evicted=0, during the time.) So obviously something is going very wrong with mysql (or the node in general) during that time. [1] https://13cf3dd11b8f009809dc-97cb3b32849366f5bed744685e46b266.ssl.cf5.rackcdn.com/692206/3/check/tempest-integrated-compute/35ecb4a/controller/logs/mysql/error_log.txt.gz -- Thanks, Matt From cboylan at sapwetik.org Fri Nov 1 14:55:08 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Fri, 01 Nov 2019 07:55:08 -0700 Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> Message-ID: <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> On Fri, Nov 1, 2019, at 6:30 AM, Matt Riedemann wrote: > On 11/1/2019 5:34 AM, Chris Dent wrote: > > Instead it's "just being slow" which could be any of: > > > > * something in the auth process (which if other services are also > >   being slow could be a unifying thing) > > * the database server having a sad (do we keep a slow query log?) > > Not by default but it can be enabled in devstack, like [1] but the > resulting log file is so big I can't open it in my browser. So hitting > this kind of issue with that enabled is going to be like finding a > needle in a haystack I think. > > > * the vm/vms has/have a noisy neighbor > > This is my guess as to the culprit. Like Slawek said elsewhere, he's > seen this in other APIs that are otherwise really fast. > > > > > Do we know anything about cpu and io stats during the slow period? > > We have peakmemtracker [2] which shows a jump during the slow period > noticed: > > Oct 31 16:51:43.132081 ubuntu-bionic-inap-mtl01-0012620879 > memory_tracker.sh[26186]: [iscsid (pid:18546)]=34352KB; [dmeventd > (pid:18039)]=119732KB; [ovs-vswitchd (pid:3626)]=691960KB > Oct 31 16:51:43.133012 ubuntu-bionic-inap-mtl01-0012620879 > memory_tracker.sh[26186]: ]]] > Oct 31 16:55:23.220099 ubuntu-bionic-inap-mtl01-0012620879 > memory_tracker.sh[26186]: [[[ > Oct 31 16:55:23.221255 ubuntu-bionic-inap-mtl01-0012620879 > memory_tracker.sh[26186]: Thu Oct 31 16:55:23 UTC 2019 > > I don't know what that means though. > > We also have dstat [3] (which I find hard as hell to read - do people > throw that into a nice graphical tool to massage that data for > inspection?) which shows a jump as well: I put the dstat csv files in https://lamada.eu/dstat-graph/ and that works reasonably well. > > Oct 31 16:53:31.515005 ubuntu-bionic-inap-mtl01-0012620879 > dstat.sh[25748]: 31-10 16:53:31| 19 7 12 61 0|5806M 509M 350M > 1194M| 612B 0 | 0 0 | 0 0 |1510 2087 |17.7 9.92 5.82|3.0 > 13 2.0| 0 0 |qemu-system-x86 24768 13% 0 0 |python2 > 25797 99k 451B 0%|mysqld 464M| 20M 8172M| 37 513 > 0 4 12 > Oct 31 16:55:08.604634 ubuntu-bionic-inap-mtl01-0012620879 > dstat.sh[25748]: 31-10 16:53:32| 20 7 12 61 0|5806M 509M 350M > 1194M|1052B 0 | 0 0 | 0 0 |1495 2076 |17.7 9.92 5.82|4.0 > 13 0| 0 0 |qemu-system-x86 24768 13% 0 0 |python2 > 25797 99k 446B0.1%|mysqld 464M| 20M 8172M| 37 513 > 0 4 12 > > Looks like around the time of slowness mysqld is the top consuming > process which is probably not surprising. > > > > > We've (sorry, let me rephrase, I've) known for the entire 5.5 years > > I've been involved in OpenStack that the CI resources are way > > over-subscribed and controller nodes in the wild are typically way > > under-specified.  Yes, our code should be robust in the face of > > that, but... > > Looking at logstash this mostly hits on OVH and INAP nodes. Question to > infra: do we know if those are more oversubscribed than some others like > RAX or VEXXHOST nodes? I believe both OVH and INAP give us dedicated hypervisors. This means that we will end up being our own noisy neighbors there. I don't know what level we oversubscribe at but amorin (OVH) and benj_ (INAP) can probably provide more info. INAP was also recently turned back on. It had been offline for redeployment and that was completed and added back to the pool. Possible that more than just the openstack version has changed? OVH controls the disk IOPs that we get pretty aggressively as well. Possible it is an IO thing? > > [1] https://review.opendev.org/#/c/691995/ > [2] > https://13cf3dd11b8f009809dc-97cb3b32849366f5bed744685e46b266.ssl.cf5.rackcdn.com/692206/3/check/tempest-integrated-compute/35ecb4a/controller/logs/screen-peakmem_tracker.txt.gz > [3] > https://13cf3dd11b8f009809dc-97cb3b32849366f5bed744685e46b266.ssl.cf5.rackcdn.com/692206/3/check/tempest-integrated-compute/35ecb4a/controller/logs/screen-dstat.txt.gz From mriedemos at gmail.com Fri Nov 1 15:22:45 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 1 Nov 2019 10:22:45 -0500 Subject: State of the Gate (placement?) In-Reply-To: <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> Message-ID: On 11/1/2019 9:55 AM, Clark Boylan wrote: > OVH controls the disk IOPs that we get pretty aggressively as well. Possible it is an IO thing? Yeah, so looking at the dstat output in that graph (thanks for pointing out that site, really nice) we basically have 0 I/O from 16:53 to 16:55, so uh, that's probably not good. -- Thanks, Matt From rfolco at redhat.com Fri Nov 1 16:45:53 2019 From: rfolco at redhat.com (Rafael Folco) Date: Fri, 1 Nov 2019 13:45:53 -0300 Subject: [tripleo] TripleO CI Summary: Sprint 38 Message-ID: Greetings, The TripleO CI team has just completed Sprint 38 / Unified Sprint 17 (Oct 10 thru Oct 30). The following is a summary of completed work during this sprint cycle: - Tested the temporary manifest implementation in the new promoter for not breaking promotion workflow. Issues to be bootstrapped in the next sprint. - Implemented CI jobs in zuul to build and run tests against ceph-ansible and podman pull requests in github. A PoC has been created to validate the usage of zuul-distro-jobs project for dealing with RPM builds. - Ceph-ansible: initial standalone job added to test pull requests - Podman: integration with RDO software factory is done - Closed-out Train release branching in both upstream and periodic realms as part of the mid-cycle technical debt. - Addressed required changes for building a CentOS8 node for upcoming distro release support across TripleO CI jobs. - All RDO third party and upstream multinode (master/train) jobs are now moved to os_tempest ansible role provided by Openstack-ansible team. - (Pushed to next sprint): Improve tests for verifying a full promotion workflow running on the staging environment. The planned work for the next sprint [1] are: - Evaluate and implement CI jobs in Zuul that deal with RPM build artifacts for ceph-ansible and podman 3rd party testing. - Design and create a PoC for individual component testing in the promotion pipeline. This effort will add an additional verification layer to check OpenStack components (compute, networking, storage, etc) with stable builds, and ease root cause determination when it breaks the code. - Continue to improve and fix the new promotion code by deploying and bootstrapping an isolated promoter server. The Ruck and Rover for this sprint are Sagi Shnaidman (sshnaidm) and Ronelle Landy (rlandy). Please direct questions or queries to them regarding CI status or issues in #tripleo, ideally to whomever has the ‘|ruck’ suffix on their nick. Ruck/rover notes are being tracked in etherpad [2]. Thanks, rfolco [1] https://tree.taiga.io/project/tripleo-ci-board/taskboard/unified-sprint-18 [2] https://etherpad.openstack.org/p/ruckroversprint18 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Fri Nov 1 20:35:46 2019 From: mihalis68 at gmail.com (Chris Morgan) Date: Fri, 1 Nov 2019 16:35:46 -0400 Subject: [ops] preliminary ops meetup proposal, jan 2020, London Message-ID: Please see https://twitter.com/osopsmeetup/status/1190365967106412544?s=20 We find this method of getting news out about operators meetups gets good engagement, so please also follow if of interest to you. Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Fri Nov 1 21:36:38 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Fri, 1 Nov 2019 17:36:38 -0400 Subject: [cinder] changing the weekly meeting time In-Reply-To: References: Message-ID: <72310d46-d878-de89-c04f-2926c8e0f016@gmail.com> On 10/30/19 10:45 PM, Fang, Liang A wrote: > Hi Brian, > > I agree with Rajat. Meeting happened in mid-night will prevent people to > attend, except they have topics that must to talk with team. > > Openstack is widely used in Asia, but there’s no much core from Asia (in > countries that the meeting in mid-night). Meeting time is one of the > reason😊 I agree, the current meeting time is not Asia-friendly. If we move the meeting 2 hours earlier (14:00 UTC), that's 6:00 a.m. on USA Pacific Coast, which is on the verge of not being Silicon Valley-friendly. So it would be good to hear from our Silicon Valley time zone contributors (we have at least 3 who attend fairly regularly). > > Nova have two meetings, two time zone friendly. But I don’t like to > divide the meeting to two. > I agree, I'd prefer not to do this unless there's no other way to work out a suitable time. Here's a chart that shows you what we're up against: https://www.timeanddate.com/worldclock/meetingtime.html?iso=20191120&p1=224&p2=179&p3=141&p4=44&p5=237&p6=248 We discussed this issue briefly at this week's Cinder meeting, and since we're hoping to pick up some new contributors at the Summit/PTG next week, we agreed to take a poll after the PTG so that new contributors can represent their respective time zones. In the meantime, it would be good to keep this ML discussion going -- maybe someone will have a creative idea for a solution that won't adversely impact any contributors!   > > Regards > > Liang > > *From:* Rajat Dhasmana > *Sent:* Thursday, October 31, 2019 12:54 AM > *To:* Brian Rosmaita > *Cc:* openstack-discuss at lists.openstack.org > *Subject:* Re: [cinder] changing the weekly meeting time > >   > > Hi Brian, > >   > > It's great that the change in weekly meeting time is considered, here > are my opinions on the same from the perspective of Asian countries > (having active upstream developers) > >   > > Current meeting time (16:00 - 17:00 UTC) > >   > > INDIA : is 9:30 - 10:30 PM IST (UTC+05:30) is a little late but manageable. > >   > > CHINA : is 12:00 - 01:00 AM CST (UTC+08:00) is almost impossible to attend. > >   > > JAPAN : is 01:00 - 02:00 AM JST (UTC+09:00) similar to the case as China. > >   > > IMO shifting the meeting time 2 hours earlier (UTC 14:00) might bring > more participation and would ease out timings for some (including me) > but these are just my thoughts. > >   > > Thanks and Regards > > Rajat Dhasmana > >   > > On Thu, Oct 24, 2019 at 3:05 AM Brian Rosmaita > > wrote: > > (Just to be completely clear -- we're only gathering information at this > point.  The Cinder weekly meeting is still Wednesdays at 16:00 UTC.) > > As we discussed at today's meeting [0], a request has been made to hold > the weekly meeting earlier so that it would be friendlier for people in > Asia time zones. > > Based on the people in attendance today, it seems that a move to 14:00 > UTC is not out of the question. > > Thus, the point of this email is to solicit comments on whether we > should change the meeting time to 14:00 UTC.  As you consider the impact > on yourself, if you are in a TZ that observes Daylight Savings Time, > keep in mind that most TZs go back to standard time over the next > few weeks. > > (I was going to insert an opinion here, but I will wait and respond in > this thread like everyone else.) > > cheers, > brian > > > [0] > http://eavesdrop.openstack.org/meetings/cinder/2019/cinder.2019-10-23-16.00.log.html#l-166 > From rosmaita.fossdev at gmail.com Fri Nov 1 21:58:23 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Fri, 1 Nov 2019 17:58:23 -0400 Subject: [cinder][ptg] team dinner in Shanghai Message-ID: <3ee6b488-3e54-64fc-039b-2ca0060f467b@gmail.com> This event is open to anyone who will be in Shanghai on Thursday 7 November and who has a constructive interest in Cinder. I know we have a few contributors in Shanghai who won't be at the PTG; hopefully, they'll be able to join us so we can meet face-to-face. We're planning for a 5:00 pm dinner; it's kind of early, but the restaurant is a close walk from the Expo Center, so I figured it would be easier to go directly there instead of dispersing to our hotels and regrouping later. Plus, an early dinner will enable people to attend the (unofficial) Game Night [0]. Details and a signup sheet (we need a head count) are at the top of the Cinder PTG etherpad: https://etherpad.openstack.org/p/shanghai-ptg-cinder I haven't been able to confirm the time and location yet, but I will keep the above etherpad updated with that info. cheers, brian [0] https://etherpad.openstack.org/p/pvg-game-night From rico.lin.guanyu at gmail.com Sat Nov 2 06:38:42 2019 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Sat, 2 Nov 2019 14:38:42 +0800 Subject: [auto-scaling] SIG PTG schedule. please join us :) In-Reply-To: References: Message-ID: Actually the schedule is changed so 6th it is! Rico Lin 於 2019年10月30日 週三,下午3:26寫道: > Sorry about this mistake, it should be 11/5 instead of 6th > > On Wed, Oct 30, 2019 at 3:05 PM Rico Lin > wrote: > >> Hi all >> >> PTG is right next week, so if you're interested in Auto-scaling features, >> please join us. >> Our session will be hosted on 11/6 for Half-day from 9:00 am to 12:30 pm >> in room 431. >> You can suggest PTG sessions in our etherpad: >> https://etherpad.openstack.org/p/PVG-auto-scaling-sig >> Feel free to join our IRC channel as well: #openstack-auto-scaling >> >> Also, you can check out our new documents for autoscaling: >> https://docs.openstack.org/auto-scaling-sig >> And provide any feedback/ bug report for any Auto-scaling issue related >> to OpenStack in >> https://storyboard.openstack.org/#!/project/openstack/auto-scaling-sig >> >> Here's my Wechat ID: RicoLinCloudGeek >> See you all in Shanghai! >> >> -- >> May The Force of OpenStack Be With You, >> >> *Rico Lin*irc: ricolin >> >> > > -- > May The Force of OpenStack Be With You, > > *Rico Lin*irc: ricolin > > -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From veeraready at yahoo.co.in Sat Nov 2 07:14:29 2019 From: veeraready at yahoo.co.in (VeeraReddy) Date: Sat, 2 Nov 2019 07:14:29 +0000 (UTC) Subject: [openstack-dev][kuryr] Error in creating LB In-Reply-To: References: <1979762272.3801941.1572509785865.ref@mail.yahoo.com> <1979762272.3801941.1572509785865@mail.yahoo.com> <2ea50863deddb3bd158ccab869536c3c4f9693d5.camel@redhat.com> Message-ID: <619091810.118353.1572678869777@mail.yahoo.com> Hi Michal,Thanks for your help , deployment was successful. Problem is with libvirt .Libvirt is already installed in host machine with 1 vm up and running , i destroyed vm and uninstall libvirt , problem solved, this issue was identified from nova logs.  Thanks, Veera. On Friday, 1 November, 2019, 03:19:38 am IST, Michael Johnson wrote: Yeah, this likely means something is wrong with your nova setup. Either it is too slow to boot a VM or there is some other error. Try looking for the "amphora" instances in nova (openstack server list) then do a show on them (openstack server show ). There is an error field from nova that may contain the error. Michael On Thu, Oct 31, 2019 at 1:46 AM Michał Dulko wrote: > > On Thu, 2019-10-31 at 08:16 +0000, VeeraReddy wrote: > > Hi, > > > > I am trying to install openstack & Kubernetes using devstack > > > > https://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html > > > >  Error Log            : http://paste.openstack.org/show/785670/ > >  Local.conf          : Paste #785671 | LodgeIt! > > This error happens when Octavia is unable to create a load balancer in > 5 minutes. Seems like your LB is still PENDING_CREATE, so this seems to > be just unusually slow Octavia. This might happen if e.g. your host has > no nested virtualization enabled. > > Try increasing KURYR_WAIT_TIMEOUT in local.conf. In the gate we use up > to 20 minutes (value of 1200). > > > Regards, > > Veera. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.rydberg at citynetwork.eu Sat Nov 2 08:19:14 2019 From: tobias.rydberg at citynetwork.eu (Tobias Rydberg) Date: Sat, 2 Nov 2019 16:19:14 +0800 Subject: [ospurge] looking for project owners / considering adoption In-Reply-To: <576E74EB-ED80-497F-9706-482FE0433208@gmail.com> References: <342983ed-1d22-8f3a-3335-f153512ec2b2@catalyst.net.nz> <576E74EB-ED80-497F-9706-482FE0433208@gmail.com> Message-ID: <2ca832bb-4b71-b775-160a-e1868dcb21d2@citynetwork.eu> Hi, A Forum session is planned for this topic, Monday 11:40. Suites perfect to continue the discussions there as well. https://www.openstack.org/summit/shanghai-2019/summit-schedule/events/24407/project-resource-cleanup-followup BR, Tobias Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED On 2019-10-30 15:43, Artem Goncharov wrote: > Hi Adam, > > Since I need this now as well I will start working on implementation > how it was agreed (in SDK and in OSC) during last summit by mid of > November. There is no need for discussing this further, it just need > to be implemented. Sad that we got no progress in half a year. > > Regards, > Artem (gtema). > >> On 30. Oct 2019, at 14:26, Adam Harwell > > wrote: >> >> That's too bad that you won't be at the summit, but I think there may >> still be some discussion planned about this topic. >> >> Yeah, I understand completely about priorities and such internally. >> Same for me... It just happens that this IS priority work for us >> right now. :) >> >> >> On Tue, Oct 29, 2019, 07:48 Adrian Turjak > > wrote: >> >> My apologies I missed this email. >> >> Sadly I won't be at the summit this time around. There may be >> some public cloud focused discussions, and some of those often >> have this topic come up. Also if Monty from the SDK team is >> around, I'd suggest finding him and having a chat. >> >> I'll help if I can but we are swamped with internal work and I >> can't dedicate much time to do upstream work that isn't urgent. :( >> >> On 17/10/19 8:48 am, Adam Harwell wrote: >>> That's interesting -- we have already started working to add >>> features and improve ospurge, and it seems like a plenty useful >>> tool for our needs, but I think I agree that it would be nice to >>> have that functionality built into the sdk. I might be able to >>> help with both, since one is immediately useful and we (like >>> everyone) have deadlines to meet, and the other makes sense to >>> me as a possible future direction that could be more widely >>> supported. >>> >>> Will you or someone else be hosting and discussion about this at >>> the Shanghai summit? I'll be there and would be happy to join >>> and discuss. >>> >>>     --Adam >>> >>> On Tue, Oct 15, 2019, 22:04 Adrian Turjak >>> > wrote: >>> >>> I tried to get a community goal to do project deletion per >>> project, but >>> we ended up deciding that a community goal wasn't ideal >>> unless we did >>> build a bulk delete API in each service: >>> https://review.opendev.org/#/c/639010/ >>> https://etherpad.openstack.org/p/community-goal-project-deletion >>> https://etherpad.openstack.org/p/DEN-Deletion-of-resources >>> https://etherpad.openstack.org/p/DEN-Train-PublicCloudWG-brainstorming >>> >>> What we decided on, but didn't get a chance to work on, was >>> building >>> into the OpenstackSDK OS-purge like functionality, as well >>> as reporting >>> functionality (of all project resources to be deleted). That >>> way we >>> could have per project per resource deletion logic, and all >>> of that >>> defined in the SDK. >>> >>> I was up for doing some of the work, but ended up swamped >>> with internal >>> work and just didn't drive or push for the deletion work >>> upstream. >>> >>> If you want to do something useful, don't pursue OS-Purge, >>> help us add >>> that official functionality to the SDK, and then we can push >>> for bulk >>> deletion APIs in each project to make resource deletion more >>> pleasant. >>> >>> I'd be happy to help with the work, and Monty on the SDK >>> team will most >>> likely be happy to as well. :) >>> >>> Cheers, >>> Adrian >>> >>> On 1/10/19 11:48 am, Adam Harwell wrote: >>> > I haven't seen much activity on this project in a while, >>> and it's been >>> > moved to opendev/x since the opendev migration... Who is >>> the current >>> > owner of this project? Is there anyone who actually is >>> maintaining it, >>> > or would mind if others wanted to adopt the project to >>> move it forward? >>> > >>> > Thanks, >>> >    --Adam Harwell >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4017 bytes Desc: S/MIME Cryptographic Signature URL: From skaplons at redhat.com Sat Nov 2 08:59:58 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Sat, 2 Nov 2019 09:59:58 +0100 Subject: [neutron][ptg] Team dinner In-Reply-To: References: <20191030211537.trgnve7df27g3jh4@skaplons-mac> Message-ID: <20191102085958.oyahgqi2yufml3vg@skaplons-mac> Hi, On Fri, Nov 01, 2019 at 02:24:51PM +0900, Takashi Yamamoto wrote: > hi, > > On Thu, Oct 31, 2019 at 6:19 AM Slawek Kaplonski wrote: > > > > Hi neutrinos, > > > > Thanks to LIU Yulong who helped me a lot to choose and book some restaurant, we > > have now booked restaurant: > > > > Expo source B2, No.168, Shangnan Road, Pudong New Area, Shanghai, TEL: +86 21 > > 58882117 > > 书院人家(世博源店) 上海市浦东新区上南路168号世博源B2 > > The Dianping page: http://www.dianping.com/shop/20877292 > > > > Dinner is scheduled to Tuesday, 5th Nov at 6pm. > > > > Restaurant is close to the Expo center. It's about 15 minutes walk according to > > the Google maps: https://tinyurl.com/y2rc83ej > > google maps is quite inaccurate in China. > https://j.map.baidu.com/e8/6h > https://router.map.qq.com/short?l=8af77e82c3b0deff1adeffb171fedf19 Thx for this links. I puted it in etherpad also. > > > > > -- > > Slawek Kaplonski > > Senior software engineer > > Red Hat > > > > -- Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Sat Nov 2 09:02:02 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Sat, 2 Nov 2019 10:02:02 +0100 Subject: [ptg][neutron] Onboarding during the PTG In-Reply-To: <20191031223451.yljtzrp2zulwpzoc@skaplons-mac> References: <20191031223451.yljtzrp2zulwpzoc@skaplons-mac> Message-ID: <20191102090202.54z3a4wgvmyj53c6@skaplons-mac> Hi, I forgot to add that we will be in *Room 431* according to [1] [1] https://www.openstack.org/ptg/ On Thu, Oct 31, 2019 at 11:34:51PM +0100, Slawek Kaplonski wrote: > Hi all new (and existing) Neutrinos, > > During the PTG in Shanghai we are planning to organize onboarding session. > It will take place on Wednesday in morning sessions. > It is planned to be started at 9:00 am and finished just before the lunch > on 12:30. See on [1] for details. > All people who wants to learn about Neutron and contribution to it are welcome > on this session. > Also all existing team members are welcome to be there to show to new > contributors and help them with onboarding process :) > > See You all in Shanghai! > > [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning > > -- > Slawek Kaplonski > Senior software engineer > Red Hat -- Slawek Kaplonski Senior software engineer Red Hat From umesh.mishra at click2cloud.net Sat Nov 2 15:18:09 2019 From: umesh.mishra at click2cloud.net (Umesh Mishra) Date: Sat, 2 Nov 2019 15:18:09 +0000 Subject: Installation DOC for Three node Message-ID: Dear Sir, This is inform you that, We want to build the openstack in our data center so request you please help to provide me doc so that we can build asap . Please help. Best Regards, Umesh Mishra Manager-IT&Devops | Click2Cloud Inc. | www.click2cloud.net Email: umesh.mishra at click2cloud.net | Mobile: +91 7738599311 -------------- next part -------------- An HTML attachment was scrubbed... URL: From emccormick at cirrusseven.com Sat Nov 2 16:27:10 2019 From: emccormick at cirrusseven.com (Erik McCormick) Date: Sat, 2 Nov 2019 12:27:10 -0400 Subject: [ops] Shanghai Sessions Message-ID: Hello all, For anyone attending the Shanghai Summit, I wanted to point out a couple of interesting Ops things. First, we are redoing the ever popular Ops War Stories at the Forum on Monday. These are a series of lightning talks by anyone with a good story to tell. No presentations needed. Sign up here: https://etherpad.openstack.org/p/shanghai-ptg-ops-war-stories Second, we have space in the PTG area Thursday afternoon (1:30pm - 4:30pm, Room 430) to do a mini Ops Meetup. Please come join us! Share ideas for topics here so we can hit the ground running. https://etherpad.openstack.org/p/shanghai-ptg-ops-meetup Cheers, Erik -------------- next part -------------- An HTML attachment was scrubbed... URL: From emccormick at cirrusseven.com Sat Nov 2 16:40:24 2019 From: emccormick at cirrusseven.com (Erik McCormick) Date: Sat, 2 Nov 2019 12:40:24 -0400 Subject: [ops] Shanghai Sessions Message-ID: Hello all, For anyone attending the Shanghai Summit, I wanted to point out a couple of interesting Ops things. First, we are redoing the ever popular Ops War Stories at the Forum on Monday. These are a series of lightning talks by anyone with a good story to tell. No presentations needed. Sign up here: https://etherpad.openstack.org/p/shanghai-ptg-ops-war-stories Second, we have space in the PTG area Thursday afternoon (1:30pm - 4:30pm, Room 430) to do a mini Ops Meetup. Please come join us! Share ideas for topics here so we can hit the ground running. https://etherpad.openstack.org/p/shanghai-ptg-ops-meetup Cheers, Erik -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Sun Nov 3 00:03:44 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Sat, 2 Nov 2019 20:03:44 -0400 Subject: [cinder] core list update Message-ID: <039cc381-d757-57e4-9014-82327b31a3d8@gmail.com> In preparation for the PTG, it's time to revise the list of Cinder core contributors. The following people have made great contributions to Cinder, but unfortunately Cinder is no longer their current focus, and they've indicated that they no longer have sufficient time to act as core reviewers for the project: - John Griffith - TommyLike Hu - Xing Yang - Yikun Jiang On behalf of the entire Cinder project team, I thank John, TommyLike, Yikun, and Xing for their past service to Cinder, and hope that they may find more time to spend on Cinder in the future. While I'm thanking people, I should also thank the current members of the cinder-core group for all their work during the Train cycle (and thank them in advance for all the work I look forward to them doing during Ussuri!): - Eric Harney - Gorka Eguileor - Ivan Kolodyazhny - Jay Bryant - Rajat Dhasmana - Sean McGinnis - Walter A. Boring IV There will leave some openings for new core contributors during the Ussuri cycle. If you're interested in getting some advice about how to position yourself to become a Cinder core, please seek out me or any of the active cores listed above during the PTG. For people who won't be at the PTG, you can always look for us in #openstack-cinder. Let's have a productive PTG! cheers, brian From Arkady.Kanevsky at dell.com Sun Nov 3 01:42:46 2019 From: Arkady.Kanevsky at dell.com (Arkady.Kanevsky at dell.com) Date: Sun, 3 Nov 2019 01:42:46 +0000 Subject: [cinder] core list update In-Reply-To: <039cc381-d757-57e4-9014-82327b31a3d8@gmail.com> References: <039cc381-d757-57e4-9014-82327b31a3d8@gmail.com> Message-ID: <2c5ff878cfbe40939cfe36cccf5f0a8e@AUSX13MPS304.AMER.DELL.COM> Thanks to all past and current core contributors. -----Original Message----- From: Brian Rosmaita Sent: Saturday, November 2, 2019 7:04 PM To: openstack-discuss at lists.openstack.org Subject: [cinder] core list update [EXTERNAL EMAIL] In preparation for the PTG, it's time to revise the list of Cinder core contributors. The following people have made great contributions to Cinder, but unfortunately Cinder is no longer their current focus, and they've indicated that they no longer have sufficient time to act as core reviewers for the project: - John Griffith - TommyLike Hu - Xing Yang - Yikun Jiang On behalf of the entire Cinder project team, I thank John, TommyLike, Yikun, and Xing for their past service to Cinder, and hope that they may find more time to spend on Cinder in the future. While I'm thanking people, I should also thank the current members of the cinder-core group for all their work during the Train cycle (and thank them in advance for all the work I look forward to them doing during Ussuri!): - Eric Harney - Gorka Eguileor - Ivan Kolodyazhny - Jay Bryant - Rajat Dhasmana - Sean McGinnis - Walter A. Boring IV There will leave some openings for new core contributors during the Ussuri cycle. If you're interested in getting some advice about how to position yourself to become a Cinder core, please seek out me or any of the active cores listed above during the PTG. For people who won't be at the PTG, you can always look for us in #openstack-cinder. Let's have a productive PTG! cheers, brian From katonalala at gmail.com Mon Nov 4 00:26:48 2019 From: katonalala at gmail.com (Lajos Katona) Date: Mon, 4 Nov 2019 01:26:48 +0100 Subject: [neutron] Bug deputy report for week of November 01 Message-ID: Hi, On the week from October 28 to November 3 I was the bug deputy of neutron, here's a short summary of the bugs arrived. Critical bugs - https://bugs.launchpad.net/neutron/+bug/1850288: scenario test test_multicast_between_vms_on_same_network fails - workaround to skip unstable test is merged, more investigation is needed. - https://bugs.launchpad.net/neutron/+bug/1850626 neutron-dynamic-routing: TypeError: bind() takes 4 positional arguments but 5 were given - assigned, workaround to skip falling test is there ( https://review.opendev.org/692372) need to check how to fix the issue with https://review.opendev.org/288271 High bugs - https://bugs.launchpad.net/neutron/+bug/1850639 FloatingIP list bad performance - Assigned, in progress - https://bugs.launchpad.net/neutron/+bug/1850800 UDP port forwarding test is failing often High - The workaround to make the test unstable is there, more investigation is needed Medium bugs - https://bugs.launchpad.net/neutron/+bug/1850558 "AttributeError: 'str' object has no attribute 'content_type' in functional tests - assigned, in progress - https://bugs.launchpad.net/neutron/+bug/1850557 DHCP connectivity after migration/resize not working - More investigation is necessary - https://bugs.launchpad.net/neutron/+bug/1850779 [L3] snat-ns will be initialized twice for DVR+HA routers during agent restart medium assigned - assigned, in progress - https://bugs.launchpad.net/neutron/+bug/1850864 DHCP agent takes very long time to report when port is provisioned Medium - Assigned In progress Low bugs - https://bugs.launchpad.net/neutron/+bug/1849980: Do not inherit from built-in "dict" - assigned, in progress - https://bugs.launchpad.net/neutron/+bug/1850602: remove firewall_v1 exceptions in neutron-lib - assigned, in progress RFE - https://bugs.launchpad.net/neutron/+bug/1850818 [RFE][floatingip port_forwarding] Add description field - assigned More investigation is needed - https://bugs.launchpad.net/neutron/+bug/1850630 firewall rule update validating func is not robust enough,missing considering the stock data - https://bugs.launchpad.net/neutron/+bug/1850137: Hosts in a VPNaaS-VPNaas VPN lose their interconnect. Regards Lajos -------------- next part -------------- An HTML attachment was scrubbed... URL: From akekane at redhat.com Mon Nov 4 01:46:58 2019 From: akekane at redhat.com (Abhishek Kekane) Date: Mon, 4 Nov 2019 07:16:58 +0530 Subject: [glance] PTG Schedule Message-ID: Hi All, Welcome to China for the OpenInfra Summit and PTG. I have prepared a PTG schedule for Glance [1]. We will have a small meet with QA team on Wednesday 6th November, full day session on Thursday 7th November and half day on Friday 8th November. I have kept 1 hour for Open discussion on Thursday and Friday, so If someone wants to join us with their topics can do the same during this time. Friday 11:30 to 12:00, we have an interview scheduled to share updates on Glance. Have a productive Summit and PTG. [1] https://etherpad.openstack.org/p/Glance-Ussuri-PTG-planning Thanks & Best Regards, Abhishek Kekane -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmendiza at redhat.com Mon Nov 4 02:08:21 2019 From: dmendiza at redhat.com (=?UTF-8?Q?Douglas_Mendiz=c3=a1bal?=) Date: Sun, 3 Nov 2019 20:08:21 -0600 Subject: [barbican] PTG Schedule Message-ID: <0451df6b-23cc-2604-b28a-e1e9f6aac6f8@redhat.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hello Barbicaneers! I hope everyone made it to Shanghai safely. Our PTG session will be on Wednesday Nov 6 from 10:30am - 4:30p at the Kilo table. We've set up an etherpad to collect topics to talk about. Please feel free to add any topics you're interested in: https://etherpad.openstack.org/p/barbican-ussuri-ptg Additionally we've reserved a spot for a Team Photo on Thursday at 11am. Hope to see y'all soon! Cheers, - - Douglas Mendizabal -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEwcapj5oGTj2zd3XogB6WFOq/OrcFAl2/iBEACgkQgB6WFOq/ OrfFew/+IjhZe+qRCi/4EmaVEDf7QxJyZDIVUlLsPWHmF98wCdj+GsbzoWUuFfHM sJCpfpVAUjxrIFOEo5uF9WiZhU36G9pgoLd1Y8Kb0/QRIQEQQcKGnlYhCn+jQjbW J2tlDrkU0GBwEBzDt91gM5JCncviY8yT6nhlr/SSLqvZRQnPewerJNyJbYsVh6N2 moXQzfeRjg1SGqR0KVUcDVPe/pE+at8A5ARFCxDiJaOIUTP0qcfKtDXh714bevyi Sw2qgDZHbLHa1nEv3umuYGcrGpKz8Uuj5ju+7oGpPh4hX4pfPxbVDSzu8srfzTui ggvcxFrpZQvdff3Lec1eclxnB+c9Z1tBKYF7pPUVtN3NPfCATkVCSQACYORPZLdh GAnyxiiUXRwzIfOo0b6koa2pRi7ZWoz0DjVzpnl+D7qztUzyiguaj3KDnuTvlfQl iMQev1QHD6fAVvByHgDRj4dyUqUi2+V/DtNZ9w29AX7C+U/afSbNGvygc8yNCtHF vbkw68aPpj5zeB0OTjPQ6N5vsUc6bSXYGnECuGw24untnutvPKR+W9g9VQEUyN1h vhvn0IPHZ9QyBJ0ctpdfA6O9PNsjY/DQNyDeiNGljTIpBjepUmqMTXvycsn8VN/E yY0OL2QFGPhcsK7Q/yeUCzMm1sken2zMg8Bdxt10qbj4GsCMtyQ= =fdMR -----END PGP SIGNATURE----- From kota.tsuyuzaki.pc at hco.ntt.co.jp Fri Nov 1 10:18:52 2019 From: kota.tsuyuzaki.pc at hco.ntt.co.jp (Kota Tsuyuzaki) Date: Fri, 01 Nov 2019 19:18:52 +0900 Subject: [ptg][storlets] Shanghi PTG plan for Storlets Message-ID: <001c01d5909d$c2c25ac0$48471040$@hco.ntt.co.jp_1> Hello guys, The Shanghai PTG will happen soon! The Storlets core team prepared the etherpad for Shanghi PTG here, https://etherpad.openstack.org/p/storlets-ptg-shanghai Please feel free to add any topics you want to discuss there. Note that, Storlets reserved the room for 1.5 days but we'll use the latter half day actually because of the other schedules of core team members. If anyone want to catch me in other time, please let me know via E-mail, or something (I'm not sure IRC and any other tools would work in Shanghi PTG network, though) -------------------------------------------- 露崎 浩太 (Kota Tsuyuzaki) kota.tsuyuzaki.pc at hco.ntt.co.jp NTTソフトウェアイノベーションセンタ 分散処理基盤プロジェクト 0422-59-2837 --------------------------------------------- From zodiac.nv at gmail.com Sat Nov 2 22:53:55 2019 From: zodiac.nv at gmail.com (Nikita Belov) Date: Sun, 3 Nov 2019 01:53:55 +0300 Subject: [tempest] Train release and zero-disk error Message-ID: Hello! I try to use rally with Train release of Openstack. I have 842 success tests and 436 failures because of "Only volume-backed servers are allowed for flavors with zero disk". What can I do with this error? Report attached. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From dangtrinhnt at gmail.com Mon Nov 4 02:18:54 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Mon, 4 Nov 2019 11:18:54 +0900 Subject: [OpenInfra Summit Shanghai] Some wrong information at Shanghai Summit Message-ID: Dear OSF, I don't attend the Shanghai summit but I notice some pictures of the presentation which Jonathan Bryce is delivering. The attached photo is about the OpenInfra Days around the world and the one on the left is *OpenInfra Days Vietnam*, not Korea. I know it's just a little thing but because it already upset some of our community members I would like to ask for correction. The Vietnam OpenInfra Community has been putting lots of effort to be recognize so please understand. Bests, Trinh -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: photo_2019-11-04_11-00-27.jpg Type: image/jpeg Size: 191246 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: photo_2019-11-04_11-09-59.jpg Type: image/jpeg Size: 112337 bytes Desc: not available URL: From adriant at catalyst.net.nz Mon Nov 4 03:00:08 2019 From: adriant at catalyst.net.nz (Adrian Turjak) Date: Mon, 4 Nov 2019 16:00:08 +1300 Subject: [ospurge] looking for project owners / considering adoption In-Reply-To: References: <342983ed-1d22-8f3a-3335-f153512ec2b2@catalyst.net.nz> Message-ID: <539f50a0-c67e-9b36-5608-beadb5b68d02@catalyst.net.nz> Also of potential interest is our own internal variant of project termination: https://gitlab.com/catalyst-cloud/python-opsclient/blob/master/opsclient/ops/v1/project.py Note, a recent thing we ran into was a lack of support for Swift Bulk deletion... which we are now turning on and fixing, because deleting a project with 2mil + objects one by one is... slow. On 31/10/19 2:26 am, Adam Harwell wrote: > That's too bad that you won't be at the summit, but I think there may > still be some discussion planned about this topic.  > > Yeah, I understand completely about priorities and such internally. > Same for me... It just happens that this IS priority work for us right > now. :) > > > On Tue, Oct 29, 2019, 07:48 Adrian Turjak > wrote: > > My apologies I missed this email. > > Sadly I won't be at the summit this time around. There may be some > public cloud focused discussions, and some of those often have > this topic come up. Also if Monty from the SDK team is around, I'd > suggest finding him and having a chat. > > I'll help if I can but we are swamped with internal work and I > can't dedicate much time to do upstream work that isn't urgent. :( > > On 17/10/19 8:48 am, Adam Harwell wrote: >> That's interesting -- we have already started working to add >> features and improve ospurge, and it seems like a plenty useful >> tool for our needs, but I think I agree that it would be nice to >> have that functionality built into the sdk. I might be able to >> help with both, since one is immediately useful and we (like >> everyone) have deadlines to meet, and the other makes sense to me >> as a possible future direction that could be more widely supported. >> >> Will you or someone else be hosting and discussion about this at >> the Shanghai summit? I'll be there and would be happy to join and >> discuss. >> >>     --Adam >> >> On Tue, Oct 15, 2019, 22:04 Adrian Turjak >> > wrote: >> >> I tried to get a community goal to do project deletion per >> project, but >> we ended up deciding that a community goal wasn't ideal >> unless we did >> build a bulk delete API in each service: >> https://review.opendev.org/#/c/639010/ >> https://etherpad.openstack.org/p/community-goal-project-deletion >> https://etherpad.openstack.org/p/DEN-Deletion-of-resources >> https://etherpad.openstack.org/p/DEN-Train-PublicCloudWG-brainstorming >> >> What we decided on, but didn't get a chance to work on, was >> building >> into the OpenstackSDK OS-purge like functionality, as well as >> reporting >> functionality (of all project resources to be deleted). That >> way we >> could have per project per resource deletion logic, and all >> of that >> defined in the SDK. >> >> I was up for doing some of the work, but ended up swamped >> with internal >> work and just didn't drive or push for the deletion work >> upstream. >> >> If you want to do something useful, don't pursue OS-Purge, >> help us add >> that official functionality to the SDK, and then we can push >> for bulk >> deletion APIs in each project to make resource deletion more >> pleasant. >> >> I'd be happy to help with the work, and Monty on the SDK team >> will most >> likely be happy to as well. :) >> >> Cheers, >> Adrian >> >> On 1/10/19 11:48 am, Adam Harwell wrote: >> > I haven't seen much activity on this project in a while, >> and it's been >> > moved to opendev/x since the opendev migration... Who is >> the current >> > owner of this project? Is there anyone who actually is >> maintaining it, >> > or would mind if others wanted to adopt the project to move >> it forward? >> > >> > Thanks, >> >    --Adam Harwell >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbryce at jbryce.com Mon Nov 4 03:00:18 2019 From: jbryce at jbryce.com (Jonathan Bryce) Date: Sun, 03 Nov 2019 21:00:18 -0600 Subject: [OpenInfra Summit Shanghai] Some wrong information at Shanghai Summit In-Reply-To: References: Message-ID: <16e345a4ee0.27a5.eb5fa01e01bf15c6e0d805bdb1ad935e@jbryce.com> Hi Trinh, I apologize for the error on the presentation. We have corrected it in version of the slides that will be distributed. We definitely appreciate the efforts of all of the community organizers, and I'm sorry for the mix up. Jonathan On November 3, 2019 20:55:34 Trinh Nguyen wrote: > Dear OSF, > > I don't attend the Shanghai summit but I notice some pictures of the > presentation which Jonathan Bryce is delivering. The attached photo is > about the OpenInfra Days around the world and the one on the left is > OpenInfra Days Vietnam, not Korea. I know it's just a little thing but > because it already upset some of our community members I would like to ask > for correction. The Vietnam OpenInfra Community has been putting lots of > effort to be recognize so please understand. > > Bests, > > Trinh > > -- > > Trinh Nguyen > www.edlab.xyz -------------- next part -------------- An HTML attachment was scrubbed... URL: From dangtrinhnt at gmail.com Mon Nov 4 03:02:49 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Mon, 4 Nov 2019 12:02:49 +0900 Subject: [OpenInfra Summit Shanghai] Some wrong information at Shanghai Summit In-Reply-To: <16e345a4ee0.27a5.eb5fa01e01bf15c6e0d805bdb1ad935e@jbryce.com> References: <16e345a4ee0.27a5.eb5fa01e01bf15c6e0d805bdb1ad935e@jbryce.com> Message-ID: Thank Jonathan for the quick response. I really appreciate that. Bests, On Mon, Nov 4, 2019 at 12:00 PM Jonathan Bryce wrote: > Hi Trinh, > > I apologize for the error on the presentation. We have corrected it in > version of the slides that will be distributed. > > We definitely appreciate the efforts of all of the community organizers, > and I'm sorry for the mix up. > > Jonathan > > > On November 3, 2019 20:55:34 Trinh Nguyen wrote: > >> Dear OSF, >> >> I don't attend the Shanghai summit but I notice some pictures of the >> presentation which Jonathan Bryce is delivering. The attached photo is >> about the OpenInfra Days around the world and the one on the left is *OpenInfra >> Days Vietnam*, not Korea. I know it's just a little thing but because it >> already upset some of our community members I would like to ask for >> correction. The Vietnam OpenInfra Community has been putting lots of effort >> to be recognize so please understand. >> >> Bests, >> >> Trinh >> >> -- >> *Trinh Nguyen* >> *www.edlab.xyz * >> >> > -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From flux.adam at gmail.com Mon Nov 4 03:09:10 2019 From: flux.adam at gmail.com (Adam Harwell) Date: Mon, 4 Nov 2019 11:09:10 +0800 Subject: [ospurge] looking for project owners / considering adoption In-Reply-To: <539f50a0-c67e-9b36-5608-beadb5b68d02@catalyst.net.nz> References: <342983ed-1d22-8f3a-3335-f153512ec2b2@catalyst.net.nz> <539f50a0-c67e-9b36-5608-beadb5b68d02@catalyst.net.nz> Message-ID: Interesting... Well, hopefully I will see some people in about 30 minutes about this. :) On Mon, Nov 4, 2019, 11:00 AM Adrian Turjak wrote: > Also of potential interest is our own internal variant of project > termination: > > https://gitlab.com/catalyst-cloud/python-opsclient/blob/master/opsclient/ops/v1/project.py > > Note, a recent thing we ran into was a lack of support for Swift Bulk > deletion... which we are now turning on and fixing, because deleting a > project with 2mil + objects one by one is... slow. > On 31/10/19 2:26 am, Adam Harwell wrote: > > That's too bad that you won't be at the summit, but I think there may > still be some discussion planned about this topic. > > Yeah, I understand completely about priorities and such internally. Same > for me... It just happens that this IS priority work for us right now. :) > > > On Tue, Oct 29, 2019, 07:48 Adrian Turjak wrote: > >> My apologies I missed this email. >> >> Sadly I won't be at the summit this time around. There may be some public >> cloud focused discussions, and some of those often have this topic come up. >> Also if Monty from the SDK team is around, I'd suggest finding him and >> having a chat. >> >> I'll help if I can but we are swamped with internal work and I can't >> dedicate much time to do upstream work that isn't urgent. :( >> On 17/10/19 8:48 am, Adam Harwell wrote: >> >> That's interesting -- we have already started working to add features and >> improve ospurge, and it seems like a plenty useful tool for our needs, but >> I think I agree that it would be nice to have that functionality built into >> the sdk. I might be able to help with both, since one is immediately useful >> and we (like everyone) have deadlines to meet, and the other makes sense to >> me as a possible future direction that could be more widely supported. >> >> Will you or someone else be hosting and discussion about this at the >> Shanghai summit? I'll be there and would be happy to join and discuss. >> >> --Adam >> >> On Tue, Oct 15, 2019, 22:04 Adrian Turjak >> wrote: >> >>> I tried to get a community goal to do project deletion per project, but >>> we ended up deciding that a community goal wasn't ideal unless we did >>> build a bulk delete API in each service: >>> https://review.opendev.org/#/c/639010/ >>> https://etherpad.openstack.org/p/community-goal-project-deletion >>> https://etherpad.openstack.org/p/DEN-Deletion-of-resources >>> https://etherpad.openstack.org/p/DEN-Train-PublicCloudWG-brainstorming >>> >>> What we decided on, but didn't get a chance to work on, was building >>> into the OpenstackSDK OS-purge like functionality, as well as reporting >>> functionality (of all project resources to be deleted). That way we >>> could have per project per resource deletion logic, and all of that >>> defined in the SDK. >>> >>> I was up for doing some of the work, but ended up swamped with internal >>> work and just didn't drive or push for the deletion work upstream. >>> >>> If you want to do something useful, don't pursue OS-Purge, help us add >>> that official functionality to the SDK, and then we can push for bulk >>> deletion APIs in each project to make resource deletion more pleasant. >>> >>> I'd be happy to help with the work, and Monty on the SDK team will most >>> likely be happy to as well. :) >>> >>> Cheers, >>> Adrian >>> >>> On 1/10/19 11:48 am, Adam Harwell wrote: >>> > I haven't seen much activity on this project in a while, and it's been >>> > moved to opendev/x since the opendev migration... Who is the current >>> > owner of this project? Is there anyone who actually is maintaining it, >>> > or would mind if others wanted to adopt the project to move it forward? >>> > >>> > Thanks, >>> > --Adam Harwell >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Mon Nov 4 03:29:40 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Mon, 4 Nov 2019 12:29:40 +0900 Subject: [ptg][storlets] Shanghi PTG plan for Storlets In-Reply-To: <001c01d5909d$c2c25ac0$48471040$@hco.ntt.co.jp_1> References: <001c01d5909d$c2c25ac0$48471040$@hco.ntt.co.jp_1> Message-ID: Hi Kota, I would suggest you to update the etherpad list of Shanghai PTG. http://ptg.openstack.org/etherpads.html The instruction is found at https://opendev.org/openstack/ptgbot/src/branch/master/README.rst#etherpad Akihiro On Mon, Nov 4, 2019 at 11:50 AM Kota Tsuyuzaki wrote: > > Hello guys, > > The Shanghai PTG will happen soon! The Storlets core team prepared the etherpad for Shanghi PTG here, > https://etherpad.openstack.org/p/storlets-ptg-shanghai > Please feel free to add any topics you want to discuss there. Note that, Storlets reserved the room for 1.5 days but we'll use the > latter half day actually because of the other schedules of core team members. > > If anyone want to catch me in other time, please let me know via E-mail, or something (I'm not sure IRC and any other tools would > work in Shanghi PTG network, though) > > -------------------------------------------- > 露崎 浩太 (Kota Tsuyuzaki) > kota.tsuyuzaki.pc at hco.ntt.co.jp > NTTソフトウェアイノベーションセンタ > 分散処理基盤プロジェクト > 0422-59-2837 > --------------------------------------------- > > > > > From umesh.mishra at click2cloud.net Mon Nov 4 04:27:19 2019 From: umesh.mishra at click2cloud.net (Umesh Mishra) Date: Mon, 4 Nov 2019 04:27:19 +0000 Subject: [ptg][storlets] Shanghi PTG plan for Storlets In-Reply-To: References: <001c01d5909d$c2c25ac0$48471040$@hco.ntt.co.jp_1> Message-ID: Dear Sir, This is inform you that, We want to build the open stack in our data center so request you please help to provide me doc so that we can build asap Best Regards, Umesh Mishra Manager-IT&Devops | Click2Cloud Inc. | www.click2cloud.net Email: umesh.mishra at click2cloud.net |  Mobile: +91 7738599311 -----Original Message----- From: Akihiro Motoki Sent: Monday, November 4, 2019 9:00 AM To: Kota Tsuyuzaki Cc: openstack-discuss Subject: Re: [ptg][storlets] Shanghi PTG plan for Storlets Hi Kota, I would suggest you to update the etherpad list of Shanghai PTG. http://ptg.openstack.org/etherpads.html The instruction is found at https://opendev.org/openstack/ptgbot/src/branch/master/README.rst#etherpad Akihiro On Mon, Nov 4, 2019 at 11:50 AM Kota Tsuyuzaki wrote: > > Hello guys, > > The Shanghai PTG will happen soon! The Storlets core team prepared the > etherpad for Shanghi PTG here, > https://etherpad.openstack.org/p/storlets-ptg-shanghai > Please feel free to add any topics you want to discuss there. Note > that, Storlets reserved the room for 1.5 days but we'll use the latter half day actually because of the other schedules of core team members. > > If anyone want to catch me in other time, please let me know via > E-mail, or something (I'm not sure IRC and any other tools would work > in Shanghi PTG network, though) > > -------------------------------------------- > 露崎 浩太 (Kota Tsuyuzaki) > kota.tsuyuzaki.pc at hco.ntt.co.jp > NTTソフトウェアイノベーションセンタ > 分散処理基盤プロジェクト > 0422-59-2837 > --------------------------------------------- > > > > > From veeraready at yahoo.co.in Mon Nov 4 08:20:46 2019 From: veeraready at yahoo.co.in (VeeraReddy) Date: Mon, 4 Nov 2019 08:20:46 +0000 (UTC) Subject: [openstack-dev][kuryr] Unable to create pod References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> Message-ID: <1134736099.729706.1572855646143@mail.yahoo.com> HI,Pod is not creating . Status of pod is always "ContainerCreating" I installed openstack & kubernetes using below linkhttps://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html devstack at kuryr-daemon.service log :http://paste.openstack.org/show/785758/ Regards, Veera. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/png Size: 17533 bytes Desc: not available URL: From umesh.mishra at click2cloud.net Mon Nov 4 08:31:10 2019 From: umesh.mishra at click2cloud.net (Umesh Mishra) Date: Mon, 4 Nov 2019 08:31:10 +0000 Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: <1134736099.729706.1572855646143@mail.yahoo.com> References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> Message-ID: Dear Sir, I want to build the open stack (train or stack version) in our premises so request you to please help me the correct document so that we can install the same. Best Regards, Umesh Mishra Manager-IT&Devops | Click2Cloud Inc. | www.click2cloud.net Email: umesh.mishra at click2cloud.net | Mobile: +91 7738599311 From: VeeraReddy Sent: Monday, November 4, 2019 1:51 PM To: openstack-dev at lists.openstack.org Subject: [openstack-dev][kuryr] Unable to create pod HI, Pod is not creating . [Inline image] Status of pod is always "ContainerCreating" I installed openstack & kubernetes using below link https://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html devstack at kuryr-daemon.service log :http://paste.openstack.org/show/785758/ Regards, Veera. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 17533 bytes Desc: image001.png URL: From ltomasbo at redhat.com Mon Nov 4 08:34:31 2019 From: ltomasbo at redhat.com (Luis Tomas Bolivar) Date: Mon, 4 Nov 2019 09:34:31 +0100 Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: <1134736099.729706.1572855646143@mail.yahoo.com> References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> Message-ID: Seems like your kuryr-cni is not able to reach the k8s API (10.0.0.129) On Mon, Nov 4, 2019 at 9:26 AM VeeraReddy wrote: > HI, > Pod is not creating . > [image: Inline image] > Status of pod is always "ContainerCreating" > > > I installed openstack & kubernetes using below link > > https://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html > > > *devstack at kuryr-daemon.service log* : > http://paste.openstack.org/show/785758/ > > Regards, > Veera. > -- LUIS TOMÁS BOLÍVAR Senior Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/jpeg Size: 17533 bytes Desc: not available URL: From veeraready at yahoo.co.in Mon Nov 4 08:50:38 2019 From: veeraready at yahoo.co.in (VeeraReddy) Date: Mon, 4 Nov 2019 08:50:38 +0000 (UTC) Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> Message-ID: <1392109807.755528.1572857438445@mail.yahoo.com> Hi Umesh, Use below devstack link to install openstackhttps://docs.openstack.org/devstack/latest/#download-devstack Instead of "git clone https://opendev.org/openstack/devstackuse git clone https://opendev.org/openstack/devstack -b stable/train Regards, Veera. On Monday, 4 November, 2019, 02:06:16 pm IST, Umesh Mishra wrote: #yiv3082777750 #yiv3082777750 -- _filtered #yiv3082777750 {font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv3082777750 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv3082777750 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}#yiv3082777750 #yiv3082777750 p.yiv3082777750MsoNormal, #yiv3082777750 li.yiv3082777750MsoNormal, #yiv3082777750 div.yiv3082777750MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv3082777750 a:link, #yiv3082777750 span.yiv3082777750MsoHyperlink {color:blue;text-decoration:underline;}#yiv3082777750 a:visited, #yiv3082777750 span.yiv3082777750MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv3082777750 p.yiv3082777750msonormal0, #yiv3082777750 li.yiv3082777750msonormal0, #yiv3082777750 div.yiv3082777750msonormal0 {margin-right:0in;margin-left:0in;font-size:11.0pt;font-family:sans-serif;}#yiv3082777750 span.yiv3082777750EmailStyle19 {font-family:sans-serif;color:windowtext;}#yiv3082777750 .yiv3082777750MsoChpDefault {font-size:10.0pt;} _filtered #yiv3082777750 {margin:1.0in 1.0in 1.0in 1.0in;}#yiv3082777750 div.yiv3082777750WordSection1 {}#yiv3082777750 Dear Sir,   I want to build the open stack (train or stack version) in our premises so request you to please help me the correct document so that we can install the same.       Best Regards, Umesh Mishra Manager-IT&Devops | Click2Cloud Inc. |www.click2cloud.net Email: umesh.mishra at click2cloud.net |  Mobile: +91 7738599311   From: VeeraReddy Sent: Monday, November 4, 2019 1:51 PM To: openstack-dev at lists.openstack.org Subject: [openstack-dev][kuryr] Unable to create pod   HI, Pod is not creating . Status of pod is always "ContainerCreating"     I installed openstack & kubernetes using below link https://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html     devstack at kuryr-daemon.service log :http://paste.openstack.org/show/785758/   Regards, Veera. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 17533 bytes Desc: not available URL: From ekcs.openstack at gmail.com Mon Nov 4 09:06:21 2019 From: ekcs.openstack at gmail.com (Eric K) Date: Mon, 4 Nov 2019 09:06:21 +0000 Subject: [self-healing][autohealing][ptg] self-healing session 11/5@1:40PM Message-ID: Looking forward to seeing you tomorrow at the PTG! The self-healing SIG will meet to continue the work on making self-healing easy and available. We’re especially eager to hear/discuss user feedback, feature requests, and ideas for making self-healing easier and better. The session will take place on Tuesday 11/5 at 1:40 - 3:10 PM in room 431 [1]. Planned topics include predictive analytics, cross-project testing, user feedback, and unified bug reporting. Additional topics and comments welcome in the etherpad: https://etherpad.openstack.org/p/SHA-self-healing-SIG [1] Map and schedule https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/Uploads/PTG-Shanghai2019-Map-Schedule.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From rishabh.sikka at hpe.com Mon Nov 4 09:15:15 2019 From: rishabh.sikka at hpe.com (Sikka, Rishabh) Date: Mon, 4 Nov 2019 09:15:15 +0000 Subject: Openstack third pary CI Implementation(Zuul V3 , Nodepool, Queens) Message-ID: Dear Team , I am installing Zuul V3 ,Nodepool for our openstack third party CI ,Please let me know if any documentation is available for the same as I am struggling with the steps shared on zuul official page. Also please do let me know if #openstack-third-party-ci is correct IRC channel related to third party ci implementation, if it is correct please refer some of the name who had already implemented it. I had already tried posting my questions on same IRC channel but did not get the desired reply. Note -: If above PDL is not correct , please refer it to the correct PDL. Regards Rishabh Sikka -------------- next part -------------- An HTML attachment was scrubbed... URL: From veeraready at yahoo.co.in Mon Nov 4 09:45:12 2019 From: veeraready at yahoo.co.in (VeeraReddy) Date: Mon, 4 Nov 2019 09:45:12 +0000 (UTC) Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> Message-ID: <705559766.789758.1572860712151@mail.yahoo.com> HI ,I am getting errors in "kuryr-daemon"http://paste.openstack.org/show/785762/ And kuryr-cni daemon is terminating Regards, Veera. On Monday, 4 November, 2019, 02:10:45 pm IST, Luis Tomas Bolivar wrote: Seems like your kuryr-cni is not able to reach the k8s API (10.0.0.129) On Mon, Nov 4, 2019 at 9:26 AM VeeraReddy wrote: HI,Pod is not creating . Status of pod is always "ContainerCreating" I installed openstack & kubernetes using below linkhttps://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html devstack at kuryr-daemon.service log :http://paste.openstack.org/show/785758/ Regards, Veera. -- LUIS TOMÁS BOLÍVAR Senior Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com     -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/jpeg Size: 17533 bytes Desc: not available URL: From mark at stackhpc.com Mon Nov 4 10:09:41 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 4 Nov 2019 10:09:41 +0000 Subject: [neutron][docs] networking-onos EOL? Message-ID: Hi, We (kolla) had a bug report [1] from someone trying to use the neutron onos_ml2 ML2 driver for the ONOS SDN controller. As far as I can tell [2], this project hasn't been released since 2015. However, the 'latest' documentation is still accessible [3], and does not mention that the project is dead. What can we do to help steer people away from projects like this? Cheers, Mark [1] https://bugs.launchpad.net/bugs/1850763 [2] https://pypi.org/project/networking-onos/#history [3] https://docs.openstack.org/networking-onos/latest/ From mdemaced at redhat.com Mon Nov 4 10:15:57 2019 From: mdemaced at redhat.com (Maysa De Macedo Souza) Date: Mon, 4 Nov 2019 11:15:57 +0100 Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: <705559766.789758.1572860712151@mail.yahoo.com> References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> <705559766.789758.1572860712151@mail.yahoo.com> Message-ID: Hi VeeraReddy, Could you check if the API load balancer is ACTIVE? Best, Maysa. On Mon, Nov 4, 2019 at 10:49 AM VeeraReddy wrote: > HI , > I am getting errors in "kuryr-daemon" > http://paste.openstack.org/show/785762/ > > And kuryr-cni daemon is terminating > > Regards, > Veera. > > > On Monday, 4 November, 2019, 02:10:45 pm IST, Luis Tomas Bolivar < > ltomasbo at redhat.com> wrote: > > > Seems like your kuryr-cni is not able to reach the k8s API (10.0.0.129) > > On Mon, Nov 4, 2019 at 9:26 AM VeeraReddy wrote: > > HI, > Pod is not creating . > [image: Inline image] > Status of pod is always "ContainerCreating" > > > I installed openstack & kubernetes using below link > > https://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html > > > *devstack at kuryr-daemon.service log* : > http://paste.openstack.org/show/785758/ > > Regards, > Veera. > > > > -- > LUIS TOMÁS BOLÍVAR > Senior Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/jpeg Size: 17533 bytes Desc: not available URL: From veeraready at yahoo.co.in Mon Nov 4 11:09:00 2019 From: veeraready at yahoo.co.in (VeeraReddy) Date: Mon, 4 Nov 2019 11:09:00 +0000 (UTC) Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> <705559766.789758.1572860712151@mail.yahoo.com> Message-ID: <469441979.831114.1572865740439@mail.yahoo.com> Hi Maysa, stack at user-OptiPlex-7050:~$ openstack service show octavia+-------------+----------------------------------+| Field       | Value                            |+-------------+----------------------------------+| description | Octavia Load Balancing Service   || enabled     | True                             || id          | 7ddcb424fdad4281aad3652dbbb1ca42 || name        | octavia                          || type        | load-balancer                    |+-------------+----------------------------------+ Regards, Veera. On Monday, 4 November, 2019, 03:51:41 pm IST, Maysa De Macedo Souza wrote: Hi VeeraReddy, Could you check if the API load balancer is ACTIVE? Best,Maysa. On Mon, Nov 4, 2019 at 10:49 AM VeeraReddy wrote: HI ,I am getting errors in "kuryr-daemon"http://paste.openstack.org/show/785762/ And kuryr-cni daemon is terminating Regards, Veera. On Monday, 4 November, 2019, 02:10:45 pm IST, Luis Tomas Bolivar wrote: Seems like your kuryr-cni is not able to reach the k8s API (10.0.0.129) On Mon, Nov 4, 2019 at 9:26 AM VeeraReddy wrote: HI,Pod is not creating . Status of pod is always "ContainerCreating" I installed openstack & kubernetes using below linkhttps://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html devstack at kuryr-daemon.service log :http://paste.openstack.org/show/785758/ Regards, Veera. -- LUIS TOMÁS BOLÍVAR Senior Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com     -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/jpeg Size: 17533 bytes Desc: not available URL: From ltomasbo at redhat.com Mon Nov 4 11:18:19 2019 From: ltomasbo at redhat.com (Luis Tomas Bolivar) Date: Mon, 4 Nov 2019 12:18:19 +0100 Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: <469441979.831114.1572865740439@mail.yahoo.com> References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> <705559766.789758.1572860712151@mail.yahoo.com> <469441979.831114.1572865740439@mail.yahoo.com> Message-ID: Hi Veera, She referred to the LoadBalancer VM, you will need to do 'openstack server list --all' and/or 'openstack loadbalancer amphora list' On Mon, Nov 4, 2019 at 12:09 PM VeeraReddy wrote: > Hi Maysa, > > stack at user-OptiPlex-7050:~$ openstack service show octavia > +-------------+----------------------------------+ > | Field | Value | > +-------------+----------------------------------+ > | description | Octavia Load Balancing Service | > | enabled | True | > | id | 7ddcb424fdad4281aad3652dbbb1ca42 | > | name | octavia | > | type | load-balancer | > +-------------+----------------------------------+ > > > > > Regards, > Veera. > > > On Monday, 4 November, 2019, 03:51:41 pm IST, Maysa De Macedo Souza < > mdemaced at redhat.com> wrote: > > > Hi VeeraReddy, > > Could you check if the API load balancer is ACTIVE? > > Best, > Maysa. > > On Mon, Nov 4, 2019 at 10:49 AM VeeraReddy wrote: > > HI , > I am getting errors in "kuryr-daemon" > http://paste.openstack.org/show/785762/ > > And kuryr-cni daemon is terminating > > Regards, > Veera. > > > On Monday, 4 November, 2019, 02:10:45 pm IST, Luis Tomas Bolivar < > ltomasbo at redhat.com> wrote: > > > Seems like your kuryr-cni is not able to reach the k8s API (10.0.0.129) > > On Mon, Nov 4, 2019 at 9:26 AM VeeraReddy wrote: > > HI, > Pod is not creating . > [image: Inline image] > Status of pod is always "ContainerCreating" > > > I installed openstack & kubernetes using below link > > https://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html > > > *devstack at kuryr-daemon.service log* : > http://paste.openstack.org/show/785758/ > > Regards, > Veera. > > > > -- > LUIS TOMÁS BOLÍVAR > Senior Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > > -- LUIS TOMÁS BOLÍVAR Senior Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/jpeg Size: 17533 bytes Desc: not available URL: From mdemaced at redhat.com Mon Nov 4 11:21:51 2019 From: mdemaced at redhat.com (Maysa De Macedo Souza) Date: Mon, 4 Nov 2019 12:21:51 +0100 Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: <469441979.831114.1572865740439@mail.yahoo.com> References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> <705559766.789758.1572860712151@mail.yahoo.com> <469441979.831114.1572865740439@mail.yahoo.com> Message-ID: Hi VeeraReddy, You checked if Octavia is enabled on your system. In order to check if the API load balancer is active, you should check the field provisioning_status on the output of the following command: "openstack loadbalancer list | grep 10.0.0.129" According to your last logs I would imagine the API is down. If that is the case, you would need to manually recreate the lbaas or retrigger the installation. Best, Maysa. On Mon, Nov 4, 2019 at 12:09 PM VeeraReddy wrote: > Hi Maysa, > > stack at user-OptiPlex-7050:~$ openstack service show octavia > +-------------+----------------------------------+ > | Field | Value | > +-------------+----------------------------------+ > | description | Octavia Load Balancing Service | > | enabled | True | > | id | 7ddcb424fdad4281aad3652dbbb1ca42 | > | name | octavia | > | type | load-balancer | > +-------------+----------------------------------+ > > > > > Regards, > Veera. > > > On Monday, 4 November, 2019, 03:51:41 pm IST, Maysa De Macedo Souza < > mdemaced at redhat.com> wrote: > > > Hi VeeraReddy, > > Could you check if the API load balancer is ACTIVE? > > Best, > Maysa. > > On Mon, Nov 4, 2019 at 10:49 AM VeeraReddy wrote: > > HI , > I am getting errors in "kuryr-daemon" > http://paste.openstack.org/show/785762/ > > And kuryr-cni daemon is terminating > > Regards, > Veera. > > > On Monday, 4 November, 2019, 02:10:45 pm IST, Luis Tomas Bolivar < > ltomasbo at redhat.com> wrote: > > > Seems like your kuryr-cni is not able to reach the k8s API (10.0.0.129) > > On Mon, Nov 4, 2019 at 9:26 AM VeeraReddy wrote: > > HI, > Pod is not creating . > [image: Inline image] > Status of pod is always "ContainerCreating" > > > I installed openstack & kubernetes using below link > > https://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html > > > *devstack at kuryr-daemon.service log* : > http://paste.openstack.org/show/785758/ > > Regards, > Veera. > > > > -- > LUIS TOMÁS BOLÍVAR > Senior Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/jpeg Size: 17533 bytes Desc: not available URL: From cdent+os at anticdent.org Mon Nov 4 11:37:36 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Mon, 4 Nov 2019 11:37:36 +0000 (GMT) Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> Message-ID: On Fri, 1 Nov 2019, Matt Riedemann wrote: > On 11/1/2019 9:55 AM, Clark Boylan wrote: >> OVH controls the disk IOPs that we get pretty aggressively as well. >> Possible it is an IO thing? > > Yeah, so looking at the dstat output in that graph (thanks for pointing out > that site, really nice) we basically have 0 I/O from 16:53 to 16:55, so uh, > that's probably not good. What happens in a case like this? Is there an official procedure for "hey, can you give is more IO?" or (if that's not an option) "can you give us less CPU?". Is that something that is automated, is is something that is monitored and alarming? "INAP ran out of IO X times in the last N hours, light the beacons!" -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From lennyb at mellanox.com Mon Nov 4 11:38:30 2019 From: lennyb at mellanox.com (Lenny Verkhovsky) Date: Mon, 4 Nov 2019 11:38:30 +0000 Subject: Openstack third pary CI Implementation(Zuul V3 , Nodepool, Queens) In-Reply-To: References: Message-ID: Hi, Yes, irc channel is correct, sorry I missed your question. There are few docs and examples of how to do it that are working[1]. We had few issues with this configuration, and since we are using physical servers with our Hardware We decided to migrate from zuul2 to Jenkins Gerrit Plugin[2] Best Regards Lenny. [1] https://docs.openstack.org/infra/manual/zuulv3.html https://zuul-ci.org/docs/zuul/admin/quick-start.html https://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3-3rd-party-ci.html https://docs.openstack.org/infra/system-config/third_party.html https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html [2] https://wiki.jenkins.io/display/JENKINS/Gerrit+Trigger From: Sikka, Rishabh Sent: Monday, November 4, 2019 11:15 AM To: openstack-dev at lists.openstack.org Cc: Rane, Vishal Subject: Openstack third pary CI Implementation(Zuul V3 , Nodepool, Queens) Dear Team , I am installing Zuul V3 ,Nodepool for our openstack third party CI ,Please let me know if any documentation is available for the same as I am struggling with the steps shared on zuul official page. Also please do let me know if #openstack-third-party-ci is correct IRC channel related to third party ci implementation, if it is correct please refer some of the name who had already implemented it. I had already tried posting my questions on same IRC channel but did not get the desired reply. Note -: If above PDL is not correct , please refer it to the correct PDL. Regards Rishabh Sikka -------------- next part -------------- An HTML attachment was scrubbed... URL: From doka.ua at gmx.com Mon Nov 4 11:47:55 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Mon, 4 Nov 2019 13:47:55 +0200 Subject: BGP dynamic routing Message-ID: <2e0d4ab1-824c-b605-91ad-93536044ae69@gmx.com> Dear colleagues, "BGP dynamic routing" doc (https://docs.openstack.org/neutron/rocky/admin/config-bgp-dynamic-routing.html) says only about advertisement of routes: "BGP dynamic routing enables advertisement of self-service (private) network prefixes to physical network devices that support BGP such as routers, thus removing the conventional dependency on static routes." and nothing about receiving of routes from external peers. Whether it is ever possible using Neutron to have fully dynamic routing inside the project, both advertising/receiving (and updating VRs configuration) routes to/from remote peers? Thank you. -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison From mark at stackhpc.com Mon Nov 4 14:15:17 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 4 Nov 2019 14:15:17 +0000 Subject: [kolla][ptg] Kolla Ussuri PVPTG (Partially Virtual PTG) In-Reply-To: References: Message-ID: On Wed, 30 Oct 2019 at 17:26, Radosław Piliszek wrote: > > Hello Everyone, > > As you may already know, Kolla core team is mostly not present on summit in Shanghai. > Instead we are organizing a PTG next week, 7-8th Nov (Thu-Fri), in Białystok, Poland. > Please let me know this week if you are interested in coming in person. > > We invite operators, contributors and contributors-to-be to join us for the virtual PTG online. > The time schedule will be advertised later. After polling participants, we have agreed to meet at 1400 - 1800 UTC on Thursday and Friday this week. Since not all participants can make the first hour, we will adjust the schedule accordingly. Marcin will follow with connection details for the Zoom video conference. Please continue to update the etherpad with potential topics for discussion. I will propose a rough agenda over the next few days. Mark > > Please fill yourself in on the whiteboard [1]. > New ideas are welcome. > > [1] https://etherpad.openstack.org/p/kolla-ussuri-ptg > > Kind regards, > Radek aka yoctozepto > From swamireddy at gmail.com Mon Nov 4 14:34:13 2019 From: swamireddy at gmail.com (M Ranga Swami Reddy) Date: Mon, 4 Nov 2019 20:04:13 +0530 Subject: Cinder multi backend quota update In-Reply-To: References: Message-ID: Great. Its working. Thanks Swami On Wed, Oct 30, 2019 at 11:18 PM Mohammed Naser wrote: > I didn't try this but.. > > openstack quota set --volume-type ceph --volumes 20 project-id > > should do the trick. > > Bonne chance > > On Wed, Oct 30, 2019 at 1:38 PM M Ranga Swami Reddy > wrote: > > > > Hello, > > We use 2 types of volume, like volumes and volumes_ceph. > > I can update the quota for volumes quota using "cinder quota-update > --volumes=20 project-id" > > > > But for volumes_ceph, the above CLI failed with volumes_ceph un > recognised option.. > > Any suggestions here? > > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. https://vexxhost.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Mon Nov 4 14:57:44 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 4 Nov 2019 08:57:44 -0600 Subject: State of the Gate (placement?) In-Reply-To: <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> Message-ID: On 11/1/2019 9:55 AM, Clark Boylan wrote: > INAP was also recently turned back on. It had been offline for redeployment and that was completed and added back to the pool. Possible that more than just the openstack version has changed? > > OVH controls the disk IOPs that we get pretty aggressively as well. Possible it is an IO thing? Related to slow nodes, I noticed this failed recently, it's a synchronous RPC call from nova-api to nova-compute that timed out after 60 seconds [1]. Looking at MessagingTimeout errors in the nova-api logs shows it's mostly in INAP and OVH nodes [2] so there seems to be a pattern emerging with those being slow nodes causing issues. There are ways we could workaround this a bit on the nova side [3] but I'm not sure how much we want to make parts of nova super resilient to very slow nodes when real life operations would probably need to know about this kind of thing to scale up/out their control plane. [1] https://zuul.opendev.org/t/openstack/build/ef0196fe84804b44ac106d011c8c29ea/log/controller/logs/screen-n-api.txt.gz?severity=4 [2] http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22MessagingTimeout%5C%22%20AND%20tags%3A%5C%22screen-n-api.txt%5C%22&from=7d [3] https://review.opendev.org/#/c/692550/ -- Thanks, Matt From donny at fortnebula.com Mon Nov 4 15:08:52 2019 From: donny at fortnebula.com (Donny Davis) Date: Mon, 4 Nov 2019 10:08:52 -0500 Subject: BGP dynamic routing In-Reply-To: <2e0d4ab1-824c-b605-91ad-93536044ae69@gmx.com> References: <2e0d4ab1-824c-b605-91ad-93536044ae69@gmx.com> Message-ID: The way I use it is to dynamically advertise my tenant networks to the edge. The edge router still handles routes in the rest of my infra. Works pretty well for me. Donny Davis c: 805 814 6800 On Mon, Nov 4, 2019, 6:52 AM Volodymyr Litovka wrote: > Dear colleagues, > > "BGP dynamic routing" doc > ( > https://docs.openstack.org/neutron/rocky/admin/config-bgp-dynamic-routing.html > ) > says only about advertisement of routes: "BGP dynamic routing enables > advertisement of self-service (private) network prefixes to physical > network devices that support BGP such as routers, thus removing the > conventional dependency on static routes." and nothing about receiving > of routes from external peers. > > Whether it is ever possible using Neutron to have fully dynamic routing > inside the project, both advertising/receiving (and updating VRs > configuration) routes to/from remote peers? > > Thank you. > > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doka.ua at gmx.com Mon Nov 4 16:28:11 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Mon, 4 Nov 2019 18:28:11 +0200 Subject: BGP dynamic routing In-Reply-To: References: <2e0d4ab1-824c-b605-91ad-93536044ae69@gmx.com> Message-ID: <4016b2a9-f3f4-3e4e-211f-4d6c1a63dc3e@gmx.com> Hi Donny, the question if I have few peers to few PoPs, everyone with own set of prefixes and need to import these external prefixes INTO the tenant. On 04.11.2019 17:08, Donny Davis wrote: > The way I use it is to dynamically advertise my tenant networks to the > edge. The edge router still handles routes in the rest of my infra. > > Works pretty well for me. > > Donny Davis > c: 805 814 6800 > > On Mon, Nov 4, 2019, 6:52 AM Volodymyr Litovka > wrote: > > Dear colleagues, > > "BGP dynamic routing" doc > (https://docs.openstack.org/neutron/rocky/admin/config-bgp-dynamic-routing.html) > says only about advertisement of routes: "BGP dynamic routing enables > advertisement of self-service (private) network prefixes to physical > network devices that support BGP such as routers, thus removing the > conventional dependency on static routes." and nothing about receiving > of routes from external peers. > > Whether it is ever possible using Neutron to have fully dynamic > routing > inside the project, both advertising/receiving (and updating VRs > configuration) routes to/from remote peers? > > Thank you. > > -- > Volodymyr Litovka >    "Vision without Execution is Hallucination." -- Thomas Edison > > -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From donny at fortnebula.com Mon Nov 4 16:31:36 2019 From: donny at fortnebula.com (Donny Davis) Date: Mon, 4 Nov 2019 11:31:36 -0500 Subject: BGP dynamic routing In-Reply-To: <4016b2a9-f3f4-3e4e-211f-4d6c1a63dc3e@gmx.com> References: <2e0d4ab1-824c-b605-91ad-93536044ae69@gmx.com> <4016b2a9-f3f4-3e4e-211f-4d6c1a63dc3e@gmx.com> Message-ID: To be honest I only use it for the use case I listed before, so beyond that I am not going to be much help. However.. they are both speaking bgp I would imagine that it works the same way as any bgp instance. Give it a whirl and let us know how it works out. :) On Mon, Nov 4, 2019 at 11:28 AM Volodymyr Litovka wrote: > Hi Donny, > > the question if I have few peers to few PoPs, everyone with own set of > prefixes and need to import these external prefixes INTO the tenant. > > > On 04.11.2019 17:08, Donny Davis wrote: > > The way I use it is to dynamically advertise my tenant networks to the > edge. The edge router still handles routes in the rest of my infra. > > Works pretty well for me. > > Donny Davis > c: 805 814 6800 > > On Mon, Nov 4, 2019, 6:52 AM Volodymyr Litovka wrote: > >> Dear colleagues, >> >> "BGP dynamic routing" doc >> ( >> https://docs.openstack.org/neutron/rocky/admin/config-bgp-dynamic-routing.html >> ) >> says only about advertisement of routes: "BGP dynamic routing enables >> advertisement of self-service (private) network prefixes to physical >> network devices that support BGP such as routers, thus removing the >> conventional dependency on static routes." and nothing about receiving >> of routes from external peers. >> >> Whether it is ever possible using Neutron to have fully dynamic routing >> inside the project, both advertising/receiving (and updating VRs >> configuration) routes to/from remote peers? >> >> Thank you. >> >> -- >> Volodymyr Litovka >> "Vision without Execution is Hallucination." -- Thomas Edison >> >> >> > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison > > -- ~/DonnyD C: 805 814 6800 "No mission too difficult. No sacrifice too great. Duty First" -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcin.juszkiewicz at linaro.org Mon Nov 4 17:04:50 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Mon, 4 Nov 2019 18:04:50 +0100 Subject: [kolla][ptg] Kolla Ussuri PVPTG (Partially Virtual PTG) In-Reply-To: References: Message-ID: <8a6d214d-970e-c923-570e-e031aa364305@linaro.org> On 04.11.2019 15:15, Mark Goddard wrote: > After polling participants, we have agreed to meet at 1400 - 1800 UTC > on Thursday and Friday this week. Since not all participants can make > the first hour, we will adjust the schedule accordingly. > > Marcin will follow with connection details for the Zoom video conference. As we agreed on Zoom I did a setup of meeting. https://zoom.us/j/157063687 will be available for 1400-1800 UTC on both Thursday and Friday. Sessions will be recorded by platform. From pierre at stackhpc.com Mon Nov 4 22:55:03 2019 From: pierre at stackhpc.com (Pierre Riteau) Date: Mon, 4 Nov 2019 23:55:03 +0100 Subject: [blazar] Shanghai Summit and PTG activities for Blazar; no IRC meetings Message-ID: Hello, Several of the Blazar core reviewers and contributors will be in Shanghai this week. Don't hesitate to talk to them if you are interested in resource reservation as a service. On Tuesday there will be a project update: https://www.openstack.org/summit/shanghai-2019/summit-schedule/events/24373/blazar-project-update-november-2019 And on Friday at the PTG, there will be a project onboarding session, as well as technical discussions. Since most of the team is in Shanghai, IRC meetings are cancelled this week. Best wishes, Pierre From mriedemos at gmail.com Tue Nov 5 00:56:00 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 4 Nov 2019 18:56:00 -0600 Subject: [infra][qa] multiline tracebacks not being indexed anymore Message-ID: We used to be able to query for things like this: message:"in reserve_block_device_name" AND message:"MessagingTimeout" AND tags:"screen-n-api.txt" to fingerprint a traceback in logstash like this [1] but that no longer works. The multiline logstash filter is at [2] but doesn't seem to be getting applied anymore. I asked about this in -infra today and fungi said: "(4:44:00 PM) fungi: mriedem: i suspect that coincided with switching away from osla, we may need some means of parsing tracebacks out of logs in the indexer" I don't know what that means (what's osla? is [2] no longer used?) but if someone could point me at some things to look at I could see if I can generate a fix. [1] https://zuul.opendev.org/t/openstack/build/ef0196fe84804b44ac106d011c8c29ea/log/controller/logs/screen-n-api.txt.gz?severity=4#31403 [2] https://opendev.org/openstack/logstash-filters/src/branch/master/filters/openstack-filters.conf -- Thanks, Matt From cboylan at sapwetik.org Tue Nov 5 00:58:22 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 05 Nov 2019 08:58:22 +0800 Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> Message-ID: On Mon, Nov 4, 2019, at 7:37 PM, Chris Dent wrote: > On Fri, 1 Nov 2019, Matt Riedemann wrote: > > > On 11/1/2019 9:55 AM, Clark Boylan wrote: > >> OVH controls the disk IOPs that we get pretty aggressively as well. > >> Possible it is an IO thing? > > > > Yeah, so looking at the dstat output in that graph (thanks for pointing out > > that site, really nice) we basically have 0 I/O from 16:53 to 16:55, so uh, > > that's probably not good. > > What happens in a case like this? Is there an official procedure for > "hey, can you give is more IO?" or (if that's not an option) "can > you give us less CPU?". Is that something that is automated, is is > something that is monitored and alarming? "INAP ran out of IO X > times in the last N hours, light the beacons!" Typically we try to work with the clouds to properly root cause the issue. Then from there we can figure out what the best fix may be. They are running our software after all and there is a good chance the problems are in openstack. I'm in shanghai at the moment but if others want to reach out feel free. benj_ and mgagne are at inap and amorin has been helpful at ovh. The test node logs include a hostid in them somewhere which an be used to identify hypervisors if necessary. Clark From cboylan at sapwetik.org Tue Nov 5 01:03:39 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 05 Nov 2019 09:03:39 +0800 Subject: [infra][qa] multiline tracebacks not being indexed anymore In-Reply-To: References: Message-ID: <9c854572-e486-4515-8a5f-c9ba5b8d0fa7@www.fastmail.com> On Tue, Nov 5, 2019, at 8:56 AM, Matt Riedemann wrote: > We used to be able to query for things like this: > > message:"in reserve_block_device_name" AND message:"MessagingTimeout" > AND tags:"screen-n-api.txt" > > to fingerprint a traceback in logstash like this [1] but that no longer > works. The multiline logstash filter is at [2] but doesn't seem to be > getting applied anymore. > > I asked about this in -infra today and fungi said: > > "(4:44:00 PM) fungi: mriedem: i suspect that coincided with switching > away from osla, we may need some means of parsing tracebacks out of logs > in the indexer" > > I don't know what that means (what's osla? is [2] no longer used?) but > if someone could point me at some things to look at I could see if I can > generate a fix. os-loganalyze, https://opendev.org/openstack/os-loganalyze, was in use on the old log server to do filtering of severity and related manipulation. One thing it would do is collapse lines that didn't have a timestamps or severity prefix. However I think that may have only been for the html rendering which logstash didn't use. I'm not sure this is the issue. As for debugging this you can grab a log file and send it through logstash locally and fiddle with the rules until you get what you want. I'd help but currently at the summit and not in a good spot to do so. > > [1] > https://zuul.opendev.org/t/openstack/build/ef0196fe84804b44ac106d011c8c29ea/log/controller/logs/screen-n-api.txt.gz?severity=4#31403 > [2] > https://opendev.org/openstack/logstash-filters/src/branch/master/filters/openstack-filters.conf > > -- > > Thanks, > > Matt > > From eandersson at blizzard.com Tue Nov 5 01:11:03 2019 From: eandersson at blizzard.com (Erik Olof Gunnar Andersson) Date: Tue, 5 Nov 2019 01:11:03 +0000 Subject: [Senlin] Splitting senlin-engine into three services Message-ID: We are looking into splitting the senlin-engine into three components (senlin-conductor, senlin-engine and senlin-health-manager) and wanted to get some feedback. The main goal here is to make the components more resilient and to reduce the number of threads per worker. Each one of the components already had it's own thread pool and in theory each worker could end up with thousands of thread. In the current version (Train) the engine process hosts these services. https://github.com/openstack/senlin/blob/stable/train/senlin/engine/dispatcher.py#L31 https://github.com/openstack/senlin/blob/stable/train/senlin/engine/health_manager.py#L865 https://github.com/openstack/senlin/blob/stable/train/senlin/engine/service.py#L79 In my patch we move two our of these out of the engine and into it's own service namespace. Split engine service into three services https://review.opendev.org/#/c/688784/ Please feel free to comment on the patch set, or let reply to this email with general feedback or concerns. Best Regards, Erik Olof Gunnar Andersson -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Tue Nov 5 01:52:11 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Tue, 5 Nov 2019 02:52:11 +0100 Subject: Installation DOC for Three node In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Tue Nov 5 06:32:50 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 5 Nov 2019 14:32:50 +0800 Subject: [cinder] ussuri PTG schedule Message-ID: The Ussuri PTG schedule is live: https://etherpad.openstack.org/p/shanghai-ptg-cinder Please check the schedule and let me know right away if your session causes a conflict for you. Except for the few fixed-time topics, we will follow the cinder tradition of dynamic scheduling, giving each topic exactly as much time as it needs and adjusting as we go. cheers, brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From akekane at redhat.com Tue Nov 5 04:07:34 2019 From: akekane at redhat.com (Abhishek Kekane) Date: Tue, 5 Nov 2019 09:37:34 +0530 Subject: [glance] Shanghai Project Update Message-ID: Hi All, We had a very good project update session in Shanghai OpenInfra summit, where we have covered what we have done in Train cycle and what our priorities are in upcoming Ussuri cycle. Attaching the project update PDF file here for your reference. Thanks & Best Regards, Abhishek Kekane -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Glance Project Update-Train.pdf Type: application/pdf Size: 95919 bytes Desc: not available URL: From veeraready at yahoo.co.in Tue Nov 5 07:01:43 2019 From: veeraready at yahoo.co.in (VeeraReddy) Date: Tue, 5 Nov 2019 07:01:43 +0000 (UTC) Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> <705559766.789758.1572860712151@mail.yahoo.com> <469441979.831114.1572865740439@mail.yahoo.com> Message-ID: <1252984897.1189394.1572937303384@mail.yahoo.com> Hi Maysa,My API load balancer is Active kublet log with error : http://paste.openstack.org/show/785793/ Regards, Veera. On Monday, 4 November, 2019, 04:57:33 pm IST, Maysa De Macedo Souza wrote: Hi VeeraReddy, You checked if Octavia is enabled on your system. In order to check if the API load balancer is active, you should check the field provisioning_status on the output of the following command: "openstack loadbalancer list | grep 10.0.0.129" According to your last logs I would imagine the API is down. If that is the case, you would need to manually recreate the lbaas or retrigger the installation. Best,Maysa. On Mon, Nov 4, 2019 at 12:09 PM VeeraReddy wrote: Hi Maysa, stack at user-OptiPlex-7050:~$ openstack service show octavia+-------------+----------------------------------+| Field       | Value                            |+-------------+----------------------------------+| description | Octavia Load Balancing Service   || enabled     | True                             || id          | 7ddcb424fdad4281aad3652dbbb1ca42 || name        | octavia                          || type        | load-balancer                    |+-------------+----------------------------------+ Regards, Veera. On Monday, 4 November, 2019, 03:51:41 pm IST, Maysa De Macedo Souza wrote: Hi VeeraReddy, Could you check if the API load balancer is ACTIVE? Best,Maysa. On Mon, Nov 4, 2019 at 10:49 AM VeeraReddy wrote: HI ,I am getting errors in "kuryr-daemon"http://paste.openstack.org/show/785762/ And kuryr-cni daemon is terminating Regards, Veera. On Monday, 4 November, 2019, 02:10:45 pm IST, Luis Tomas Bolivar wrote: Seems like your kuryr-cni is not able to reach the k8s API (10.0.0.129) On Mon, Nov 4, 2019 at 9:26 AM VeeraReddy wrote: HI,Pod is not creating . Status of pod is always "ContainerCreating" I installed openstack & kubernetes using below linkhttps://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html devstack at kuryr-daemon.service log :http://paste.openstack.org/show/785758/ Regards, Veera. -- LUIS TOMÁS BOLÍVAR Senior Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com     -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572937258877blob.jpg Type: image/png Size: 14244 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/jpeg Size: 17533 bytes Desc: not available URL: From colleen at gazlene.net Tue Nov 5 08:37:39 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Tue, 05 Nov 2019 16:37:39 +0800 Subject: [dev][ops][ptg][keystone] Join the keystone onboarding session! In-Reply-To: <7e0350e7-d249-4f5c-8a54-50c883bfb350@www.fastmail.com> References: <7e0350e7-d249-4f5c-8a54-50c883bfb350@www.fastmail.com> Message-ID: Don't forget to join me at the keystone onboarding session tomorrow morning (Wednesday Nov. 6) at the PTG! Colleen On Fri, Nov 1, 2019, at 10:49, Colleen Murphy wrote: > Hello Stackers, > > If you're a developer, technical writer, operator, or user and > interested in getting involved in the keystone project, stop by the > keystone onboarding session in Shanghai next week! We will be at the > Kilo table in the Blue Room on Wednesday from 9 to 10:30. The format > will be open ended, so come with all your questions about how you can > participate on the keystone team. > > Can't make it to the session? Take a look at our contributing guide[1] > and feel free to get in touch with me directly. > > Colleen Murphy / cmurphy (keystone PTL) > > [1] https://docs.openstack.org/keystone/latest/contributor/how-can-i-help.html > > From gmann at ghanshyammann.com Tue Nov 5 08:57:56 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 05 Nov 2019 16:57:56 +0800 Subject: [qa] QA Office hour canceled for this week Message-ID: <16e3ac8eaa6.e357366e69902.1578074033606401731@ghanshyammann.com> Hello Everyone, As we are in PTG, I will cancel the QA office hour for this week and will continue the same on 14th Nov week. -gmann From ralonsoh at redhat.com Tue Nov 5 09:51:27 2019 From: ralonsoh at redhat.com (Rodolfo Alonso) Date: Tue, 05 Nov 2019 09:51:27 +0000 Subject: [neutron][QoS] QoS meeting cancelled November 5 Message-ID: <3fdfc0b8bcfe70487d5d21e8b69f44804810e339.camel@redhat.com> Hello Neutrinos: Due to the summit this week, the Neutron QoS meeting will be cancelled. Next meeting will be November 19. Regards. From dharmendra.kushwaha at india.nec.com Tue Nov 5 11:57:31 2019 From: dharmendra.kushwaha at india.nec.com (Dharmendra Kushwaha) Date: Tue, 5 Nov 2019 11:57:31 +0000 Subject: [tacker] Ussuri PTG Message-ID: Tacker Folks, As our scheduled sessions will be finished by 12:30 pm, so lets meet after lunch. I will be available in PTG area full day. https://etherpad.openstack.org/p/Tacker-PTG-Ussuri Thanks & Regards Dharmendra Kushwaha ________________________________ The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NECTI or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NECTI or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. From umesh.mishra at click2cloud.net Tue Nov 5 12:41:53 2019 From: umesh.mishra at click2cloud.net (Umesh Mishra) Date: Tue, 5 Nov 2019 12:41:53 +0000 Subject: Installation DOC for Three node In-Reply-To: References: Message-ID: Dear Sir, We are trying to create the Machin but we unable to create could you please help or share me your skype id or contact number so that we can solve our issue. Best Regards, Umesh Mishra Manager-IT&Devops | Click2Cloud Inc. | www.click2cloud.net Email: umesh.mishra at click2cloud.net | Mobile: +91 7738599311 From: Sean McGinnis Sent: Tuesday, November 5, 2019 7:22 AM To: Umesh Mishra Cc: openstack-discuss at lists.openstack.org Subject: Re: Installation DOC for Three node Hi Umesh, Each project maintains installation guides for installing using system packages. That documentation can be found here: https://docs.openstack.org/train/install/ What you are more likely to want is a deployment tool that takes care of the installation for you. There are several options available, depending on your needs (container-based, perferred config management system, etc.). Information about those can be found here: https://www.openstack.org/software/project-navigator/deployment-tools Sent: Saturday, November 02, 2019 at 9:18 AM From: "Umesh Mishra" > To: "openstack-discuss at lists.openstack.org" > Subject: Installation DOC for Three node Dear Sir, This is inform you that, We want to build the openstack in our data center so request you please help to provide me doc so that we can build asap . Please help. Best Regards, Umesh Mishra Manager-IT&Devops | Click2Cloud Inc. | www.click2cloud.net Email: umesh.mishra at click2cloud.net | Mobile: +91 7738599311 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Tue Nov 5 14:41:37 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 5 Nov 2019 22:41:37 +0800 Subject: [neutron] PTG - remote access and reminders Message-ID: <20191105144137.qlr35jei4xcefred@skaplons-mac> Hi, Tomorrow (Wednesday) we are starting Neutron PTG session. Agenda is available at [1]. If You are not in Shanghai but would maybe try to participate remotely in the sessions, please reach out to me directly through email or IRC. I will try to provide some access and stream from the session. Also, please remember that on Wednesday morning we have onboarding session for new contributors. So if You are interested in contributing to Neutron, feel free to reach out to us in *room 431* - we are starting at 9:00 am :) [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning -- Slawek Kaplonski Senior software engineer Red Hat From jbeuque at cisco.com Tue Nov 5 14:45:47 2019 From: jbeuque at cisco.com (Jean Bernard Beuque (jbeuque)) Date: Tue, 5 Nov 2019 14:45:47 +0000 Subject: [neutron][tap-as-a-service] ERSPAN support Message-ID: Hello, I'd like to add ERSPAN support to the Tap-as-a-Service project. I've currently implemented a prototype that can be used with networking-vpp: https://opendev.org/x/networking-vpp The modified version of tap as a service is available here (The API has been extended to support ERSPAN): https://github.com/jbeuque/tap-as-a-service I don't know who maintains the Taas project. But if you think adding this functionality could be useful, please contact me. (Please take the modified version of Taas as a proposal to be discussed). Regards, Jean-Bernard Beuque -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Tue Nov 5 16:23:40 2019 From: openstack at fried.cc (Eric Fried) Date: Tue, 5 Nov 2019 10:23:40 -0600 Subject: [cinder][osc][docs] openstackclient docs for cinder v2 vs v3 Message-ID: <656f60e9-b5d6-5390-13c6-34347a9bc2e1@fried.cc> Howdy cinderinos. I've been on a mission to get all of the python-openstackclient command [1] and plugin [2] docs autogenerated rather than hardcoded [3] so you don't have to remember to update two places when you add/change a subcommand option. I'm almost done -- cinder is the last one -- but I want to confirm some odd observations before I dig in. - All of the v3 subcommands are implemented by code in the openstackclient.volume.v2 package. Where there's overlap, the command classes are identical from v2 to v3. However, it appears as though the v2 commands are a *superset* of the v3 commands. Specifically, the following appear in v2 but not v3 [4]: volume_backup_record_export volume_backup_record_import volume_backend_capability_show volume_backend_pool_list volume_host_failover Observations: * v3 has no other 'volume backup record' subcommands, but otherwise has the same 'volume backup' subcommands as v2. * v3 has no 'volume backend' subcommands. * v2 has both 'volume host failover' and 'volume host set', but v3 has only the latter. * It seems suspicious that the "missing" v3 commands comprise a contiguous block under the v2 entry point. So before I go creating a mess of v2-only and v2+v3 documents, I wanted to confirm that the above was actually intentional. - The existing hardcoded documents mention v1 and/or v2, but don't mention v3 at all (e.g. [5]). I want to confirm that it's okay for me to add mention of v3 where appropriate. Thanks, efried [1] https://docs.openstack.org/python-openstackclient/latest/cli/command-list.html [2] https://docs.openstack.org/python-openstackclient/latest/cli/plugin-commands/index.html [3] https://review.opendev.org/#/q/topic:generate-docs+(status:open+OR+status:merged) [4] https://opendev.org/openstack/python-openstackclient/src/tag/4.0.0/setup.cfg#L610-L616 [5] https://docs.openstack.org/python-openstackclient/train/cli/command-objects/volume.html From mriedemos at gmail.com Tue Nov 5 16:45:52 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 5 Nov 2019 10:45:52 -0600 Subject: [nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations? Message-ID: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> I was helping someone recover from a stuck live migration today where the migration record was stuck in pre-migrating status and somehow the request never hit the compute or was lost. The guest was stopped on the guest and basically the live migration either never started or never completed properly (maybe rabbit dropped the request or the compute service was restarted, I don't know). I instructed them to update the database to set the migration record status to 'error' and hard reboot the instance to get it running again. Then they pointed out they were seeing this in the compute logs: "There are allocations remaining against the source host that might need to be removed" That's because the source node allocations are still tracked in placement by the migration record and the dest node allocations are tracked by the instance. Cleaning that up is non-trivial. I have a troubleshooting doc started for manually cleaning up that kind of stuff here [1] but ultimately just told them to delete the allocations in placement for both the migration and the instance and then run the heal_allocations command to recreate the allocations for the instance. Since this person's nova deployment was running Stein, they don't have the --dry-run [2] or --instance [3] options for the heal_allocations command. This isn't a huge problem but it does mean they could be healing allocations for instances they didn't expect. They could work around this by installing nova from train or master in a VM/container/virtual environment and running it against the stein setup, but that's maybe more work than they want to do. The question I'm posing is if people would like to see those options backported to stein and if so, would the stable team be OK with it? I'd say this falls into a gray area where these are things that are optional, not used by default, and are operational tooling so less risk to backport, but it's not zero risk. It's also worth noting that when I wrote those patches I did so with the intent that people could backport them at least internally. [1] https://review.opendev.org/#/c/691427/ [2] https://review.opendev.org/#/c/651932/ [3] https://review.opendev.org/#/c/651945/ -- Thanks, Matt From dms at danplanet.com Tue Nov 5 16:51:13 2019 From: dms at danplanet.com (Dan Smith) Date: Tue, 05 Nov 2019 08:51:13 -0800 Subject: [nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations? In-Reply-To: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> (Matt Riedemann's message of "Tue, 5 Nov 2019 10:45:52 -0600") References: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> Message-ID: > The question I'm posing is if people would like to see those options > backported to stein and if so, would the stable team be OK with it? > I'd say this falls into a gray area where these are things that are > optional, not used by default, and are operational tooling so less > risk to backport, but it's not zero risk. It's also worth noting that > when I wrote those patches I did so with the intent that people could > backport them at least internally. Backporting features to operator tooling that helps them recover from bugs or other failures without doing database surgery seems like a good thing. Hard to argue that the risk outweighs the benefit, IMHO. --Dan From mriedemos at gmail.com Tue Nov 5 17:23:32 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 5 Nov 2019 11:23:32 -0600 Subject: Installation DOC for Three node In-Reply-To: References: Message-ID: <70039a26-1b30-bb4a-0838-39082ae68085@gmail.com> On 11/5/2019 6:41 AM, Umesh Mishra wrote: > We are trying to create the Machin but we unable to create could you > please help or share me your skype id or contact number so that we can > solve our issue. This is not really an appropriate request for this mailing list. It's OK to ask for help and support for specific issues in this mailing list but there is a chance that if the issue you're reporting is too generic you might not get a reply, which is the case here. You're looking for docs on how to install openstack. Sean provided links to the project install guides and deployment tools to automate that if you don't want to do it manually. If you have specific issues going through the install guides manually or using one of the deployment tools, then post the specific issues and someone may be able to help. Requesting contact information from a community member to help you directly isn't appropriate. If setting everything up yourself is untenable, then I recommend getting in touch with a vendor: https://www.openstack.org/marketplace/ -- Thanks, Matt From Albert.Braden at synopsys.com Tue Nov 5 20:11:00 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Tue, 5 Nov 2019 20:11:00 +0000 Subject: CPU pinning blues In-Reply-To: References: Message-ID: I found the offending UUID in the nova_api and placement databases. Do I need to delete these entries from the DB or is there a safer way to get rid of the "phantom" VM? MariaDB [(none)]> select * from nova_api.instance_mappings where instance_uuid = '4856d505-c220-4873-b881-836b5b75f7bb'; | created_at | updated_at | id | instance_uuid | cell_id | project_id | queued_for_delete | | 2019-10-08 21:26:03 | NULL | 589 | 4856d505-c220-4873-b881-836b5b75f7bb | NULL | 474ae347d8ad426f8118e55eee47dcfd | 0 | MariaDB [(none)]> select * from nova_api.request_specs where instance_uuid = '4856d505-c220-4873-b881-836b5b75f7bb'; | created_at | updated_at | id | instance_uuid | spec | | 2019-10-08 21:26:03 | NULL | 589 | 4856d505-c220-4873-b881-836b5b75f7bb | {"nova_object.version": "1.11", "nova_object.changes": ["requested_destination", "instance_uuid", "retry", "num_instances", "pci_requests", "limits", "availability_zone", "force_nodes", "image", "instance_group", "force_hosts", "ignore_hosts", "numa_topology", "is_bfv", "user_id", "flavor", "project_id", "security_groups", "scheduler_hints"], "nova_object.name": "RequestSpec", "nova_object.data": {"requested_destination": null, "instance_uuid": "4856d505-c220-4873-b881-836b5b75f7bb", "retry": null, "num_instances": 1, "pci_requests": {"nova_object.version": "1.1", "nova_object.changes": ["requests"], "nova_object.name": "InstancePCIRequests", "nova_object.data": {"requests": []}, "nova_object.namespace": "nova"}, "limits": {"nova_object.version": "1.0", "nova_object.changes": ["vcpu", "memory_mb", "disk_gb", "numa_topology"], "nova_object.name": "SchedulerLimits", "nova_object.data": {"vcpu": null, "memory_mb": null, "disk_gb": null, "numa_topology": null}, "nova_object.namespace": "nova"}, "availability_zone": null, "force_nodes": null, "image": {"nova_object.version": "1.8", "nova_object.changes": ["status", "name", "container_format", "created_at", "disk_format", "updated_at", "id", "min_disk", "min_ram", "checksum", "owner", "properties", "size"], "nova_object.name": "ImageMeta", "nova_object.data": {"status": "active", "created_at": "2019-10-02T01:10:04Z", "name": "QSC-P-CentOS6.6-19P1-v4", "container_format": "bare", "min_ram": 0, "disk_format": "qcow2", "updated_at": "2019-10-02T01:10:44Z", "id": "200cb134-2716-4662-8183-33642078547f", "min_disk": 0, "checksum": "94d33caafd85b45519fca331ee7ea03e", "owner": "474ae347d8ad426f8118e55eee47dcfd", "properties": {"nova_object.version": "1.20", "nova_object.name": "ImageMetaProps", "nova_object.data": {}, "nova_object.namespace": "nova"}, "size": 4935843840}, "nova_object.namespace": "nova"}, "instance_group": null, "force_hosts": null, "ignore_hosts": null, "numa_topology": null, "is_bfv": false, "user_id": "2cb6757679d54a69803a5b6e317b3a93", "flavor": {"nova_object.version": "1.2", "nova_object.name": "Flavor", "nova_object.data": {"disabled": false, "root_gb": 35, "description": null, "flavorid": "e8b42da7-d352-441e-b494-77d6a6cd7366", "deleted": false, "created_at": "2019-09-23T21:19:50Z", "ephemeral_gb": 10, "updated_at": null, "memory_mb": 4096, "vcpus": 1, "extra_specs": {}, "swap": 3072, "rxtx_factor": 1.0, "is_public": true, "deleted_at": null, "vcpu_weight": 0, "id": 2, "name": "s1.1cx4g"}, "nova_object.namespace": "nova"}, "project_id": "474ae347d8ad426f8118e55eee47dcfd", "security_groups": {"nova_object.version": "1.1", "nova_object.changes": ["objects"], "nova_object.name": "SecurityGroupList", "nova_object.data": {"objects": [{"nova_object.version": "1.2", "nova_object.changes": ["name"], "nova_object.name": "SecurityGroup", "nova_object.data": {"name": "default"}, "nova_object.namespace": "nova"}]}, "nova_object.namespace": "nova"}, "scheduler_hints": {}}, "nova_object.namespace": "nova"} | 1 row in set (0.001 sec) MariaDB [(none)]> SELECT * FROM placement.allocations WHERE consumer_id = '4856d505-c220-4873-b881-836b5b75f7bb'; | created_at | updated_at | id | resource_provider_id | consumer_id | resource_class_id | used | | 2019-10-08 22:03:33 | NULL | 3073 | 1024 | 4856d505-c220-4873-b881-836b5b75f7bb | 0 | 1 | | 2019-10-08 22:03:33 | NULL | 3074 | 1024 | 4856d505-c220-4873-b881-836b5b75f7bb | 1 | 4096 | | 2019-10-08 22:03:33 | NULL | 3075 | 1024 | 4856d505-c220-4873-b881-836b5b75f7bb | 2 | 48 | 3 rows in set (0.001 sec) MariaDB [(none)]> SELECT * FROM placement.consumers WHERE uuid = '4856d505-c220-4873-b881-836b5b75f7bb'; | created_at | updated_at | id | uuid | project_id | user_id | generation | | 2019-10-08 22:03:33 | 2019-10-08 22:03:33 | 734 | 4856d505-c220-4873-b881-836b5b75f7bb | 1 | 1 | 1 | 1 row in set (0.000 sec) From: Albert Braden > Sent: Thursday, October 31, 2019 10:50 AM To: openstack-discuss at lists.openstack.org Subject: CPU pinning blues I'm following this document to setup CPU pinning on Rocky: https://www.redhat.com/en/blog/driving-fast-lane-cpu-pinning-and-numa-topology-awareness-openstack-compute I followed all of the steps except for modifying non-pinned flavors and I have one aggregate containing a single NUMA-capable host: root at us01odc-dev1-ctrl1:/var/log/nova# os aggregate list +----+-------+-------------------+ | ID | Name | Availability Zone | +----+-------+-------------------+ | 4 | perf3 | None | +----+-------+-------------------+ root at us01odc-dev1-ctrl1:/var/log/nova# os aggregate show 4 +-------------------+----------------------------+ | Field | Value | +-------------------+----------------------------+ | availability_zone | None | | created_at | 2019-10-30T23:05:41.000000 | | deleted | False | | deleted_at | None | | hosts | [u'us01odc-dev1-hv003'] | | id | 4 | | name | perf3 | | properties | pinned='true' | | updated_at | None | +-------------------+----------------------------+ I have a flavor with the NUMA properties: root at us01odc-dev1-ctrl1:/var/log/nova# os flavor show s1.perf3 +----------------------------+-------------------------------------------------------------------------+ | Field | Value | +----------------------------+-------------------------------------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | disk | 35 | | id | be3d21c4-7e91-42a2-b832-47f42fdd3907 | | name | s1.perf3 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:pinned='true', hw:cpu_policy='dedicated' | | ram | 30720 | | rxtx_factor | 1.0 | | swap | 7168 | | vcpus | 4 | +----------------------------+-------------------------------------------------------------------------+ I create a VM with that flavor: openstack server create --flavor s1.perf3 --image NOT-QSC-CentOS6.10-19P1-v4 --network it-network alberttest4 but it goes to error status, and I see this in the logs: *** *** Post with logs got moderated so they are here: https://paste.fedoraproject.org/paste/3bza6CJstXFPy8LatRJruA -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Tue Nov 5 22:44:33 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 5 Nov 2019 16:44:33 -0600 Subject: CPU pinning blues In-Reply-To: References: Message-ID: <3345e570-f328-5f43-1e65-8c95a5e20d46@gmail.com> On 11/5/2019 2:11 PM, Albert Braden wrote: > I found the offending UUID in the nova_api and placement databases. Do I > need to delete these entries from the DB or is there a safer way to get > rid of the “phantom” VM? > > MariaDB [(none)]> select * from nova_api.instance_mappings where > instance_uuid = '4856d505-c220-4873-b881-836b5b75f7bb'; > > | created_at          | updated_at | id  | > instance_uuid                        | cell_id | > project_id                       | queued_for_delete | > > | 2019-10-08 21:26:03 | NULL       | 589 | > 4856d505-c220-4873-b881-836b5b75f7bb |    NULL | > 474ae347d8ad426f8118e55eee47dcfd |                 0 | > Interesting. So there is an instance mapping but it's not pointing at any cell. I'm assuming there is no entry for this instance in the nova_api.build_requests table either? A couple of related patches for that instance mapping thing: 1. I have a patch that adds a nova-manage command to cleanup busted instance mappings [1]. In this case you'd just --purge that broken instance mapping. 2. mnaser has reported similar weird issues where an instance mapping exists but doesn't point at a cell and the build request is gone and the instance isn't in cell0. For that we have a sanity check patch [2] which might be helpful to you if you hit this again. If either of those patches are helpful to you, please vote on the changes so we can draw some more eyes to the reviews. As for the allocations, you can remove those from placement using the osc-placement CLI plugin [3]: openstack resource provider allocation delete 4856d505-c220-4873-b881-836b5b75f7bb [1] https://review.opendev.org/#/c/655908/ [2] https://review.opendev.org/#/c/683730/ [3] https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-allocation-delete -- Thanks, Matt From Albert.Braden at synopsys.com Tue Nov 5 22:51:25 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Tue, 5 Nov 2019 22:51:25 +0000 Subject: CPU pinning blues In-Reply-To: <3345e570-f328-5f43-1e65-8c95a5e20d46@gmail.com> References: <3345e570-f328-5f43-1e65-8c95a5e20d46@gmail.com> Message-ID: Thanks Matt! I saw your "any interest" email earlier and tried that procedure, and it fixed the problem. -----Original Message----- From: Matt Riedemann Sent: Tuesday, November 5, 2019 2:45 PM To: openstack-discuss at lists.openstack.org Subject: Re: CPU pinning blues On 11/5/2019 2:11 PM, Albert Braden wrote: > I found the offending UUID in the nova_api and placement databases. Do I > need to delete these entries from the DB or is there a safer way to get > rid of the "phantom" VM? > > MariaDB [(none)]> select * from nova_api.instance_mappings where > instance_uuid = '4856d505-c220-4873-b881-836b5b75f7bb'; > > | created_at          | updated_at | id  | > instance_uuid                        | cell_id | > project_id                       | queued_for_delete | > > | 2019-10-08 21:26:03 | NULL       | 589 | > 4856d505-c220-4873-b881-836b5b75f7bb |    NULL | > 474ae347d8ad426f8118e55eee47dcfd |                 0 | > Interesting. So there is an instance mapping but it's not pointing at any cell. I'm assuming there is no entry for this instance in the nova_api.build_requests table either? A couple of related patches for that instance mapping thing: 1. I have a patch that adds a nova-manage command to cleanup busted instance mappings [1]. In this case you'd just --purge that broken instance mapping. 2. mnaser has reported similar weird issues where an instance mapping exists but doesn't point at a cell and the build request is gone and the instance isn't in cell0. For that we have a sanity check patch [2] which might be helpful to you if you hit this again. If either of those patches are helpful to you, please vote on the changes so we can draw some more eyes to the reviews. As for the allocations, you can remove those from placement using the osc-placement CLI plugin [3]: openstack resource provider allocation delete 4856d505-c220-4873-b881-836b5b75f7bb [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__review.opendev.org_-23_c_655908_&d=DwID-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=lVMaLB96zC2yeBzxfcfyNlaIItnmylkLNso5971gZCQ&s=LP7-0mN2MJ5Qbv28Oodg41N8KpIOlKgcBy--M2vTgjw&e= [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__review.opendev.org_-23_c_683730_&d=DwID-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=lVMaLB96zC2yeBzxfcfyNlaIItnmylkLNso5971gZCQ&s=tSCdhr2PxDvww4kksTXG6Z-vvX3WRhahzynEELjMwXw&e= [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_osc-2Dplacement_latest_cli_index.html-23resource-2Dprovider-2Dallocation-2Ddelete&d=DwID-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=lVMaLB96zC2yeBzxfcfyNlaIItnmylkLNso5971gZCQ&s=0bzrScr45Jbu5_a1c6OHvfexVJXeasxzGoOllYGCwRQ&e= -- Thanks, Matt From dtroyer at gmail.com Tue Nov 5 22:58:29 2019 From: dtroyer at gmail.com (Dean Troyer) Date: Tue, 5 Nov 2019 16:58:29 -0600 Subject: [cinder][osc][docs] openstackclient docs for cinder v2 vs v3 In-Reply-To: <656f60e9-b5d6-5390-13c6-34347a9bc2e1@fried.cc> References: <656f60e9-b5d6-5390-13c6-34347a9bc2e1@fried.cc> Message-ID: On Tue, Nov 5, 2019 at 10:26 AM Eric Fried wrote: > - All of the v3 subcommands are implemented by code in the > openstackclient.volume.v2 package. Where there's overlap, the command > classes are identical from v2 to v3. However, it appears as though the > v2 commands are a *superset* of the v3 commands. Specifically, the > following appear in v2 but not v3 [4]: A number of commands were deprecated between v2 and v3, some were just renamed. However, that crux of this problem is that this pass-through was ever done in the first place. This is the only place in OSc that we did this rather than just copy the code between the API version modules. IMO that is what we need to finally do to fix this, complete the actual duplication of the v2 bits still being called by v3 in the v3 directories. > So before I go creating a mess of v2-only and v2+v3 documents, I wanted > to confirm that the above was actually intentional. > > - The existing hardcoded documents mention v1 and/or v2, but don't > mention v3 at all (e.g. [5]). I want to confirm that it's okay for me to > add mention of v3 where appropriate. Again, folks wanted to avoid doing the work to set up v3 properly, now the debt collector comes calling...I would hold off doing anything with the docs until the code and tests have been properly straightened out. dt -- Dean Troyer dtroyer at gmail.com From sfinucan at redhat.com Wed Nov 6 02:21:25 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Wed, 06 Nov 2019 10:21:25 +0800 Subject: [nova][ptg] Team dinner Message-ID: Hi all, Thanks to Alex Xu, we have organized a table for dinner this evening, Wed 6th November at 7pm. Xibo restaurant (Xinjiang food) 3/F 83 Changshu Rd, Jingan Qu, Shanghai, China (near "Changshu Road" or "Jing An Temple" subway stations) 中國上海市静安区常熟路83号 The address is on Google Maps - hopefully it's accurate :) Anyone working on nova is welcome, though I'd ask that you'd note your attendance on the PTG etherpad [1]. Looking forward to seeing everyone, Stephen (stephenfin) [1] https://etherpad.openstack.org/p/nova-shanghai-ptg From mdulko at redhat.com Wed Nov 6 02:23:00 2019 From: mdulko at redhat.com (Michal Dulko) Date: Wed, 6 Nov 2019 03:23:00 +0100 Subject: [kuryr] Kuryr team at the PTG Message-ID: Hi, I had not reserved Kuryr space on the PTG as we weren't expecting many Kuryr team members here, but turns out there's some representation. We'll meet on Thursday at 2 PM Shanghai time to discuss anything related to Kuryr. We want meet in the Blue Room (where the tables are) and will try to find some space to run the discussion. Today you can find me at the K8s SIG table. Feel free to join! Thanks, Michał From i at liuyulong.me Wed Nov 6 03:31:44 2019 From: i at liuyulong.me (=?utf-8?B?TElVIFl1bG9uZw==?=) Date: Wed, 6 Nov 2019 11:31:44 +0800 Subject: [Neutron] cancel the L3 meeting today In-Reply-To: References: Message-ID: Alright, we are all in Shanghai Today (30th Nov), so the L3 meeting will also be cancelled.     ------------------ Original ------------------ From:  "LIU Yulong" From flux.adam at gmail.com Wed Nov 6 04:00:34 2019 From: flux.adam at gmail.com (Adam Harwell) Date: Wed, 6 Nov 2019 12:00:34 +0800 Subject: [barbican] PTG Schedule In-Reply-To: <0451df6b-23cc-2604-b28a-e1e9f6aac6f8@redhat.com> References: <0451df6b-23cc-2604-b28a-e1e9f6aac6f8@redhat.com> Message-ID: FYI, team photo is moved to Friday at 11:00am. :) On Mon, Nov 4, 2019, 10:12 AM Douglas Mendizábal wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > Hello Barbicaneers! > > I hope everyone made it to Shanghai safely. Our PTG session will be > on Wednesday Nov 6 from 10:30am - 4:30p at the Kilo table. > > We've set up an etherpad to collect topics to talk about. Please feel > free to add any topics you're interested in: > > https://etherpad.openstack.org/p/barbican-ussuri-ptg > > Additionally we've reserved a spot for a Team Photo on Thursday at > 11am. Hope to see y'all soon! > > Cheers, > - - Douglas Mendizabal > -----BEGIN PGP SIGNATURE----- > > iQIzBAEBCAAdFiEEwcapj5oGTj2zd3XogB6WFOq/OrcFAl2/iBEACgkQgB6WFOq/ > OrfFew/+IjhZe+qRCi/4EmaVEDf7QxJyZDIVUlLsPWHmF98wCdj+GsbzoWUuFfHM > sJCpfpVAUjxrIFOEo5uF9WiZhU36G9pgoLd1Y8Kb0/QRIQEQQcKGnlYhCn+jQjbW > J2tlDrkU0GBwEBzDt91gM5JCncviY8yT6nhlr/SSLqvZRQnPewerJNyJbYsVh6N2 > moXQzfeRjg1SGqR0KVUcDVPe/pE+at8A5ARFCxDiJaOIUTP0qcfKtDXh714bevyi > Sw2qgDZHbLHa1nEv3umuYGcrGpKz8Uuj5ju+7oGpPh4hX4pfPxbVDSzu8srfzTui > ggvcxFrpZQvdff3Lec1eclxnB+c9Z1tBKYF7pPUVtN3NPfCATkVCSQACYORPZLdh > GAnyxiiUXRwzIfOo0b6koa2pRi7ZWoz0DjVzpnl+D7qztUzyiguaj3KDnuTvlfQl > iMQev1QHD6fAVvByHgDRj4dyUqUi2+V/DtNZ9w29AX7C+U/afSbNGvygc8yNCtHF > vbkw68aPpj5zeB0OTjPQ6N5vsUc6bSXYGnECuGw24untnutvPKR+W9g9VQEUyN1h > vhvn0IPHZ9QyBJ0ctpdfA6O9PNsjY/DQNyDeiNGljTIpBjepUmqMTXvycsn8VN/E > yY0OL2QFGPhcsK7Q/yeUCzMm1sken2zMg8Bdxt10qbj4GsCMtyQ= > =fdMR > -----END PGP SIGNATURE----- > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From soulxu at gmail.com Wed Nov 6 04:03:29 2019 From: soulxu at gmail.com (Alex Xu) Date: Wed, 6 Nov 2019 12:03:29 +0800 Subject: [nova][ptg] Team dinner In-Reply-To: References: Message-ID: You can tell waitress, it is order by Mr. Xu, and the last few phone number is 9564 Stephen Finucane 于2019年11月6日周三 上午10:25写道: > Hi all, > > Thanks to Alex Xu, we have organized a table for dinner this evening, > Wed 6th November at 7pm. > > Xibo restaurant (Xinjiang food) > 3/F 83 Changshu Rd, Jingan Qu, Shanghai, China (near "Changshu Road" > or "Jing An Temple" subway stations) > 中國上海市静安区常熟路83号 > > The address is on Google Maps - hopefully it's accurate :) > > Anyone working on nova is welcome, though I'd ask that you'd note your > attendance on the PTG etherpad [1]. > > Looking forward to seeing everyone, > Stephen (stephenfin) > > [1] https://etherpad.openstack.org/p/nova-shanghai-ptg > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucioseki at gmail.com Wed Nov 6 05:15:43 2019 From: lucioseki at gmail.com (Lucio Seki) Date: Wed, 6 Nov 2019 13:15:43 +0800 Subject: [cinder] ussuri PTG schedule In-Reply-To: References: Message-ID: Hi rosmaita, I'm gonna leave for Manila project team photo 11h50-12h00, and it might conflict with the item I'm interested in (Mutable options). If it does conflict, is it possible to swap with some other item in the list? Lucio Seki (lseki) On Tue, Nov 5, 2019, 14:36 Brian Rosmaita wrote: > The Ussuri PTG schedule is live: > https://etherpad.openstack.org/p/shanghai-ptg-cinder > > Please check the schedule and let me know right away if your session > causes a conflict for you. Except for the few fixed-time topics, we will > follow the cinder tradition of dynamic scheduling, giving each topic > exactly as much time as it needs and adjusting as we go. > > cheers, > brian > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexey.perevalov at hotmail.com Wed Nov 6 07:09:27 2019 From: alexey.perevalov at hotmail.com (Perevalov Alexey) Date: Wed, 6 Nov 2019 07:09:27 +0000 Subject: [kuryr] Kuryr team at the PTG In-Reply-To: References: Message-ID: Hi, we'll be there. See you tomorrow. Are you going to organize zoom meeting in parallel? ________________________________ От: Michal Dulko Отправлено: 6 ноября 2019 г. 5:23 Кому: openstack-discuss Тема: [kuryr] Kuryr team at the PTG Hi, I had not reserved Kuryr space on the PTG as we weren't expecting many Kuryr team members here, but turns out there's some representation. We'll meet on Thursday at 2 PM Shanghai time to discuss anything related to Kuryr. We want meet in the Blue Room (where the tables are) and will try to find some space to run the discussion. Today you can find me at the K8s SIG table. Feel free to join! Thanks, Michał -------------- next part -------------- An HTML attachment was scrubbed... URL: From kendall at openstack.org Wed Nov 6 07:20:16 2019 From: kendall at openstack.org (Kendall Waters) Date: Wed, 6 Nov 2019 15:20:16 +0800 Subject: [kuryr] Kuryr team at the PTG In-Reply-To: References: Message-ID: <37CF31D0-F40F-49BB-B8D7-CC7FA79C218A@openstack.org> Hi Michal, We do not have any extra space in the Blue Hall tomorrow, however, there are plenty of tables in the prefunction area that you are welcome to use for your meeting. Cheers, Kendall Kendall Waters OpenStack Marketing & Events kendall at openstack.org > On Nov 6, 2019, at 3:09 PM, Perevalov Alexey wrote: > > Hi, > we'll be there. See you tomorrow. Are you going to organize zoom meeting in parallel? > > От: Michal Dulko > Отправлено: 6 ноября 2019 г. 5:23 > Кому: openstack-discuss > Тема: [kuryr] Kuryr team at the PTG > > Hi, > > I had not reserved Kuryr space on the PTG as we weren't expecting many > Kuryr team members here, but turns out there's some representation. > We'll meet on Thursday at 2 PM Shanghai time to discuss anything > related to Kuryr. We want meet in the Blue Room (where the tables are) > and will try to find some space to run the discussion. > > Today you can find me at the K8s SIG table. > > Feel free to join! > > Thanks, > Michał -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdulko at redhat.com Wed Nov 6 08:05:30 2019 From: mdulko at redhat.com (Michal Dulko) Date: Wed, 6 Nov 2019 09:05:30 +0100 Subject: [kuryr] Kuryr team at the PTG In-Reply-To: <37CF31D0-F40F-49BB-B8D7-CC7FA79C218A@openstack.org> References: <37CF31D0-F40F-49BB-B8D7-CC7FA79C218A@openstack.org> Message-ID: Sure, thanks! On Wed, Nov 6, 2019 at 8:20 AM Kendall Waters wrote: > > Hi Michal, > > We do not have any extra space in the Blue Hall tomorrow, however, there are plenty of tables in the prefunction area that you are welcome to use for your meeting. > > Cheers, > Kendall > > Kendall Waters > OpenStack Marketing & Events > kendall at openstack.org > > > > On Nov 6, 2019, at 3:09 PM, Perevalov Alexey wrote: > > Hi, > we'll be there. See you tomorrow. Are you going to organize zoom meeting in parallel? > > ________________________________ > От: Michal Dulko > Отправлено: 6 ноября 2019 г. 5:23 > Кому: openstack-discuss > Тема: [kuryr] Kuryr team at the PTG > > Hi, > > I had not reserved Kuryr space on the PTG as we weren't expecting many > Kuryr team members here, but turns out there's some representation. > We'll meet on Thursday at 2 PM Shanghai time to discuss anything > related to Kuryr. We want meet in the Blue Room (where the tables are) > and will try to find some space to run the discussion. > > Today you can find me at the K8s SIG table. > > Feel free to join! > > Thanks, > Michał > > From mdulko at redhat.com Wed Nov 6 08:08:14 2019 From: mdulko at redhat.com (Michal Dulko) Date: Wed, 6 Nov 2019 09:08:14 +0100 Subject: [kuryr] Kuryr team at the PTG In-Reply-To: References: Message-ID: Hey! It'll be 7 AM for Maysa and Luis, so I guess too early, but if there's someone else interested in participating that has a better timezone fit, we can do it. Thanks, Michał On Wed, Nov 6, 2019 at 8:09 AM Perevalov Alexey wrote: > > Hi, > we'll be there. See you tomorrow. Are you going to organize zoom meeting in parallel? > > ________________________________ > От: Michal Dulko > Отправлено: 6 ноября 2019 г. 5:23 > Кому: openstack-discuss > Тема: [kuryr] Kuryr team at the PTG > > Hi, > > I had not reserved Kuryr space on the PTG as we weren't expecting many > Kuryr team members here, but turns out there's some representation. > We'll meet on Thursday at 2 PM Shanghai time to discuss anything > related to Kuryr. We want meet in the Blue Room (where the tables are) > and will try to find some space to run the discussion. > > Today you can find me at the K8s SIG table. > > Feel free to join! > > Thanks, > Michał > > From missile0407 at gmail.com Wed Nov 6 12:28:50 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Wed, 6 Nov 2019 20:28:50 +0800 Subject: [kolla] Repository setup in non-internet environment. Message-ID: Hi, I'm thinking about the deployment in non-internet environment. As we know Kolla has already prepared docker registry and kolla-build to let user can create local registry for deployment. But there's still have two problem about non-internet deployment. 1. Docker-ce repository. 2. Pip repository. (Also having others perhaps.) Does Kolla planning to support non-internet deployment? I would like to do this if possible. Looking forward to hearing from you, Eddie. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yamamoto at midokura.com Wed Nov 6 13:52:12 2019 From: yamamoto at midokura.com (Takashi Yamamoto) Date: Wed, 6 Nov 2019 21:52:12 +0800 Subject: [neutron][tap-as-a-service] ERSPAN support In-Reply-To: References: Message-ID: i guess i'm the maintainer of taas these days. thank you for the interest in the project. On Tue, Nov 5, 2019 at 10:51 PM Jean Bernard Beuque (jbeuque) wrote: > > Hello, > > > > I'd like to add ERSPAN support to the Tap-as-a-Service project. > > > > I've currently implemented a prototype that can be used with networking-vpp: > > https://opendev.org/x/networking-vpp > > The modified version of tap as a service is available here (The API has been extended to support ERSPAN): > > https://github.com/jbeuque/tap-as-a-service do i need to take a tree diff to see what was changed? it's easier for me to read the change if you submit the change on gerrit. > > > > I don't know who maintains the Taas project. But if you think adding this functionality could be useful, please contact me. > > (Please take the modified version of Taas as a proposal to be discussed). > > > > Regards, > > Jean-Bernard Beuque > > From melwittt at gmail.com Wed Nov 6 17:02:23 2019 From: melwittt at gmail.com (melanie witt) Date: Wed, 6 Nov 2019 09:02:23 -0800 Subject: [nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations? In-Reply-To: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> References: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> Message-ID: On 11/5/19 08:45, Matt Riedemann wrote: > I was helping someone recover from a stuck live migration today where > the migration record was stuck in pre-migrating status and somehow the > request never hit the compute or was lost. The guest was stopped on the > guest and basically the live migration either never started or never > completed properly (maybe rabbit dropped the request or the compute > service was restarted, I don't know). > > I instructed them to update the database to set the migration record > status to 'error' and hard reboot the instance to get it running again. > > Then they pointed out they were seeing this in the compute logs: > > "There are allocations remaining against the source host that might need > to be removed" > > That's because the source node allocations are still tracked in > placement by the migration record and the dest node allocations are > tracked by the instance. Cleaning that up is non-trivial. I have a > troubleshooting doc started for manually cleaning up that kind of stuff > here [1] but ultimately just told them to delete the allocations in > placement for both the migration and the instance and then run the > heal_allocations command to recreate the allocations for the instance. > Since this person's nova deployment was running Stein, they don't have > the --dry-run [2] or --instance [3] options for the heal_allocations > command. This isn't a huge problem but it does mean they could be > healing allocations for instances they didn't expect. > > They could work around this by installing nova from train or master in a > VM/container/virtual environment and running it against the stein setup, > but that's maybe more work than they want to do. > > The question I'm posing is if people would like to see those options > backported to stein and if so, would the stable team be OK with it? I'd > say this falls into a gray area where these are things that are > optional, not used by default, and are operational tooling so less risk > to backport, but it's not zero risk. It's also worth noting that when I > wrote those patches I did so with the intent that people could backport > them at least internally. I think tools like this that provide significant operability benefit are worthwhile to backport and that the value is much greater than the risk. Related but not nearly as simple, I've backported nova-manage db purge and nova-manage db archive_deleted_rows --purge, --before, and --all-cells downstream because of the amount of bugs support/operators have opened around database cleanup pain. These were all pretty difficult to backport with the number of differences and conflicts, but my point is that I understand the motivation well and support the idea. The fact that the patches in question were written with backportability in mind is A Good Thing. -melanie > [1] https://review.opendev.org/#/c/691427/ > [2] https://review.opendev.org/#/c/651932/ > [3] https://review.opendev.org/#/c/651945/ > From mriedemos at gmail.com Wed Nov 6 17:14:04 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 6 Nov 2019 11:14:04 -0600 Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> Message-ID: On 11/4/2019 6:58 PM, Clark Boylan wrote: > Typically we try to work with the clouds to properly root cause the issue. Then from there we can figure out what the best fix may be. They are running our software after all and there is a good chance the problems are in openstack. > > I'm in shanghai at the moment but if others want to reach out feel free. benj_ and mgagne are at inap and amorin has been helpful at ovh. The test node logs include a hostid in them somewhere which an be used to identify hypervisors if necessary. I noticed this today [1]. That doesn't always result in failed jobs but I correlated it to a failure in a timeout in a nova functional job [2] and those normally don't have these types of problems. Note the correlation to when it spikes, midnight and noon it looks like. The dip on 11/2 and 11/3 was the weekend. And it's mostly OVH nodes. So they must have some kind of cron or something that hits at those times? Anecdotally, I'll also note that it seems like the gate is much more stable this week while the summit is happening. We're actually able to merge some changes in nova which is kind of amazing given the last month or so of rechecks we've had to do. [1] http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Function%20'nova.servicegroup.drivers.db.DbDriver._report_state'%20run%20outlasted%20interval%20by%5C%22&from=7d [2] https://zuul.opendev.org/t/openstack/build/63001bbd58c244cea70c995f1ebf61fb/log/job-output.txt#3092 -- Thanks, Matt From jbeuque at cisco.com Wed Nov 6 18:10:39 2019 From: jbeuque at cisco.com (Jean Bernard Beuque (jbeuque)) Date: Wed, 6 Nov 2019 18:10:39 +0000 Subject: [neutron][tap-as-a-service] ERSPAN support In-Reply-To: References: Message-ID: Hello Takashi, Thanks for your answer. OK, I'll submit the changes on gerrit. Regards, Jean-Bernard -----Original Message----- From: Takashi Yamamoto Sent: mercredi 6 novembre 2019 14:52 To: Jean Bernard Beuque (jbeuque) Cc: openstack-discuss at lists.openstack.org; Ian Wells (iawells) ; Jerome Tollet (jtollet) Subject: Re: [neutron][tap-as-a-service] ERSPAN support i guess i'm the maintainer of taas these days. thank you for the interest in the project. On Tue, Nov 5, 2019 at 10:51 PM Jean Bernard Beuque (jbeuque) wrote: > > Hello, > > > > I'd like to add ERSPAN support to the Tap-as-a-Service project. > > > > I've currently implemented a prototype that can be used with networking-vpp: > > https://opendev.org/x/networking-vpp > > The modified version of tap as a service is available here (The API has been extended to support ERSPAN): > > https://github.com/jbeuque/tap-as-a-service do i need to take a tree diff to see what was changed? it's easier for me to read the change if you submit the change on gerrit. > > > > I don't know who maintains the Taas project. But if you think adding this functionality could be useful, please contact me. > > (Please take the modified version of Taas as a proposal to be discussed). > > > > Regards, > > Jean-Bernard Beuque > > From Albert.Braden at synopsys.com Wed Nov 6 18:16:56 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Wed, 6 Nov 2019 18:16:56 +0000 Subject: CPU pinning blues In-Reply-To: <3345e570-f328-5f43-1e65-8c95a5e20d46@gmail.com> References: <3345e570-f328-5f43-1e65-8c95a5e20d46@gmail.com> Message-ID: Will these patches work on Rocky? -----Original Message----- From: Matt Riedemann Sent: Tuesday, November 5, 2019 2:45 PM To: openstack-discuss at lists.openstack.org Subject: Re: CPU pinning blues On 11/5/2019 2:11 PM, Albert Braden wrote: > I found the offending UUID in the nova_api and placement databases. Do I > need to delete these entries from the DB or is there a safer way to get > rid of the "phantom" VM? > > MariaDB [(none)]> select * from nova_api.instance_mappings where > instance_uuid = '4856d505-c220-4873-b881-836b5b75f7bb'; > > | created_at          | updated_at | id  | > instance_uuid                        | cell_id | > project_id                       | queued_for_delete | > > | 2019-10-08 21:26:03 | NULL       | 589 | > 4856d505-c220-4873-b881-836b5b75f7bb |    NULL | > 474ae347d8ad426f8118e55eee47dcfd |                 0 | > Interesting. So there is an instance mapping but it's not pointing at any cell. I'm assuming there is no entry for this instance in the nova_api.build_requests table either? A couple of related patches for that instance mapping thing: 1. I have a patch that adds a nova-manage command to cleanup busted instance mappings [1]. In this case you'd just --purge that broken instance mapping. 2. mnaser has reported similar weird issues where an instance mapping exists but doesn't point at a cell and the build request is gone and the instance isn't in cell0. For that we have a sanity check patch [2] which might be helpful to you if you hit this again. If either of those patches are helpful to you, please vote on the changes so we can draw some more eyes to the reviews. As for the allocations, you can remove those from placement using the osc-placement CLI plugin [3]: openstack resource provider allocation delete 4856d505-c220-4873-b881-836b5b75f7bb [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__review.opendev.org_-23_c_655908_&d=DwID-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=lVMaLB96zC2yeBzxfcfyNlaIItnmylkLNso5971gZCQ&s=LP7-0mN2MJ5Qbv28Oodg41N8KpIOlKgcBy--M2vTgjw&e= [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__review.opendev.org_-23_c_683730_&d=DwID-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=lVMaLB96zC2yeBzxfcfyNlaIItnmylkLNso5971gZCQ&s=tSCdhr2PxDvww4kksTXG6Z-vvX3WRhahzynEELjMwXw&e= [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_osc-2Dplacement_latest_cli_index.html-23resource-2Dprovider-2Dallocation-2Ddelete&d=DwID-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=lVMaLB96zC2yeBzxfcfyNlaIItnmylkLNso5971gZCQ&s=0bzrScr45Jbu5_a1c6OHvfexVJXeasxzGoOllYGCwRQ&e= -- Thanks, Matt From dms at danplanet.com Wed Nov 6 19:03:19 2019 From: dms at danplanet.com (Dan Smith) Date: Wed, 06 Nov 2019 11:03:19 -0800 Subject: [nova] Operator input on automatic heal behaviors Message-ID: Hi all, If you're a nova operator, you probably know (and love) our increasing number of "heal $thing" commands in nova-manage. Despite appearances, we do not try to make these inconsistent, duplicative, and confusing. However, in reality, they pretty much are. Further, they require something being broken, an operator noticing, and then manual execution to (hopefully) fix things. While reviewing another such proposed nova-manage command today, I decided to propose a potential solution to make this better in the future. I've got a spec proposed to create new standalone command/service for nova that will consolidate all of these into a very consistent interface, with a daemon mode that can be run in the background to constantly (and slowly) periodically audit these things and heal them when issues are found. If you're an operator and have strong feelings on this topic, please review and opine here: https://review.opendev.org/#/c/693226 Thanks! --Dan From mriedemos at gmail.com Wed Nov 6 19:03:35 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 6 Nov 2019 13:03:35 -0600 Subject: CPU pinning blues In-Reply-To: References: <3345e570-f328-5f43-1e65-8c95a5e20d46@gmail.com> Message-ID: <61c70900-7b68-0d93-95c4-fd6ba09d33ed@gmail.com> On 11/6/2019 12:16 PM, Albert Braden wrote: > Will these patches work on Rocky? I don't know, I haven't tried backporting them. -- Thanks, Matt From melwittt at gmail.com Wed Nov 6 19:12:42 2019 From: melwittt at gmail.com (melanie witt) Date: Wed, 6 Nov 2019 11:12:42 -0800 Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> Message-ID: On 11/4/19 16:58, Clark Boylan wrote: > On Mon, Nov 4, 2019, at 7:37 PM, Chris Dent wrote: >> On Fri, 1 Nov 2019, Matt Riedemann wrote: >> >>> On 11/1/2019 9:55 AM, Clark Boylan wrote: >>>> OVH controls the disk IOPs that we get pretty aggressively as well. >>>> Possible it is an IO thing? >>> >>> Yeah, so looking at the dstat output in that graph (thanks for pointing out >>> that site, really nice) we basically have 0 I/O from 16:53 to 16:55, so uh, >>> that's probably not good. >> >> What happens in a case like this? Is there an official procedure for >> "hey, can you give is more IO?" or (if that's not an option) "can >> you give us less CPU?". Is that something that is automated, is is >> something that is monitored and alarming? "INAP ran out of IO X >> times in the last N hours, light the beacons!" > > Typically we try to work with the clouds to properly root cause the issue. Then from there we can figure out what the best fix may be. They are running our software after all and there is a good chance the problems are in openstack. > > I'm in shanghai at the moment but if others want to reach out feel free. benj_ and mgagne are at inap and amorin has been helpful at ovh. The test node logs include a hostid in them somewhere which an be used to identify hypervisors if necessary. Just wanted to throw this out there to the ML in case anyone has any thoughts: Since we know that I/O is overloaded in these cases, would it make any sense to have infra/tempest use a flavor which sets disk I/O quotas [1] to help prevent any one process from getting starved out? I agree that properly troubleshooting the root cause is necessary and maybe adding limits would not be desired for concern of it potentially hiding issues. -melanie [1] https://docs.openstack.org/nova/latest/user/flavors.html#extra-specs-disk-tuning From immo.wetzel at adtran.com Wed Nov 6 22:37:28 2019 From: immo.wetzel at adtran.com (Immo Wetzel) Date: Wed, 6 Nov 2019 22:37:28 +0000 Subject: ephemeral dics storage Message-ID: Gday mates, We do run a pike installation and do run into a problem with the ephemeral discs. As written these are on one side the common way for normal VMs to create a disk as long as the vm exists. On the other side these are stored on local discs. But these are usually not the fastest way and therefore the documentation said that in production environments, which we are going to be, these should be stored on shared spaces too. Like a SAN. I found some descriptions how to use ceph for locale storage via rbd backend but we don't use ceph. Each compute node has an FC connection which is used via cinder for the volumes. So what would be the recommendation to use ephemeral disc with FC SAN ? THX a lot Immo -------------- next part -------------- An HTML attachment was scrubbed... URL: From eandersson at blizzard.com Wed Nov 6 23:07:53 2019 From: eandersson at blizzard.com (Erik Olof Gunnar Andersson) Date: Wed, 6 Nov 2019 23:07:53 +0000 Subject: [nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations? In-Reply-To: References: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> Message-ID: Yea - this is our number one pain point with Nova and Rocky, and having this backported would be invaluable. Since we are on the topic some additional issues we are having. - Sometimes heal_allocations just fails without a good error (e.g. Compute host could not be found.) - Errors are always sequential and always halt execution, so if you have a lot of errors, you'll end up fixing them all one-by-one. - Better logging when unexpected errors do happen (maybe something more verbose like --debug would be good?). Best Regards, Erik Olof Gunnar Andersson -----Original Message----- From: melanie witt Sent: Wednesday, November 6, 2019 9:02 AM To: openstack-discuss at lists.openstack.org Subject: Re: [nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations? On 11/5/19 08:45, Matt Riedemann wrote: > I was helping someone recover from a stuck live migration today where > the migration record was stuck in pre-migrating status and somehow the > request never hit the compute or was lost. The guest was stopped on > the guest and basically the live migration either never started or > never completed properly (maybe rabbit dropped the request or the > compute service was restarted, I don't know). > > I instructed them to update the database to set the migration record > status to 'error' and hard reboot the instance to get it running again. > > Then they pointed out they were seeing this in the compute logs: > > "There are allocations remaining against the source host that might > need to be removed" > > That's because the source node allocations are still tracked in > placement by the migration record and the dest node allocations are > tracked by the instance. Cleaning that up is non-trivial. I have a > troubleshooting doc started for manually cleaning up that kind of > stuff here [1] but ultimately just told them to delete the allocations > in placement for both the migration and the instance and then run the > heal_allocations command to recreate the allocations for the instance. > Since this person's nova deployment was running Stein, they don't have > the --dry-run [2] or --instance [3] options for the heal_allocations > command. This isn't a huge problem but it does mean they could be > healing allocations for instances they didn't expect. > > They could work around this by installing nova from train or master in > a VM/container/virtual environment and running it against the stein > setup, but that's maybe more work than they want to do. > > The question I'm posing is if people would like to see those options > backported to stein and if so, would the stable team be OK with it? > I'd say this falls into a gray area where these are things that are > optional, not used by default, and are operational tooling so less > risk to backport, but it's not zero risk. It's also worth noting that > when I wrote those patches I did so with the intent that people could > backport them at least internally. I think tools like this that provide significant operability benefit are worthwhile to backport and that the value is much greater than the risk. Related but not nearly as simple, I've backported nova-manage db purge and nova-manage db archive_deleted_rows --purge, --before, and --all-cells downstream because of the amount of bugs support/operators have opened around database cleanup pain. These were all pretty difficult to backport with the number of differences and conflicts, but my point is that I understand the motivation well and support the idea. The fact that the patches in question were written with backportability in mind is A Good Thing. -melanie > [1] > https://urldefense.com/v3/__https://review.opendev.org/*/c/691427/__;I > w!2E0gRdhhnqPNNL0!37tRTxqquwil9Vw_imfj9qg3SczjE--jSBbK3qUS_UO_wOddekP_ > GkxCspm5LX4aBQ$ [2] > https://urldefense.com/v3/__https://review.opendev.org/*/c/651932/__;I > w!2E0gRdhhnqPNNL0!37tRTxqquwil9Vw_imfj9qg3SczjE--jSBbK3qUS_UO_wOddekP_ > GkxCspniVit4uQ$ [3] > https://urldefense.com/v3/__https://review.opendev.org/*/c/651945/__;I > w!2E0gRdhhnqPNNL0!37tRTxqquwil9Vw_imfj9qg3SczjE--jSBbK3qUS_UO_wOddekP_ > GkxCsplE-E_TJw$ > From mriedemos at gmail.com Wed Nov 6 23:29:19 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 6 Nov 2019 17:29:19 -0600 Subject: [nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations? In-Reply-To: References: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> Message-ID: <5b9128b6-ca60-64aa-8e95-222412d072c1@gmail.com> On 11/6/2019 5:07 PM, Erik Olof Gunnar Andersson wrote: > Yea - this is our number one pain point with Nova and Rocky, and having this backported would be invaluable. I posted [1] today. If that's accepted I can work on Rocky afterward. > > Since we are on the topic some additional issues we are having. > > - Sometimes heal_allocations just fails without a good error (e.g. Compute host could not be found.) > - Errors are always sequential and always halt execution, so if you have a lot of errors, you'll end up fixing them all one-by-one. > - Better logging when unexpected errors do happen (maybe something more verbose like --debug would be good?). Could you open a bug with more details about the issues you're hitting. Like in what case do you hit ComputeHostNotFound? The sequential errors thing is pretty obvious but I'm not sure what to do about it off the top of my head besides some option to say "process as much as possible storing up all of the errors to dump at the end" kind of thing. As for better logging about unexpected errors, it's hard to know what to log that's better when it's unexpected, you know? If you have examples can you throw those into the bug report? [1] https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:stable/stein+topic:heal_allocations_dry_run -- Thanks, Matt From rui.zang at yandex.com Thu Nov 7 03:01:51 2019 From: rui.zang at yandex.com (rui zang) Date: Thu, 07 Nov 2019 11:01:51 +0800 Subject: ephemeral dics storage In-Reply-To: References: Message-ID: <71054891573095711@myt3-a8f6b0e91bb2.qloud-c.yandex.net> An HTML attachment was scrubbed... URL: From sinan at turka.nl Thu Nov 7 08:23:08 2019 From: sinan at turka.nl (Sinan Polat) Date: Thu, 7 Nov 2019 09:23:08 +0100 (CET) Subject: Change Volume Type, but in use Message-ID: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> Hi, I am using Ceph as the backend for Cinder. Within Ceph we have defined 2 RBD pools (ssdvolumes, sasvolumes). In OpenStack I created 2 Volume Types (ssd, sas). Each Volume Type has property "volume_backend_name='tripleo_ceph_'". In the Cinder configuration I have the following backends configured: [tripleo_ceph_ssd] backend_host=hostgroup volume_backend_name=tripleo_ceph_ssd volume_driver=cinder.volume.drivers.rbd.RBDDriver rbd_ceph_conf=/etc/ceph/ceph.conf rbd_user=openstack rbd_pool=ssdvolumes [tripleo_ceph_sas] backend_host=hostgroup volume_backend_name=tripleo_ceph_sas volume_driver=cinder.volume.drivers.rbd.RBDDriver rbd_ceph_conf=/etc/ceph/ceph.conf rbd_user=openstack rbd_pool=sasvolumes As you might have noticed, the backend name (tripleo_ceph_ssd) and the RBD pool name (ssdvolumes, not ssd) does not match. So far, we do not have any problems. But I want to correct the names and I do not want to have the mismatch anymore. So I want to change the value of key volume_backend_name for both Volume Types (tripleo_ceph_ssd => tripleo_ceph_ssdvolumes). I tried the following: $ openstack volume type show 80cb25ff-376a-4483-b4f7-d8c75839e0ce +--------------------+----------------------------------------+ | Field | Value | +--------------------+----------------------------------------+ | access_project_ids | None | | description | | | id | 80cb25ff-376a-4483-b4f7-d8c75839e0ce | | is_public | True | | name | ssd | | properties | volume_backend_name='tripleo_ceph_ssd' | | qos_specs_id | None | +--------------------+----------------------------------------+ $ $ openstack volume type set --property volume_backend_name='tripleo_ceph_ssdvolumes' 80cb25ff-376a-4483-b4f7-d8c75839e0ce Failed to set volume type property: Volume Type is currently in use. (HTTP 400) (Request-ID: req-5efaa5b7-910f-4802-8494-3115cfc4ab93) Command Failed: One or more of the operations failed $ How to solve my problem? Thanks! Sinan -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Thu Nov 7 08:31:16 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 7 Nov 2019 09:31:16 +0100 Subject: [kolla] Repository setup in non-internet environment. In-Reply-To: References: Message-ID: Hello Eddie, We would welcome such a feature of course! -yoctozepto śr., 6 lis 2019 o 13:31 Eddie Yen napisał(a): > Hi, > > I'm thinking about the deployment in non-internet environment. > As we know Kolla has already prepared docker registry and kolla-build to > let user can create local registry for deployment. But there's still have > two problem about non-internet deployment. > > 1. Docker-ce repository. > 2. Pip repository. > (Also having others perhaps.) > > Does Kolla planning to support non-internet deployment? I would like to do > this if possible. > > Looking forward to hearing from you, > Eddie. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From berendt at betacloud-solutions.de Thu Nov 7 09:25:02 2019 From: berendt at betacloud-solutions.de (Christian Berendt) Date: Thu, 7 Nov 2019 10:25:02 +0100 Subject: [kolla] Repository setup in non-internet environment. In-Reply-To: References: Message-ID: <4850D8F8-C425-40EA-A9A6-B7DF3ECC0C3D@betacloud-solutions.de> Hello Eddie. > On 6. Nov 2019, at 13:28, Eddie Yen wrote: > > 1. Docker-ce repository. Use an APT mirror. For example Aptly. > 2. Pip repository. > (Also having others perhaps.) Packages from Pypi should no longer be necessary for the use of Kolle-Ansible. For some time now. If that's still the case, use a Pypi Mirror. For example Devpi. The Docker images can also be mirrored. Use a local Docker registry to do this. The use of an HTTP proxy like Squid is also possible. This proxy must have online access. The use of Nexus OSS is also a possibility. Then you only have one central mirror service. If you want to build completely offline you can't avoid single mirrors for the single packages (Docker, APT, Pypi). We provide a role under https://github.com/osism/ansible-mirror to deploy individual mirror services with Docker Compose. > Does Kolla planning to support non-internet deployment? I would like to do this if possible. This is already possible and we do this very often. HTH, Christian. -- Christian Berendt Chief Executive Officer (CEO) Mail: berendt at betacloud-solutions.de Web: https://www.betacloud-solutions.de Betacloud Solutions GmbH Teckstrasse 62 / 70190 Stuttgart / Deutschland Geschäftsführer: Christian Berendt Unternehmenssitz: Stuttgart Amtsgericht: Stuttgart, HRB 756139 From missile0407 at gmail.com Thu Nov 7 10:31:00 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Thu, 7 Nov 2019 18:31:00 +0800 Subject: [kolla] Repository setup in non-internet environment. In-Reply-To: <4850D8F8-C425-40EA-A9A6-B7DF3ECC0C3D@betacloud-solutions.de> References: <4850D8F8-C425-40EA-A9A6-B7DF3ECC0C3D@betacloud-solutions.de> Message-ID: Hi Christian, thanks for your reply and suggestion. In some cases we met, all kinds of internet access method (proxy server, mobile internet, etc.) are restricted. And in some previous release (like Rocky), pip packages still necessary. So we will prepare the whole local repository and registry in this kind of environment. When kolla-ansible going to bootstrapping servers, it will insert docker-ce repository. But this is already hard-coded (pointed to download.docker.com). Also no pip local repository setup during bootstrapping. So I gonna do is let them become functional. User can configure local docker-ce and pip repository in globals.yml directly if needed. BTW, glad to know about ansible-mirror. I'd like to try it if I have a time. Many thanks, Eddie. Christian Berendt 於 2019年11月7日 週四 下午5:25寫道: > Hello Eddie. > > > On 6. Nov 2019, at 13:28, Eddie Yen wrote: > > > > 1. Docker-ce repository. > > Use an APT mirror. For example Aptly. > > > > 2. Pip repository. > > (Also having others perhaps.) > > Packages from Pypi should no longer be necessary for the use of > Kolle-Ansible. For some time now. > > If that's still the case, use a Pypi Mirror. For example Devpi. > > > The Docker images can also be mirrored. Use a local Docker registry to do > this. > > > The use of an HTTP proxy like Squid is also possible. This proxy must have > online access. > > The use of Nexus OSS is also a possibility. Then you only have one central > mirror service. > > > If you want to build completely offline you can't avoid single mirrors for > the single packages (Docker, APT, Pypi). > > We provide a role under https://github.com/osism/ansible-mirror to deploy > individual mirror services with Docker Compose. > > > > Does Kolla planning to support non-internet deployment? I would like to > do this if possible. > > This is already possible and we do this very often. > > HTH, Christian. > > -- > Christian Berendt > Chief Executive Officer (CEO) > > Mail: berendt at betacloud-solutions.de > Web: https://www.betacloud-solutions.de > > Betacloud Solutions GmbH > Teckstrasse 62 / 70190 Stuttgart / Deutschland > > Geschäftsführer: Christian Berendt > Unternehmenssitz: Stuttgart > Amtsgericht: Stuttgart, HRB 756139 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Thu Nov 7 10:48:35 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 7 Nov 2019 10:48:35 +0000 Subject: [kolla] Repository setup in non-internet environment. In-Reply-To: <4850D8F8-C425-40EA-A9A6-B7DF3ECC0C3D@betacloud-solutions.de> References: <4850D8F8-C425-40EA-A9A6-B7DF3ECC0C3D@betacloud-solutions.de> Message-ID: On Thu, 7 Nov 2019 at 09:26, Christian Berendt wrote: > > Hello Eddie. > > > On 6. Nov 2019, at 13:28, Eddie Yen wrote: > > > > 1. Docker-ce repository. > > Use an APT mirror. For example Aptly. > Or yum if using CentOS. Pulp or artifactory or $other should work. Presumably there is already some mirror solution for your OS packages? > > > 2. Pip repository. > > (Also having others perhaps.) > > Packages from Pypi should no longer be necessary for the use of Kolle-Ansible. For some time now. You at least need to install Kolla Ansible's Python dependencies. You could consider building a docker image containing Kolla Ansible and using this for your deployments, if that helps. In kolla-ansible bootstrap-servers we also use the easy_install and pip Ansible modules to install pip and the Docker python package. > > If that's still the case, use a Pypi Mirror. For example Devpi. > > > The Docker images can also be mirrored. Use a local Docker registry to do this. > > > The use of an HTTP proxy like Squid is also possible. This proxy must have online access. > > The use of Nexus OSS is also a possibility. Then you only have one central mirror service. > > > If you want to build completely offline you can't avoid single mirrors for the single packages (Docker, APT, Pypi). > > We provide a role under https://github.com/osism/ansible-mirror to deploy individual mirror services with Docker Compose. > > > > Does Kolla planning to support non-internet deployment? I would like to do this if possible. > > This is already possible and we do this very often. > > HTH, Christian. > > -- > Christian Berendt > Chief Executive Officer (CEO) > > Mail: berendt at betacloud-solutions.de > Web: https://www.betacloud-solutions.de > > Betacloud Solutions GmbH > Teckstrasse 62 / 70190 Stuttgart / Deutschland > > Geschäftsführer: Christian Berendt > Unternehmenssitz: Stuttgart > Amtsgericht: Stuttgart, HRB 756139 > > From berendt at betacloud-solutions.de Thu Nov 7 11:06:39 2019 From: berendt at betacloud-solutions.de (Christian Berendt) Date: Thu, 7 Nov 2019 12:06:39 +0100 Subject: [kolla] Repository setup in non-internet environment. In-Reply-To: References: <4850D8F8-C425-40EA-A9A6-B7DF3ECC0C3D@betacloud-solutions.de> Message-ID: <4FA700EE-C9BC-4C14-9074-179CE1F94017@betacloud-solutions.de> Hello Mark. > On 7. Nov 2019, at 11:48, Mark Goddard wrote: > > You at least need to install Kolla Ansible's Python dependencies. You > could consider building a docker image containing Kolla Ansible and > using this for your deployments, if that helps. That's right. That's why we put it in images to avoid this problem with Pypi. Works very well in everyday life. Christian. -- Christian Berendt Chief Executive Officer (CEO) Mail: berendt at betacloud-solutions.de Web: https://www.betacloud-solutions.de Betacloud Solutions GmbH Teckstrasse 62 / 70190 Stuttgart / Deutschland Geschäftsführer: Christian Berendt Unternehmenssitz: Stuttgart Amtsgericht: Stuttgart, HRB 756139 From mark at stackhpc.com Thu Nov 7 14:11:13 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 7 Nov 2019 14:11:13 +0000 Subject: [kolla][ptg] Kolla Ussuri PVPTG (Partially Virtual PTG) In-Reply-To: <8a6d214d-970e-c923-570e-e031aa364305@linaro.org> References: <8a6d214d-970e-c923-570e-e031aa364305@linaro.org> Message-ID: Zoom didn't work well, now trying meet: https://meet.google.com/nyh-gzvy-nnw On Mon, 4 Nov 2019 at 17:05, Marcin Juszkiewicz wrote: > > On 04.11.2019 15:15, Mark Goddard wrote: > > > After polling participants, we have agreed to meet at 1400 - 1800 UTC > > on Thursday and Friday this week. Since not all participants can make > > the first hour, we will adjust the schedule accordingly. > > > > Marcin will follow with connection details for the Zoom video conference. > > As we agreed on Zoom I did a setup of meeting. > > https://zoom.us/j/157063687 will be available for 1400-1800 UTC on both > Thursday and Friday. Sessions will be recorded by platform. > From corey.bryant at canonical.com Thu Nov 7 19:11:41 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Thu, 7 Nov 2019 14:11:41 -0500 Subject: [tc] Add non-voting py38 for ussuri Message-ID: Hello TC members, Python 3.8 is available in Ubuntu Bionic now and while I understand it's too late to enable voting py38 unit tests for ussuri, I'd like to at least enable non-voting py38 unit tests. This email is seeking approval and direction from the TC to move forward with enabling non-voting py38 tests. For some further background: The next release of Ubuntu, Focal (20.04) LTS, is scheduled to release in April 2020. Python 3.8 will be the default in the Focal release, so I'm hopeful that non-voting unit tests will help close some of the gap. I have a review here for the zuul project template enablement for ussuri: https://review.opendev.org/#/c/693401 Also should this be updated considering py38 would be non-voting? https://governance.openstack.org/tc/reference/runtimes/ussuri.html Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From gouthampravi at gmail.com Thu Nov 7 19:47:53 2019 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Thu, 7 Nov 2019 11:47:53 -0800 Subject: Change Volume Type, but in use In-Reply-To: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> References: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> Message-ID: Hey Sinat, The error message suggests that you have volumes that use the volume type you're modifying. If yes, with cinder API, since the Rocky release [1], you cannot modify volume types that are currently in use. The original design for cinder volume types was that they were always mutable - and changes to existing volume types didn't affect pre-existing volumes. However, this behavior was modified in the Ocata release [2], and finally removed in the Rocky release. One of your options is to also rename the existing volume type and create a new one if you'd like, with the original name. [1] https://docs.openstack.org/releasenotes/cinder/rocky.html#relnotes-13-0-0-stable-rocky-upgrade-notes [2] https://review.opendev.org/#/c/440680/ On Thu, Nov 7, 2019 at 12:33 AM Sinan Polat wrote: > Hi, > > I am using Ceph as the backend for Cinder. Within Ceph we have defined 2 > RBD pools (ssdvolumes, sasvolumes). > In OpenStack I created 2 Volume Types (ssd, sas). Each Volume Type has > property "volume_backend_name='tripleo_ceph_'". > > In the Cinder configuration I have the following backends configured: > > [tripleo_ceph_ssd] > backend_host=hostgroup > volume_backend_name=tripleo_ceph_ssd > volume_driver=cinder.volume.drivers.rbd.RBDDriver > rbd_ceph_conf=/etc/ceph/ceph.conf > rbd_user=openstack > rbd_pool=ssdvolumes > > [tripleo_ceph_sas] > backend_host=hostgroup > volume_backend_name=tripleo_ceph_sas > volume_driver=cinder.volume.drivers.rbd.RBDDriver > rbd_ceph_conf=/etc/ceph/ceph.conf > rbd_user=openstack > rbd_pool=sasvolumes > > As you might have noticed, the backend name (tripleo_ceph_ssd) and the RBD > pool name (ssdvolumes, not ssd) does not match. So far, we do not have any > problems. But I want to correct the names and I do not want to have the > mismatch anymore. > > So I want to change the value of key volume_backend_name for both Volume > Types (tripleo_ceph_ssd => tripleo_ceph_ssdvolumes). > > I tried the following: > $ openstack volume type show 80cb25ff-376a-4483-b4f7-d8c75839e0ce > +--------------------+----------------------------------------+ > | Field | Value | > +--------------------+----------------------------------------+ > | access_project_ids | None | > | description | | > | id | 80cb25ff-376a-4483-b4f7-d8c75839e0ce | > | is_public | True | > | name | ssd | > | properties | volume_backend_name='tripleo_ceph_ssd' | > | qos_specs_id | None | > +--------------------+----------------------------------------+ > $ > > > $ openstack volume type set --property > volume_backend_name='tripleo_ceph_ssdvolumes' > 80cb25ff-376a-4483-b4f7-d8c75839e0ce > Failed to set volume type property: Volume Type is currently in use. (HTTP > 400) (Request-ID: req-5efaa5b7-910f-4802-8494-3115cfc4ab93) > Command Failed: One or more of the operations failed > $ > > How to solve my problem? > > Thanks! > > Sinan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sinan at turka.nl Thu Nov 7 20:58:26 2019 From: sinan at turka.nl (Sinan Polat) Date: Thu, 7 Nov 2019 21:58:26 +0100 (CET) Subject: Change Volume Type, but in use In-Reply-To: References: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> Message-ID: <844791131.568096.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> Hi Goutham, Thanks for your response. It is correct that I have volumes that are using the volume type I want to modify. If I understand correctly, modifying volumes types is not possible as long as there are volumes using the volume type. Renaming the volume type and creating a new volume type won't work I guess. Since the backend name in the cinder configuration will be changed, the volumes that are using the renamed volume type won't able to find the backend and will fail, not? I would have the backend named "ssdvolumes" in the cinder configuration. The new volume type would have the correct backend name (ssdvolumes) but the renamed volume type would still have "ssd" as its backend name. Kind regards, Sinan > Op 7 november 2019 om 20:47 schreef Goutham Pacha Ravi > : > > Hey Sinat, > > The error message suggests that you have volumes that use the volume type > you're modifying. If yes, with cinder API, since the Rocky release [1], you > cannot modify volume types that are currently in use. The original design for > cinder volume types was that they were always mutable - and changes to > existing volume types didn't affect pre-existing volumes. However, this > behavior was modified in the Ocata release [2], and finally removed in the > Rocky release. > > One of your options is to also rename the existing volume type and create > a new one if you'd like, with the original name. > > [1] > https://docs.openstack.org/releasenotes/cinder/rocky.html#relnotes-13-0-0-stable-rocky-upgrade-notes > [2] https://review.opendev.org/#/c/440680/ > > > On Thu, Nov 7, 2019 at 12:33 AM Sinan Polat mailto:sinan at turka.nl > wrote: > > > > > > Hi, > > > > I am using Ceph as the backend for Cinder. Within Ceph we have > > defined 2 RBD pools (ssdvolumes, sasvolumes). > > In OpenStack I created 2 Volume Types (ssd, sas). Each Volume Type > > has property "volume_backend_name='tripleo_ceph_'". > > > > In the Cinder configuration I have the following backends > > configured: > > > > [tripleo_ceph_ssd] > > backend_host=hostgroup > > volume_backend_name=tripleo_ceph_ssd > > volume_driver=cinder.volume.drivers.rbd.RBDDriver > > rbd_ceph_conf=/etc/ceph/ceph.conf > > rbd_user=openstack > > rbd_pool=ssdvolumes > > > > [tripleo_ceph_sas] > > backend_host=hostgroup > > volume_backend_name=tripleo_ceph_sas > > volume_driver=cinder.volume.drivers.rbd.RBDDriver > > rbd_ceph_conf=/etc/ceph/ceph.conf > > rbd_user=openstack > > rbd_pool=sasvolumes > > > > As you might have noticed, the backend name (tripleo_ceph_ssd) and > > the RBD pool name (ssdvolumes, not ssd) does not match. So far, we do not > > have any problems. But I want to correct the names and I do not want to have > > the mismatch anymore. > > > > So I want to change the value of key volume_backend_name for both > > Volume Types (tripleo_ceph_ssd => tripleo_ceph_ssdvolumes). > > > > I tried the following: > > $ openstack volume type show 80cb25ff-376a-4483-b4f7-d8c75839e0ce > > +--------------------+----------------------------------------+ > > | Field | Value | > > +--------------------+----------------------------------------+ > > | access_project_ids | None | > > | description | | > > | id | 80cb25ff-376a-4483-b4f7-d8c75839e0ce | > > | is_public | True | > > | name | ssd | > > | properties | volume_backend_name='tripleo_ceph_ssd' | > > | qos_specs_id | None | > > +--------------------+----------------------------------------+ > > $ > > > > > > $ openstack volume type set --property > > volume_backend_name='tripleo_ceph_ssdvolumes' > > 80cb25ff-376a-4483-b4f7-d8c75839e0ce > > Failed to set volume type property: Volume Type is currently in use. > > (HTTP 400) (Request-ID: req-5efaa5b7-910f-4802-8494-3115cfc4ab93) > > Command Failed: One or more of the operations failed > > $ > > > > How to solve my problem? > > > > Thanks! > > > > Sinan > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Thu Nov 7 22:48:17 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 7 Nov 2019 23:48:17 +0100 Subject: Change Volume Type, but in use In-Reply-To: <844791131.568096.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> References: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> <844791131.568096.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> Message-ID: An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Thu Nov 7 22:56:51 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 7 Nov 2019 23:56:51 +0100 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: My non-TC take on this...   > Python 3.8 is available in Ubuntu Bionic now and while I understand it's too late to enable voting py38 unit tests for ussuri, I'd like to at least enable non-voting py38 unit tests. This email is seeking approval and direction from the TC to move forward with enabling non-voting py38 tests. I think it would be great to start testing 3.8 so there are no surprises once we need to officially move there. But I would actually not want to see that run on every since patch in every single repo. > For some further background: The next release of Ubuntu, Focal (20.04) LTS, is scheduled to release in April 2020. Python 3.8 will be the default in the Focal release, so I'm hopeful that non-voting unit tests will help close some of the gap.   > I have a review here for the zuul project template enablement for ussuri: > https://review.opendev.org/#/c/693401 I do not think it should be added to the ussuri jobs template. I think it would be more useful as its own job for now that can be added to a select few repos as a full tempest run so a smaller number of test runs can cover a broader cross-section of projects. Otherwise as maybe a periodic job for now so it doesn't add to the run time and noise on every patch being submitted. Any idea so far from manual py38 testing if there are breaking changes that are going to impact us?   > Also should this be updated considering py38 would be non-voting? > https://governance.openstack.org/tc/reference/runtimes/ussuri.html[https://governance.openstack.org/tc/reference/runtimes/ussuri.html] I not think it would be appropriate to list 3.8 under the Ussuri runtimes. That should only list the officially targeted runtimes for the release. From sinan at turka.nl Thu Nov 7 22:56:43 2019 From: sinan at turka.nl (Sinan Polat) Date: Thu, 7 Nov 2019 23:56:43 +0100 Subject: Change Volume Type, but in use In-Reply-To: References: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> <844791131.568096.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> Message-ID: <9737AF0E-4B21-4214-B0F7-D9F47D056D60@turka.nl> Hi Sean, Currently: Ceph RBD pool name = ssdvolumes. Volume Type name = ssd Volume Type metadata pointing to backend name ssd Backend name in Cinder conf = ssd New situation: Backend name in Cinder conf will be ssdvolumes, and not ssd anymore. Since it seems it is not possible to modify a Volume Type when in use by volumes, I do not see other options. Sinan > Op 7 nov. 2019 om 23:48 heeft Sean McGinnis het volgende geschreven: > > Hi Sinan, > > Changing a backend name is generally not recommended. It is only an internal detail seen by administrators anyway, so I'm not sure it would be worth the effort to try to change it just to add "volumes" to the end. > > Sean > > Sent: Thursday, November 07, 2019 at 2:58 PM > From: "Sinan Polat" > To: "Goutham Pacha Ravi" > Cc: "OpenStack Discuss" > Subject: Re: Change Volume Type, but in use > Hi Goutham, > > Thanks for your response. > > It is correct that I have volumes that are using the volume type I want to modify. If I understand correctly, modifying volumes types is not possible as long as there are volumes using the volume type. > > Renaming the volume type and creating a new volume type won't work I guess. Since the backend name in the cinder configuration will be changed, the volumes that are using the renamed volume type won't able to find the backend and will fail, not? > > I would have the backend named "ssdvolumes" in the cinder configuration. The new volume type would have the correct backend name (ssdvolumes) but the renamed volume type would still have "ssd" as its backend name. > > Kind regards, > Sinan > > Op 7 november 2019 om 20:47 schreef Goutham Pacha Ravi : > > Hey Sinat, > > The error message suggests that you have volumes that use the volume type you're modifying. If yes, with cinder API, since the Rocky release [1], you cannot modify volume types that are currently in use. The original design for cinder volume types was that they were always mutable - and changes to existing volume types didn't affect pre-existing volumes. However, this behavior was modified in the Ocata release [2], and finally removed in the Rocky release. > > One of your options is to also rename the existing volume type and create a new one if you'd like, with the original name. > > [1] https://docs.openstack.org/releasenotes/cinder/rocky.html#relnotes-13-0-0-stable-rocky-upgrade-notes > [2] https://review.opendev.org/#/c/440680/ > > > On Thu, Nov 7, 2019 at 12:33 AM Sinan Polat wrote: > Hi, > > I am using Ceph as the backend for Cinder. Within Ceph we have defined 2 RBD pools (ssdvolumes, sasvolumes). > In OpenStack I created 2 Volume Types (ssd, sas). Each Volume Type has property "volume_backend_name='tripleo_ceph_'". > > In the Cinder configuration I have the following backends configured: > > [tripleo_ceph_ssd] > backend_host=hostgroup > volume_backend_name=tripleo_ceph_ssd > volume_driver=cinder.volume.drivers.rbd.RBDDriver > rbd_ceph_conf=/etc/ceph/ceph.conf > rbd_user=openstack > rbd_pool=ssdvolumes > > [tripleo_ceph_sas] > backend_host=hostgroup > volume_backend_name=tripleo_ceph_sas > volume_driver=cinder.volume.drivers.rbd.RBDDriver > rbd_ceph_conf=/etc/ceph/ceph.conf > rbd_user=openstack > rbd_pool=sasvolumes > > As you might have noticed, the backend name (tripleo_ceph_ssd) and the RBD pool name (ssdvolumes, not ssd) does not match. So far, we do not have any problems. But I want to correct the names and I do not want to have the mismatch anymore. > > So I want to change the value of key volume_backend_name for both Volume Types (tripleo_ceph_ssd => tripleo_ceph_ssdvolumes). > > I tried the following: > $ openstack volume type show 80cb25ff-376a-4483-b4f7-d8c75839e0ce > +--------------------+----------------------------------------+ > | Field | Value | > +--------------------+----------------------------------------+ > | access_project_ids | None | > | description | | > | id | 80cb25ff-376a-4483-b4f7-d8c75839e0ce | > | is_public | True | > | name | ssd | > | properties | volume_backend_name='tripleo_ceph_ssd' | > | qos_specs_id | None | > +--------------------+----------------------------------------+ > $ > > > $ openstack volume type set --property volume_backend_name='tripleo_ceph_ssdvolumes' 80cb25ff-376a-4483-b4f7-d8c75839e0ce > Failed to set volume type property: Volume Type is currently in use. (HTTP 400) (Request-ID: req-5efaa5b7-910f-4802-8494-3115cfc4ab93) > Command Failed: One or more of the operations failed > $ > > How to solve my problem? > > Thanks! > > Sinan -------------- next part -------------- An HTML attachment was scrubbed... URL: From nate.johnston at redhat.com Fri Nov 8 01:10:05 2019 From: nate.johnston at redhat.com (Nate Johnston) Date: Thu, 7 Nov 2019 20:10:05 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: <20191108011005.yknhtfbkvsleckdx@firewall> On Thu, Nov 07, 2019 at 11:56:51PM +0100, Sean McGinnis wrote: > My non-TC take on this... > >   > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's too late to enable voting py38 unit tests for ussuri, I'd like to at least enable non-voting py38 unit tests. This email is seeking approval and direction from the TC to move forward with enabling non-voting py38 tests. > > I think it would be great to start testing 3.8 so there are no surprises once we need to officially move there. But I would actually not want to see that run on every since patch in every single repo. > > > For some further background: The next release of Ubuntu, Focal (20.04) LTS, is scheduled to release in April 2020. Python 3.8 will be the default in the Focal release, so I'm hopeful that non-voting unit tests will help close some of the gap. >   > > I have a review here for the zuul project template enablement for ussuri: > > https://review.opendev.org/#/c/693401 > > I do not think it should be added to the ussuri jobs template. Would it be possible to add it to the template, but under the experimental queue? That way we leverage the template's ability to do the work for all projects but the job won't be executed without a specific experimental check. Thanks, Nate > I think it would be more useful as its own job for now that can be added to a select few repos as a full tempest run so a smaller number of test runs can cover a broader cross-section of projects. > > Otherwise as maybe a periodic job for now so it doesn't add to the run time and noise on every patch being submitted. > > Any idea so far from manual py38 testing if there are breaking changes that are going to impact us? >   > > Also should this be updated considering py38 would be non-voting? > > https://governance.openstack.org/tc/reference/runtimes/ussuri.html[https://governance.openstack.org/tc/reference/runtimes/ussuri.html] > > I not think it would be appropriate to list 3.8 under the Ussuri runtimes. That should only list the officially targeted runtimes for the release. > From skaplons at redhat.com Fri Nov 8 05:03:22 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 8 Nov 2019 13:03:22 +0800 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: <20191108011005.yknhtfbkvsleckdx@firewall> References: <20191108011005.yknhtfbkvsleckdx@firewall> Message-ID: <20191108050322.kpzhombboymjk4wf@skaplons-mac> On Thu, Nov 07, 2019 at 08:10:05PM -0500, Nate Johnston wrote: > On Thu, Nov 07, 2019 at 11:56:51PM +0100, Sean McGinnis wrote: > > My non-TC take on this... > > > >   > > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's too late to enable voting py38 unit tests for ussuri, I'd like to at least enable non-voting py38 unit tests. This email is seeking approval and direction from the TC to move forward with enabling non-voting py38 tests. > > > > I think it would be great to start testing 3.8 so there are no surprises once we need to officially move there. But I would actually not want to see that run on every since patch in every single repo. > > > > > For some further background: The next release of Ubuntu, Focal (20.04) LTS, is scheduled to release in April 2020. Python 3.8 will be the default in the Focal release, so I'm hopeful that non-voting unit tests will help close some of the gap. > >   > > > I have a review here for the zuul project template enablement for ussuri: > > > https://review.opendev.org/#/c/693401 > > > > I do not think it should be added to the ussuri jobs template. > > Would it be possible to add it to the template, but under the > experimental queue? That way we leverage the template's ability to do > the work for all projects but the job won't be executed without a > specific experimental check. Personally from neutron point of view I think that periodic is better than experimental as with periodic jobs we don't need to do any additional actions to run this job and see results. And we are checking periodic jobs' results every week on CI meeting. But ofcourse experimental would also work :) > > Thanks, > > Nate > > > I think it would be more useful as its own job for now that can be added to a select few repos as a full tempest run so a smaller number of test runs can cover a broader cross-section of projects. > > > > Otherwise as maybe a periodic job for now so it doesn't add to the run time and noise on every patch being submitted. > > > > Any idea so far from manual py38 testing if there are breaking changes that are going to impact us? > >   > > > Also should this be updated considering py38 would be non-voting? > > > https://governance.openstack.org/tc/reference/runtimes/ussuri.html[https://governance.openstack.org/tc/reference/runtimes/ussuri.html] > > > > I not think it would be appropriate to list 3.8 under the Ussuri runtimes. That should only list the officially targeted runtimes for the release. > > > > -- Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Fri Nov 8 05:08:16 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 8 Nov 2019 13:08:16 +0800 Subject: [kolla][ptg] Kolla Ussuri PVPTG (Partially Virtual PTG) In-Reply-To: <196d5a99-53f4-68c7-d28d-b6962abb8b3b@linaro.org> References: <196d5a99-53f4-68c7-d28d-b6962abb8b3b@linaro.org> Message-ID: <20191108050816.xs5yourrkqqxt43g@skaplons-mac> Hi, >From what I can say now, I'm streaming Neutron PTG sessions through BlueJeans for last 3 days and it works without any problems and without any vpn connection. So this may be useful also for You. On Thu, Oct 31, 2019 at 09:55:26AM +0100, Marcin Juszkiewicz wrote: > W dniu 30.10.2019 o 23:23, Kendall Nelson pisze: > > > If people were going to be in Shanghai for the Summit (or live in > > China) they wouldn't be able to participate because of the firewall. > > Can you (or someone else present in Poland) provide an alternative > > solution to Google meet so that everyone interested could join? > > Tell us which of them work for you: > > - Bluejeans > - Zoom > > > As I have access to both platforms at work. > -- Slawek Kaplonski Senior software engineer Red Hat From mdulko at redhat.com Fri Nov 8 05:53:20 2019 From: mdulko at redhat.com (=?UTF-8?Q?Micha=C5=82?= Dulko) Date: Fri, 08 Nov 2019 06:53:20 +0100 Subject: [kuryr] Meeting at PTG - follow up Message-ID: <5eee4b4f468c5b8c9b7407762165179d7b665f3f.camel@redhat.com> Hi, Thank you all for a constructive discussion at the Shanghai PTG. First of all I want to clarify that the Neutron improvements in bulk port creation code I talked about were only merged in Stein and are not available in Queens. Sorry about that, my mistake. Below I want to list some highlights from the discussion. * There seemed no pushback on switching to independent release model, so I'll follow up on that soon. * We discussed using RPC instead of API polling to detect when port becomes ACTIVE in Neutron (https://review.opendev.org/#/c/669642/). I commented on the review with a follow up that came from discussion with Neutron team. Apparently there are 2 better ways of doing this. Let's continue that discussion on the review. * We know that we don't support running as a second CNI plugin on Multus because of eth0 being hardcoded as interface name and kuryr- controller handling pods that do not had CNI requests. This is something to solve. * Apparently we all suffer the same problems with Neutron performance when starting a bigger workload on Kuryr-configured K8s cluster (think big Helm chart, multiple operators or simultaneous test suite). This all comes to reducing the number of calls to Neutron. We still don't see a feasible solution to solve this problem, but it seems to be a priority for both Samsung and Red Hat at the moment. Let's see if we'll meet again in Vancouver! Thanks, Michał From mdulko at redhat.com Fri Nov 8 06:05:26 2019 From: mdulko at redhat.com (=?UTF-8?Q?Micha=C5=82?= Dulko) Date: Fri, 08 Nov 2019 07:05:26 +0100 Subject: [kuryr] Meeting at PTG - follow up In-Reply-To: <5eee4b4f468c5b8c9b7407762165179d7b665f3f.camel@redhat.com> References: <5eee4b4f468c5b8c9b7407762165179d7b665f3f.camel@redhat.com> Message-ID: <3e8d7b10556ef04353894ff0cbaffdd17da24450.camel@redhat.com> On Fri, 2019-11-08 at 06:53 +0100, Michał Dulko wrote: > Hi, > > Thank you all for a constructive discussion at the Shanghai PTG. First > of all I want to clarify that the Neutron improvements in bulk port > creation code I talked about were only merged in Stein and are not > available in Queens. Sorry about that, my mistake. > > Below I want to list some highlights from the discussion. > > * There seemed no pushback on switching to independent release model, > so I'll follow up on that soon. > * We discussed using RPC instead of API polling to detect when port > becomes ACTIVE in Neutron (https://review.opendev.org/#/c/669642/). > I commented on the review with a follow up that came from discussion > with Neutron team. Apparently there are 2 better ways of doing this. > Let's continue that discussion on the review. > * We know that we don't support running as a second CNI plugin on > Multus because of eth0 being hardcoded as interface name and kuryr- > controller handling pods that do not had CNI requests. This is > something to solve. > * Apparently we all suffer the same problems with Neutron performance > when starting a bigger workload on Kuryr-configured K8s cluster > (think big Helm chart, multiple operators or simultaneous test > suite). This all comes to reducing the number of calls to Neutron. > We still don't see a feasible solution to solve this problem, but it > seems to be a priority for both Samsung and Red Hat at the moment. > > Let's see if we'll meet again in Vancouver! > > Thanks, > Michał I forgot to add link to raw etherpad with notes: https://etherpad.openstack.org/p/kuryr-PVG From balazs.gibizer at est.tech Fri Nov 8 06:33:11 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 8 Nov 2019 06:33:11 +0000 Subject: [nova][ptg] Virtual PTG In-Reply-To: <4254ccd8-88ca-b21d-29b6-ab4e427f3ee4@fried.cc> References: <4254ccd8-88ca-b21d-29b6-ab4e427f3ee4@fried.cc> Message-ID: <1573194777.23158.0@est.tech> On Thu, Oct 24, 2019 at 17:28, Eric Fried wrote: > Hello nova contributors and other stakeholders. > > As you are aware, nova maintainers will be sparser than usual at the > ussuri PTG. For that reason, and also because it promotes better > inclusion anyway, I'd like us to do the majority of decision making > via > the mailing list. The PTG is still a useful place to talk through > design > ideas, but this will give those not attending a voice in the final > direction. > > To that end, I call your attention to the etherpad [1]. As usual, list > your topics there. And if your topic is something for which you only > need (or wish to start with) in-person discussions (e.g. "I'd like to > do > $thing but could use some help figuring out $how"), you're done. > > But if what you're shooting for is discussion leading to some kind of > decision, like... > > - My spec has been stalled because we can't decide among N different > approaches; we need to reach a consensus. > - My feature is really important; can we please prioritize it for > ussuri? > > ...then in addition to putting your topic on the etherpad, please > initiate a (separate) thread on this mailing list, including > [nova][ptg] > in your subject line. > > Some of these topics may be resolved before the PTG itself. Others may > be discussed in Shanghai. However, even if a consensus is reached in > person, expect that decision to be tentative pending closure of the ML > thread. Now as the PTG is over (for nova at least as we finished a bit earlier) Sylvain and me will start sending out summary mails about the topics we discussed to create a place to further discuss the topics with the rest of the nova team. The etherpad[1] contains most of the agreements we reached during the discussions but of course none of them are final as we did not have a core quorum on the PTG. Thanks for everyone who contributed in any way to the PTG discussions! Cheers, gibi > > Thanks, > efried > > [1] https://etherpad.openstack.org/p/nova-shanghai-ptg > From balazs.gibizer at est.tech Fri Nov 8 07:09:31 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 8 Nov 2019 07:09:31 +0000 Subject: [nova][ptg] pinned and unpinned CPUs in one instance Message-ID: <1573196961.23158.1@est.tech> spec: https://review.opendev.org/668656 Agreements from the PTG: How we will test it: * do functional test with libvirt driver, like the pinned cpu tests we have today * donyd's CI supports nested virt so we can do pinned cpu testing but not realtime. As this CI is still work in progress we should not block on this. * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to have Naming: use the 'shared' and 'dedicated' terminology Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax will have less expression power until nova models NUMA in placement. So nova will try to evenly distribute PCPUs between numa nodes. If it not possible we reject the request and ask the user to use the hw:pinvcpus=3 syntax. Realtime mask is an exclusion mask, any vcpus not listed there has to be in the dedicated set of the instance. TODOInvestigate whether we want to enable NUMA by default * Pros: Simpler, everything is NUMA by default * Cons: We'll either have to break/make configurablethe 1:1 guest:host NUMA mapping else we won't be able to boot e.g. a 40 core shared instance on a 40 core, 2 NUMA node host Cheers, gibi From balazs.gibizer at est.tech Fri Nov 8 07:24:18 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 8 Nov 2019 07:24:18 +0000 Subject: [nova][ptg] Community Goals Message-ID: <1573197848.23158.2@est.tech> python3 readiness: Stick with only considering py3 as default when writing code, not caring about refactoring existing code for the purpose of being Py3 pedantic. Write Py3-only code on master, and eventually make the backport py2-compatible on stable branches if needed. dropping paste: Need an operator ML post: is anybody using this? aarents (OVH?) said they are using paste.ini to insert middlewares. So we cannot simply drop paste. improved contributor documentation: we have PTL doc in tree [2]. Work on the CONTRIBUTING.rst [3]. [1] https://etherpad.openstack.org/p/nova-shanghai-ptg [2] https://docs.openstack.org/nova/latest/contributor/ptl-guide.html [3] https://review.opendev.org/#/c/640970/ From balazs.gibizer at est.tech Fri Nov 8 07:32:47 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 8 Nov 2019 07:32:47 +0000 Subject: [nova][ptg]Support volume local cache Message-ID: <1573198357.23158.3@est.tech> spec: https://review.opendev.org/#/c/689070/ TODOs: LiangFang to update the spec based on the discussion in the room[1]: * use traits to driver scheduling. The cache is not sliced per instance so it cannot be a resource class * document the alternative between doing a hard scheduling decision or only implement caching as a best effort optimization for the guest. * document the alternative to do the whole cache management on libvirt (or QEMU) level [1] https://etherpad.openstack.org/p/nova-shanghai-ptg From balazs.gibizer at est.tech Fri Nov 8 08:01:58 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 8 Nov 2019 08:01:58 +0000 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship Message-ID: <1573200108.23158.4@est.tech> spec: https://review.opendev.org/#/c/650188/ Agreements from the room[1]: * new config option 'using_shared_disk_provider' (just an example name) on the compute level to ignore the DISK_GB reporting from the driver * new config option 'sharing_disk_aggregate' (just an example name) on the compute level to tell nova compute what is the UUID of the placement aggregate that contains the sharing DISK_GB providers in placement. * the "using_shared_disk_provider" flag necessarly has to be explicit since if not, it would be a chicken-and-egg problem on a greenfields install as the shared RP wouldn't be created * deployer needs to create the sharing disk RP and report inventory / traits on it * deployer needs to define the placement aggregate and add the sharing disk RP into it * when compute restarts and sees that 'using_shared_disk_provider' = True in the config, it adds the its compute RP to the aggregate defined in 'sharing_disk_aggregate' Then if it sees that the root RP still has DISK_GB inventory then trigger a reshape * os-hypervisor API response (in a new microversion) will have a link to the sharing disk RP if the compute is so configured. TODO: * tpatil to update the spec [1] https://etherpad.openstack.org/p/nova-shanghai-ptg From Tushar.Patil at nttdata.com Fri Nov 8 09:39:49 2019 From: Tushar.Patil at nttdata.com (Patil, Tushar) Date: Fri, 8 Nov 2019 09:39:49 +0000 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: <1573200108.23158.4@est.tech> References: <1573200108.23158.4@est.tech> Message-ID: Hi All, > TODO: > * tpatil to update the spec I will update the specs next week and upload it for review. Regards, tpatil ________________________________________ From: Balázs Gibizer Sent: Friday, November 8, 2019 5:01 PM To: openstack-discuss Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship spec: https://review.opendev.org/#/c/650188/ Agreements from the room[1]: * new config option 'using_shared_disk_provider' (just an example name) on the compute level to ignore the DISK_GB reporting from the driver * new config option 'sharing_disk_aggregate' (just an example name) on the compute level to tell nova compute what is the UUID of the placement aggregate that contains the sharing DISK_GB providers in placement. * the "using_shared_disk_provider" flag necessarly has to be explicit since if not, it would be a chicken-and-egg problem on a greenfields install as the shared RP wouldn't be created * deployer needs to create the sharing disk RP and report inventory / traits on it * deployer needs to define the placement aggregate and add the sharing disk RP into it * when compute restarts and sees that 'using_shared_disk_provider' = True in the config, it adds the its compute RP to the aggregate defined in 'sharing_disk_aggregate' Then if it sees that the root RP still has DISK_GB inventory then trigger a reshape * os-hypervisor API response (in a new microversion) will have a link to the sharing disk RP if the compute is so configured. TODO: * tpatil to update the spec [1] https://etherpad.openstack.org/p/nova-shanghai-ptg Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. From antoine.millet at enix.fr Fri Nov 8 09:53:57 2019 From: antoine.millet at enix.fr (Antoine Millet) Date: Fri, 08 Nov 2019 10:53:57 +0100 Subject: [ops][nova][neutron] Proper way to migrate instances between nodes with different ML2 agents types Message-ID: Hi here, I'm trying to find a solution to migrate instances between hypervisors of an openstack cluster with nodes running different ML2 agents (OVS and bridges, I'm actually migrating the whole cluster to the latter). The cluster is running Rocky. I enabled both mechanisms in the neutron- server configuration and some nodes are running the neutron- openvswitch-agent and some other the neutron-linuxbridge-agent. My network nodes (running the l3 agent) are currently running the neutron- openvswitch-agent. I also noticed that when nova-compute is starting up, VIF plugins for OVS and Bridges are loaded ("INFO os_vif [-] Loaded VIF plugins: ovs, linux_bridge"). When I start a live migration for an instance running on an hypervisor using the OVS agent to an hypervisor using the bridge agent, it fails because the destination hypervisor try to execute 'ovs-*' commands to bind the VM to its network. I also tried cold migration and just restarting an hypervisor with the bridge agent instead of the OVS one, but it fails similarly when the instances startup. After some research, I discovered that the mechanism used to bind an instance port to a network is stored in the port binding configuration in the database and that the code that executes the 'ovs-*' commands is actually located in the os_vif library that is used by the nova-compute agent. So, I tried to remove the OVS plugin from the os_vif library. Ubuntu ship both plugins in the same package so I just deleted the plugin directory in /usr/lib/python2.7/dist-packages directory (don't judge me please, it's for science ;-)). And... it worked as expected (port bindings are converted to bridge mechanism), at least for the cold migration (hot migration is cancelled without any error message, I need to investigate more). How can I do those migration the proper way? Thank you for any help! Antoine -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From info at dantalion.nl Fri Nov 8 11:45:17 2019 From: info at dantalion.nl (info at dantalion.nl) Date: Fri, 8 Nov 2019 12:45:17 +0100 Subject: [olso][taskflow] graph-flow failed task will halt execution of children? In-Reply-To: References: Message-ID: <953ef54c-4463-2f80-2997-aca339b9a369@dantalion.nl> I have a short and simple question which I couldn't find a clear answer for in the documentation. I understand that when a task raises a exception in a graph flow it will revert all parents, however, I fail to find any information if it will subsequently prevent the execution of all children. I imagine yes as the dependencies for these tasks are now unmet but I would like to know for sure. TL;DR; Does an exception in a graph-flow task prevent the execution of children? Kind Regards, Corne Lukken (Dantali0n) From smooney at redhat.com Fri Nov 8 12:20:52 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 08 Nov 2019 12:20:52 +0000 Subject: [nova][ptg] pinned and unpinned CPUs in one instance In-Reply-To: <1573196961.23158.1@est.tech> References: <1573196961.23158.1@est.tech> Message-ID: On Fri, 2019-11-08 at 07:09 +0000, Balázs Gibizer wrote: > spec: https://review.opendev.org/668656 > > Agreements from the PTG: > > How we will test it: > * do functional test with libvirt driver, like the pinned cpu tests we > have today > * donyd's CI supports nested virt so we can do pinned cpu testing but > not realtime. As this CI is still work in progress we should not block > on this. we can do realtime testing in that ci. i already did. also there is a new label that is available across 3 providers so we wont just be relying on donyd's good work. > * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to > have > > Naming: use the 'shared' and 'dedicated' terminology didn't we want to have a hw:cpu_policy=mixed specificaly for this case? > > Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra > specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax > will have less expression power until nova models NUMA in placement. So > nova will try to evenly distribute PCPUs between numa nodes. If it not > possible we reject the request and ask the user to use the > hw:pinvcpus=3 syntax. > > Realtime mask is an exclusion mask, any vcpus not listed there has to > be in the dedicated set of the instance. > > TODOInvestigate whether we want to enable NUMA by default > * Pros: Simpler, everything is NUMA by default > * Cons: We'll either have to break/make configurablethe 1:1 guest:host in the context of mix if we dont enable numa affinity by default we should remove that behavior from all case where we do it today. > NUMA mapping else we won't be able to boot e.g. a 40 core shared > instance on a 40 core, 2 NUMA node host if this is a larger question of if we should have all instance be numa by default i have argued yes for quite a while as i think having 1 code path has many advantages. that said im aware of this limitation. one way to solve this was the use of the proposed can_split placmenent paramter. so if you did not specify a numa toplogy we would add can_split=vCPUs and then create a singel or multiple numa node toplogy based on the allcoations. if we combine that with a allocation weigher we could sort the allocation candiates by smallest number of numa nodes so we would prefer landing on hosts that can fit it on 1 numa node. its a big change but long overdue. that said i have also argued the other point too in responce to pushback on "all vms have numa of 1 unless you say otherwise" i.e. that the 1:1 between mapping virtual and host numa nodes shoudl be configurable and is not required by the api today. the backwards compatible way to do that is its not requried by default if you are using shared cores and is required if you are using pinned but that is a littel confusing. i dont really know what the right answer to this is but i think its a seperate question form the topic of this thread. we dont need to solve this to enable pinned and unpinned cpus in one instance but we do need to adress this before we can model numa in placment. > > > Cheers, > gibi > > > From mriedemos at gmail.com Fri Nov 8 14:03:28 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 8 Nov 2019 08:03:28 -0600 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: <1573200108.23158.4@est.tech> References: <1573200108.23158.4@est.tech> Message-ID: <4ffbcc4c-c043-5d7a-7f7a-d78de9fc75d7@gmail.com> On 11/8/2019 2:01 AM, Balázs Gibizer wrote: > * deployer needs to create the sharing disk RP and report inventory / > traits on it > * deployer needs to define the placement aggregate and add the sharing > disk RP into it > * when compute restarts and sees that 'using_shared_disk_provider' = > True in the config, it adds the its compute RP to the aggregate defined > in 'sharing_disk_aggregate' Then if it sees that the root RP still has > DISK_GB inventory then trigger a reshape Does the compute host also get added to a nova host aggregate which mirrors the resource provider aggregate in placmeent or do we only need the placement resource provider sharing DISK_GB aggregate? -- Thanks, Matt From mriedemos at gmail.com Fri Nov 8 14:05:33 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 8 Nov 2019 08:05:33 -0600 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: <1573200108.23158.4@est.tech> References: <1573200108.23158.4@est.tech> Message-ID: <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> On 11/8/2019 2:01 AM, Balázs Gibizer wrote: > * when compute restarts and sees that 'using_shared_disk_provider' = > True in the config, it adds the its compute RP to the aggregate defined > in 'sharing_disk_aggregate' Then if it sees that the root RP still has > DISK_GB inventory then trigger a reshape Conversely, if the deployer decides to use local disk for the host again, what are the steps? 1. Change using_shared_disk_provider=False 2. Restart/SIGHUP compute service 3. Compute removes itself from the aggregate 4. Compute reshapes to add DISK_GB inventory on the root compute node resource provider and moves DISK_GB allocations from the sharing provider back to the root compute node provider. Correct? -- Thanks, Matt From corey.bryant at canonical.com Fri Nov 8 14:09:51 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Fri, 8 Nov 2019 09:09:51 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: On Thu, Nov 7, 2019 at 5:56 PM Sean McGinnis wrote: > My non-TC take on this... > > > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's > too late to enable voting py38 unit tests for ussuri, I'd like to at least > enable non-voting py38 unit tests. This email is seeking approval and > direction from the TC to move forward with enabling non-voting py38 tests. > > I think it would be great to start testing 3.8 so there are no surprises > once we need to officially move there. But I would actually not want to see > that run on every since patch in every single repo. > Just to be clear I'm only talking about unit tests right now which are generally light on resource requirements. However it would be great to also have py38 function test enablement and periodic would make sense for function tests at this point. For unit tests though it seems the benefit of knowing whether your patch regresses unit tests for the latest python version far outweighs the resources required, so I don't see much benefit in adding periodic unit test jobs. > > For some further background: The next release of Ubuntu, Focal (20.04) > LTS, is scheduled to release in April 2020. Python 3.8 will be the default > in the Focal release, so I'm hopeful that non-voting unit tests will help > close some of the gap. > > > I have a review here for the zuul project template enablement for ussuri: > > https://review.opendev.org/#/c/693401 > > I do not think it should be added to the ussuri jobs template. > > I think it would be more useful as its own job for now that can be added > to a select few repos as a full tempest run so a smaller number of test > runs can cover a broader cross-section of projects. > > Otherwise as maybe a periodic job for now so it doesn't add to the run > time and noise on every patch being submitted. > > Do you feel the same on the 2 points above for unit test enablement? Any idea so far from manual py38 testing if there are breaking changes that > are going to impact us? > I don't have enough details yet so I'll have to get back to you on that but yes there is breakage that I haven't dug into yet. > > Also should this be updated considering py38 would be non-voting? > > > https://governance.openstack.org/tc/reference/runtimes/ussuri.html[https://governance.openstack.org/tc/reference/runtimes/ussuri.html] > > I not think it would be appropriate to list 3.8 under the Ussuri runtimes. > That should only list the officially targeted runtimes for the release. > Ok, makes sense. Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Fri Nov 8 14:16:27 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 8 Nov 2019 08:16:27 -0600 Subject: [ops][nova][neutron] Proper way to migrate instances between nodes with different ML2 agents types In-Reply-To: References: Message-ID: <8b93198d-5d61-0d25-7ebd-5a10ee01642b@gmail.com> On 11/8/2019 3:53 AM, Antoine Millet wrote: > How can I do those migration the proper way? [1] was implemented in Rocky to support live migration between different networking backends (vif types). A couple of things to check: 1. Is Neutron fully upgraded to Rocky and exposing the "Port Bindings Extended" (binding-extended) extension? Nova uses that to determine if neutron is new enough to create an inactive port binding for the target host prior to starting the live migration. 2. Are your nova-compute services all upgraded to at least Rocky and reporting version >=35 in the services table in the cell1 DB? [2] 3. Do you have [compute]/upgrade_levels RPC pinned to anything below Rocky? Or is that configured to "auto"? These are things to check just to make sure the basic upgrade requirements are satisfied before the code will even attempt to do the new style binding flow for live migration. If that's all working properly, you should see this DEBUG log message on the source host during live migration [4]. [1] https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/neutron-new-port-binding-api.html [2] https://github.com/openstack/nova/blob/a90fe1951200ebd27fe74788c0a96c01104ac2cf/nova/conductor/tasks/live_migrate.py#L41 [3] https://docs.openstack.org/nova/rocky/configuration/config.html#upgrade_levels.compute [4] https://github.com/openstack/nova/blob/a90fe1951200ebd27fe74788c0a96c01104ac2cf/nova/virt/libvirt/migration.py#L317 -- Thanks, Matt From openstack at fried.cc Fri Nov 8 14:36:17 2019 From: openstack at fried.cc (Eric Fried) Date: Fri, 8 Nov 2019 08:36:17 -0600 Subject: [nova][ptg] Virtual PTG In-Reply-To: <1573194777.23158.0@est.tech> References: <4254ccd8-88ca-b21d-29b6-ab4e427f3ee4@fried.cc> <1573194777.23158.0@est.tech> Message-ID: <20db8f68-f841-6d34-a63f-305967ef9af2@fried.cc> Thanks very much to all those who facilitated and participated on site! > Now as the PTG is over (for nova at least as we finished a bit earlier) > Sylvain and me will start sending out summary mails about the topics we > discussed to create a place to further discuss the topics with the rest > of the nova team. The etherpad[1] contains most of the agreements we > reached during the discussions but of course none of them are final as > we did not have a core quorum on the PTG. efried . From antoine.millet at enix.fr Fri Nov 8 15:53:27 2019 From: antoine.millet at enix.fr (Antoine Millet) Date: Fri, 08 Nov 2019 16:53:27 +0100 Subject: [ops][nova][neutron] Proper way to migrate instances between nodes with different ML2 agents types In-Reply-To: <8b93198d-5d61-0d25-7ebd-5a10ee01642b@gmail.com> References: <8b93198d-5d61-0d25-7ebd-5a10ee01642b@gmail.com> Message-ID: Matt, Thank you for your answer! > > 1. Is Neutron fully upgraded to Rocky and exposing the "Port > Bindings > Extended" (binding-extended) extension? Nova uses that to determine > if > neutron is new enough to create an inactive port binding for the > target > host prior to starting the live migration. I'm not sure how to test that but my neutron components are all upgraded to rocky / 13.0.4. > > 2. Are your nova-compute services all upgraded to at least Rocky and > reporting version >=35 in the services table in the cell1 DB? [2] I can confirm that. > > 3. Do you have [compute]/upgrade_levels RPC pinned to anything below > Rocky? Or is that configured to "auto"? All the upgrade_levels on compute are pinned to auto (control plane and nodes). > If that's all working properly, you should see this DEBUG log message > on > the source host during live migration [4]. I can actually see this message on the nova-compute logs: 2019-11-08 16:34:18.599 3995 DEBUG nova.virt.libvirt.migration [-] [instance: 3dbf401b-19bf-4342-a04e-6ac9cff99efe] Updating guest XML with vif config: And here is the problem at the same time at the destination: 2019-11-08 15:34:23.720 4434 ERROR os_vif AgentError: Error during following call to agent: ['ovs-vsctl', '--timeout=120', '--', '--if- exists', 'del-port', u'br-int', u'qvoea312a58-2e'] Antoine -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From smooney at redhat.com Fri Nov 8 16:23:22 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 08 Nov 2019 16:23:22 +0000 Subject: [ops][nova][neutron] Proper way to migrate instances between nodes with different ML2 agents types In-Reply-To: References: Message-ID: On Fri, 2019-11-08 at 10:53 +0100, Antoine Millet wrote: > Hi here, > > I'm trying to find a solution to migrate instances between hypervisors > of an openstack cluster with nodes running different ML2 agents (OVS > and bridges, I'm actually migrating the whole cluster to the latter). > > The cluster is running Rocky. I enabled both mechanisms in the neutron- > server configuration and some nodes are running the neutron- > openvswitch-agent and some other the neutron-linuxbridge-agent. My > network nodes (running the l3 agent) are currently running the neutron- > openvswitch-agent. I also noticed that when nova-compute is starting > up, VIF plugins for OVS and Bridges are loaded ("INFO os_vif [-] Loaded > VIF plugins: ovs, linux_bridge"). > > When I start a live migration for an instance running on an hypervisor > using the OVS agent to an hypervisor using the bridge agent, it fails > because the destination hypervisor try to execute 'ovs-*' commands to > bind the VM to its network. I also tried cold migration and just > restarting an hypervisor with the bridge agent instead of the OVS one, > but it fails similarly when the instances startup. > > After some research, I discovered that the mechanism used to bind an > instance port to a network is stored in the port binding configuration > in the database and that the code that executes the 'ovs-*' commands is > actually located in the os_vif library that is used by the nova-compute > agent. > > So, I tried to remove the OVS plugin from the os_vif library. Ubuntu > ship both plugins in the same package so I just deleted the plugin > directory in /usr/lib/python2.7/dist-packages directory (don't judge me > please, it's for science ;-)). And... it worked as expected (port > bindings are converted to bridge mechanism), at least for the cold > migration (hot migration is cancelled without any error message, I need > to investigate more). so while that is an inventive approch os-vif is not actully involved in the port binding process it handles port pluggin later. i did some testing aroudn this usecase back in 2018 and found a number of gaps that need to be addressed to support live migration between linux brige and ovs or viseversa first the bridge name not set in vif:binding-details by ml2/linux-bridge https://bugs.launchpad.net/neutron/+bug/1788012 os if we try to go from ovs to linuxbridge we generates the wrong xml and try to add the port to a linux bridge called br-int Updating guest XML with vif config: Aug 14 12:15:27 devstack1 nova-compute[14852]: Aug 14 12:15:27 devstack1 nova-compute[14852]: Aug 14 12:15:27 devstack1 nova-compute[14852]: Aug 14 12:15:27 devstack1 nova-compute[14852]: Aug 14 12:15:27 devstack1 nova-compute[14852]: Aug 14 12:15:27 devstack1 nova-compute[14852]: using mixed linux bridnge and ovs host also has other proablems if you are using vxlan or gre because neutron does not form mesh tunnel overly between different ml2 driver. https://bugs.launchpad.net/neutron/+bug/1788023 the linux bridge plugin also uses a different tcp port for reasons(vxlan was merged in teh linux kernel before the inan port number was assigned.) so in effect there is not support way to do this with a live migration in rocky but there ways to force it to work. the simpelest way to do this is to cold migrate followed by a hard reboot but you need to add both ovs and linux bridge tools on each host but only have 1 agent running. you can also live migrate twice to the same host and hard reboot. the first migration will fail. the second should succeed but result in the vm tap device being connected to the wrong bridge and the hard reboot fixes it. > way? > > Thank you for any help! > > Antoine From johnsomor at gmail.com Fri Nov 8 16:35:48 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Fri, 8 Nov 2019 08:35:48 -0800 Subject: [olso][taskflow] graph-flow failed task will halt execution of children? In-Reply-To: <953ef54c-4463-2f80-2997-aca339b9a369@dantalion.nl> References: <953ef54c-4463-2f80-2997-aca339b9a369@dantalion.nl> Message-ID: Short answer is yes. When a task fails and the revert path starts, it goes back up the graph executing the revert methods and will not execute any children beyond the failed task. That said, there is an option in the engine to disable reverts (execution will simply halt at the failed task), there are ways to make decision paths, and there is a pretty robust set of retry tools that can be applied in a revert situation. Michael On Fri, Nov 8, 2019 at 2:49 AM info at dantalion.nl wrote: > > I have a short and simple question which I couldn't find a clear > answer for in the documentation. > > I understand that when a task raises a exception in a graph flow it > will revert all parents, however, I fail to find any information if it > will subsequently prevent the execution of all children. > > I imagine yes as the dependencies for these tasks are now unmet but I > would like to know for sure. > > TL;DR; Does an exception in a graph-flow task prevent the execution of > children? > > Kind Regards, > Corne Lukken (Dantali0n) > From mark at stackhpc.com Fri Nov 8 18:20:02 2019 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 8 Nov 2019 18:20:02 +0000 Subject: [kolla][ptg] Kolla Ussuri PVPTG (Partially Virtual PTG) In-Reply-To: References: Message-ID: Thank you to everyone who joined in the discussion over the last two days in the Kolla PTG. It is nice to put a face/voice to some IRC nicks :) We covered a lot of topics, and I think we made some good progress on many of them. At the end of today's session, we listed all of the potential work for the Ussuri cycle in an Etherpad [1] and voted for which items we think should be prioritised in the Ussuri cycle. I would like to invite anyone in the Kolla community who has not yet voted to do so, even if you did not attend the PTG sessions. Input from the community is very valuable, and will help to guide our efforts as maintainers. The voting process is described at the top of the pad. For anyone who would like to see the notes from the discussions, they are at [2]. Please get in touch via IRC if you have any questions. [1] https://etherpad.openstack.org/p/kolla-ussuri-priorities [2] https://etherpad.openstack.org/p/kolla-ussuri-ptg Cheers, Mark On Mon, 4 Nov 2019 at 14:15, Mark Goddard wrote: > > On Wed, 30 Oct 2019 at 17:26, Radosław Piliszek > wrote: > > > > Hello Everyone, > > > > As you may already know, Kolla core team is mostly not present on summit in Shanghai. > > Instead we are organizing a PTG next week, 7-8th Nov (Thu-Fri), in Białystok, Poland. > > Please let me know this week if you are interested in coming in person. > > > > We invite operators, contributors and contributors-to-be to join us for the virtual PTG online. > > The time schedule will be advertised later. > > After polling participants, we have agreed to meet at 1400 - 1800 UTC > on Thursday and Friday this week. Since not all participants can make > the first hour, we will adjust the schedule accordingly. > > Marcin will follow with connection details for the Zoom video conference. > > Please continue to update the etherpad with potential topics for > discussion. I will propose a rough agenda over the next few days. > > Mark > > > > > Please fill yourself in on the whiteboard [1]. > > New ideas are welcome. > > > > [1] https://etherpad.openstack.org/p/kolla-ussuri-ptg > > > > Kind regards, > > Radek aka yoctozepto > > From zigo at debian.org Fri Nov 8 21:17:57 2019 From: zigo at debian.org (Thomas Goirand) Date: Fri, 8 Nov 2019 22:17:57 +0100 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: On 11/7/19 11:56 PM, Sean McGinnis wrote: > My non-TC take on this... > >   >> Python 3.8 is available in Ubuntu Bionic now and while I understand it's too late to enable voting py38 unit tests for ussuri, I'd like to at least enable non-voting py38 unit tests. This email is seeking approval and direction from the TC to move forward with enabling non-voting py38 tests. > > I think it would be great to start testing 3.8 so there are no surprises once we need to officially move there. But I would actually not want to see that run on every since patch in every single repo. > >> For some further background: The next release of Ubuntu, Focal (20.04) LTS, is scheduled to release in April 2020. Python 3.8 will be the default in the Focal release, so I'm hopeful that non-voting unit tests will help close some of the gap. >   >> I have a review here for the zuul project template enablement for ussuri: >> https://review.opendev.org/#/c/693401 > > I do not think it should be added to the ussuri jobs template. > > I think it would be more useful as its own job for now that can be added to a select few repos as a full tempest run so a smaller number of test runs can cover a broader cross-section of projects. > > Otherwise as maybe a periodic job for now so it doesn't add to the run time and noise on every patch being submitted. > > Any idea so far from manual py38 testing if there are breaking changes that are going to impact us? >   >> Also should this be updated considering py38 would be non-voting? >> https://governance.openstack.org/tc/reference/runtimes/ussuri.html[https://governance.openstack.org/tc/reference/runtimes/ussuri.html] > > I not think it would be appropriate to list 3.8 under the Ussuri runtimes. That should only list the officially targeted runtimes for the release. Sean and everyone else, Pardon me, but I have to rant here... :) Please try see things from a downstream consumer point of view. This isn't the Python 2.7 era anymore, where we had a stable python for like forever. OpenStack needs to move quicker to newer Python 3 versions, especially considering that Python 2.7 isn't an option for anyone anymore. While your proposal (ie: less jobs on Python 3.8) looks like a nice first approach, it is my strong believe that the project should quickly move to voting and full Python 3.8 testing, and preferably, have it in order, with functional testing, for the whole project, by the time Ussuri is released. I know what's going to happen: I'll tell about a bug in Python 3.8 on IRC, and someone will reply to me "hey, that's not supported by OpenStack before the V release, please go away", even though as downstream distribution package maintainer, we don't have the power to decide what version of Python our distribution runs on (ie: both Debian Sid, Ubuntu and Fedora are quickly moving targets). There's absolutely no excuse for the OpenStack project to be dragging its feet, apart maybe the fact that it may not be easy to setup the infra to run tests on Py3.8 just yet. It isn't a normal situation that downstream distributions get the shit (pardon my french) and get to be the ones fixing issues and proposing patches (Corey, you've done awesome job on this...), yet it's been the case for nearly every single Python 3 releases. I very much would appreciate this situation to be fixed, and the project moving faster. Cheers, Thomas Goirand (zigo) From skaplons at redhat.com Sat Nov 9 02:16:07 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Sat, 9 Nov 2019 10:16:07 +0800 Subject: [infra] Etherpad problem Message-ID: <20191109021607.ffm5lkwkfohl22fq@skaplons-mac> Hi, Just at the end of the ptg sessions, neutron etherpad was got broken somehow. Now when I try to open [1] I see only something like: An error occurred The error was reported with the following id: 'igzOahZ6ruH0eSUAWKaj' Please press and hold Ctrl and press F5 to reload this page, if the problem persists please send this error message to your webmaster: 'ErrorId: igzOahZ6ruH0eSUAWKaj URL: https://etherpad.openstack.org/p/Shanghai-Neutron-Planning UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:70.0) Gecko/20100101 Firefox/70.0 TypeError: r.dropdowns is undefined in https://etherpad.openstack.org/javascripts/lib/ep_etherpad-lite/static/js/pad.js?callback=require.define at line 18' We can open one of the previous versions which is available at [2] but I don't know how we can fix original etherpad or restore version from [2] to be original etherpad and make it working again. Can someone from infra team check that for us maybe? Thx in advance for any help. [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning/timeslider#16117 -- Slawek Kaplonski Senior software engineer Red Hat From haleyb.dev at gmail.com Sat Nov 9 16:31:36 2019 From: haleyb.dev at gmail.com (Brian Haley) Date: Sun, 10 Nov 2019 00:31:36 +0800 Subject: [infra] Etherpad problem In-Reply-To: <20191109021607.ffm5lkwkfohl22fq@skaplons-mac> References: <20191109021607.ffm5lkwkfohl22fq@skaplons-mac> Message-ID: <6c3b8ca3-d2d4-7eb4-d615-5396eaf1cdef@gmail.com> On 11/9/19 10:16 AM, Slawek Kaplonski wrote: > Hi, > > Just at the end of the ptg sessions, neutron etherpad was got broken somehow. > Now when I try to open [1] I see only something like: > > An error occurred > The error was reported with the following id: 'igzOahZ6ruH0eSUAWKaj' > > Please press and hold Ctrl and press F5 to reload this page, if the problem > persists please send this error message to your webmaster: > 'ErrorId: igzOahZ6ruH0eSUAWKaj > URL: https://etherpad.openstack.org/p/Shanghai-Neutron-Planning > UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:70.0) Gecko/20100101 > Firefox/70.0 > TypeError: r.dropdowns is undefined in > https://etherpad.openstack.org/javascripts/lib/ep_etherpad-lite/static/js/pad.js?callback=require.define > at line 18' > > > We can open one of the previous versions which is available at [2] but I don't > know how we can fix original etherpad or restore version from [2] to be original > etherpad and make it working again. > Can someone from infra team check that for us maybe? > Thx in advance for any help. > > [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning > [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning/timeslider#16117 Hi Slawek, When I just went to check this etherpad now, noticed I had a tab open that was in "Force reconnect" state. I made a copy of that, just might be a little out of date on the last items. The formatting is also a little odd, but at least it's better than nothing if we can't get the original back. https://etherpad.openstack.org/p/neutron-ptg-temp -Brian From fungi at yuggoth.org Sat Nov 9 17:09:44 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sat, 9 Nov 2019 17:09:44 +0000 Subject: [infra] Etherpad problem In-Reply-To: <6c3b8ca3-d2d4-7eb4-d615-5396eaf1cdef@gmail.com> References: <20191109021607.ffm5lkwkfohl22fq@skaplons-mac> <6c3b8ca3-d2d4-7eb4-d615-5396eaf1cdef@gmail.com> Message-ID: <20191109170943.w4l2g5qd34g5m2tw@yuggoth.org> On 2019-11-10 00:31:36 +0800 (+0800), Brian Haley wrote: [...] > When I just went to check this etherpad now, noticed I had a tab > open that was in "Force reconnect" state. I made a copy of that, > just might be a little out of date on the last items. The > formatting is also a little odd, but at least it's better than > nothing if we can't get the original back. [...] This happens from time to time. We can get a dump of the previous revision from the API if needed and paste that into a new pad, but yeah formatting and author colors will be lost in the process. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From balazs.gibizer at est.tech Sun Nov 10 08:09:18 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 08:09:18 +0000 Subject: [nova][ptg] Resource provider delete at service delete Message-ID: <1573373353.31166.0@est.tech> ML thread: http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007135.html Agreements in the room: * Check ongoing migration and reject the delete if migration with this compute having the source node exists. Let operator confirm the migrations * Cascade delete providers and allocations in placement. * in case of evacuated instances this is the right thing to do * in any other dangling allocation case nova has the final thrut so nova has the authority to delete them. * Document possible ways to reconcile Placement with Nova using heal_allocations and eventually the audit command once it's merged. Cheers, gibi From balazs.gibizer at est.tech Sun Nov 10 08:10:58 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 08:10:58 +0000 Subject: [nova][ptg] Flavor explosion Message-ID: <1573373452.31166.1@est.tech> original spec: https://review.opendev.org/#/c/663563 with -2s The first round of discussion was resulted in no agreement. Then on Friday we revisited the issue based on mdbooth's proposal about composability. Agreement in the room: * Do not try to change the model of the flavor in nova code and in the db. * Define a "ComposableFlavorBit" (bikeshed on the name please) REST API entity that can hold any kind of flavor bits (extra specs, normal flavor fields), propose some format in the spec for it. This entity can only be created by the admin by default * Extend the server create REST API to allow the end user to specify what "ComposableFlavorBit"s she wants to add to the "base" flavor she used in the create request. * The nova api then merges the "ComposableFalvorBit"s with the base flavor and embed the resulted flavor object into the instance. * Do a similar thing for resize TODOs: * brinzhang (with possible help from yawang) to re-write the spec Cheers, gibi From sean.mcginnis at gmx.com Sun Nov 10 10:44:24 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Sun, 10 Nov 2019 04:44:24 -0600 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: <20191110104424.GA6424@sm-workstation> > > Sean and everyone else, > > Pardon me, but I have to rant here... :) > Please try see things from a downstream consumer point of view. > > This isn't the Python 2.7 era anymore, where we had a stable python for > like forever. OpenStack needs to move quicker to newer Python 3 > versions, especially considering that Python 2.7 isn't an option for > anyone anymore. While your proposal (ie: less jobs on Python 3.8) looks > like a nice first approach, it is my strong believe that the project > should quickly move to voting and full Python 3.8 testing, and > preferably, have it in order, with functional testing, for the whole > project, by the time Ussuri is released. > We've had this debate many times now. Nothing has changed the fact that we cannot make something an official runtime until there is an official distro out there with that as the runtime. There is not for Ussuri. > I know what's going to happen: I'll tell about a bug in Python 3.8 on > IRC, and someone will reply to me "hey, that's not supported by > OpenStack before the V release, please go away", even though as > downstream distribution package maintainer, we don't have the power to > decide what version of Python our distribution runs on (ie: both Debian > Sid, Ubuntu and Fedora are quickly moving targets). > I very highly doubt that, and very much disagree that someone will say to go away. From what I have seen, the majority of the community is very responsive to issues raised about future version problems. Fixing and working with py3.8 is not what is being discussed here. Only whether those jobs to validate py3.8 should run on every patch or not. > There's absolutely no excuse for the OpenStack project to be dragging > its feet, apart maybe the fact that it may not be easy to setup the > infra to run tests on Py3.8 just yet. It isn't a normal situation that > downstream distributions get the shit (pardon my french) and get to be > the ones fixing issues and proposing patches (Corey, you've done awesome > job on this...), yet it's been the case for nearly every single Python 3 > releases. I very much would appreciate this situation to be fixed, and > the project moving faster. > > Cheers, > > Thomas Goirand (zigo) > From zhangbailin at inspur.com Sun Nov 10 12:51:07 2019 From: zhangbailin at inspur.com (=?gb2312?B?QnJpbiBaaGFuZyjVxbDZwdYp?=) Date: Sun, 10 Nov 2019 12:51:07 +0000 Subject: [nova][ptg] Flavor explosion Message-ID: Gibi: The summary on the ML write in the nova ptg etherpad line 245 cannot open, raised a 404 error. Etherpad: https://etherpad.openstack.org/p/nova-shanghai-ptg Thanks. Brin Zhang items: [lists.openstack.org代发][nova][ptg] Flavor explosion original spec: https://review.opendev.org/#/c/663563 with -2s The first round of discussion was resulted in no agreement. Then on Friday we revisited the issue based on mdbooth's proposal about composability. Agreement in the room: * Do not try to change the model of the flavor in nova code and in the db. * Define a "ComposableFlavorBit" (bikeshed on the name please) REST API entity that can hold any kind of flavor bits (extra specs, normal flavor fields), propose some format in the spec for it. This entity can only be created by the admin by default * Extend the server create REST API to allow the end user to specify what "ComposableFlavorBit"s she wants to add to the "base" flavor she used in the create request. * The nova api then merges the "ComposableFalvorBit"s with the base flavor and embed the resulted flavor object into the instance. * Do a similar thing for resize TODOs: * brinzhang (with possible help from yawang) to re-write the spec Cheers, gibi From fungi at yuggoth.org Sun Nov 10 15:41:39 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sun, 10 Nov 2019 15:41:39 +0000 Subject: [nova][ptg] Flavor explosion In-Reply-To: References: Message-ID: <20191110154139.sdom27sbhncpp6lm@yuggoth.org> On 2019-11-10 12:51:07 +0000 (+0000), Brin Zhang(张百林) wrote: > The summary on the ML write in the nova ptg etherpad line 245 > cannot open, raised a 404 error. > Etherpad: https://etherpad.openstack.org/p/nova-shanghai-ptg [...] It links to http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010642.html which opens just fine for me. Maybe someone has corrected it? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From balazs.gibizer at est.tech Sun Nov 10 16:05:01 2019 From: balazs.gibizer at est.tech (=?utf-8?B?QmFsw6F6cyBHaWJpemVy?=) Date: Sun, 10 Nov 2019 16:05:01 +0000 Subject: [nova][ptg] Flavor explosion In-Reply-To: <20191110154139.sdom27sbhncpp6lm@yuggoth.org> References: <20191110154139.sdom27sbhncpp6lm@yuggoth.org> Message-ID: <1573401895.31166.2@est.tech> On Sun, Nov 10, 2019 at 15:41, Jeremy Stanley wrote: > On 2019-11-10 12:51:07 +0000 (+0000), Brin Zhang(张百林) wrote: >> The summary on the ML write in the nova ptg etherpad line 245 >> cannot open, raised a 404 error. >> Etherpad: https://etherpad.openstack.org/p/nova-shanghai-ptg > [...] > > It links to > http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010642.html > which opens just fine for me. Maybe someone has corrected it? The link was wrong at L269 ( it ended with htm instead of html). It is fixed now. Cheers, gibi > -- > Jeremy Stanley From zhangbailin at inspur.com Sun Nov 10 16:09:06 2019 From: zhangbailin at inspur.com (=?gb2312?B?QnJpbiBaaGFuZyjVxbDZwdYp?=) Date: Sun, 10 Nov 2019 16:09:06 +0000 Subject: [nova][ptg] Flavor explosion Message-ID: Hi all, Based on the discussion on the Train PTG, and reference to the records on the etherpad and ML, I was updated that SPEC, and I think there are some details need to be discussed, and I have listed some details, if there are any other things that I have not considered, or if some place that I thoughtless, please post a discussion. List some details as follows, and you can review that spec in https://review.opendev.org/#/c/663563. Listed details: - Don't change the model of the flavor in nova code and in the db. - No change for operators who choose not to request the flavor extra specs group. - Requested more than one flavor extra specs groups, if there are different values for the same spec will be raised a 409. - Flavor in request body of server create that has the same spec in the request ``flavor_extra_specs_group``, it will be raised a 409. - When resize an instance, you need to compare the ``flavor_extra_specs_group`` with the spec request spec, otherwise raise a 400. ---------------------------------------------------------------------------------------- Items: [lists.openstack.org代发][nova][ptg] Flavor explosion original spec: https://review.opendev.org/#/c/663563 with -2s The first round of discussion was resulted in no agreement. Then on Friday we revisited the issue based on mdbooth's proposal about composability. Agreement in the room: * Do not try to change the model of the flavor in nova code and in the db. * Define a "ComposableFlavorBit" (bikeshed on the name please) REST API entity that can hold any kind of flavor bits (extra specs, normal flavor fields), propose some format in the spec for it. This entity can only be created by the admin by default * Extend the server create REST API to allow the end user to specify what "ComposableFlavorBit"s she wants to add to the "base" flavor she used in the create request. * The nova api then merges the "ComposableFalvorBit"s with the base flavor and embed the resulted flavor object into the instance. * Do a similar thing for resize TODOs: * brinzhang (with possible help from yawang) to re-write the spec Cheers, gibi From balazs.gibizer at est.tech Sun Nov 10 16:15:15 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 16:15:15 +0000 Subject: [nova][ptg] Expose auto converge and post copy Message-ID: <1573402509.31166.3@est.tech> spec: https://review.opendev.org/#/c/687199 There was multiple discussion during the PTG around this. I think at the end mdbooth and yawang found a possible solution that they liked and sounded OK to me too. Unfortunately the etherpad was not updated and my memory is bad enough that I cannot gather what was the exact proposal. yawang: could you please post a short summary here and / or simply update the spec with the discussed solution? Cheers, gibi From balazs.gibizer at est.tech Sun Nov 10 16:17:52 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 16:17:52 +0000 Subject: [nova][ptg] Other small discussions Message-ID: <1573402667.31166.4@est.tech> image precache support * aarents (OVH?) interested in the feauter * TODO: aarents to talk to dansmith about future improvements like rate limiting nova implications of a starlingx bug: https://bugs.launchpad.net/starlingx/+bug/1850834 * TODO: yawang to file a nova bug if reproducible * TODO: yawang to propose a patch in nova based on the starlingx fix https://gist.github.com/VictorRodriguez/e137a8cd87cf821f8076e9acc02ce195 vm scoped sriov numa affinity * spec: https://review.opendev.org/#/c/683174/4 * there was not enough knowledge in the room to really discuss * TODO: gibi to read and comment the spec midcylce * stephenfin will propose a midcycle disussion on the ML From balazs.gibizer at est.tech Sun Nov 10 16:23:03 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 16:23:03 +0000 Subject: [nova][ironic][ptg] reconfigure the baremetal node through the resize API Message-ID: <1573402969.31166.5@est.tech> ironic spec https://review.opendev.org/#/c/672252/ TODOs: * alex_xu (?) to create a nova spec to list possible alternatives about which nova REST API could be used to trigger reconfigure (resize, reboot, new api) * A potential way forward would be Ironic supporting resize (confirm/revert) as well with some magical autodetection of an Ironic instance *or* some magical flavor-based solution Cheers, gibi From balazs.gibizer at est.tech Sun Nov 10 16:29:53 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 16:29:53 +0000 Subject: [nova][ptg] drop the shadow table concept Message-ID: <1573403387.31166.6@est.tech> CERN reported two issues with archive_deleted_rows CLI: * When one record gets inserted into the shadow_instance_extra but didn't get deleted from instance_extra (I know this is in a single transaction but sometimes it happens), needs manual cleanup on the database * Also there could be two cells running this command at the same time fighting for the API db lock, TODOs: * tssurya to report bugs / improvements on archive_deleted_rows CLI based on CERN's experience with long table locking * mnaser to report a wishlist bug / specless bp about one step db purge CLI which would skip the shadow tables Cheers, gibi From balazs.gibizer at est.tech Sun Nov 10 16:32:35 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 16:32:35 +0000 Subject: [nova][ptg] Ussuri cycle themes Message-ID: <1573403550.31166.7@est.tech> Ussuri themes proposal: * Policy work Note for the operators that this work only provides value after the whole work is ready and it is pretty possible that this will not finish in the current cycle. * Unified limits Discuss on the ML how can we help oslo to progress with the oslo-limits work. If we can help with that then, make unified limits as a cycle Goal. * Cyborg - Nova integration (bauzas) I'm OK with making Cyborg a priority for this cycle provided we don't hold any attempt to start thinking on fixing PCI tracking. We definitely needs to discuss this further with the whole team. Cheers, gibi From balazs.gibizer at est.tech Sun Nov 10 16:35:40 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 16:35:40 +0000 Subject: [nova][neutron][ptg] Nova - Neutron cross project dicussions Message-ID: <1573403737.31166.8@est.tech> https://etherpad.openstack.org/p/ptg-ussuri-xproj-nova-neutron Nova-Neutron live-migration - ACTIVE/INACTIVE port bindings issue * nova needs vif plug event for inactive bindings * live migration with ml2-ovs only works by luck * agentless drivers does not support this https://bugs.launchpad.net/neutron/+bug/1834045 https://bugs.launchpad.net/neutron/+bug/1840147 Agreements: * document the nova-neutron live migration workflow to create a common base for discussion. Add ijw and sean-k-mooney to the review SR-IOV live migration utilizing kernel NET_FAILOVER feature * aim to support live migration with SRIOV nic without traffic interrupt by bonding a virtio interface to the SRIOV nic in the guest kernel with NET_FAILOVER. * https://www.kernel.org/doc/html/latest/networking/net_failover.html * There was multiple solution proposlas but no agreements about which one to pursue. * TODO: adrianc to propose a nova spec based on the discussion Correlation of Bandwidth RP with PCI RP when PCI will be tracked in Placement * Early heads up to Neutron team that Nova things about modelling PCI in Placement * Current best thinking on Nova side is to correlate BW RP with PCI RP in placement with a placement aggregate. * Nova will do the correlation * Neutron needs to prepare for possible BW RP generation conflict * Agreement: Neutron is OK with this plan. rubasov can help with the neutron impact when nova makes progress. From balazs.gibizer at est.tech Sun Nov 10 16:44:55 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 16:44:55 +0000 Subject: [nova][ironic][ptg] Resource tracker scaling issues Message-ID: <1573404293.31166.9@est.tech> * COMPUTE_RESOURCE_SEMAPHORE blocks instance creation on all nodes (on the same host) while the _update_available_resource runs on all nodes. On 3500 baremetal nodes _update_available_resource takes 1.5 hour. * Do we still need _update_available_resource periodic task to run for ironic nodes? * Reduce the scope of the COMPUTE_RESOURCE_SEMAPHORE lock * https://review.opendev.org/#/c/682242/ * https://review.opendev.org/#/c/677790/ * changing a locking scheme is frightening => we need more testing Agreement: * Do a tempest test with a lot of fake ironic node records to have a way to test if changing the locking scheme breaks anything * Log a bug and propose a patch for having a per-node lock instead of the same object for all the ResourceTrackers * See also whether concurrency helps * Propose a spec if you really want to pursue the idea of being somehow inconsistent with data by not having a lock Cheers, gibi From mriedemos at gmail.com Sun Nov 10 20:41:11 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Sun, 10 Nov 2019 14:41:11 -0600 Subject: [nova][ptg] drop the shadow table concept In-Reply-To: <1573403387.31166.6@est.tech> References: <1573403387.31166.6@est.tech> Message-ID: On 11/10/2019 10:29 AM, Balázs Gibizer wrote: > * Also there could be two cells running this command at the same time > fighting for the API db lock, In Train the --all-cells option was added to the CLI so that should resolve this issue. I think Mel said she backported those changes internally so I'm not sure how hard it would be for those to go back to Stein or Rocky or whatever release CERN is using now. -- Thanks, Matt From mriedemos at gmail.com Sun Nov 10 21:04:56 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Sun, 10 Nov 2019 15:04:56 -0600 Subject: [nova][ptg] drop the shadow table concept In-Reply-To: <1573403387.31166.6@est.tech> References: <1573403387.31166.6@est.tech> Message-ID: <30ebbfc9-bfcc-2fc7-ce39-a1996266ec0b@gmail.com> On 11/10/2019 10:29 AM, Balázs Gibizer wrote: > * When one record gets inserted into the shadow_instance_extra but > didn't get deleted from instance_extra (I know this is in a single > transaction but sometimes it happens), needs manual cleanup on the > database Is this potentially caused by the issue attempting to be fixed here? https://review.opendev.org/#/c/412771/ -- Thanks, Matt From mriedemos at gmail.com Sun Nov 10 21:07:51 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Sun, 10 Nov 2019 15:07:51 -0600 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: <1573404293.31166.9@est.tech> References: <1573404293.31166.9@est.tech> Message-ID: <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> On 11/10/2019 10:44 AM, Balázs Gibizer wrote: > On 3500 baremetal nodes _update_available_resource takes 1.5 hour. Why have a single nova-compute service manage this many nodes? Or even 1000? Why not try to partition things a bit more reasonably like a normal cell where you might have ~200 nodes per compute service host (I think CERN keeps their cells to around 200 physical compute hosts for scaling)? That way you can also leverage the compute service hashring / failover feature for HA? I realize the locking stuff is not great, but at what point is it unreasonable to expect a single compute service to manage that many nodes/instances? -- Thanks, Matt From eandersson at blizzard.com Sun Nov 10 23:06:16 2019 From: eandersson at blizzard.com (Erik Olof Gunnar Andersson) Date: Sun, 10 Nov 2019 23:06:16 +0000 Subject: [Senlin] Splitting senlin-engine into three services In-Reply-To: References: Message-ID: We are looking at merging this in a few days if there is no additional feedback. Also, in case someone has experience with adding new services to a project. Is there a general way of requesting rpms etc to be updated for the next release to support the new services? Especially for the workflows that are handled outside of the official openstack repos. ________________________________ From: Erik Olof Gunnar Andersson Sent: Monday, November 4, 2019 5:11 PM To: openstack-discuss at lists.openstack.org Cc: Duc Truong Subject: [Senlin] Splitting senlin-engine into three services We are looking into splitting the senlin-engine into three components (senlin-conductor, senlin-engine and senlin-health-manager) and wanted to get some feedback. The main goal here is to make the components more resilient and to reduce the number of threads per worker. Each one of the components already had it's own thread pool and in theory each worker could end up with thousands of thread. In the current version (Train) the engine process hosts these services. https://github.com/openstack/senlin/blob/stable/train/senlin/engine/dispatcher.py#L31 https://github.com/openstack/senlin/blob/stable/train/senlin/engine/health_manager.py#L865 https://github.com/openstack/senlin/blob/stable/train/senlin/engine/service.py#L79 In my patch we move two our of these out of the engine and into it's own service namespace. Split engine service into three services https://review.opendev.org/#/c/688784/ Please feel free to comment on the patch set, or let reply to this email with general feedback or concerns. Best Regards, Erik Olof Gunnar Andersson -------------- next part -------------- An HTML attachment was scrubbed... URL: From wang.ya at 99cloud.net Mon Nov 11 02:42:35 2019 From: wang.ya at 99cloud.net (wang.ya) Date: Mon, 11 Nov 2019 10:42:35 +0800 Subject: [nova][ptg] Expose auto converge and post copy In-Reply-To: <1573402509.31166.3@est.tech> References: <1573402509.31166.3@est.tech> Message-ID: <5717FBD3-8373-4EAA-9EF4-5F3EFDE7B53F@99cloud.net> Hi: * The original method(expose the auto converge/post copy in image properties/flavor extra specs) exposed features of hypervisor layer directly, and it affects scheduling. Therefore, it's not appropriate. * The new method is add a new parameter: "no-performance-impact". If the parameter set during live migrate, the libvirt driver will disable the auto converge and post copy functions. User can tag their instances as "no performance impact" in instance metadata or somewhere else, operator can check the tag to decide whether add the parameter before live migrate. I will write a new spec to describe these in detail :) Best Regards On 2019/11/11, 12:15 AM, "Balázs Gibizer" wrote: spec: https://review.opendev.org/#/c/687199 There was multiple discussion during the PTG around this. I think at the end mdbooth and yawang found a possible solution that they liked and sounded OK to me too. Unfortunately the etherpad was not updated and my memory is bad enough that I cannot gather what was the exact proposal. yawang: could you please post a short summary here and / or simply update the spec with the discussed solution? Cheers, gibi From bxzhu_5355 at 163.com Mon Nov 11 03:33:54 2019 From: bxzhu_5355 at 163.com (Boxiang Zhu) Date: Mon, 11 Nov 2019 11:33:54 +0800 (GMT+08:00) Subject: [manila][kolla-ansible] Fail to use manila with cephfs NFS share backend Message-ID: <7f1c7f52.45d8.16e58866b9d.Coremail.bxzhu_5355@163.com> Hi everyone, I have deployed a OpenStack cluster with AIO mode by kolla-ansible. In my globals.yml, something is enabled as followed: ...... enable_ceph: "yes" enable_ceph_mds: "yes" enable_ceph_nfs: "yes" enable_manila: "yes" enable_manila_backend_cephfs_nfs: "yes" ....... And in my manila.conf, some configs are as followed: [DEFAULT] enabled_share_protocols = NFS,CIFS ...... [cephfsnfs1] driver_handles_share_servers = False share_backend_name = CEPHFSNFS1 share_driver = manila.share.drivers.cephfs.driver.CephFSDriver cephfs_protocol_helper_type = NFS cephfs_conf_path = /etc/ceph/ceph.conf cephfs_auth_id = manila cephfs_cluster_name = ceph cephfs_enable_snapshots = False cephfs_ganesha_server_is_remote = False cephfs_ganesha_server_ip = 172.16.60.84 ...... I use CLI to create the nfs, some commands as followed: -> manila type-create cephfsnfstype false -> manila type-key cephfsnfstype set vendor_name=Ceph storage_protocol=NFS -> manila create --share-type cephfsnfstype --name cephnfsshare1 nfs 1 -> manila share-export-location-list cephnfsshare1 +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ | ID | Path | Preferred | +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ | b101bf59-4cd7-4a09-a12e-b6dd48a5bb18 | 172.16.60.84:/volumes/_nogroup/93b1e23d-0166-41a4-a12a-51bf4c3654a5 | False | +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ -> manila access-allow cephnfsshare1 ip 172.16.60.119 But I have got some error messages from /var/lib/docker/volumes/kolla_logs/_data/manila/manila-share.log 2019-11-11 10:11:12.035 26 ERROR manila.share.drivers.ganesha.manager [req-673584f9-eaed-4464-819e-ace63dfa0f49 6f4bd93b18c546c880d5d4e526a0243e 70dba7786a8b4326a35e03d0ad8707f2 - - -] Error while executing management command on Ganesha node : dbus call exportmgr.AddExport.: ProcessExecutionError: Unexpected error while running command. 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server [req-673584f9-eaed-4464-819e-ace63dfa0f49 6f4bd93b18c546c880d5d4e526a0243e 70dba7786a8b4326a35e03d0ad8707f2 - - -] Exception during message handling: GaneshaCommandFailure: Ganesha management command failed. Command: sudo manila-rootwrap /etc/manila/rootwrap.conf dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/etc/ganesha/export.d/share-93b1e23d-0166-41a4-a12a-51bf4c3654a5.conf string:EXPORT(Export_Id=105) Exit code: 1 Stdout: u'' Stderr: u'Error org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was not provided by any .service files\n' 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/manager.py", line 187, in wrapped 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return f(self, *args, **kwargs) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/utils.py", line 568, in wrapper 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return func(self, *args, **kwargs) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/manager.py", line 3554, in update_access 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", line 283, in update_access_rules 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", line 322, in _update_access_rules 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", line 390, in _update_rules_through_share_driver 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/cephfs/driver.py", line 289, in update_access 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/ganesha/__init__.py", line 308, in update_access 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server self.ganesha.add_export(share['name'], confdict) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/ganesha/manager.py", line 491, in add_export 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server cmd=e.cmd) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server GaneshaCommandFailure: Ganesha management command failed. 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Command: sudo manila-rootwrap /etc/manila/rootwrap.conf dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/etc/ganesha/export.d/share-93b1e23d-0166-41a4-a12a-51bf4c3654a5.conf string:EXPORT(Export_Id=105) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Exit code: 1 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Stdout: u'' 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Stderr: u'Error org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was not provided by any .service files\n' 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server I found some possible solutions[0][1][2][3] to try to fix this, but failed. Any suggestions? [0] https://github.com/nfs-ganesha/nfs-ganesha/issues/483 [1] https://github.com/nfs-ganesha/nfs-ganesha/issues/219 [2] https://github.com/gluster/storhaug/issues/14 [3] https://sourceforge.net/p/nfs-ganesha/mailman/message/32227132/ Thanks, Boxiang -------------- next part -------------- An HTML attachment was scrubbed... URL: From wang.ya at 99cloud.net Mon Nov 11 07:45:12 2019 From: wang.ya at 99cloud.net (wang.ya) Date: Mon, 11 Nov 2019 15:45:12 +0800 Subject: [nova][ptg] Expose auto converge and post copy In-Reply-To: <5717FBD3-8373-4EAA-9EF4-5F3EFDE7B53F@99cloud.net> References: <1573402509.31166.3@est.tech> <5717FBD3-8373-4EAA-9EF4-5F3EFDE7B53F@99cloud.net> Message-ID: <75036C56-67F9-4C14-9ECD-BFF1DEAD006B@99cloud.net> Hi: Here is the spec [1]_ Because the exist spec [2]_ has gap with the agreement, so I rewrote a new spec. .. [1]: https://review.opendev.org/#/c/693655/ .. [2]: https://review.opendev.org/#/c/687199/ Best Regards On 2019/11/11, 10:43 AM, "wang.ya" wrote: Hi: * The original method(expose the auto converge/post copy in image properties/flavor extra specs) exposed features of hypervisor layer directly, and it affects scheduling. Therefore, it's not appropriate. * The new method is add a new parameter: "no-performance-impact". If the parameter set during live migrate, the libvirt driver will disable the auto converge and post copy functions. User can tag their instances as "no performance impact" in instance metadata or somewhere else, operator can check the tag to decide whether add the parameter before live migrate. I will write a new spec to describe these in detail :) Best Regards On 2019/11/11, 12:15 AM, "Balázs Gibizer" wrote: spec: https://review.opendev.org/#/c/687199 There was multiple discussion during the PTG around this. I think at the end mdbooth and yawang found a possible solution that they liked and sounded OK to me too. Unfortunately the etherpad was not updated and my memory is bad enough that I cannot gather what was the exact proposal. yawang: could you please post a short summary here and / or simply update the spec with the discussed solution? Cheers, gibi From radoslaw.piliszek at gmail.com Mon Nov 11 08:43:41 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 11 Nov 2019 09:43:41 +0100 Subject: [manila][kolla-ansible] Fail to use manila with cephfs NFS share backend In-Reply-To: <7f1c7f52.45d8.16e58866b9d.Coremail.bxzhu_5355@163.com> References: <7f1c7f52.45d8.16e58866b9d.Coremail.bxzhu_5355@163.com> Message-ID: Is this Stein on CentOS? Looks like a bug to me. Please report to: https://bugs.launchpad.net/kolla-ansible with all the details we ask for. Thanks. -yoctozepto pon., 11 lis 2019 o 04:45 Boxiang Zhu napisał(a): > > Hi everyone, > > I have deployed a OpenStack cluster with AIO mode by kolla-ansible. > In my globals.yml, something is enabled as followed: > ...... > enable_ceph: "yes" > enable_ceph_mds: "yes" > enable_ceph_nfs: "yes" > enable_manila: "yes" > enable_manila_backend_cephfs_nfs: "yes" > ....... > And in my manila.conf, some configs are as followed: > [DEFAULT] > enabled_share_protocols = NFS,CIFS > ...... > [cephfsnfs1] > driver_handles_share_servers = False > share_backend_name = CEPHFSNFS1 > share_driver = manila.share.drivers.cephfs.driver.CephFSDriver > cephfs_protocol_helper_type = NFS > cephfs_conf_path = /etc/ceph/ceph.conf > cephfs_auth_id = manila > cephfs_cluster_name = ceph > cephfs_enable_snapshots = False > cephfs_ganesha_server_is_remote = False > cephfs_ganesha_server_ip = 172.16.60.84 > ...... > I use CLI to create the nfs, some commands as followed: > -> manila type-create cephfsnfstype false > -> manila type-key cephfsnfstype set vendor_name=Ceph > storage_protocol=NFS > -> manila create --share-type cephfsnfstype --name cephnfsshare1 nfs 1 > -> manila share-export-location-list cephnfsshare1 > > +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ > | ID > | Path > | Preferred | > > +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ > | b101bf59-4cd7-4a09-a12e-b6dd48a5bb18 | 172.16.60.84:/volumes/_nogroup/93b1e23d-0166-41a4-a12a-51bf4c3654a5 > | False | > > +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ > -> manila access-allow cephnfsshare1 ip 172.16.60.119 > But I have got some error messages > from /var/lib/docker/volumes/kolla_logs/_data/manila/manila-share.log > 2019-11-11 10:11:12.035 26 ERROR manila.share.drivers.ganesha.manager > [req-673584f9-eaed-4464-819e-ace63dfa0f49 6f4bd93b18c546c880d5d4e526a0243e > 70dba7786a8b4326a35e03d0ad8707f2 - - -] Error while executing management > command on Ganesha node : dbus call exportmgr.AddExport.: > ProcessExecutionError: Unexpected error while running command. > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > [req-673584f9-eaed-4464-819e-ace63dfa0f49 6f4bd93b18c546c880d5d4e526a0243e > 70dba7786a8b4326a35e03d0ad8707f2 - - -] Exception during message handling: > GaneshaCommandFailure: Ganesha management command failed. > Command: sudo manila-rootwrap /etc/manila/rootwrap.conf dbus-send > --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr > org.ganesha.nfsd.exportmgr.AddExport > string:/etc/ganesha/export.d/share-93b1e23d-0166-41a4-a12a-51bf4c3654a5.conf > string:EXPORT(Export_Id=105) > Exit code: 1 > Stdout: u'' > Stderr: u'Error org.freedesktop.DBus.Error.ServiceUnknown: The name > org.ganesha.nfsd was not provided by any .service files\n' > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Traceback (most > recent call last): > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", > line 165, in _process_incoming > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server res = > self.dispatcher.dispatch(message) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", > line 274, in dispatch > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return > self._do_dispatch(endpoint, method, ctxt, args) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", > line 194, in _do_dispatch > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server result = > func(ctxt, **new_args) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/manager.py", > line 187, in wrapped > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return > f(self, *args, **kwargs) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/utils.py", line > 568, in wrapper > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return > func(self, *args, **kwargs) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/manager.py", > line 3554, in update_access > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > share_server=share_server) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", > line 283, in update_access_rules > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > share_server=share_server) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", > line 322, in _update_access_rules > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > share_server) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", > line 390, in _update_rules_through_share_driver > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > share_server=share_server > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/cephfs/driver.py", > line 289, in update_access > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > share_server=share_server) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/ganesha/__init__.py", > line 308, in update_access > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > self.ganesha.add_export(share['name'], confdict) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/ganesha/manager.py", > line 491, in add_export > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server cmd=e.cmd) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > GaneshaCommandFailure: Ganesha management command failed. > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Command: sudo > manila-rootwrap /etc/manila/rootwrap.conf dbus-send --print-reply --system > --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr > org.ganesha.nfsd.exportmgr.AddExport > string:/etc/ganesha/export.d/share-93b1e23d-0166-41a4-a12a-51bf4c3654a5.conf > string:EXPORT(Export_Id=105) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Exit code: 1 > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Stdout: u'' > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Stderr: u'Error > org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was > not provided by any .service files\n' > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > > I found some possible solutions[0][1][2][3] to try to fix this, but > failed. > Any suggestions? > > [0] https://github.com/nfs-ganesha/nfs-ganesha/issues/483 > [1] https://github.com/nfs-ganesha/nfs-ganesha/issues/219 > [2] https://github.com/gluster/storhaug/issues/14 > [3] https://sourceforge.net/p/nfs-ganesha/mailman/message/32227132/ > > Thanks, > Boxiang > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bxzhu_5355 at 163.com Mon Nov 11 09:25:26 2019 From: bxzhu_5355 at 163.com (Boxiang Zhu) Date: Mon, 11 Nov 2019 17:25:26 +0800 (CST) Subject: =?UTF-8?Q?=E5=9B=9E=E5=A4=8D:Re:_[manila][kolla-ansible]_Fail_to_?= =?UTF-8?Q?use_manila_with_cephfs_NFS_share_backend?= In-Reply-To: References: <7f1c7f52.45d8.16e58866b9d.Coremail.bxzhu_5355@163.com> Message-ID: <64c11eb2.8a8d.16e59c8431f.Coremail.bxzhu_5355@163.com> hi yoctozepto, Here is the link https://bugs.launchpad.net/kolla-ansible/+bug/1852055 BTW, in fact, before I met the problem, I found anther issue from ceph-nfs.log. The ganesha can not connect to the /run/dbus/system_bus_socket, So I change the kolla-ansible/share/kolla-ansible/ansible/roles/ceph/tasks/start_nfss.yml, add `` - "/run/:/run/:shared" `` under volumes section. Thanks, Boxiang At 2019-11-11 16:43:41, "Radosław Piliszek" wrote: Is this Stein on CentOS? Looks like a bug to me. Please report to: https://bugs.launchpad.net/kolla-ansible with all the details we ask for. Thanks. -yoctozepto pon., 11 lis 2019 o 04:45 Boxiang Zhu napisał(a): Hi everyone, I have deployed a OpenStack cluster with AIO mode by kolla-ansible. In my globals.yml, something is enabled as followed: ...... enable_ceph: "yes" enable_ceph_mds: "yes" enable_ceph_nfs: "yes" enable_manila: "yes" enable_manila_backend_cephfs_nfs: "yes" ....... And in my manila.conf, some configs are as followed: [DEFAULT] enabled_share_protocols = NFS,CIFS ...... [cephfsnfs1] driver_handles_share_servers = False share_backend_name = CEPHFSNFS1 share_driver = manila.share.drivers.cephfs.driver.CephFSDriver cephfs_protocol_helper_type = NFS cephfs_conf_path = /etc/ceph/ceph.conf cephfs_auth_id = manila cephfs_cluster_name = ceph cephfs_enable_snapshots = False cephfs_ganesha_server_is_remote = False cephfs_ganesha_server_ip = 172.16.60.84 ...... I use CLI to create the nfs, some commands as followed: -> manila type-create cephfsnfstype false -> manila type-key cephfsnfstype set vendor_name=Ceph storage_protocol=NFS -> manila create --share-type cephfsnfstype --name cephnfsshare1 nfs 1 -> manila share-export-location-list cephnfsshare1 +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ | ID | Path | Preferred | +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ | b101bf59-4cd7-4a09-a12e-b6dd48a5bb18 | 172.16.60.84:/volumes/_nogroup/93b1e23d-0166-41a4-a12a-51bf4c3654a5 | False | +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ -> manila access-allow cephnfsshare1 ip 172.16.60.119 But I have got some error messages from /var/lib/docker/volumes/kolla_logs/_data/manila/manila-share.log 2019-11-11 10:11:12.035 26 ERROR manila.share.drivers.ganesha.manager [req-673584f9-eaed-4464-819e-ace63dfa0f49 6f4bd93b18c546c880d5d4e526a0243e 70dba7786a8b4326a35e03d0ad8707f2 - - -] Error while executing management command on Ganesha node : dbus call exportmgr.AddExport.: ProcessExecutionError: Unexpected error while running command. 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server [req-673584f9-eaed-4464-819e-ace63dfa0f49 6f4bd93b18c546c880d5d4e526a0243e 70dba7786a8b4326a35e03d0ad8707f2 - - -] Exception during message handling: GaneshaCommandFailure: Ganesha management command failed. Command: sudo manila-rootwrap /etc/manila/rootwrap.conf dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/etc/ganesha/export.d/share-93b1e23d-0166-41a4-a12a-51bf4c3654a5.conf string:EXPORT(Export_Id=105) Exit code: 1 Stdout: u'' Stderr: u'Error org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was not provided by any .service files\n' 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/manager.py", line 187, in wrapped 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return f(self, *args, **kwargs) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/utils.py", line 568, in wrapper 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return func(self, *args, **kwargs) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/manager.py", line 3554, in update_access 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", line 283, in update_access_rules 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", line 322, in _update_access_rules 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", line 390, in _update_rules_through_share_driver 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/cephfs/driver.py", line 289, in update_access 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/ganesha/__init__.py", line 308, in update_access 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server self.ganesha.add_export(share['name'], confdict) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/ganesha/manager.py", line 491, in add_export 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server cmd=e.cmd) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server GaneshaCommandFailure: Ganesha management command failed. 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Command: sudo manila-rootwrap /etc/manila/rootwrap.conf dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/etc/ganesha/export.d/share-93b1e23d-0166-41a4-a12a-51bf4c3654a5.conf string:EXPORT(Export_Id=105) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Exit code: 1 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Stdout: u'' 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Stderr: u'Error org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was not provided by any .service files\n' 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server I found some possible solutions[0][1][2][3] to try to fix this, but failed. Any suggestions? [0] https://github.com/nfs-ganesha/nfs-ganesha/issues/483 [1] https://github.com/nfs-ganesha/nfs-ganesha/issues/219 [2] https://github.com/gluster/storhaug/issues/14 [3] https://sourceforge.net/p/nfs-ganesha/mailman/message/32227132/ Thanks, Boxiang -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.sgaravatto at gmail.com Mon Nov 11 09:32:16 2019 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Mon, 11 Nov 2019 10:32:16 +0100 Subject: [ops][glance] Quota for max number of images/snapshots per projects Message-ID: As far as I can see it is not possible to set a quota for the maximum number of images-snapshots for a given project, at least in the Rocky release If I am not wrong it is only possible to set the max size of an image or the total size of space used for glance by a project, but these settings would be the same for all projects Are there plans to implement this capability ? Thanks, Massimo -------------- next part -------------- An HTML attachment was scrubbed... URL: From arne.wiebalck at cern.ch Mon Nov 11 10:19:59 2019 From: arne.wiebalck at cern.ch (Arne Wiebalck) Date: Mon, 11 Nov 2019 11:19:59 +0100 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> Message-ID: Hi Matt, On 10.11.19 22:07, Matt Riedemann wrote: > On 11/10/2019 10:44 AM, Balázs Gibizer wrote: >> On 3500 baremetal nodes _update_available_resource takes 1.5 hour. > > Why have a single nova-compute service manage this many nodes? Or even > 1000? > > Why not try to partition things a bit more reasonably like a normal cell > where you might have ~200 nodes per compute service host (I think CERN > keeps their cells to around 200 physical compute hosts for scaling)? > > That way you can also leverage the compute service hashring / failover > feature for HA? > > I realize the locking stuff is not great, but at what point is it > unreasonable to expect a single compute service to manage that many > nodes/instances? > I agree that using sharding and/or multiple cells to manage that many nodes is sensible. One reason we haven't done it yet is that we got away with this very simple setup so far ;) Sharding with and/or within cells will help to some degree (and we are actively looking into this as you probably know), but I think that should not stop us from checking if there are algorithmic improvements (e.g. when collecting the data), or if moving to a different locking granularity or even parallelising the update are feasible additional improvements. Cheers, Arne -- Arne Wiebalck CERN IT From huaqiang.wang at intel.com Mon Nov 11 11:58:27 2019 From: huaqiang.wang at intel.com (Wang, Huaqiang) Date: Mon, 11 Nov 2019 11:58:27 +0000 Subject: [nova][ptg] pinned and unpinned CPUs in one instance In-Reply-To: <1573196961.23158.1@est.tech> References: <1573196961.23158.1@est.tech> Message-ID: <77E9D723B6A15C4CB27F7C3F130DE8624776582B@shsmsx102.ccr.corp.intel.com> > -----Original Message----- > From: Balázs Gibizer > Sent: Friday, November 8, 2019 3:10 PM > To: openstack-discuss > Subject: [nova][ptg] pinned and unpinned CPUs in one instance > > spec: https://review.opendev.org/668656 > > Agreements from the PTG: > > How we will test it: > * do functional test with libvirt driver, like the pinned cpu tests we have > today > * donyd's CI supports nested virt so we can do pinned cpu testing but not > realtime. As this CI is still work in progress we should not block on this. > * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to > have > > Naming: use the 'shared' and 'dedicated' terminology > > Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra > specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax will > have less expression power until nova models NUMA in placement. So nova > will try to evenly distribute PCPUs between numa nodes. If it not possible we > reject the request and ask the user to use the > hw:pinvcpus=3 syntax. > > Realtime mask is an exclusion mask, any vcpus not listed there has to be in > the dedicated set of the instance. > > TODOInvestigate whether we want to enable NUMA by default > * Pros: Simpler, everything is NUMA by default > * Cons: We'll either have to break/make configurablethe 1:1 guest:host > NUMA mapping else we won't be able to boot e.g. a 40 core shared instance > on a 40 core, 2 NUMA node host For the case of 'booting a 40 core shared instance on 40 core 2NUMA node' that will not be covered by the new 'mixed' policy. It is just a legacy 'shared' instance with no assumption about instance NUMA topology. By the way if you want a 'shared' instance, with 40 cores, to be scheduled on a host of 40cores, 2 NUMA nodes, you also need to register all host cores as 'shared' cpus through 'conf.compute.cpu_shared_set'. For instance with 'mixed' policy, what I want to propose is the instance should demand at least one 'dedicated'(or PCPU) core. Thus, any 'mixed' instance or 'dedicated' instance will not be scheduled one this host due to no PCPU available on this host. And also, a 'mixed' instance should also demand at least one 'shared' (or VCPU) core. a 'mixed' instance demanding all cores from PCPU resource should be considered as an invalid one. And an instance demanding all cores from PCPU resource is just a legacy 'dedicated' instance, which CPU allocation policy is 'dedicated'. In conclusion, a instance with the policy of 'mixed' -. demands at least one 'dedicated' cpu and at least one 'shared' cpu. -. with NUMA topology by default due to requesting pinned cpu In my understanding the cons does not exist by making above rules. Br Huaqiang > > > Cheers, > gibi > > From huaqiang.wang at intel.com Mon Nov 11 12:45:30 2019 From: huaqiang.wang at intel.com (Wang, Huaqiang) Date: Mon, 11 Nov 2019 12:45:30 +0000 Subject: [nova][ptg] pinned and unpinned CPUs in one instance In-Reply-To: References: <1573196961.23158.1@est.tech> Message-ID: <77E9D723B6A15C4CB27F7C3F130DE862477658E0@shsmsx102.ccr.corp.intel.com> > -----Original Message----- > From: Sean Mooney > Sent: Friday, November 8, 2019 8:21 PM > To: Balázs Gibizer ; openstack-discuss discuss at lists.openstack.org> > Subject: Re: [nova][ptg] pinned and unpinned CPUs in one instance > > On Fri, 2019-11-08 at 07:09 +0000, Balázs Gibizer wrote: > > spec: https://review.opendev.org/668656 > > > > Agreements from the PTG: > > > > How we will test it: > > * do functional test with libvirt driver, like the pinned cpu tests we > > have today > > * donyd's CI supports nested virt so we can do pinned cpu testing but > > not realtime. As this CI is still work in progress we should not block > > on this. > we can do realtime testing in that ci. > i already did. also there is a new label that is available across 3 providers so > we wont just be relying on donyd's good work. > > > * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to > > have > > > > Naming: use the 'shared' and 'dedicated' terminology > didn't we want to have a hw:cpu_policy=mixed specificaly for this case? > > > > Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra > > specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax > > will have less expression power until nova models NUMA in placement. > > So nova will try to evenly distribute PCPUs between numa nodes. If it > > not possible we reject the request and ask the user to use the > > hw:pinvcpus=3 syntax. > > > > Realtime mask is an exclusion mask, any vcpus not listed there has to > > be in the dedicated set of the instance. > > > > TODOInvestigate whether we want to enable NUMA by default > > * Pros: Simpler, everything is NUMA by default > > * Cons: We'll either have to break/make configurablethe 1:1 guest:host > in the context of mix if we dont enable numa affinity by default we should > remove that behavior from all case where we do it today. > > NUMA mapping else we won't be able to boot e.g. a 40 core shared > > instance on a 40 core, 2 NUMA node host Hi gabi or sean, To help me to understand the issue under discussion, if I change the instance requirement a little bit to: -. an instance demanding 1 dedicated core and 39 shared cores -. instance vcpu allocation ratio is 1 -. host has 2 NUMA nodes and 40 cores in total -. 39 of 40 cores are registered as VCPU resource the 1 core is registered as PCPU It will raise the same problem, right? because it hopes the instance to be scheduled on the host. > if this is a larger question of if we should have all instance be numa by > default i have argued yes for quite a while as i think having 1 code path has > many advantages. that said im aware of this limitation. > one way to solve this was the use of the proposed can_split placmenent > paramter. so if you did not specify a numa toplogy we would add > can_split=vCPUs and then create a singel or multiple numa node toplogy > based on the allcoations. if we combine that with a allocation weigher we > could sort the allocation candiates by smallest number of numa nodes so we > would prefer landing on hosts that can fit it on 1 numa node. > its a big change but long overdue. > I have read the 'can_split' spec, it will help if I understand the issue correctly. Then I agree with Sean that it is another issue that is not belong to spec 668656. > that said i have also argued the other point too in responce to pushback on > "all vms have numa of 1 unless you say otherwise" i.e. that the 1:1 between > mapping virtual and host numa nodes shoudl be configurable and is not > required by the api today. the backwards compatible way to do that is its not > requried by default if you are using shared cores and is required if you are > using pinned but that is a littel confusing. > > i dont really know what the right answer to this is but i think its a seperate > question form the topic of this thread. > we dont need to solve this to enable pinned and unpinned cpus in one > instance but we do need to adress this before we can model numa in > placment. > > > > > > > Cheers, > > gibi > > > > > > > From cdent+os at anticdent.org Mon Nov 11 13:03:54 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Mon, 11 Nov 2019 13:03:54 +0000 (GMT) Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> Message-ID: On Sun, 10 Nov 2019, Matt Riedemann wrote: > On 11/10/2019 10:44 AM, Balázs Gibizer wrote: >> On 3500 baremetal nodes _update_available_resource takes 1.5 hour. > > Why have a single nova-compute service manage this many nodes? Or even 1000? > > Why not try to partition things a bit more reasonably like a normal cell > where you might have ~200 nodes per compute service host (I think CERN keeps > their cells to around 200 physical compute hosts for scaling)? Without commenting on the efficacy of doing things this way, I can report that 1000 (or even 3500) instances (not nodes) is a thing that can happen in some openstack + vsphere setups and tends to exercise some of the same architectural problems that a lots-of- ironic (nodes) setup encounters. As far as I can tell the root architecture problem is: a) there are lots loops b) there is an expectation that those loops will have a small number of iterations (b) is generally true for a run of the mill KVM setup, but not otherwise. (b) not being true in other contexts creates an impedance mismatch that is hard to overcome without doing at least one of the two things suggested elsewhere in this thread: 1. manage fewer pieces per nova-compute (Matt) 2. "algorithmic improvement" (Arne) On 2, I wonder if there's been any exploration of using something like a circular queue and time-bounding the periodic jobs? Or using separate processes? For the ironic and vsphere contexts, increased CPU usage by the nova-compute process does not impact on the workload resources, so parallization is likely a good option. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From gmann at ghanshyammann.com Mon Nov 11 13:04:56 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 11 Nov 2019 21:04:56 +0800 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: <20191110104424.GA6424@sm-workstation> References: <20191110104424.GA6424@sm-workstation> Message-ID: <16e5a91380c.cce9f68a100336.2769893163805162012@ghanshyammann.com> ---- On Sun, 10 Nov 2019 18:44:24 +0800 Sean McGinnis wrote ---- > > > > Sean and everyone else, > > > > Pardon me, but I have to rant here... :) > > Please try see things from a downstream consumer point of view. > > > > This isn't the Python 2.7 era anymore, where we had a stable python for > > like forever. OpenStack needs to move quicker to newer Python 3 > > versions, especially considering that Python 2.7 isn't an option for > > anyone anymore. While your proposal (ie: less jobs on Python 3.8) looks > > like a nice first approach, it is my strong believe that the project > > should quickly move to voting and full Python 3.8 testing, and > > preferably, have it in order, with functional testing, for the whole > > project, by the time Ussuri is released. > > > > We've had this debate many times now. Nothing has changed the fact that we > cannot make something an official runtime until there is an official distro out > there with that as the runtime. There is not for Ussuri. ++. We cannot add for Ussuri at this stage. We can go with the below plan: - Start an experimental unit tests (functional and integration are next step) job first and projects can slowly start fixing (if any failure) those based on their bandwidth. - Once job pass then make it periodic or n-v to capture the new code checking on py3.8. - Same process for functional jobs. - Integration jobs are not required to be duplicated. they can be moved to the latest py version later. In future release, we iterate the results of jobs and discuss to be add in testing run time mplate. -gmann > > > I know what's going to happen: I'll tell about a bug in Python 3.8 on > > IRC, and someone will reply to me "hey, that's not supported by > > OpenStack before the V release, please go away", even though as > > downstream distribution package maintainer, we don't have the power to > > decide what version of Python our distribution runs on (ie: both Debian > > Sid, Ubuntu and Fedora are quickly moving targets). > > > > I very highly doubt that, and very much disagree that someone will say to go > away. From what I have seen, the majority of the community is very responsive > to issues raised about future version problems. > > Fixing and working with py3.8 is not what is being discussed here. Only whether > those jobs to validate py3.8 should run on every patch or not. > > > There's absolutely no excuse for the OpenStack project to be dragging > > its feet, apart maybe the fact that it may not be easy to setup the > > infra to run tests on Py3.8 just yet. It isn't a normal situation that > > downstream distributions get the shit (pardon my french) and get to be > > the ones fixing issues and proposing patches (Corey, you've done awesome > > job on this...), yet it's been the case for nearly every single Python 3 > > releases. I very much would appreciate this situation to be fixed, and > > the project moving faster. > > > > Cheers, > > > > Thomas Goirand (zigo) > > > > From smooney at redhat.com Mon Nov 11 13:15:24 2019 From: smooney at redhat.com (Sean Mooney) Date: Mon, 11 Nov 2019 13:15:24 +0000 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: <20191110104424.GA6424@sm-workstation> References: <20191110104424.GA6424@sm-workstation> Message-ID: <844804a4bee95e2dac2d69746e817c9678eae9d9.camel@redhat.com> On Sun, 2019-11-10 at 04:44 -0600, Sean McGinnis wrote: > > > > Sean and everyone else, > > > > Pardon me, but I have to rant here... :) > > Please try see things from a downstream consumer point of view. > > > > This isn't the Python 2.7 era anymore, where we had a stable python for > > like forever. OpenStack needs to move quicker to newer Python 3 > > versions, especially considering that Python 2.7 isn't an option for > > anyone anymore. While your proposal (ie: less jobs on Python 3.8) looks > > like a nice first approach, it is my strong believe that the project > > should quickly move to voting and full Python 3.8 testing, and > > preferably, have it in order, with functional testing, for the whole > > project, by the time Ussuri is released. > > > > We've had this debate many times now. Nothing has changed the fact that we > cannot make something an official runtime until there is an official distro out > there with that as the runtime. There is not for Ussuri. > > > I know what's going to happen: I'll tell about a bug in Python 3.8 on > > IRC, and someone will reply to me "hey, that's not supported by > > OpenStack before the V release, please go away", even though as > > downstream distribution package maintainer, we don't have the power to > > decide what version of Python our distribution runs on (ie: both Debian > > Sid, Ubuntu and Fedora are quickly moving targets). > > > > I very highly doubt that, and very much disagree that someone will say to go > away. From what I have seen, the majority of the community is very responsive > to issues raised about future version problems. > > Fixing and working with py3.8 is not what is being discussed here. Only whether > those jobs to validate py3.8 should run on every patch or not. i think the only push back you would get is if fixing py38 compatibility would break py27 or py36 support. py27 is less of a concern at this point although i know some project might support it longer then required. The point being if we had to choose between a supported python and a newer python that we dont yet support we would prefer the supported version but generaly there is a way to support all the version we care about. it just means we can use the py38 only features until that becomes our minium supported version. i think periodic jobs make more sense personally then experimental. depending on the velocity of the project there may be little different between a non voting check job and a peroidc at least for the smaller project. for larger project like nova a periodic would give more coverage then experimental and would use alot less resources but having a periodic job is only useful i someone checks it so im not sure if adding it to the default template makes sesne. i would suggest we create a python-runtime-next template that add a py38 unit test job to that that adds the job to the periodic and experimental piplines. project that will actually check the periodic jobs in there weekly meeting like neutorn can opt in those that wont dont need to add the template. > > > There's absolutely no excuse for the OpenStack project to be dragging > > its feet, apart maybe the fact that it may not be easy to setup the > > infra to run tests on Py3.8 just yet. It isn't a normal situation that > > downstream distributions get the shit (pardon my french) and get to be > > the ones fixing issues and proposing patches (Corey, you've done awesome > > job on this...), yet it's been the case for nearly every single Python 3 > > releases. I very much would appreciate this situation to be fixed, and > > the project moving faster. > > > > Cheers, > > > > Thomas Goirand (zigo) > > > > From geguileo at redhat.com Mon Nov 11 13:48:40 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Mon, 11 Nov 2019 14:48:40 +0100 Subject: Change Volume Type, but in use In-Reply-To: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> References: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> Message-ID: <20191111134840.bmgjlncfiwaerrqg@localhost> On 07/11, Sinan Polat wrote: > Hi, > > I am using Ceph as the backend for Cinder. Within Ceph we have defined 2 RBD > pools (ssdvolumes, sasvolumes). > In OpenStack I created 2 Volume Types (ssd, sas). Each Volume Type has property > "volume_backend_name='tripleo_ceph_'". > > In the Cinder configuration I have the following backends configured: > > [tripleo_ceph_ssd] > backend_host=hostgroup > volume_backend_name=tripleo_ceph_ssd > volume_driver=cinder.volume.drivers.rbd.RBDDriver > rbd_ceph_conf=/etc/ceph/ceph.conf > rbd_user=openstack > rbd_pool=ssdvolumes > > [tripleo_ceph_sas] > backend_host=hostgroup > volume_backend_name=tripleo_ceph_sas > volume_driver=cinder.volume.drivers.rbd.RBDDriver > rbd_ceph_conf=/etc/ceph/ceph.conf > rbd_user=openstack > rbd_pool=sasvolumes > > As you might have noticed, the backend name (tripleo_ceph_ssd) and the RBD pool > name (ssdvolumes, not ssd) does not match. So far, we do not have any problems. > But I want to correct the names and I do not want to have the mismatch anymore. > > So I want to change the value of key volume_backend_name for both Volume Types > (tripleo_ceph_ssd => tripleo_ceph_ssdvolumes). > Hi, I agree with Sean, I wouldn't change it since this is only aesthetic. Having said that, there's always a way to do most things, even if it's NOT RECOMMENDED: - Update cinder.conf - Get the volume type id for the 2 volume types to change - Stop cinder services - Go into the DB and manually update the volume types changes in the "volume_type_extra_specs" table filtering by the volume_type_id and the key "volume_backend_name" and setting the new "value". - Use the "cinder-manage volume update_host" to update existing volumes to the new backend (you could also do this directly in the DB). - Start cinder services - Remove the old service from the DB (they appear as down now) using the "cinder-manage service remove" command. Cinder-manage docs: https://docs.openstack.org/cinder/latest/cli/cinder-manage.html Regards, Gorka. > I tried the following: > $ openstack volume type show 80cb25ff-376a-4483-b4f7-d8c75839e0ce > +--------------------+----------------------------------------+ > | Field | Value | > +--------------------+----------------------------------------+ > | access_project_ids | None | > | description | | > | id | 80cb25ff-376a-4483-b4f7-d8c75839e0ce | > | is_public | True | > | name | ssd | > | properties | volume_backend_name='tripleo_ceph_ssd' | > | qos_specs_id | None | > +--------------------+----------------------------------------+ > $ > > > $ openstack volume type set --property > volume_backend_name='tripleo_ceph_ssdvolumes' > 80cb25ff-376a-4483-b4f7-d8c75839e0ce > Failed to set volume type property: Volume Type is currently in use. (HTTP 400) > (Request-ID: req-5efaa5b7-910f-4802-8494-3115cfc4ab93) > Command Failed: One or more of the operations failed > $ > > How to solve my problem? > > Thanks! > > Sinan From sinan at turka.nl Mon Nov 11 13:56:26 2019 From: sinan at turka.nl (Sinan Polat) Date: Mon, 11 Nov 2019 14:56:26 +0100 Subject: Change Volume Type, but in use In-Reply-To: <20191111134840.bmgjlncfiwaerrqg@localhost> References: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> <20191111134840.bmgjlncfiwaerrqg@localhost> Message-ID: <8C45CFCD-3903-46FB-9691-3E7257061975@turka.nl> Hi, Sorry for not being totally clear. The cluster is managed by TripleO. After each deployment/update, the cinder configuration is updated with incorrect names. Currently we correct it manually in the cinder configuration. So it is not only aesthetic. Sinan > Op 11 nov. 2019 om 14:48 heeft Gorka Eguileor het volgende geschreven: > >> On 07/11, Sinan Polat wrote: >> Hi, >> >> I am using Ceph as the backend for Cinder. Within Ceph we have defined 2 RBD >> pools (ssdvolumes, sasvolumes). >> In OpenStack I created 2 Volume Types (ssd, sas). Each Volume Type has property >> "volume_backend_name='tripleo_ceph_'". >> >> In the Cinder configuration I have the following backends configured: >> >> [tripleo_ceph_ssd] >> backend_host=hostgroup >> volume_backend_name=tripleo_ceph_ssd >> volume_driver=cinder.volume.drivers.rbd.RBDDriver >> rbd_ceph_conf=/etc/ceph/ceph.conf >> rbd_user=openstack >> rbd_pool=ssdvolumes >> >> [tripleo_ceph_sas] >> backend_host=hostgroup >> volume_backend_name=tripleo_ceph_sas >> volume_driver=cinder.volume.drivers.rbd.RBDDriver >> rbd_ceph_conf=/etc/ceph/ceph.conf >> rbd_user=openstack >> rbd_pool=sasvolumes >> >> As you might have noticed, the backend name (tripleo_ceph_ssd) and the RBD pool >> name (ssdvolumes, not ssd) does not match. So far, we do not have any problems. >> But I want to correct the names and I do not want to have the mismatch anymore. >> >> So I want to change the value of key volume_backend_name for both Volume Types >> (tripleo_ceph_ssd => tripleo_ceph_ssdvolumes). >> > > Hi, > > I agree with Sean, I wouldn't change it since this is only aesthetic. > > Having said that, there's always a way to do most things, even if it's > NOT RECOMMENDED: > > - Update cinder.conf > - Get the volume type id for the 2 volume types to change > - Stop cinder services > - Go into the DB and manually update the volume types changes in the > "volume_type_extra_specs" table filtering by the volume_type_id and > the key "volume_backend_name" and setting the new "value". > - Use the "cinder-manage volume update_host" to update existing volumes > to the new backend (you could also do this directly in the DB). > - Start cinder services > - Remove the old service from the DB (they appear as down now) using the > "cinder-manage service remove" command. > > Cinder-manage docs: > https://docs.openstack.org/cinder/latest/cli/cinder-manage.html > > Regards, > Gorka. > >> I tried the following: >> $ openstack volume type show 80cb25ff-376a-4483-b4f7-d8c75839e0ce >> +--------------------+----------------------------------------+ >> | Field | Value | >> +--------------------+----------------------------------------+ >> | access_project_ids | None | >> | description | | >> | id | 80cb25ff-376a-4483-b4f7-d8c75839e0ce | >> | is_public | True | >> | name | ssd | >> | properties | volume_backend_name='tripleo_ceph_ssd' | >> | qos_specs_id | None | >> +--------------------+----------------------------------------+ >> $ >> >> >> $ openstack volume type set --property >> volume_backend_name='tripleo_ceph_ssdvolumes' >> 80cb25ff-376a-4483-b4f7-d8c75839e0ce >> Failed to set volume type property: Volume Type is currently in use. (HTTP 400) >> (Request-ID: req-5efaa5b7-910f-4802-8494-3115cfc4ab93) >> Command Failed: One or more of the operations failed >> $ >> >> How to solve my problem? >> >> Thanks! >> >> Sinan > > From geguileo at redhat.com Mon Nov 11 14:00:16 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Mon, 11 Nov 2019 15:00:16 +0100 Subject: [cinder] consistency group not working In-Reply-To: <7adf0a5d-43b3-c606-2ba8-00d97b96cbdc@everyware.ch> References: <7adf0a5d-43b3-c606-2ba8-00d97b96cbdc@everyware.ch> Message-ID: <20191111140016.qyftq5iy27ekmdtj@localhost> On 27/09, Francois Scheurer wrote: > Dear Cinder Experts > > > We are running the rocky release. > > |We can create a consistency group: openstack consistency group create > --volume-type b9f67298-cf68-4cb2-bed2-c806c5f83487 fsc-consgroup Bug 1: but > adding volumes is not working: openstack consistency group add volume > c3f49ef0-601e-4558-a75a-9b758304ce3b b48752e3-641f-4a49-a892-6cb54ab6b74d > c0022411-59a4-4c7c-9474-c7ea8ccc7691 0f4c6493-dbe2-4f75-8e37-5541a267e3f2 => > Invalid volume: Volume is not local to this node. (HTTP 400) (Request-ID: > req-7f67934a-5835-40ef-b25c-12591fd79f85) Bug 2: deleting consistency group > is also not working (silently failing): openstack consistency group delete > c3f49ef0-601e-4558-a75a-9b758304ce3b |||=> AttributeError: 'RBDDriver' > object has no attribute 'delete_consistencygroup'| See details below. Using > the --force option makes no difference and the consistency group is not > deleted. Do you think this is a bug or a configuration issue? Thank you in > advance. | > > Cheers > > Francois Hi, It seems you are trying to use consistency groups with the RBD driver, which doesn't currently support consistency groups. Cheers, Gorka. > > |Details: ==> > /var/lib/docker/volumes/kolla_logs/_data/cinder/cinder-api-access.log <== > 10.0.129.17 - - [27/Sep/2019:12:16:24 +0200] "POST /v3/f099965b37ac41489e9cac8c9d208711/consistencygroups/3706bbab-e2df-4507-9168-08ef811e452c/delete > HTTP/1.1" 202 - 109720 "-" "python-cinderclient" ==> > /var/lib/docker/volumes/kolla_logs/_data/cinder/cinder-volume.log <== > 2019-09-27 12:16:24.491 30 ERROR oslo_messaging.rpc.server > [req-9010336e-d569-47ad-84e2-8dd8b729939c b141574ee71f49a0b53a05ae968576c5 > f099965b37ac41489e9cac8c9d208711 - default default] Exception during message > handling: AttributeError: 'RBDDriver' object has no attribute > 'delete_consistencygroup' 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server Traceback (most recent call last): 2019-09-27 > 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", > line 163, in _process_incoming 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-09-27 > 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", > line 265, in dispatch 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, > args) 2019-09-27 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", > line 194, in _do_dispatch 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-09-27 > 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/osprofiler/profiler.py", > line 159, in wrapper 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server result = f(*args, **kwargs) 2019-09-27 > 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/cinder/volume/manager.py", > line 3397, in delete_group 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server vol_obj.save() 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", > line 220, in __exit__ 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server self.force_reraise() 2019-09-27 12:16:24.491 30 > ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", > line 196, in force_reraise 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) > 2019-09-27 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/cinder/volume/manager.py", > line 3362, in delete_group 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server self.driver.delete_consistencygroup(context, cg, > 2019-09-27 12:16:24.491 30 ERROR oslo_messaging.rpc.server AttributeError: > 'RBDDriver' object has no attribute 'delete_consistencygroup' 2019-09-27 > 12:16:24.491 30 ERROR oslo_messaging.rpc.server| > > > > > -- > > > EveryWare AG > François Scheurer > Senior Systems Engineer > Zurlindenstrasse 52a > CH-8003 Zürich > > tel: +41 44 466 60 00 > fax: +41 44 466 60 10 > mail: francois.scheurer at everyware.ch > web: http://www.everyware.ch > From dms at danplanet.com Mon Nov 11 14:53:06 2019 From: dms at danplanet.com (Dan Smith) Date: Mon, 11 Nov 2019 06:53:06 -0800 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: (Arne Wiebalck's message of "Mon, 11 Nov 2019 11:19:59 +0100") References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> Message-ID: > Sharding with and/or within cells will help to some degree (and we are > actively looking into this as you probably know), but I think that > should not stop us from checking if there are algorithmic improvements > (e.g. when collecting the data), or if moving to a different locking > granularity or even parallelising the update are feasible additional > improvements. All of that code was designed around one node per compute host. In the ironic case it was expanded (hacked) to support N where N is not huge. Giving it a huge number, and using a driver where nodes go into maintenance/cleaning for long periods of time is asking for trouble. Given there is only one case where N can legitimately be greater than one, I'm really hesitant to back a proposal to redesign it for large values of N. Perhaps we as a team just need to document what sane, tested, and expected-to-work values for N are? --Dan From tidwellrdev at gmail.com Mon Nov 11 15:31:53 2019 From: tidwellrdev at gmail.com (Ryan Tidwell) Date: Mon, 11 Nov 2019 09:31:53 -0600 Subject: [neutron] Bug Deputy Report Nov. 4-11 Message-ID: Hello neutrinos, here is the bug deputy report for the week of Nov. 4th: High: * https://bugs.launchpad.net/neutron/+bug/1851659 "removing a network from a DHCP agent removes L3 rules even if it shouldn't" This was found on stable/rocky, we should obviously see if it can be reproduced on master, stein, and train as well. There does seem to be a workaround, but VM's lost connectivity for a brief period. Medium: * https://bugs.launchpad.net/neutron/+bug/1851500 ""test_show_port_chain" conflicts with Flow Classifier" This is a gate issue, with what appears to be some instability with a test. Initial triage indicates it's an intermittent issue. * https://bugs.launchpad.net/neutron/+bug/1851194 "FWaaSv2 configures iptables with invalid port name" This similar to https://bugs.launchpad.net/neutron/+bug/1798577, FWaaSv2 seems to be referencing the wrong port names. RFE: * https://bugs.launchpad.net/neutron/+bug/1851609 Add an option for graceful l3 agent shutdown -Ryan Tidwell -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack-dev at storpool.com Mon Nov 11 16:06:21 2019 From: openstack-dev at storpool.com (Peter Penchev) Date: Mon, 11 Nov 2019 18:06:21 +0200 Subject: [nova][cinder] Volume-backed instance disks and some operations that do not support those yet Message-ID: Hi, First of all, thanks to everyone involved for all the work on Nova, Cinder, os-brick, and actually all the rest of OpenStack, too! So, yeah, I guess it is kind of weird that I'm asking this on the list just a couple of days after the PTG where I could have asked in person, but here goes :) There seem to still be some quirks with Nova and volume-backed instance disks; some actions on instances are not allowed, others produce somewhat weird results. From a quick look at the code it seems to me that currently these are: - taking a snapshot of an instance (produces a zero-sized file, no real data backed up) - backing an instance up (refuses outright) - rescuing an instance (refuses outright) ...and maybe there are some that I've missed. So, possibly stupid question here, but what are the project's plans about these - is there an intention to implement them at some point, or are there some very, very hard theoreitcal or practical problems (so something like "guess not for the present"), or is somebody working on something? The main reason that I am asking is that we, StorPool, have a shared-storage Cinder driver, and every now and then a customer comes up and asks about one or more of these actions. Every now and then we come back to the idea of writing a vendor-specific Nova image backend, but, first off, we are not really sure whether we want to do this, and second, we are not really sure whether it will be accepted upstream. A couple of years ago people told us "don't do that" and there was some talk about having an image backend for storage drivers supported by libvirt, but that effort seems to have stalled. Of course, we know that in all software projects, including, but certainly not limited to, the more-or-less volunteer free/libre/open-source projects, there are many tasks and many demands on the developers so that it is only natural that not everything is implemented or adapted at once; things happen, priorities shift, people get redirected, nobody else steps up to continue - it happens. With this in mind, where do things stand right now, should we consider writing an image backend, are there other options or plans? So thanks for reading through my ramblings, I guess, and keep up the great work! Best regards, Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From dms at danplanet.com Mon Nov 11 16:32:35 2019 From: dms at danplanet.com (Dan Smith) Date: Mon, 11 Nov 2019 08:32:35 -0800 Subject: [nova][cinder] Volume-backed instance disks and some operations that do not support those yet In-Reply-To: (Peter Penchev's message of "Mon, 11 Nov 2019 18:06:21 +0200") References: Message-ID: > With this in mind, where do things stand right now, should we consider > writing an image backend, are there other options or plans? I don't think you should, no. The image backend code is messy and problematic for a lot of reasons, and building on what we have there is a path to madness I think. Rewriting it is no small feat, and I think that if we did we'd want to do so in such a way that makes use of cinder for anything other than local disk. That's a really nice ideal, but it's a huge amount of work to do (and review) and also unlikely to ever actually happen. We can do a lot better by reducing the feature gap with volume-backed instances. Implementing the features that aren't supported, and improving the ones that are *weird* when used on a volume-backed instance. These would be much smaller changes, easier to review, easier to gain acceptance for, etc. Personally, if you want to do some work in this area, I'd recommend picking a weird behavior and trying to propose an improvement to it. --Dan From melwittt at gmail.com Mon Nov 11 16:50:48 2019 From: melwittt at gmail.com (melanie witt) Date: Mon, 11 Nov 2019 08:50:48 -0800 Subject: [nova][ptg] drop the shadow table concept In-Reply-To: References: <1573403387.31166.6@est.tech> Message-ID: On 11/10/19 12:41, Matt Riedemann wrote: > On 11/10/2019 10:29 AM, Balázs Gibizer wrote: >> * Also there could be two cells running this command at the same time >> fighting for the API db lock, > > In Train the --all-cells option was added to the CLI so that should > resolve this issue. I think Mel said she backported those changes > internally so I'm not sure how hard it would be for those to go back to > Stein or Rocky or whatever release CERN is using now. That's correct, I backported --all-cells [1][2][3][4] to Stein, Rocky, and Queens downstream. I found it not to be easy but YMMV. The primary conflicts in Stein were with --before, so I went ahead and brought those patches back as well [5][6][7] since we also needed --before to help people avoid the "orphaned virt guests if archive runs while nova-compute is down" problem. Same deal for Rocky. And finally with Queens, there's an additional conflict around deleting instance group members [8], so I also brought that back because it's related to all of the database cleanup issues that support has repeatedly faced with customers. Hope this helps anyone considering backporting --all-cells. Cheers, -melanie [1] https://review.opendev.org/675218 [2] https://review.opendev.org/675209 [3] https://review.opendev.org/675205 [4] https://review.opendev.org/507486 [5] https://review.opendev.org/661289 [6] https://review.opendev.org/556751 [7] https://review.opendev.org/643779 [8] https://review.opendev.org/598953 From doka.ua at gmx.com Mon Nov 11 17:06:43 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Mon, 11 Nov 2019 19:06:43 +0200 Subject: [Neutron] OVS forwarding issues Message-ID: Dear colleagues, just faced an issue with Openvswitch, which looks strange for me. The problem is that any particular VM receives a lot of packets, which are unicasted: - from other VMs which reside on the same host (let's name them "local VMs") - to other VMs which reside on other hosts (let's name them "remote VMs") Long output from "ovs-ofctl dump-flows br-int" which, as far as I can narrow, ends there: # ovs-ofctl dump-flows br-int |grep " table=94," |egrep "n_packets=[123456789]"  cookie=0xaf6b1435fe826bdf, duration=2952350.695s, table=94, n_packets=291494723, n_bytes=40582103074, idle_age=0, hard_age=65534, priority=1 actions=NORMAL coming to normal processing (classic MAC learning). Looking into br-int MAC-table (ovs-appctl fdb/show br-int) shows, that there are really no MAC addresses of remote VMs and br-int behaves in the right way, flooding unknown unicast to all ports in this L2 segment. Of course, there is br-tun which connected over vxlan to all other hosts and to br-int:     Bridge br-tun         Controller "tcp:127.0.0.1:6633"             is_connected: true         fail_mode: secure         Port "vxlan-0a960008"             Interface "vxlan-0a960008"                 type: vxlan                 options: {df_default="true", in_key=flow, local_ip="10.150.0.5", out_key=flow, remote_ip="10.150.0.8"}         [ ... ]         Port br-tun             Interface br-tun                 type: internal         Port patch-int             Interface patch-int                 type: patch                 options: {peer=patch-tun} but MAC table on br-tun is empty as well: # ovs-appctl fdb/show br-tun  port  VLAN  MAC                Age # Finally, packets get to destination, while being copied to all ports on source host, which is serious security issue. I do not think so conceived by design, I rather think we missed something in configuration. Can anybody point me where we're wrong and help with this issue? We're using Openstack Rocky and OVS 2.10.0 under Ubuntu 16.04. Network configuration is: @controller: # cat /etc/neutron/plugins/ml2/ml2_conf.ini |egrep -v "^$|^#" [DEFAULT] verbose = true [ml2] type_drivers = flat,vxlan tenant_network_types = vxlan mechanism_drivers = l2population,openvswitch extension_drivers = port_security,qos,dns_domain_ports [ml2_type_flat] flat_networks = provider [ml2_type_geneve] [ml2_type_gre] [ml2_type_vlan] [ml2_type_vxlan] vni_ranges = 400:400000 [securitygroup] firewall_driver = openvswitch enable_security_group = true enable_ipset = true @agent: # cat /etc/neutron/plugins/ml2/openvswitch_agent.ini |egrep -v "^$|^#" [DEFAULT] verbose = true [agent] tunnel_types = vxlan l2_population = true arp_responder = true extensions = qos [ovs] local_ip = 10.150.0.5 bridge_mappings = provider:br-ex [securitygroup] firewall_driver = openvswitch enable_security_group = true enable_ipset = true [xenapi] Thank you. -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.denton at rackspace.com Mon Nov 11 17:38:24 2019 From: james.denton at rackspace.com (James Denton) Date: Mon, 11 Nov 2019 17:38:24 +0000 Subject: [Neutron] OVS forwarding issues In-Reply-To: References: Message-ID: Hi, This is a known issue with the openvswitch firewall[1]. > firewall_driver = openvswitch I recommend running iptables_hybrid until that is resolved. [1] https://bugs.launchpad.net/neutron/+bug/1732067 James Denton Network Engineer Rackspace Private Cloud james.denton at rackspace.com From: Volodymyr Litovka Date: Monday, November 11, 2019 at 12:10 PM To: "openstack-discuss at lists.openstack.org" Cc: "doka.ua at gmx.com" Subject: [Neutron] OVS forwarding issues CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Dear colleagues, just faced an issue with Openvswitch, which looks strange for me. The problem is that any particular VM receives a lot of packets, which are unicasted: - from other VMs which reside on the same host (let's name them "local VMs") - to other VMs which reside on other hosts (let's name them "remote VMs") Long output from "ovs-ofctl dump-flows br-int" which, as far as I can narrow, ends there: # ovs-ofctl dump-flows br-int |grep " table=94," |egrep "n_packets=[123456789]" cookie=0xaf6b1435fe826bdf, duration=2952350.695s, table=94, n_packets=291494723, n_bytes=40582103074, idle_age=0, hard_age=65534, priority=1 actions=NORMAL coming to normal processing (classic MAC learning). Looking into br-int MAC-table (ovs-appctl fdb/show br-int) shows, that there are really no MAC addresses of remote VMs and br-int behaves in the right way, flooding unknown unicast to all ports in this L2 segment. Of course, there is br-tun which connected over vxlan to all other hosts and to br-int: Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "vxlan-0a960008" Interface "vxlan-0a960008" type: vxlan options: {df_default="true", in_key=flow, local_ip="10.150.0.5", out_key=flow, remote_ip="10.150.0.8"} [ ... ] Port br-tun Interface br-tun type: internal Port patch-int Interface patch-int type: patch options: {peer=patch-tun} but MAC table on br-tun is empty as well: # ovs-appctl fdb/show br-tun port VLAN MAC Age # Finally, packets get to destination, while being copied to all ports on source host, which is serious security issue. I do not think so conceived by design, I rather think we missed something in configuration. Can anybody point me where we're wrong and help with this issue? We're using Openstack Rocky and OVS 2.10.0 under Ubuntu 16.04. Network configuration is: @controller: # cat /etc/neutron/plugins/ml2/ml2_conf.ini |egrep -v "^$|^#" [DEFAULT] verbose = true [ml2] type_drivers = flat,vxlan tenant_network_types = vxlan mechanism_drivers = l2population,openvswitch extension_drivers = port_security,qos,dns_domain_ports [ml2_type_flat] flat_networks = provider [ml2_type_geneve] [ml2_type_gre] [ml2_type_vlan] [ml2_type_vxlan] vni_ranges = 400:400000 [securitygroup] firewall_driver = openvswitch enable_security_group = true enable_ipset = true @agent: # cat /etc/neutron/plugins/ml2/openvswitch_agent.ini |egrep -v "^$|^#" [DEFAULT] verbose = true [agent] tunnel_types = vxlan l2_population = true arp_responder = true extensions = qos [ovs] local_ip = 10.150.0.5 bridge_mappings = provider:br-ex [securitygroup] firewall_driver = openvswitch enable_security_group = true enable_ipset = true [xenapi] Thank you. -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From flux.adam at gmail.com Mon Nov 11 18:09:03 2019 From: flux.adam at gmail.com (Adam Harwell) Date: Mon, 11 Nov 2019 10:09:03 -0800 Subject: [octavia][ptg] Summary of Shanghai PTG Discussion Message-ID: Fellow Octavians, We covered a lot of ground during this PTG, met a number of new folks, and got a lot of valuable feedback. I'll do my best to summarize here what was discussed. 1. Metrics 1. It would be nice to expose metrics for pools/members, though we would like to get a better understanding of the requirements / use-cases. 2. We should publish metrics using some mechanism (plugin/driver). 1. The default would be "database" and would handle the existing API-exposed metrics. 2. Additional drivers would be loaded in parallel, and might include Monasca/Ceilometer/Prometheus drivers. 3. We will switch our metrics internally to use a delta system instead of absolute values from HAProxy. This will allow us to publish in a more sane way in the future. This would not change the way metrics are exposed in the existing API. 2. Notifications 1. We will need to create a spec and gather community feedback. 2. Initial observation indicates the need for two general paths, which will most likely have their own driver systems: 1. provisioning_status changes (including all create/update/delete events) 2. operating_status changes (member up/down, etc) 3. We would provide the entire object in the notification, similar to what other services do. 4. Most likely the default driver(s) would use oslo.notify. 3. Availability Zone Support (Multi-Zone Fault Tolerance) 1. Make at least a story for tracking this, if it doesn't already exist. 2. Allow a single LB to have amphorae in multiple zones. 3. Existing patch: https://review.opendev.org/#/c/558962/ 4. Availability Zone Support (Compute AZ Awareness) 1. Make at least a story for tracking this, if it doesn't already exist. 2. Allow placing LBs in specific zones: 1. When zones are geographically separated, LBs should exist in the same zone as the members they support 2. When zones are logically separated (like PCI compliance zones, etc), users may need to place them specifically. 3. A new parameter `availability_zone` will be added to the LB create API. It will allow the user to select which Octavia AZ to use. 4. A new API section will be added for creating/configuring/listing Octavia AZs. This will allow a linkage between Compute AZs and Amphora Management Network, along with other possible options in the future. Admins can create/update, and users can list zones. 5. Update clients to support this, including further polluting the `openstack availability zone list` command to include `--loadbalancers` zones. 5. Python 2 EOL 1. Remove all jobs that test Python 2 (or update them if they're not duplicates). 2. Remove six compatibility code, which should simplify string handling significantly. 6. More Flavor Capabilities 1. Image Tag (to allow different amp images per flavor) 2. Availability Zone (to allow compute AZ pinning) 3. Management Network (to go with compute AZ) 4. Metadata (allow passing arbitrary metadata through to compute) 7. TLS Protocol/Cipher API Support 1. Allow users to select specific protocols/ciphers as a whitelist. 2. Stories: 1. Ciphers: https://storyboard.openstack.org/#!/story/2006627 2. Protocols: https://storyboard.openstack.org/#!/story/2006733 8. Performance Tuning 1. HAProxy: There are a number of knobs/dials that can be adjusted to make HAProxy behave more efficiently. Some of these that we could look at more are around TLS options, and around multiprocessing/threading. The latter will probably need to wait for us to switch to HAProxy 2.0. 2. Image Metadata: There are flags that could be added to our amphora image's metadata that might improve performance. To be further researched. 9. Testing 1. Team to evaluate existing non-voting jobs for promotion to voting. 1. Agreement was made with the Barbican team to promote both side's co-gating jobs to voting. 2. Team to evaluate merging or pruning some jobs to reduce the overall set that run on each change. 3. Grenade needs a few changes: 1. Switch to python3. 2. Upgrade to Zuul v3. 3. Test additional operations on existing LBs (old amp image), not just traffic. 4. Test more than just the most recent amphora image against the current control-plane code. Use periodic jobs for this. 5. Fix the Zuul grafana dashboard for Octavia test history. 10. Jobboard 1. This continues to be a priority for Ussuri. 2. Put together a priority list of patches specifically for jobboard. 11. HAProxy 2.0 1. A number of features are gated behind the new version, including multi-process and HTTP/2 support. 2. Need to reach out to distributions to push for backports (to cloudarchive for Ubuntu, and whatever similar thing for CentOS). 3. Possibly add an element to allow building new versions from source. 4. Perform version-based validation of options on the API-side of the amphora driver. 1. Inspect and cache Glance metadata for the LB's amphora image to get version data. 2. Provide the metadata string for the operator from our disk-image-create script. The full etherpad from the PTG including the notes I've summarized here is available at https://etherpad.openstack.org/p/octavia-shanghai-U-ptg if further review is desired. Thanks to everyone who participated, and best of luck on this (hopefully) productive new cycle! --Adam Harwell -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Mon Nov 11 19:05:55 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 12 Nov 2019 03:05:55 +0800 Subject: [nova][ptg] drop the shadow table concept In-Reply-To: <1573403387.31166.6@est.tech> References: <1573403387.31166.6@est.tech> Message-ID: On Mon, Nov 11, 2019 at 12:33 AM Balázs Gibizer wrote: > > CERN reported two issues with archive_deleted_rows CLI: > * When one record gets inserted into the shadow_instance_extra but > didn't get deleted from instance_extra (I know this is in a single > transaction but sometimes it happens), needs manual cleanup on the > database > * Also there could be two cells running this command at the same time > fighting for the API db lock, > > > TODOs: > * tssurya to report bugs / improvements on archive_deleted_rows CLI > based on CERN's experience with long table locking > * mnaser to report a wishlist bug / specless bp about one step db purge > CLI which would skip the shadow tables I did my homework: https://bugs.launchpad.net/nova/+bug/1852121 I don't think I have time currently to iterate and work on it right now, but at least it's documented. > Cheers, > gibi > > > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. https://vexxhost.com From zbitter at redhat.com Mon Nov 11 19:32:20 2019 From: zbitter at redhat.com (Zane Bitter) Date: Mon, 11 Nov 2019 14:32:20 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> On 7/11/19 2:11 pm, Corey Bryant wrote: > Hello TC members, > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's > too late to enable voting py38 unit tests for ussuri, I'd like to at > least enable non-voting py38 unit tests. This email is seeking approval > and direction from the TC to move forward with enabling non-voting py38 > tests. I was a bit fuzzy on this myself, so I looked it up and this is what the TC decided when we passed the resolution: > If the new Zuul template contains test jobs that were not in the previous one, the goal champion(s) may choose to update the previous template to add a non-voting check job (or jobs) to match the gating jobs in the new template. This means that all repositories that have not yet converted to the template for the upcoming release will see a non-voting preview of the new job(s) that will be added once they update. If this option is chosen, the non-voting job should be limited to the master branch so that it does not run on the preceding release’s stable branch. (from https://governance.openstack.org/tc/resolutions/20181024-python-update-process.html#unit-tests ) So to follow that process we would need to define the python versions for V, then appoint a goal champion, and after that it would be at the champion's discretion to add a non-voting job on master in Ussuri. I happened to be sitting next to Sean when I saw this thread, and after discussing it with him I think he would OK with having a non-voting job on every commit, since it's what we have documented. Previous discussions established that the overhead of adding one Python unit test job to every project was pretty inconsequential (we'll offset it by dropping 2.7 jobs anyway). I submitted a draft governance patch defining the Python versions for V (https://review.opendev.org/693743). Unfortunately we can't merge it yet because we don't have a release name for V (Sean is working on that: https://review.opendev.org/693266). It's gazing in the crystal ball a little bit, but even if for some reason Ubuntu 20.04 is not released before the V cycle starts, it's inevitable that we will be selecting Python 3.8 because it meets the first criterion ("The latest released version of Python 3 that is available in any distribution we can feasibly use for testing") - 3.8 is released and it's available in Ubuntu 18.04, which is the distro we use for testing anyway. So, in my opinion, if you're volunteering to be the goal champion then there's no need for any further approval by the TC ;) I guess to make that official we should commit the python3 update Goal for the V cycle now... or at least as soon as we have a release name. This is happening a little earlier than I think we anticipated but, given that there's no question what is going to happen in V, I don't think we'd be doing anybody any favours by delaying the process unnecessarily. > For some further background: The next release of Ubuntu, Focal (20.04) > LTS, is scheduled to release in April 2020. Python 3.8 will be the > default in the Focal release, so I'm hopeful that non-voting unit tests > will help close some of the gap. > > I have a review here for the zuul project template enablement for ussuri: > https://review.opendev.org/#/c/693401 > > Also should this be updated considering py38 would be non-voting? > https://governance.openstack.org/tc/reference/runtimes/ussuri.html No, I don't think this changes anything for Ussuri. It's preparation for V. cheers, Zane. From melwittt at gmail.com Mon Nov 11 19:37:30 2019 From: melwittt at gmail.com (melanie witt) Date: Mon, 11 Nov 2019 11:37:30 -0800 Subject: [nova][ptg] drop the shadow table concept In-Reply-To: References: <1573403387.31166.6@est.tech> Message-ID: On 11/11/19 08:50, melanie witt wrote: > On 11/10/19 12:41, Matt Riedemann wrote: >> On 11/10/2019 10:29 AM, Balázs Gibizer wrote: >>> * Also there could be two cells running this command at the same time >>> fighting for the API db lock, >> >> In Train the --all-cells option was added to the CLI so that should >> resolve this issue. I think Mel said she backported those changes >> internally so I'm not sure how hard it would be for those to go back >> to Stein or Rocky or whatever release CERN is using now. > > That's correct, I backported --all-cells [1][2][3][4] to Stein, Rocky, > and Queens downstream. I found it not to be easy but YMMV. > > The primary conflicts in Stein were with --before, so I went ahead and > brought those patches back as well [5][6][7] since we also needed > --before to help people avoid the "orphaned virt guests if archive runs > while nova-compute is down" problem. > > Same deal for Rocky. > > And finally with Queens, there's an additional conflict around deleting > instance group members [8], so I also brought that back because it's > related to all of the database cleanup issues that support has > repeatedly faced with customers. Sorry, I have to be pedantic and amend the info about Queens ^ to add that --purge [9][10][11] was another conflict in Queens that I also backported because we had a separate request open by support for that as well anyway. > Hope this helps anyone considering backporting --all-cells. > > Cheers, > -melanie > > [1] https://review.opendev.org/675218 > [2] https://review.opendev.org/675209 > [3] https://review.opendev.org/675205 > [4] https://review.opendev.org/507486 > [5] https://review.opendev.org/661289 > [6] https://review.opendev.org/556751 > [7] https://review.opendev.org/643779 > [8] https://review.opendev.org/598953 [9] https://review.opendev.org/550171 [10] https://review.opendev.org/550182 [11] https://review.opendev.org/550502 From mriedemos at gmail.com Mon Nov 11 19:40:52 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 11 Nov 2019 13:40:52 -0600 Subject: [nova][ptg] drop the shadow table concept In-Reply-To: References: <1573403387.31166.6@est.tech> Message-ID: <6c497067-70ad-5a47-c432-995855917989@gmail.com> On 11/11/2019 1:05 PM, Mohammed Naser wrote: > I did my homework: > > https://bugs.launchpad.net/nova/+bug/1852121 > > I don't think I have time currently to iterate and work on it right > now, but at least it's documented. I commented in the bug and, without more details, I don't see how it's really worth the trouble of refactoring the archive/purge code to deal with this optimization but I can probably be proven wrong. -- Thanks, Matt From tidwellrdev at gmail.com Mon Nov 11 19:47:15 2019 From: tidwellrdev at gmail.com (Ryan Tidwell) Date: Mon, 11 Nov 2019 13:47:15 -0600 Subject: BGP dynamic routing In-Reply-To: References: <2e0d4ab1-824c-b605-91ad-93536044ae69@gmx.com> <4016b2a9-f3f4-3e4e-211f-4d6c1a63dc3e@gmx.com> Message-ID: At the moment neutron-dynamic-routing does not support receiving routes from its peers. If you look at the code, you'll see that the BGP will handle any route updates it gets from a peer by simply invoking a no-op routine that logs an info message [1]. You're not the first one to ask the question, so if you can express a solid use case I think an RFE could be crafted to support you. I just haven't seen the use case expressed by anyone yet, but that's not to say it doesn't exist. -Ryan Tidwell [1] https://github.com/openstack/neutron-dynamic-routing/blob/master/neutron_dynamic_routing/services/bgp/agent/driver/os_ken/driver.py#L40 On Mon, Nov 4, 2019 at 10:38 AM Donny Davis wrote: > To be honest I only use it for the use case I listed before, so beyond > that I am not going to be much help. > > However.. they are both speaking bgp I would imagine that it works the > same way as any bgp instance. > > Give it a whirl and let us know how it works out. :) > > On Mon, Nov 4, 2019 at 11:28 AM Volodymyr Litovka wrote: > >> Hi Donny, >> >> the question if I have few peers to few PoPs, everyone with own set of >> prefixes and need to import these external prefixes INTO the tenant. >> >> >> On 04.11.2019 17:08, Donny Davis wrote: >> >> The way I use it is to dynamically advertise my tenant networks to the >> edge. The edge router still handles routes in the rest of my infra. >> >> Works pretty well for me. >> >> Donny Davis >> c: 805 814 6800 >> >> On Mon, Nov 4, 2019, 6:52 AM Volodymyr Litovka wrote: >> >>> Dear colleagues, >>> >>> "BGP dynamic routing" doc >>> ( >>> https://docs.openstack.org/neutron/rocky/admin/config-bgp-dynamic-routing.html >>> ) >>> says only about advertisement of routes: "BGP dynamic routing enables >>> advertisement of self-service (private) network prefixes to physical >>> network devices that support BGP such as routers, thus removing the >>> conventional dependency on static routes." and nothing about receiving >>> of routes from external peers. >>> >>> Whether it is ever possible using Neutron to have fully dynamic routing >>> inside the project, both advertising/receiving (and updating VRs >>> configuration) routes to/from remote peers? >>> >>> Thank you. >>> >>> -- >>> Volodymyr Litovka >>> "Vision without Execution is Hallucination." -- Thomas Edison >>> >>> >>> >> -- >> Volodymyr Litovka >> "Vision without Execution is Hallucination." -- Thomas Edison >> >> > > -- > ~/DonnyD > C: 805 814 6800 > "No mission too difficult. No sacrifice too great. Duty First" > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Mon Nov 11 19:54:02 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 11 Nov 2019 13:54:02 -0600 Subject: [nova][cinder] Volume-backed instance disks and some operations that do not support those yet In-Reply-To: References: Message-ID: <89de51c7-b489-7df3-1bb5-7424ee8a9542@gmail.com> On 11/11/2019 10:06 AM, Peter Penchev wrote: > There seem to still be some quirks with Nova and volume-backed instance > disks; some actions on instances are not allowed, others produce > somewhat weird results. From a quick look at the code it seems to me > that currently these are: > - taking a snapshot of an instance (produces a zero-sized file, no real > data backed up) Volume-backed instance snapshot is supported [1]. It creates a volume snapshot in cinder and then links that to the glance image via metadata. If you boot a server from that image snapshot it's boot-from-volume under the covers, what is sometimes referred to as an image-defined block device mapping. Tempest also has a scenario test for this [2]. > - backing an instance up (refuses outright) Yeah not supported and not really necessary to support. The createBackup API is essentially frozen since it's just orchestration over the existing createImage API and could all be done via external tooling so it's not really a priority to make that a more feature rich API. We've even talked about deprecating createBackup just to get people to stop using it. > - rescuing an instance (refuses outright) Yeah, not supported, but there have been specs [3][4]. > ...and maybe there are some that I've missed. Rebuilding a volume-backed server is another big one. There was actually agreement on how to do this between nova and cinder [5][6], the cinder implementation was code up and being reviewed, but the nova side lagged and was eventually abandoned. So that could be picked up again if someone was willing to invest the time in it. [1] https://github.com/openstack/nova/blob/20.0.0/nova/compute/api.py#L3031 [2] https://github.com/openstack/tempest/blob/22.1.0/tempest/scenario/test_volume_boot_pattern.py#L210 [3] https://review.opendev.org/#/c/651151/ [4] https://review.opendev.org/#/c/532410/ [5] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/volume-backed-server-rebuild.html [6] https://blueprints.launchpad.net/cinder/+spec/add-volume-re-image-api -- Thanks, Matt From mriedemos at gmail.com Mon Nov 11 20:01:48 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 11 Nov 2019 14:01:48 -0600 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> Message-ID: <7f98ad40-bcf0-bb85-da58-d6a7a1f71705@gmail.com> On 11/11/2019 7:03 AM, Chris Dent wrote: > Or using > separate processes? For the ironic and vsphere contexts, increased > CPU usage by the nova-compute process does not impact on the > workload resources, so parallization is likely a good option. I don't know how much it would help - someone would have to actually test it out and get metrics - but one easy win might just be using a thread or process executor pool here [1] so that N compute nodes could be processed through the update_available_resource periodic task concurrently, maybe $ncpu or some factor thereof. By default make it serialized for backward compatibility and non-ironic deployments. Making that too highly concurrent could have negative impacts on other things running on that host, like the neutron agent, or potentially storming conductor/rabbit with a ton of DB requests from that compute. That doesn't help with the scenario that the big COMPUTE_RESOURCE_SEMAPHORE lock is held by the periodic task while spawning, moving, or deleting an instance that also needs access to the big lock to update the resource tracker, but baby steps if any steps in this area of the code would be my recommendation. [1] https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629 -- Thanks, Matt From dharmendra.kushwaha at gmail.com Tue Nov 12 07:21:16 2019 From: dharmendra.kushwaha at gmail.com (Dharmendra Kushwaha) Date: Tue, 12 Nov 2019 12:51:16 +0530 Subject: [tacker] No IRC meeting today Message-ID: Hello Taker team, As we have PTG in last week, lets skip today's weekly meeting. Thanks & Regards Dharmendra Kushwaha -------------- next part -------------- An HTML attachment was scrubbed... URL: From luyao.zhong at intel.com Tue Nov 12 05:46:13 2019 From: luyao.zhong at intel.com (Zhong, Luyao) Date: Tue, 12 Nov 2019 05:46:13 +0000 Subject: [nova] track error migrations and orphans in Resource Tracker Message-ID: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> Hi Nova experts, "Not tracking error migrations and orphans in RT." is probably a bug. This may trigger some problems in update_available_resources in RT at the moment. That is some orphans or error migrations are using cpus/memory/disk etc, but we don't take these usage into consideration. And instance.resources is introduced from Train used to contain specific resources, we also track assigned specific resources in RT based on tracked migrations and instances. So this bug will also affect the specific resources tracking. I draft an doc to clarify this bug and possible solutions: https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT Looking forward to suggestions from you. Thanks in advance. Best Regards, Luyao -------------- next part -------------- An HTML attachment was scrubbed... URL: From doka.ua at gmx.com Tue Nov 12 07:38:08 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Tue, 12 Nov 2019 09:38:08 +0200 Subject: BGP dynamic routing In-Reply-To: References: <2e0d4ab1-824c-b605-91ad-93536044ae69@gmx.com> <4016b2a9-f3f4-3e4e-211f-4d6c1a63dc3e@gmx.com> Message-ID: <7440b104-20c5-7473-c151-06f66b904731@gmx.com> Hi Ryan, thanks for the reply. To be frank, I can't come up with some general use cases for such RFE. I'm solving particular problem, connecting remote premises over VPN to the cloud tenant and is able to combine BGP on VPN concentrator and static routes inside tenant. The question was like "what if supported? It will be convenient." I appreciate your efforts and thanks again for the answer. On 11.11.2019 21:47, Ryan Tidwell wrote: > At the moment neutron-dynamic-routing does not support receiving > routes from its peers. If you look at the code, you'll see that the > BGP will handle any route updates it gets from a peer by simply > invoking a no-op routine that logs an info message [1]. You're not the > first one to ask the question, so if you can express a solid use case > I think an RFE could be crafted to support you. I just haven't seen > the use case expressed by anyone yet, but that's not to say it doesn't > exist. > > -Ryan Tidwell > > [1] > https://github.com/openstack/neutron-dynamic-routing/blob/master/neutron_dynamic_routing/services/bgp/agent/driver/os_ken/driver.py#L40 > > On Mon, Nov 4, 2019 at 10:38 AM Donny Davis > wrote: > > To be honest I only use it for the use case I listed before, so > beyond that I am not going to be much help. > > However.. they are both speaking bgp I would imagine that it works > the same way as any bgp instance. > > Give it a whirl and let us know how it works out. :) > > On Mon, Nov 4, 2019 at 11:28 AM Volodymyr Litovka > wrote: > > Hi Donny, > > the question if I have few peers to few PoPs, everyone with > own set of prefixes and need to import these external prefixes > INTO the tenant. > > > On 04.11.2019 17:08, Donny Davis wrote: >> The way I use it is to dynamically advertise my tenant >> networks to the edge. The edge router still handles routes in >> the rest of my infra. >> >> Works pretty well for me. >> >> Donny Davis >> c: 805 814 6800 >> >> On Mon, Nov 4, 2019, 6:52 AM Volodymyr Litovka >> > wrote: >> >> Dear colleagues, >> >> "BGP dynamic routing" doc >> (https://docs.openstack.org/neutron/rocky/admin/config-bgp-dynamic-routing.html) >> says only about advertisement of routes: "BGP dynamic >> routing enables >> advertisement of self-service (private) network prefixes >> to physical >> network devices that support BGP such as routers, thus >> removing the >> conventional dependency on static routes." and nothing >> about receiving >> of routes from external peers. >> >> Whether it is ever possible using Neutron to have fully >> dynamic routing >> inside the project, both advertising/receiving (and >> updating VRs >> configuration) routes to/from remote peers? >> >> Thank you. >> >> -- >> Volodymyr Litovka >>    "Vision without Execution is Hallucination." -- Thomas >> Edison >> >> > > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison > > > > -- > ~/DonnyD > C: 805 814 6800 > "No mission too difficult. No sacrifice too great. Duty First" > -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Tue Nov 12 08:42:45 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 12 Nov 2019 09:42:45 +0100 Subject: [Neutron] OVS forwarding issues In-Reply-To: References: Message-ID: <20191112084245.hjz3zrwbrdw64okd@skaplons-mac> Hi, If You are using ovs firewall driver, it is known issue there. See bug [1] for details. There is proposal how to fix it in [2] but it's not perfect and still require some more work to do. [1] https://bugs.launchpad.net/neutron/+bug/1732067 [2] https://bugs.launchpad.net/neutron/+bug/1841622 On Mon, Nov 11, 2019 at 07:06:43PM +0200, Volodymyr Litovka wrote: > Dear colleagues, > > just faced an issue with Openvswitch, which looks strange for me. The > problem is that any particular VM receives a lot of packets, which are > unicasted: > - from other VMs which reside on the same host (let's name them "local VMs") > - to other VMs which reside on other hosts (let's name them "remote VMs") > > Long output from "ovs-ofctl dump-flows br-int" which, as far as I can > narrow, ends there: > > # ovs-ofctl dump-flows br-int |grep " table=94," |egrep > "n_packets=[123456789]" >  cookie=0xaf6b1435fe826bdf, duration=2952350.695s, table=94, > n_packets=291494723, n_bytes=40582103074, idle_age=0, hard_age=65534, > priority=1 actions=NORMAL > > coming to normal processing (classic MAC learning). Looking into br-int > MAC-table (ovs-appctl fdb/show br-int) shows, that there are really no > MAC addresses of remote VMs and br-int behaves in the right way, > flooding unknown unicast to all ports in this L2 segment. > > Of course, there is br-tun which connected over vxlan to all other hosts > and to br-int: > >     Bridge br-tun >         Controller "tcp:127.0.0.1:6633" >             is_connected: true >         fail_mode: secure >         Port "vxlan-0a960008" >             Interface "vxlan-0a960008" >                 type: vxlan >                 options: {df_default="true", in_key=flow, > local_ip="10.150.0.5", out_key=flow, remote_ip="10.150.0.8"} >         [ ... ] >         Port br-tun >             Interface br-tun >                 type: internal >         Port patch-int >             Interface patch-int >                 type: patch >                 options: {peer=patch-tun} > > but MAC table on br-tun is empty as well: > > # ovs-appctl fdb/show br-tun >  port  VLAN  MAC                Age > # > > Finally, packets get to destination, while being copied to all ports on > source host, which is serious security issue. > > I do not think so conceived by design, I rather think we missed > something in configuration. Can anybody point me where we're wrong and > help with this issue? > > We're using Openstack Rocky and OVS 2.10.0 under Ubuntu 16.04. Network > configuration is: > > @controller: > # cat /etc/neutron/plugins/ml2/ml2_conf.ini |egrep -v "^$|^#" > [DEFAULT] > verbose = true > [ml2] > type_drivers = flat,vxlan > tenant_network_types = vxlan > mechanism_drivers = l2population,openvswitch > extension_drivers = port_security,qos,dns_domain_ports > [ml2_type_flat] > flat_networks = provider > [ml2_type_geneve] > [ml2_type_gre] > [ml2_type_vlan] > [ml2_type_vxlan] > vni_ranges = 400:400000 > [securitygroup] > firewall_driver = openvswitch > enable_security_group = true > enable_ipset = true > > @agent: > # cat /etc/neutron/plugins/ml2/openvswitch_agent.ini |egrep -v "^$|^#" > [DEFAULT] > verbose = true > [agent] > tunnel_types = vxlan > l2_population = true > arp_responder = true > extensions = qos > [ovs] > local_ip = 10.150.0.5 > bridge_mappings = provider:br-ex > [securitygroup] > firewall_driver = openvswitch > enable_security_group = true > enable_ipset = true > [xenapi] > > Thank you. > > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison > -- Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Tue Nov 12 10:27:29 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 12 Nov 2019 11:27:29 +0100 Subject: [infra] Etherpad problem In-Reply-To: <6c3b8ca3-d2d4-7eb4-d615-5396eaf1cdef@gmail.com> References: <20191109021607.ffm5lkwkfohl22fq@skaplons-mac> <6c3b8ca3-d2d4-7eb4-d615-5396eaf1cdef@gmail.com> Message-ID: <20191112102729.fggiccohpwpqd3he@skaplons-mac> Hi, Thx Brian. I used Your backup etherpad to "restore" everything in [1] [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored On Sun, Nov 10, 2019 at 12:31:36AM +0800, Brian Haley wrote: > On 11/9/19 10:16 AM, Slawek Kaplonski wrote: > > Hi, > > > > Just at the end of the ptg sessions, neutron etherpad was got broken somehow. > > Now when I try to open [1] I see only something like: > > > > An error occurred > > The error was reported with the following id: 'igzOahZ6ruH0eSUAWKaj' > > > > Please press and hold Ctrl and press F5 to reload this page, if the problem > > persists please send this error message to your webmaster: > > 'ErrorId: igzOahZ6ruH0eSUAWKaj > > URL: https://etherpad.openstack.org/p/Shanghai-Neutron-Planning > > UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:70.0) Gecko/20100101 > > Firefox/70.0 > > TypeError: r.dropdowns is undefined in > > https://etherpad.openstack.org/javascripts/lib/ep_etherpad-lite/static/js/pad.js?callback=require.define > > at line 18' > > > > > > We can open one of the previous versions which is available at [2] but I don't > > know how we can fix original etherpad or restore version from [2] to be original > > etherpad and make it working again. > > Can someone from infra team check that for us maybe? > > Thx in advance for any help. > > > > [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning > > [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning/timeslider#16117 > > Hi Slawek, > > When I just went to check this etherpad now, noticed I had a tab open that > was in "Force reconnect" state. I made a copy of that, just might be a > little out of date on the last items. The formatting is also a little odd, > but at least it's better than nothing if we can't get the original back. > > https://etherpad.openstack.org/p/neutron-ptg-temp > > -Brian > -- Slawek Kaplonski Senior software engineer Red Hat From sbauza at redhat.com Tue Nov 12 10:29:20 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 11:29:20 +0100 Subject: [nova][ptg] Resurrecting NUMA topology in Placement Message-ID: We discussed about a long known story https://review.openstack.org/#/c/552924/ The whole agreement during the PTG was to keep things simple with baby steps : - only supporting a few NUMA queries and defer others as unsupported (still supported by legacy NUMATopologyFilter) - The to-be-resurrected spec would be only focus on VCPU/PCPU/MEMORY_MB resource classes and not handle PCI or GPU devices (ie. level-1 tree, no children under the NUMA RPs) Agreement was also there for saying that functional tests should be enough since PlacementFixture already works perfectly. -S -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Tue Nov 12 10:33:52 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 11:33:52 +0100 Subject: [nova][ptg] Support re-configure deleted_on_termination in server Message-ID: Spec is https://review.opendev.org/#/c/580336/ Most people seem to think this makes sense but realize there are already other ways to do this (snapshot) and therefore it's not totally necessary. The agreement in the room was to post the code up for the change, as this will help sell people on it if it's trivial enough and document the use case (i.e. are there scenarios where this would make life 10x easier?) -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Tue Nov 12 10:51:16 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 11:51:16 +0100 Subject: [nova][ptg] PCI refactoring needs and a strawman proposal inside Message-ID: Based on some Forum discussions, we had a de facto conversation at the Nova PTG about any potential things we could do for helping our operators, but also potentially Cyborg since they use the PCI passthrough capabilities. The feedback from ops (thanks mnaser) was that PCI passthrough works pretty smoothly but there are some ugly issues where we could improve the UX. The agreement in the room was, at least for Ussuri, to start collecting some ideas on how we could model Placement usage for PCI devices and also write motivations for such things in a spec. We also agreed on a smooth upgrade plan where PCITracker would still be present for a couple of releases until we're able to close the feature gap. Bauzas volunteered for drafting the spec and gibi, stephenfin and bauzas started to sketch up the Placement modeling on a whiteboard, picture to be shared in a follow-up. -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Tue Nov 12 13:18:35 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 12 Nov 2019 13:18:35 +0000 Subject: [nova] track error migrations and orphans in Resource Tracker In-Reply-To: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> References: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> Message-ID: On Tue, 2019-11-12 at 05:46 +0000, Zhong, Luyao wrote: > Hi Nova experts, > > "Not tracking error migrations and orphans in RT." is probably a bug. This may trigger some problems in > update_available_resources in RT at the moment. That is some orphans or error migrations are using cpus/memory/disk > etc, but we don't take these usage into consideration. And instance.resources is introduced from Train used to contain > specific resources, we also track assigned specific resources in RT based on tracked migrations and instances. So this > bug will also affect the specific resources tracking. > > I draft an doc to clarify this bug and possible solutions: > https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT > Looking forward to suggestions from you. Thanks in advance. > there are patche up to allow cleaning up orpahn instances https://review.opendev.org/#/c/627765/ https://review.opendev.org/#/c/648912/ if we can get those merged that woudl adress at least some of the proablem > Best Regards, > Luyao From smooney at redhat.com Tue Nov 12 13:26:14 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 12 Nov 2019 13:26:14 +0000 Subject: [nova][ptg] Resurrecting NUMA topology in Placement In-Reply-To: References: Message-ID: On Tue, 2019-11-12 at 11:29 +0100, Sylvain Bauza wrote: > We discussed about a long known story > https://review.openstack.org/#/c/552924/ > > The whole agreement during the PTG was to keep things simple with baby > steps : > - only supporting a few NUMA queries and defer others as unsupported (still > supported by legacy NUMATopologyFilter) > - The to-be-resurrected spec would be only focus on VCPU/PCPU/MEMORY_MB > resource classes and not handle PCI or GPU devices (ie. level-1 tree, no > children under the NUMA RPs) > > Agreement was also there for saying that functional tests should be enough > since PlacementFixture already works perfectly. we can now do numa testing in the gate so we can also add tempest testing this cycle. artom has recently gotten whitebox to run (more work to do https://review.opendev.org/#/c/691062/) and i do want to get my multi numa nfv testing job https://review.opendev.org/#/c/679656/ at least in experimental and perodic pipelines. i would like it to be in check eventually but baby steps. i dont think these should be a blocker for any of this work but i think we shoudl take advantage of them to contiue to improve the numa testing in gate. > > -S From smooney at redhat.com Tue Nov 12 13:29:32 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 12 Nov 2019 13:29:32 +0000 Subject: [nova][ptg] PCI refactoring needs and a strawman proposal inside In-Reply-To: References: Message-ID: <35cff7ea13a85bde43a4626c84b6bc130eb67110.camel@redhat.com> On Tue, 2019-11-12 at 11:51 +0100, Sylvain Bauza wrote: > Based on some Forum discussions, we had a de facto conversation at the Nova > PTG about any potential things we could do for helping our operators, but > also potentially Cyborg since they use the PCI passthrough capabilities. > > The feedback from ops (thanks mnaser) was that PCI passthrough works pretty > smoothly but there are some ugly issues where we could improve the UX. > > The agreement in the room was, at least for Ussuri, to start collecting > some ideas on how we could model Placement usage for PCI devices and also > write motivations for such things in a spec. > We also agreed on a smooth upgrade plan where PCITracker would still be > present for a couple of releases until we're able to close the feature gap. > > Bauzas volunteered for drafting the spec and gibi, stephenfin and bauzas > started to sketch up the Placement modeling on a whiteboard, picture to be > shared in a follow-up. i have a list of things i want to enhance related to pci/sriov so i would be interested in this topic too. it might be worth consiering a SIG on this topic if it will be cross project. From smooney at redhat.com Tue Nov 12 13:29:53 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 12 Nov 2019 13:29:53 +0000 Subject: [nova][ptg] Resurrecting NUMA topology in Placement In-Reply-To: References: Message-ID: On Tue, 2019-11-12 at 13:26 +0000, Sean Mooney wrote: > On Tue, 2019-11-12 at 11:29 +0100, Sylvain Bauza wrote: > > We discussed about a long known story > > https://review.openstack.org/#/c/552924/ > > > > The whole agreement during the PTG was to keep things simple with baby > > steps : > > - only supporting a few NUMA queries and defer others as unsupported (still > > supported by legacy NUMATopologyFilter) > > - The to-be-resurrected spec would be only focus on VCPU/PCPU/MEMORY_MB > > resource classes and not handle PCI or GPU devices (ie. level-1 tree, no > > children under the NUMA RPs) > > > > Agreement was also there for saying that functional tests should be enough > > since PlacementFixture already works perfectly. > > we can now do numa testing in the gate so we can also add tempest testing this cycle. > artom has recently gotten whitebox to run (more work to do https://review.opendev.org/#/c/691062/) > and i do want to get my multi numa nfv testing job https://review.opendev.org/#/c/679656/ at least > in experimental and perodic pipelines. i would like it to be in check eventually but baby steps. > > i dont think these should be a blocker for any of this work but i think we shoudl take advantage > of them to contiue to improve the numa testing in gate. by gate i ment ci/ > > > > > -S > > From skaplons at redhat.com Tue Nov 12 13:53:11 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 12 Nov 2019 14:53:11 +0100 Subject: [ptg][neutron] Ussuri PTG summary Message-ID: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> Hi Neutron team, First if all thank to all of You for great and very productive week during the PTG in Shanghai. Below is summary of our discussions from whole 3 days. If I forgot about something, please respond to the email and update missing informations. But if You want to have follow up discussion about one of the topics from this summary, please start a new thread to keep this one only as high level summary of the PTG. On boarding =========== Slides from onboarding session can be found at [1] If You have any follow up questions to us about onboarding, or You need help with starting any work in Neutron team, please contact me or Miguel Lavalle by email or on IRC. My IRC nick is slaweq and Miguel's nick is mlavalle. We are available on #openstack-neutron channel @freenode. Train retrospective =================== Good things in Train cycle: * working with this team is still good experience * core team is stable, and we didn't lost any core reviewers during the cycle, * networking is still one of key reasons why people use OpenStack Not good things: * dimished vitality in stadium projects - we had also forum session and follow discussion about this later during the PTG, * gate instability - we have seen many issues which were out of our control, like infra problems, grenade jobs failures, other projects failures, but also many bugs on our side, * we have really a lot of jobs in our check/gate queue. If each of them is failing 5% of times, it's hard to merge any patch as almost every time, one of jobs will fail. Later during the PTG we also discussed that topic and we were looking for some jobs which we maybe can potentially drop from our queues. See below for summary about that, Action items/improvements: * many team meetings each week. We decided to limit number of meetings by: ** consolidate performance subteam meeting into weekly team meeting - this topic will be added to the team meeting's agenda for team meetings on Monday, ** consolidate ovn convergence meeting into weekly team meeting - this topic will be added to the team meeting's agenda for team meetings on Tuesday, ** we need to check if QoS subteam meeting is still needed, * Review process: list of actual review priorities would be useful for the team, we will add "Review-Priority" label to the Neutron reviews board and try to use it during the Ussuri cycle. Openvswitch agent enhancements ============================== We had bunch of topics related to potential improvements for neutron-openvswitch-agent proposed mostly by Liu Yulong. Slides with his proposals are available at [2]. * retire DHCP agent - resyncs of DHCP agent are problematic, especially when agent hosts many networks. Proposal was to add new L2 agent's extension which could be used instead of "regular" DHCP agent and to provide only basic DHCP functionalities. Such solutions would work in the way quite similar to how networking-ovn works today but we would need to implement and maintain own dhcp server application. Problems of this solution are: ** problems with compatibility e.g. with Ironic, ** how it would work with mixed deployments, e.g. with ovs and sriov agents, ** support for dhcp options, Advantages of this solution: ** fully distributed DHCP service, ** no DHCP agents, so less RPC messages on the bus and easier maintanance of the agents, Team's feedback for that is that this is potentially nice solution which may helps in some specific, large scale deploymnets. We can continue discussion about this during Ussuri cycle for sure. * add accepted egress fdb flows We agreed that this is a bug and we should continue work on this to propose some way to fix it. Solution proposed by LIU during this discussion wasn't good as it could potentially break some corner cases. * new API and agent for L2 traffic health check The team asked to add to the spec some more detailed and concrete use cases with explanation how this new API may help operator of the cloud to investigate where the problem actually is. * Local flows cache and batch updating The team agreed that as long as this will be optional solution which operator can opt-in we can give it a try. But spec and discuss details there will be necessary. * stop processing ports twice in ovs-agent We all agreed that this is a bug and should be fixed. But we have to be careful as fixing this bug may cause some other problems e.g. with live-migration - see nova-neutron cross project session. * ovs-agent: batch flow updates with --bundle We all agreed that this can be done as an improvement of existing code. Similar option is already used in openvswitch firewall driver. Neutron - Cyborg cross project session ====================================== Etherpad for the session is at [3]. Cyborg team wants to include Neutron in workflow of spawning VMs with Smart NICs or accelerator cards. From Neutron's side, required change is to allow including "accel" data in port binding profile. As long as this will be well documented what can be placed there, there should be no problem with doing that. Technically we can place almost anything there. Neutron - Kuryr cross project session ===================================== Etherpad for the session is at [4]. Kuryr team proposed 4 improvements for Neutron which would help a lot Kuryr. Ideas are: * Network cascade deletion, * Force subport deletion, * Tag resources at creation time, * Security group creation with rules & bulk security group rule creation All of those ideas makes sense for Neutron team. Tag resources at creation time is even accepted rfe already - see [5] but there was no volunteer to implement it. We will add it to list of our BPs tracked weekly on team meeting. Miguel Lavalle is going to take a look at it during this cycle. For other proposals we need to have RFEs reported first. Starting the process of removing ML2/Linuxbridge ================================================ Currently in Neutron tree we have 4 drivers: * Linuxbridge, * Openvswitch, * macvtap, * sriov. SR-IOV driver is out of discussion here as this driver is addressing slightly different use case than other out drivers. We started discussion about above topic because we don't want to end up with too many drivers in-tree and we also had some discussions (and we have spec for that already) about include networking-ovn as in-tree driver. So with networking-ovn in-tree we would have already 4 drivers which can be used on any hardware: linuxbridge, ovs, macvtap and ovn. Conclusions from the discussion are: * each driver requires proper testing in the gate, so we need to add many new jobs to our check/gate queue, * currently linuxbridge driver don't have a lot of development and feature parity gaps between linuxbridge and ovs drivers is getting bigger and bigger (e.g. dvr, trunk ports), * also macvtap driver don't have a lot of activity in last few cycles. Maybe this one could be also considered as candidate to deprecation, * we need to have process of deprecating some drivers and time horizon for such actions should be at least 2 cycles. * we will not remove any driver completly but rather we will move it to be in stadium process first so it still can be maintained by people who are interested in it. Actions to do after this discussion: * Miguel Lavalle will contact RAX and Godaddy (we know that those are Linuxbridge users currently) to ask about their feedback about this, * if there are any other companies using LB driver, Nate Johnston is willing to help conctating them, please reach to him in such case. * we may ratify marking linuxbridge as deprecated in the team meeting during Ussuri cycle if nothing surprising pops in. Encrypted(IPSec) tenant networks ================================ Interesting topic proposed but we need to have RFE and spec with more detailed informations about it to continue discussions. Medatada service over IPv6 ========================== This is continuation of old RFE [6]. The only real problem is to choose proper IPv6 address which will be well known address used e.g. by cloud-init. Original spec proposed fe80::a9fe:a9fe as IPv6 address to access metadata service. We decided to be bold and define the standard. Bence Romsics and Miguel Lavalle volunteered to reach out to cloud-init maintainers to discuss that. walkthrough of OVN ================== Since some time we have in review spec about ml2/ovs and ovn convergence. See [7] for details. List of parity gaps between those backends is available at [8]. During the discussion we talked about things like: * migration from ml2/ovs to ml2/ovn - some scripts are already done in [9], * migration from ml2/lb to ml2/ovn - there was no any work done in this topic so far but it should be doable also if someone would need it and want to invest own time for that, * include networking-ovn as in-tree neutron driver and reasons why it could be good idea. Main reasons of that are: ** that would help growing networking-ovn community, ** would help to maintain a healthy project team, ** the default drivers have always been in-tree, However such inclusion may also hurt modularity/logical separation/dependency management/packaging/etc so we need to consider it really carefully and consider all points of view and opinions. Next action item on this topic is to write more detailed summary of this topic and send it to ML and ask wider audience for feedback. IPv6 devstack tempest test configuration vs OVN =============================================== Generally team supports idea which was described during this session and we should change sligtly IPv6 config on e.g. devstack deployments. Neutron - Edge SIG session ========================== We discussed about RFE [10]. This will require also changes on placement side. See [11] for details. Also some cyborg and ovn related changes may be relevant to topics related to Edge. Currently specs which we have are only related to ML2/OVS solution. Neutron - Nova cross project session ==================================== Etherpad for this session is on [12]. Summary written already by gibi can be found at [13]. On [14] You can find image which shows in visual way problem with live-migration of instances with SR-IOV ports. Policy handling in Neutron ========================== The goal of the session was to plan on Neutron's side similar effort to what services like nova are doing now to use new roles like reader and scopes, like project, domain, system provided by Keystone. Miguel Lavalle volunteered to work on this for Neutron and to be part of popup team for cross project collaboration on this topic. Neutron performance improvements ================================ Miguel Lavalle shown us his new profiling decorator [15] and how we all can use it to profile some of API calls in Neutron. Reevaluate Stadium projects =========================== This was follow up discussion after forum session. Notes from forum session can be found at [16]. Nate also prepared some good data about stadium projects activity in last cycles. See at [17] and [18] for details. We all agreed that projects which are in (relatively) good condition now are: * networking-ovn, * networking-odl, * ovsdbapp Projects in bad condition are other projects, like: * neutron-interconnection, * networking-sfc, * networking-bagpipe/bgpvpn, * networking-midonet, * neutron-fwaas and neutron-fwaas-dashboard, * neutron-dynamic-routing, * neutron-vpnaas and neutron-vpnaas-dashboard, We decided to immediately remove neutron-interconnection project as it was never really implemented. For other of those projects, we will send emails to ML to ask for potential maintainers of those projects. If there will be no any volunteers to maintain some of those projects, we will deprecated them and move to "x/" namespace in 2 cycles. Floating IP's On Routed Networks ================================ There is still interest of doing this. Lajos Katona started adding some scenario tests for routed networks already as we need improved test coverage for this feature. Miguel Lavalle said that he will possibly try to work on implementing this in Ussuri cycle. L3 agent enhancement ==================== We talked about couple potential improvements of existing L3 agent, all proposed by LIU Yulong. * retire metering-agent It seems that there is some interest in metering agent recently so we shouldn't probably consider of retiring it for now. We also talked about adding new "tc based" driver to the metering agent and this discussion can be continue on rfe bug [19]. * Centralized DNAT (non-DVR) traffic (floating IP) Scale-out This is proposal of new DVR solution. Some details of this new solution are available at [20]. We agreed that this proposal is trying to solve some very specific use case, and it seems to be very complicated solution with many potential corner cases to address. As a community we don't want to introduce and maintain such complicated new L3 design. * Lazy-load agent side router resources when no related service port Team wants to see RFE with detailed description of the exact problem which this is trying to solve and than continue discussion on such RFE. Zuul jobs ========= In this session we talked about jobs which we can potentially promote to be voting (and we didn't found any of such) and about jobs which we maybe can potentially remove from our queues. Here is what we agreed: * we have 2 iptables_hybrid jobs - one on Fedora and one on Ubuntu - we will drop one of those jobs and left only one of them, * drop neutron-grenade job as it is running still on py27 - we have grenade-py3 which is the same job but run on py36 already, * as it is begin of the cycle, we will switch in devstack neutron uwsgi to be default choice and we will remove "-uwsgi" jobs from queue, * we should compare our single node and multinode variants of same jobs and maybe promote multinode jobs to be voting and then remove single node job - I volunteered to do that, * remove our existing experimental jobs as those jobs are mostly broken and nobody is run those jobs in experimental queue actually, * Yamamoto will check failing networking-midonet job and propose patch to make it passing again, * we will change neutron-tempest-plugin jobs for branch in EM phase to always use certain tempest-plugin and tempest tag, than we will remove those jobs from check and gate queue in master branch, Stateless security groups ========================= Old RFE [21] was approved for neutron-fwaas project but we all agreed that this should be now implemented for security groups in core Neutron. People from Nuage are interested in work on this in upstream. We should probably also explore how easy/hard it will be to implement it in networking-ovn backend. Old, stagnant specs =================== During this session we decided to abandon many of old specs which were proposed long time ago and there is currently no any activity and interest in continue working on them. If anyone would be interested in continue work on some of them, feel free to contact neutron core team on irc or through email and we can always reopen such patch. Community Goal things ===================== We discussed about currently proposed community goals and who can take care of which goal on Neutron's side. Currently there are proposals of community goals as below: * python3 readiness - Nate will take care of this, * move jobs definitions to zuul v3 - I will take care of it. In core neutron and neutron-tempest-plugin we are (mostly) done. On stadium projects' side this will require some work to do, * Project specific PTL and contributor guides - Miguel Lavalle will take care of this goal as former PTL, We will track progress of community goals weekly in our team meetings. Neutron-lib =========== As some time ago our main neutron-lib maintainer (Boden) leaved from the project, we need some new volunteers to continue work on it. Todo list is available on [22]. This should be mostly important for people who are maintaining stadium projects or some 3rd party drivers/plugins so if You are doing things like that, please check list from [22] and reach out to us on ML or #openstack-neutron IRC channel. [1] https://www.slideshare.net/SawomirKaposki/neutron-on-boarding-room [2] https://github.com/gotostack/shanghai_ptg/blob/master/shanghai_neutron_ptg_topics_liuyulong.pdf [3] https://etherpad.openstack.org/p/Shanghai-Neutron-Cyborg-xproj [4] https://etherpad.openstack.org/p/kuryr-neutron-nice-to-have [5] https://bugs.launchpad.net/neutron/+bug/1815933 [6] https://bugs.launchpad.net/neutron/+bug/1460177 [7] https://review.opendev.org/#/c/658414/ [8] https://etherpad.openstack.org/p/ML2-OVS-OVN-Convergence [9] https://github.com/openstack/networking-ovn/tree/master/migration [10] https://bugs.launchpad.net/neutron/+bug/1832526 [11] http://lists.openstack.org/pipermail/openstack-discuss/2019-October/009991.html [12] https://etherpad.openstack.org/p/ptg-ussuri-xproj-nova-neutron [13] http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010654.html [14] https://imgur.com/a/12PrQ9W [15] https://review.opendev.org/678438 [16] https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward [17] https://ethercalc.openstack.org/neutron-stadium-train-metrics [18] https://ibb.co/SBzDGdD [19] https://bugs.launchpad.net/neutron/+bug/1817881 [20] https://imgur.com/a/6MeNUNb [21] https://bugs.launchpad.net/neutron/+bug/1753466 [22] https://etherpad.openstack.org/p/neutron-lib-volunteers-and-punch-list -- Slawek Kaplonski Senior software engineer Red Hat From lyarwood at redhat.com Tue Nov 12 13:57:06 2019 From: lyarwood at redhat.com (Lee Yarwood) Date: Tue, 12 Nov 2019 13:57:06 +0000 Subject: [nova][cinder] Volume-backed instance disks and some operations that do not support those yet In-Reply-To: <89de51c7-b489-7df3-1bb5-7424ee8a9542@gmail.com> References: <89de51c7-b489-7df3-1bb5-7424ee8a9542@gmail.com> Message-ID: <20191112135706.zmywdbyucgmqdloo@lyarwood.usersys.redhat.com> On 11-11-19 13:54:02, Matt Riedemann wrote: > On 11/11/2019 10:06 AM, Peter Penchev wrote: > > - rescuing an instance (refuses outright) > > Yeah, not supported, but there have been specs [3][4]. > > [..] > > [3] https://review.opendev.org/#/c/651151/ > [4] https://review.opendev.org/#/c/532410/ I might actually have time for this during U. Third time lucky? https://review.opendev.org/693849 Cheers, -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From sbauza at redhat.com Tue Nov 12 14:07:05 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 15:07:05 +0100 Subject: [ptg][nova][cinder] x-p meeting minutes Message-ID: Excerpts taken from https://etherpad.openstack.org/p/shanghai-ptg-cinder We discussed two items : 1/ Improving replication - When a failover happens volumes are no longer usable in Nova. - Nova should start re-attaching volumes after a failover? - Why can't we detach and attach the volume? Data that is in flight would be lost. - Question about boot from volume. In that case the instance is dead anyway because access to the volume has been lost. - Could go through the shutdown, detach, attach, reboot path. - Problem is that detach is going to fail. Need to force it or handle the failure. - We aren't sure that Nova will allow a detach of a boot volume. - Also don't currently have a force detach API. - AGREE: Need to figure out how to pass the force to os-brick to detach volume and when rebooting a volume. 2/ Nova bug for images created from encrypted volumes - Nova is not creating a new key for encrypted images but the deletion policy metadata can allow a key to be deleted wehn it is still in use by other images or even volumes - Nova needs to clone the keys when doing create image. - The nova team thinks that we have found a bug that needs to be fixed. We just need to open a bug. - ACTION Cinder to open a bug against Nova. (can you also ping lyarwood when it's open?) -S -------------- next part -------------- An HTML attachment was scrubbed... URL: From corey.bryant at canonical.com Tue Nov 12 14:12:29 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Tue, 12 Nov 2019 09:12:29 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> References: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> Message-ID: On Mon, Nov 11, 2019 at 2:33 PM Zane Bitter wrote: > On 7/11/19 2:11 pm, Corey Bryant wrote: > > Hello TC members, > > > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's > > too late to enable voting py38 unit tests for ussuri, I'd like to at > > least enable non-voting py38 unit tests. This email is seeking approval > > and direction from the TC to move forward with enabling non-voting py38 > > tests. > > I was a bit fuzzy on this myself, so I looked it up and this is what the > TC decided when we passed the resolution: > > > If the new Zuul template contains test jobs that were not in the > previous one, the goal champion(s) may choose to update the previous > template to add a non-voting check job (or jobs) to match the gating jobs > in the new template. This means that all repositories that have not yet > converted to the template for the upcoming release will see a non-voting > preview of the new job(s) that will be added once they update. If this > option is chosen, the non-voting job should be limited to the master branch > so that it does not run on the preceding release’s stable branch. > > Thanks for digging that up and explaining. I recall that wording and it makes a lot more sense now that we have a scenario in front of us. > (from > > https://governance.openstack.org/tc/resolutions/20181024-python-update-process.html#unit-tests > ) > > So to follow that process we would need to define the python versions > for V, then appoint a goal champion, and after that it would be at the > champion's discretion to add a non-voting job on master in Ussuri. I > happened to be sitting next to Sean when I saw this thread, and after > discussing it with him I think he would OK with having a non-voting job > on every commit, since it's what we have documented. Previous > discussions established that the overhead of adding one Python unit test > job to every project was pretty inconsequential (we'll offset it by > dropping 2.7 jobs anyway). > > I submitted a draft governance patch defining the Python versions for V > (https://review.opendev.org/693743). Unfortunately we can't merge it yet > because we don't have a release name for V (Sean is working on that: > https://review.opendev.org/693266). It's gazing in the crystal ball a > Thanks very much for getting that going. little bit, but even if for some reason Ubuntu 20.04 is not released > before the V cycle starts, it's inevitable that we will be selecting > Python 3.8 because it meets the first criterion ("The latest released > version of Python 3 that is available in any distribution we can > feasibly use for testing") - 3.8 is released and it's available in > Ubuntu 18.04, which is the distro we use for testing anyway. > > So, in my opinion, if you're volunteering to be the goal champion then > there's no need for any further approval by the TC ;) > > Sure, I can champion that. Just to be clear, would that be Ussuri and V python3-updates champion, similar to the following? https://governance.openstack.org/tc/goals/selected/train/python3-updates.html Granted it's easier now that we mostly just have to switch the job template to the new release. > I guess to make that official we should commit the python3 update Goal > for the V cycle now... or at least as soon as we have a release name. > How far off do you think we are from having a V name? If just a few weeks then I'm fine waiting but if over a month I'm more concerned. This is happening a little earlier than I think we anticipated but, > given that there's no question what is going to happen in V, I don't > think we'd be doing anybody any favours by delaying the process > unnecessarily. I agree. And Python 3.9 doesn't release until Oct 2020 so that won't be in the picture for Ussuri or V. > > For some further background: The next release of Ubuntu, Focal (20.04) > > LTS, is scheduled to release in April 2020. Python 3.8 will be the > > default in the Focal release, so I'm hopeful that non-voting unit tests > > will help close some of the gap. > > > > I have a review here for the zuul project template enablement for ussuri: > > https://review.opendev.org/#/c/693401 > > > > Also should this be updated considering py38 would be non-voting? > > https://governance.openstack.org/tc/reference/runtimes/ussuri.html > > No, I don't think this changes anything for Ussuri. It's preparation for V. > > Ok. Appreciate all the input and help. Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Tue Nov 12 14:13:51 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 15:13:51 +0100 Subject: [ptg][nova][keystone] x-p meeting minutes Message-ID: We only discussed about the large effort needed for the new policies. https://specs.openstack.org/openstack/nova-specs/specs/ussuri/approved/policy-defaults-refresh.html https://review.opendev.org/#/q/topic:bp/policy-defaults-refresh+(status:open+OR+status:merged) We asked the Keystone team to review the first changes in the series as it's very helpful. We had concerns about completion over the Ussuri cycle since the series can be very large and we want to avoid deprecation messages for some APIs if not all the Nova APIs are touched yet. The agreement was to hold a procedural -2 on the API changes and start reviewing (for both Keystone and Nova folks) until all the changes are there. Maybe a runway could help once all the API changes are uploaded. -S -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Tue Nov 12 14:20:44 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 15:20:44 +0100 Subject: [ptg][nova][glance] x-p meeting minutes Message-ID: We only discussed about nova snapshots to dedicated glance stores. Reference is https://review.opendev.org/#/c/641210/ We had concerns on implementing glance storage location strategies in Nova. A counter-proposal was made to only pass the original image ID when calling Glance for a snapshot so then Glance would then use any store the operator wants (eg. a location strategy like "please store the snapshot to a place close to where the original image is"). The agreement was there for the counter-proposal so the Nova change would only be to pass the original glance image ID to the new glanceclient that would support it. Glance folks will provide a new spec revision. -S -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Tue Nov 12 14:24:20 2019 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 12 Nov 2019 09:24:20 -0500 Subject: [ops] meetups team meeting and meetups venue for Jan 2020 Message-ID: There wll be an OpenStack Ops Meetups team meeting in just under 45 minutes on #openstack-operators. The agenda is here: https://etherpad.openstack.org/p/ops-meetups-team We'd like to see if we can agree to formally accept the offer to host the next OpenStack Operators Meetup in London next January (see https://etherpad.openstack.org/p/ops-meetup-1st-2020) Please attend the meeting if you have feedback for or against this proposal. So far all feedback has been positive and there are no other offers. Note (full disclosure) that this proposal is from my employer. We'll also go over material from last week's summit in Shanghai. Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Tue Nov 12 14:27:26 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Tue, 12 Nov 2019 23:27:26 +0900 Subject: [infra] Etherpad problem In-Reply-To: <20191112102729.fggiccohpwpqd3he@skaplons-mac> References: <20191109021607.ffm5lkwkfohl22fq@skaplons-mac> <6c3b8ca3-d2d4-7eb4-d615-5396eaf1cdef@gmail.com> <20191112102729.fggiccohpwpqd3he@skaplons-mac> Message-ID: Hi, I updated the restored etherpad [1] based on the history at the last moment [2], especially "Review old, stagnant specs" session. If someone has notes on spec reviews discussed after the etherpad became broken (around L.562-569). [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning/timeslider#16117 Thanks, Akihiro On Tue, Nov 12, 2019 at 7:35 PM Slawek Kaplonski wrote: > > Hi, > > Thx Brian. I used Your backup etherpad to "restore" everything in [1] > > [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored > > On Sun, Nov 10, 2019 at 12:31:36AM +0800, Brian Haley wrote: > > On 11/9/19 10:16 AM, Slawek Kaplonski wrote: > > > Hi, > > > > > > Just at the end of the ptg sessions, neutron etherpad was got broken somehow. > > > Now when I try to open [1] I see only something like: > > > > > > An error occurred > > > The error was reported with the following id: 'igzOahZ6ruH0eSUAWKaj' > > > > > > Please press and hold Ctrl and press F5 to reload this page, if the problem > > > persists please send this error message to your webmaster: > > > 'ErrorId: igzOahZ6ruH0eSUAWKaj > > > URL: https://etherpad.openstack.org/p/Shanghai-Neutron-Planning > > > UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:70.0) Gecko/20100101 > > > Firefox/70.0 > > > TypeError: r.dropdowns is undefined in > > > https://etherpad.openstack.org/javascripts/lib/ep_etherpad-lite/static/js/pad.js?callback=require.define > > > at line 18' > > > > > > > > > We can open one of the previous versions which is available at [2] but I don't > > > know how we can fix original etherpad or restore version from [2] to be original > > > etherpad and make it working again. > > > Can someone from infra team check that for us maybe? > > > Thx in advance for any help. > > > > > > [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning > > > [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning/timeslider#16117 > > > > Hi Slawek, > > > > When I just went to check this etherpad now, noticed I had a tab open that > > was in "Force reconnect" state. I made a copy of that, just might be a > > little out of date on the last items. The formatting is also a little odd, > > but at least it's better than nothing if we can't get the original back. > > > > https://etherpad.openstack.org/p/neutron-ptg-temp > > > > -Brian > > > > -- > Slawek Kaplonski > Senior software engineer > Red Hat > > From lyarwood at redhat.com Tue Nov 12 15:09:08 2019 From: lyarwood at redhat.com (Lee Yarwood) Date: Tue, 12 Nov 2019 15:09:08 +0000 Subject: [ptg][nova][cinder] x-p meeting minutes In-Reply-To: References: Message-ID: <20191112150908.w2of3q5mpznm42dl@lyarwood.usersys.redhat.com> On 12-11-19 15:07:05, Sylvain Bauza wrote: > Excerpts taken from https://etherpad.openstack.org/p/shanghai-ptg-cinder > We discussed two items : > > 1/ Improving replication > > - When a failover happens volumes are no longer usable in Nova. > > > - Nova should start re-attaching volumes after a failover? > > > - Why can't we detach and attach the volume? Data that is in flight > would be lost. > > > - Question about boot from volume. In that case the instance is dead > anyway because access to the volume has been lost. > > > - Could go through the shutdown, detach, attach, reboot path. > > > - Problem is that detach is going to fail. Need to force it or handle > the failure. > > > - We aren't sure that Nova will allow a detach of a boot volume. > > > - Also don't currently have a force detach API. > > > - AGREE: Need to figure out how to pass the force to os-brick to detach > volume and when rebooting a volume. I started looking at this a while ago in the review below: libvirt: Wire up a force disconnect_volume flag https://review.opendev.org/#/c/584849/ I'll restore and see if it's still valid for the above. > 2/ Nova bug for images created from encrypted volumes > > - Nova is not creating a new key for encrypted images but the deletion > policy metadata can allow a key to be deleted wehn it is still in use by > other images or even volumes > > > - Nova needs to clone the keys when doing create image. > > > - The nova team thinks that we have found a bug that needs to be fixed. > We just need to open a bug. > > > - ACTION Cinder to open a bug against Nova. (can you also ping lyarwood > when it's open?) This has been created below: Possible data loss from createImage action https://bugs.launchpad.net/nova/+bug/1852106 As discussed in the bug the actual use case of booting an instance from an encrypted image that itself was created from an encrypted volume has never worked. I'm also confused by the use of cinder_ specific image properties here but this may be because this use case was never intended to be supported? Anyway, I've ended up looking at the ephemeral encryption support in Nova's Libvirt driver for most of the day and honestly it needs to be deprecated and removed as it's never going to be able to handle anything like this. The current implementation using a single per instance key for all disks when use cases like this and the encrypted image spec below require per disk keys: Spec for the Nova part of Image Encryption https://review.opendev.org/#/c/608696/ -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From amotoki at gmail.com Tue Nov 12 15:29:17 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Wed, 13 Nov 2019 00:29:17 +0900 Subject: [all][doc] Patches to add --keep-going to sphinx-build (and patches proposed to many many repositories) Message-ID: Hi, As you may notice, we see a lot of patches which try to add --keep-going to sphinx-build. [0] I have suggestions and questions. 1) First, when reviewing them, keep the following in your mind. * --keep-going is added even when -W option is not used in the sphinx-build command line. -W is recommended in the PTI [1], so ensure to have -W. * Some of them ignores cases where "python setup.py build_sphinx" is still used. If it is a good chance to clean them up and use "sphinx-build" consistently. 2) Why do we need to remove the build directory for releasenotes? Some of them propose to add "rm -rf releasenotes/build"? (for example [2]) I cannot understand why this needs to be added. Do we really want to call "rm -rf "? I know it is needed in some repositories due to various reasons, but generally speaking it make the documentation build longer and they are unnecessary. I tried to get the reason from the authors in reviews but they just say it is simple and it can be added at the same time. Thus, I would like to ask it more broadly in the list. 3) What is the recommended way to get a consensus to make this kind of patches which affects many many repositories? It is not productive to ask questions in individual reviews. Some patches are approved fast and questions are pop-up in other patches. It makes it difficult to discuss the real needs and keep many repositories consistent. I am not against all of these changes, but I would like to see more organized way. Thought? Thanks, Akihiro [0] https://review.opendev.org/#/q/message:keeping [1] https://governance.openstack.org/tc/reference/project-testing-interface.html#documentation [2] https://review.opendev.org/#/c/690956/2/tox.ini From sbauza at redhat.com Tue Nov 12 15:38:38 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 16:38:38 +0100 Subject: [nova][ptg] Resurrecting NUMA topology in Placement In-Reply-To: References: Message-ID: On Tue, Nov 12, 2019 at 2:30 PM Sean Mooney wrote: > On Tue, 2019-11-12 at 13:26 +0000, Sean Mooney wrote: > > On Tue, 2019-11-12 at 11:29 +0100, Sylvain Bauza wrote: > > > We discussed about a long known story > > > https://review.openstack.org/#/c/552924/ > > > > > > The whole agreement during the PTG was to keep things simple with baby > > > steps : > > > - only supporting a few NUMA queries and defer others as unsupported > (still > > > supported by legacy NUMATopologyFilter) > > > - The to-be-resurrected spec would be only focus on VCPU/PCPU/MEMORY_MB > > > resource classes and not handle PCI or GPU devices (ie. level-1 tree, > no > > > children under the NUMA RPs) > > > > > > Agreement was also there for saying that functional tests should be > enough > > > since PlacementFixture already works perfectly. > > > > we can now do numa testing in the gate so we can also add tempest > testing this cycle. > > artom has recently gotten whitebox to run (more work to do > https://review.opendev.org/#/c/691062/) > > and i do want to get my multi numa nfv testing job > https://review.opendev.org/#/c/679656/ at least > > in experimental and perodic pipelines. i would like it to be in check > eventually but baby steps. > > > > i dont think these should be a blocker for any of this work but i think > we shoudl take advantage > > of them to contiue to improve the numa testing in gate. > by gate i ment ci/ > Yup, I understood and we also discussed this possibility at the PTG. To be clear, that would be nice to get Tempest tests on a specific job that'd verify this, but this shouldn't be a blocker. > > > > > > > -S > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Tue Nov 12 15:44:10 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 16:44:10 +0100 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> Message-ID: On Mon, Nov 11, 2019 at 4:05 PM Dan Smith wrote: > > Sharding with and/or within cells will help to some degree (and we are > > actively looking into this as you probably know), but I think that > > should not stop us from checking if there are algorithmic improvements > > (e.g. when collecting the data), or if moving to a different locking > > granularity or even parallelising the update are feasible additional > > improvements. > > All of that code was designed around one node per compute host. In the > ironic case it was expanded (hacked) to support N where N is not > huge. Giving it a huge number, and using a driver where nodes go into > maintenance/cleaning for long periods of time is asking for trouble. > > Given there is only one case where N can legitimately be greater than > one, I'm really hesitant to back a proposal to redesign it for large > values of N. > > Perhaps we as a team just need to document what sane, tested, and > expected-to-work values for N are? > > What we discussed at the PTG was the fact that we only have one global semaphore for this module but we have N ResourceTracker python objects (where N is the number of Ironic nodes per compute service). As per CERN, it looks this semaphore blocks when updating periodically so we basically said it could only be a bugfix given we could create N semaphores instead. That said, as it could have some problems, we want to make sure we can test the change not only by the gate but also directly by CERN. Another discussion was about having more than one thread for the compute service (ie. N threads) but my opinion was that we should first look at the above before discussing about any other way. -S --Dan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Tue Nov 12 15:49:06 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 12 Nov 2019 09:49:06 -0600 Subject: [all][doc] Patches to add --keep-going to sphinx-build (and patches proposed to many many repositories) In-Reply-To: References: Message-ID: <744149ba-bb9f-dd95-e8a6-5330e81d444c@gmail.com> On 11/12/2019 9:29 AM, Akihiro Motoki wrote: > I am not against all of these changes, but I would like to see more > organized way. > > Thought? This looks like the basic "find a maybe not so controversial change and spam it all over every repo to pad contribution stats" pattern to me. Marginally better than fixing typos or URLs which is the usual thing being proposed in these types of changes. -- Thanks, Matt From mihalis68 at gmail.com Tue Nov 12 15:55:22 2019 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 12 Nov 2019 10:55:22 -0500 Subject: [ops] next ops meetup : London 7,8 Jan 2020 - approved! Message-ID: https://twitter.com/osopsmeetup/status/1194281816468938752?s=20 -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From moreira.belmiro.email.lists at gmail.com Tue Nov 12 16:06:17 2019 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Tue, 12 Nov 2019 17:06:17 +0100 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: <7f98ad40-bcf0-bb85-da58-d6a7a1f71705@gmail.com> References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> <7f98ad40-bcf0-bb85-da58-d6a7a1f71705@gmail.com> Message-ID: Hi, using several cells for the Ironic deployment would be great however it doesn't work with the current architecture. The nova ironic driver gets all the nodes available in Ironic. This means that if we have several cells all of them will report the same nodes! The other possibility is to have a dedicated Ironic instance per cell, but in this case it will be very hard to manage a large deployment. What we are trying is to shard the ironic nodes between several nova-computes. nova/ironic deployment supports several nova-computes and it will be great if the RT nodes cycle is sharded between them. But anyway, this will also require speeding up the big lock. It would be great if a compute node can handle more than 500 nodes. Considering our use case: 15k/500 = 30 compute nodes. Belmiro CERN On Mon, Nov 11, 2019 at 9:13 PM Matt Riedemann wrote: > On 11/11/2019 7:03 AM, Chris Dent wrote: > > Or using > > separate processes? For the ironic and vsphere contexts, increased > > CPU usage by the nova-compute process does not impact on the > > workload resources, so parallization is likely a good option. > > I don't know how much it would help - someone would have to actually > test it out and get metrics - but one easy win might just be using a > thread or process executor pool here [1] so that N compute nodes could > be processed through the update_available_resource periodic task > concurrently, maybe $ncpu or some factor thereof. By default make it > serialized for backward compatibility and non-ironic deployments. Making > that too highly concurrent could have negative impacts on other things > running on that host, like the neutron agent, or potentially storming > conductor/rabbit with a ton of DB requests from that compute. > > That doesn't help with the scenario that the big > COMPUTE_RESOURCE_SEMAPHORE lock is held by the periodic task while > spawning, moving, or deleting an instance that also needs access to the > big lock to update the resource tracker, but baby steps if any steps in > this area of the code would be my recommendation. > > [1] > https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629 > > -- > > Thanks, > > Matt > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dms at danplanet.com Tue Nov 12 16:12:49 2019 From: dms at danplanet.com (Dan Smith) Date: Tue, 12 Nov 2019 08:12:49 -0800 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: (Belmiro Moreira's message of "Tue, 12 Nov 2019 17:06:17 +0100") References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> <7f98ad40-bcf0-bb85-da58-d6a7a1f71705@gmail.com> Message-ID: > Hi, using several cells for the Ironic deployment would be great > however it doesn't work with the current architecture. The nova > ironic driver gets all the nodes available in Ironic. This means that > if we have several cells all of them will report the same nodes! The > other possibility is to have a dedicated Ironic instance per cell, but > in this case it will be very hard to manage a large deployment. That's a problem for more reasons than just your scale. However, doesn't this solve that problem? https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/ironic-conductor-groups.html --Dan From moreira.belmiro.email.lists at gmail.com Tue Nov 12 16:27:31 2019 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Tue, 12 Nov 2019 17:27:31 +0100 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> <7f98ad40-bcf0-bb85-da58-d6a7a1f71705@gmail.com> Message-ID: Dan Smith just point me the conductor groups that were added in Stein. https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/ironic-conductor-groups.html This is an interesting way to partition the deployment much better than the multiple nova-computes setup. Thanks, Belmiro CERN On Tue, Nov 12, 2019 at 5:06 PM Belmiro Moreira < moreira.belmiro.email.lists at gmail.com> wrote: > Hi, > using several cells for the Ironic deployment would be great however it > doesn't work with the current architecture. > The nova ironic driver gets all the nodes available in Ironic. This means > that if we have several cells all of them will report the same nodes! > The other possibility is to have a dedicated Ironic instance per cell, but > in this case it will be very hard to manage a large deployment. > > What we are trying is to shard the ironic nodes between several > nova-computes. > nova/ironic deployment supports several nova-computes and it will be great > if the RT nodes cycle is sharded between them. > > But anyway, this will also require speeding up the big lock. > It would be great if a compute node can handle more than 500 nodes. > Considering our use case: 15k/500 = 30 compute nodes. > > Belmiro > CERN > > > > On Mon, Nov 11, 2019 at 9:13 PM Matt Riedemann > wrote: > >> On 11/11/2019 7:03 AM, Chris Dent wrote: >> > Or using >> > separate processes? For the ironic and vsphere contexts, increased >> > CPU usage by the nova-compute process does not impact on the >> > workload resources, so parallization is likely a good option. >> >> I don't know how much it would help - someone would have to actually >> test it out and get metrics - but one easy win might just be using a >> thread or process executor pool here [1] so that N compute nodes could >> be processed through the update_available_resource periodic task >> concurrently, maybe $ncpu or some factor thereof. By default make it >> serialized for backward compatibility and non-ironic deployments. Making >> that too highly concurrent could have negative impacts on other things >> running on that host, like the neutron agent, or potentially storming >> conductor/rabbit with a ton of DB requests from that compute. >> >> That doesn't help with the scenario that the big >> COMPUTE_RESOURCE_SEMAPHORE lock is held by the periodic task while >> spawning, moving, or deleting an instance that also needs access to the >> big lock to update the resource tracker, but baby steps if any steps in >> this area of the code would be my recommendation. >> >> [1] >> >> https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629 >> >> -- >> >> Thanks, >> >> Matt >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim at jimrollenhagen.com Tue Nov 12 16:44:47 2019 From: jim at jimrollenhagen.com (Jim Rollenhagen) Date: Tue, 12 Nov 2019 11:44:47 -0500 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> <7f98ad40-bcf0-bb85-da58-d6a7a1f71705@gmail.com> Message-ID: On Tue, Nov 12, 2019 at 11:38 AM Belmiro Moreira < moreira.belmiro.email.lists at gmail.com> wrote: > Dan Smith just point me the conductor groups that were added in Stein. > > https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/ironic-conductor-groups.html > This is an interesting way to partition the deployment much better than > the multiple nova-computes setup. > Just a note, they aren't mutually exclusive. You can run multiple nova-computes to manage a single conductor group, whether for HA or because you're using groups for some other construct (cells, racks, halls, network zones, etc) which you want to shard further. // jim > Thanks, > Belmiro > CERN > > On Tue, Nov 12, 2019 at 5:06 PM Belmiro Moreira < > moreira.belmiro.email.lists at gmail.com> wrote: > >> Hi, >> using several cells for the Ironic deployment would be great however it >> doesn't work with the current architecture. >> The nova ironic driver gets all the nodes available in Ironic. This means >> that if we have several cells all of them will report the same nodes! >> The other possibility is to have a dedicated Ironic instance per cell, >> but in this case it will be very hard to manage a large deployment. >> >> What we are trying is to shard the ironic nodes between several >> nova-computes. >> nova/ironic deployment supports several nova-computes and it will be >> great if the RT nodes cycle is sharded between them. >> >> But anyway, this will also require speeding up the big lock. >> It would be great if a compute node can handle more than 500 nodes. >> Considering our use case: 15k/500 = 30 compute nodes. >> >> Belmiro >> CERN >> >> >> >> On Mon, Nov 11, 2019 at 9:13 PM Matt Riedemann >> wrote: >> >>> On 11/11/2019 7:03 AM, Chris Dent wrote: >>> > Or using >>> > separate processes? For the ironic and vsphere contexts, increased >>> > CPU usage by the nova-compute process does not impact on the >>> > workload resources, so parallization is likely a good option. >>> >>> I don't know how much it would help - someone would have to actually >>> test it out and get metrics - but one easy win might just be using a >>> thread or process executor pool here [1] so that N compute nodes could >>> be processed through the update_available_resource periodic task >>> concurrently, maybe $ncpu or some factor thereof. By default make it >>> serialized for backward compatibility and non-ironic deployments. Making >>> that too highly concurrent could have negative impacts on other things >>> running on that host, like the neutron agent, or potentially storming >>> conductor/rabbit with a ton of DB requests from that compute. >>> >>> That doesn't help with the scenario that the big >>> COMPUTE_RESOURCE_SEMAPHORE lock is held by the periodic task while >>> spawning, moving, or deleting an instance that also needs access to the >>> big lock to update the resource tracker, but baby steps if any steps in >>> this area of the code would be my recommendation. >>> >>> [1] >>> >>> https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629 >>> >>> -- >>> >>> Thanks, >>> >>> Matt >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From zbitter at redhat.com Tue Nov 12 16:47:06 2019 From: zbitter at redhat.com (Zane Bitter) Date: Tue, 12 Nov 2019 11:47:06 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> Message-ID: On 12/11/19 9:12 am, Corey Bryant wrote: > > On Mon, Nov 11, 2019 at 2:33 PM Zane Bitter > wrote: > > On 7/11/19 2:11 pm, Corey Bryant wrote: > > Hello TC members, > > > > Python 3.8 is available in Ubuntu Bionic now and while I > understand it's > > too late to enable voting py38 unit tests for ussuri, I'd like to at > > least enable non-voting py38 unit tests. This email is seeking > approval > > and direction from the TC to move forward with enabling > non-voting py38 > > tests. > > I was a bit fuzzy on this myself, so I looked it up and this is what > the > TC decided when we passed the resolution: > > > If the new Zuul template contains test jobs that were not in the > previous one, the goal champion(s) may choose to update the previous > template to add a non-voting check job (or jobs) to match the gating > jobs in the new template. This means that all repositories that have > not yet converted to the template for the upcoming release will see > a non-voting preview of the new job(s) that will be added once they > update. If this option is chosen, the non-voting job should be > limited to the master branch so that it does not run on the > preceding release’s stable branch. > > > Thanks for digging that up and explaining. I recall that wording and it > makes a lot more sense now that we have a scenario in front of us. > > (from > https://governance.openstack.org/tc/resolutions/20181024-python-update-process.html#unit-tests > > ) > > So to follow that process we would need to define the python versions > for V, then appoint a goal champion, and after that it would be at the > champion's discretion to add a non-voting job on master in Ussuri. I > happened to be sitting next to Sean when I saw this thread, and after > discussing it with him I think he would OK with having a non-voting job > on every commit, since it's what we have documented. Previous > discussions established that the overhead of adding one Python unit > test > job to every project was pretty inconsequential (we'll offset it by > dropping 2.7 jobs anyway). > > I submitted a draft governance patch defining the Python versions for V > (https://review.opendev.org/693743). Unfortunately we can't merge it > yet > because we don't have a release name for V (Sean is working on that: > https://review.opendev.org/693266). It's gazing in the crystal ball a > > > Thanks very much for getting that going. > > little bit, but even if for some reason Ubuntu 20.04 is not released > before the V cycle starts, it's inevitable that we will be selecting > Python 3.8 because it meets the first criterion ("The latest released > version of Python 3 that is available in any distribution we can > feasibly use for testing") - 3.8 is released and it's available in > Ubuntu 18.04, which is the distro we use for testing anyway. > > So, in my opinion, if you're volunteering to be the goal champion then > there's no need for any further approval by the TC ;) > > > Sure, I can champion that. Thanks! > Just to be clear, would that be Ussuri and V > python3-updates champion, similar to the following? > https://governance.openstack.org/tc/goals/selected/train/python3-updates.html Yes, for V it will be similar to that but s/train/v.../ only simpler because you already did the hard bits :) The goal champion for that is the one who gets to decide on adding the non-voting py38 job in Ussuri. For U the proposed goal is https://review.opendev.org/691178 - so it will both update the Zuul template from train->ussuri and drop the py27 job (the former is a prerequisite for the latter because of reasons - see https://review.opendev.org/688997). That one is a little more complicated because we also should drop Python 2 functional tests before we drop the py27 unit tests, and because things have to happen in a certain order (services before libraries). OTOH we're only dropping stuff in this release and not adding new voting jobs that could break. Currently gmann has listed himself as the champion for that, but I know he's looking for help (we can have multiple champions for a goal). Somebody somewhere already has an action item to ask you about it :) > Granted it's easier now that we mostly just have to switch the job > template to the new release. > > I guess to make that official we should commit the python3 update Goal > for the V cycle now... or at least as soon as we have a release name. > > > How far off do you think we are from having a V name? If just a few > weeks then I'm fine waiting but if over a month I'm more concerned. Sean's patch has the naming poll closing on 2019-12-16, and we have to wait for legal approval from the OSF after that. (Ideally we'd have started sooner, but we were entertaining proposals to change the process and there was kind of an assumption that we wouldn't be using the existing one again.) My take is that we shouldn't get too bureaucratic here. The criteria are well-defined so the outcome is not in doubt. There's no reason to delay until the patch is formally merged. We operate by lazy consensus, so if any TC members object they can reply to this thread. I'll flag it in IRC so people know about it. If there's no objections in the next week or say then the openstack-zuul-jobs team would be entitled to take that as approval. cheers, Zane. > This is happening a little earlier than I think we anticipated but, > given that there's no question what is going to happen in V, I don't > think we'd be doing anybody any favours by delaying the process > unnecessarily. > > > I agree. And Python 3.9 doesn't release until Oct 2020 so that won't be > in the picture for Ussuri or V. > > > > For some further background: The next release of Ubuntu, Focal > (20.04) > > LTS, is scheduled to release in April 2020. Python 3.8 will be the > > default in the Focal release, so I'm hopeful that non-voting unit > tests > > will help close some of the gap. > > > > I have a review here for the zuul project template enablement for > ussuri: > > https://review.opendev.org/#/c/693401 > > > > Also should this be updated considering py38 would be non-voting? > > https://governance.openstack.org/tc/reference/runtimes/ussuri.html > > No, I don't think this changes anything for Ussuri. It's preparation > for V. > > > Ok. Appreciate all the input and help. > > Thanks, > Corey From dpeacock at redhat.com Tue Nov 12 17:21:03 2019 From: dpeacock at redhat.com (David Peacock) Date: Tue, 12 Nov 2019 12:21:03 -0500 Subject: [heat] Addressing the large patch backlog. Message-ID: Hi all, For those interested in the workings of the Heat project, I'd like to kick off a call to action. At the time of writing there are approximately 200 open patches against the core Heat project repo alone, not counting the other Heat repos. Recently I started going through and triaging the patches I'd consider "historical" with an arbitrary cut off for this definition of August 1st of this year. There are 148 patches which meet this definition, dating all the way back to 2015. I have gone through them all and placed them into a spreadsheet [0] which I'd invite you all to check. Provided is a link to the patch in question, initial upload date, last meaningful update date, primary author, and a high level summary of the patch. Additionally I've broken the patches down into three recommended states based on a high level first pass. *Abandon* 34 patches are candidates to be abandoned; they usually are of debatable utility, have significant outstanding concerns, or have no followup from the original developer in a very long time. In many cases, all of these conditions. *Without good reason or explanation from the original developer, these patches may ultimately be cleared out.* *Rebase + Merge* 38 patches are with a high level look in reasonably good shape, perform a stated goal, and may be trivial to core review and ultimately rebase and merge. *If you're the original developer or otherwise interested in these patches and wish to see them through the merge process, please rebase the patch.* *Research* 76 patches are sufficiently complex that they'll need a much closer look. Some of these patches are in a seemingly "finished" state, some are a way off. Some have unanswered concerns from core review and have been left dangling. *If you're the original developer or otherwise interested in working these patches through to completion, please do get involved.* When I started this little mission I wasn't quite sure what to expect. What I have found is that as much as there was anticipated cruft to clear out, there is a great deal of very good work lurking here, waiting to see the light of day, and it would be so good to see this work realised. :-) If you have anything to say, feel free to write back on list, and if you'd like to coordinate with me any efforts with these patches I can be found by email or on Freenode in the #heat channel; I'm dpeacock. Based on feedback of this idea, and indeed on each individual patch, I hope we can get this backlog under control, and harvest some of this excellent code! Thank you, David Peacock [0] https://ethercalc.openstack.org/b3qtqyhkg9g1 Please be mindful of accidental edits. -------------- next part -------------- An HTML attachment was scrubbed... URL: From moreira.belmiro.email.lists at gmail.com Tue Nov 12 17:35:49 2019 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Tue, 12 Nov 2019 18:35:49 +0100 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> <7f98ad40-bcf0-bb85-da58-d6a7a1f71705@gmail.com> Message-ID: Great! Thanks Jim. I will later report our experience with conductor groups. Belmiro CERN On Tue, Nov 12, 2019 at 5:58 PM Jim Rollenhagen wrote: > > > On Tue, Nov 12, 2019 at 11:38 AM Belmiro Moreira < > moreira.belmiro.email.lists at gmail.com> wrote: > >> Dan Smith just point me the conductor groups that were added in Stein. >> >> https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/ironic-conductor-groups.html >> This is an interesting way to partition the deployment much better than >> the multiple nova-computes setup. >> > > Just a note, they aren't mutually exclusive. You can run multiple > nova-computes to manage a single conductor group, whether for HA or because > you're using groups for some other construct (cells, racks, halls, network > zones, etc) which you want to shard further. > > // jim > > >> Thanks, >> Belmiro >> CERN >> >> On Tue, Nov 12, 2019 at 5:06 PM Belmiro Moreira < >> moreira.belmiro.email.lists at gmail.com> wrote: >> >>> Hi, >>> using several cells for the Ironic deployment would be great however it >>> doesn't work with the current architecture. >>> The nova ironic driver gets all the nodes available in Ironic. This >>> means that if we have several cells all of them will report the same nodes! >>> The other possibility is to have a dedicated Ironic instance per cell, >>> but in this case it will be very hard to manage a large deployment. >>> >>> What we are trying is to shard the ironic nodes between several >>> nova-computes. >>> nova/ironic deployment supports several nova-computes and it will be >>> great if the RT nodes cycle is sharded between them. >>> >>> But anyway, this will also require speeding up the big lock. >>> It would be great if a compute node can handle more than 500 nodes. >>> Considering our use case: 15k/500 = 30 compute nodes. >>> >>> Belmiro >>> CERN >>> >>> >>> >>> On Mon, Nov 11, 2019 at 9:13 PM Matt Riedemann >>> wrote: >>> >>>> On 11/11/2019 7:03 AM, Chris Dent wrote: >>>> > Or using >>>> > separate processes? For the ironic and vsphere contexts, increased >>>> > CPU usage by the nova-compute process does not impact on the >>>> > workload resources, so parallization is likely a good option. >>>> >>>> I don't know how much it would help - someone would have to actually >>>> test it out and get metrics - but one easy win might just be using a >>>> thread or process executor pool here [1] so that N compute nodes could >>>> be processed through the update_available_resource periodic task >>>> concurrently, maybe $ncpu or some factor thereof. By default make it >>>> serialized for backward compatibility and non-ironic deployments. >>>> Making >>>> that too highly concurrent could have negative impacts on other things >>>> running on that host, like the neutron agent, or potentially storming >>>> conductor/rabbit with a ton of DB requests from that compute. >>>> >>>> That doesn't help with the scenario that the big >>>> COMPUTE_RESOURCE_SEMAPHORE lock is held by the periodic task while >>>> spawning, moving, or deleting an instance that also needs access to the >>>> big lock to update the resource tracker, but baby steps if any steps in >>>> this area of the code would be my recommendation. >>>> >>>> [1] >>>> >>>> https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629 >>>> >>>> -- >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From corey.bryant at canonical.com Tue Nov 12 17:46:30 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Tue, 12 Nov 2019 12:46:30 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> Message-ID: On Tue, Nov 12, 2019 at 11:47 AM Zane Bitter wrote: > On 12/11/19 9:12 am, Corey Bryant wrote: > > > > On Mon, Nov 11, 2019 at 2:33 PM Zane Bitter > > wrote: > > > > On 7/11/19 2:11 pm, Corey Bryant wrote: > > > Hello TC members, > > > > > > Python 3.8 is available in Ubuntu Bionic now and while I > > understand it's > > > too late to enable voting py38 unit tests for ussuri, I'd like to > at > > > least enable non-voting py38 unit tests. This email is seeking > > approval > > > and direction from the TC to move forward with enabling > > non-voting py38 > > > tests. > > > > I was a bit fuzzy on this myself, so I looked it up and this is what > > the > > TC decided when we passed the resolution: > > > > > If the new Zuul template contains test jobs that were not in the > > previous one, the goal champion(s) may choose to update the previous > > template to add a non-voting check job (or jobs) to match the gating > > jobs in the new template. This means that all repositories that have > > not yet converted to the template for the upcoming release will see > > a non-voting preview of the new job(s) that will be added once they > > update. If this option is chosen, the non-voting job should be > > limited to the master branch so that it does not run on the > > preceding release’s stable branch. > > > > > > Thanks for digging that up and explaining. I recall that wording and it > > makes a lot more sense now that we have a scenario in front of us. > > > > (from > > > https://governance.openstack.org/tc/resolutions/20181024-python-update-process.html#unit-tests > > > > ) > > > > So to follow that process we would need to define the python versions > > for V, then appoint a goal champion, and after that it would be at > the > > champion's discretion to add a non-voting job on master in Ussuri. I > > happened to be sitting next to Sean when I saw this thread, and after > > discussing it with him I think he would OK with having a non-voting > job > > on every commit, since it's what we have documented. Previous > > discussions established that the overhead of adding one Python unit > > test > > job to every project was pretty inconsequential (we'll offset it by > > dropping 2.7 jobs anyway). > > > > I submitted a draft governance patch defining the Python versions > for V > > (https://review.opendev.org/693743). Unfortunately we can't merge it > > yet > > because we don't have a release name for V (Sean is working on that: > > https://review.opendev.org/693266). It's gazing in the crystal ball > a > > > > > > Thanks very much for getting that going. > > > > little bit, but even if for some reason Ubuntu 20.04 is not released > > before the V cycle starts, it's inevitable that we will be selecting > > Python 3.8 because it meets the first criterion ("The latest released > > version of Python 3 that is available in any distribution we can > > feasibly use for testing") - 3.8 is released and it's available in > > Ubuntu 18.04, which is the distro we use for testing anyway. > > > > So, in my opinion, if you're volunteering to be the goal champion > then > > there's no need for any further approval by the TC ;) > > > > > > Sure, I can champion that. > > Thanks! > > > Just to be clear, would that be Ussuri and V > > python3-updates champion, similar to the following? > > > https://governance.openstack.org/tc/goals/selected/train/python3-updates.html > > Yes, for V it will be similar to that but s/train/v.../ only simpler > because you already did the hard bits :) The goal champion for that is > the one who gets to decide on adding the non-voting py38 job in Ussuri. > > Alright I'll definitely sign up to be champion for V. For U the proposed goal is https://review.opendev.org/691178 - so it > will both update the Zuul template from train->ussuri and drop the py27 > job (the former is a prerequisite for the latter because of reasons - > see https://review.opendev.org/688997). That one is a little more > complicated because we also should drop Python 2 functional tests before > we drop the py27 unit tests, and because things have to happen in a > certain order (services before libraries). OTOH we're only dropping > stuff in this release and not adding new voting jobs that could break. > Currently gmann has listed himself as the champion for that, but I know > he's looking for help (we can have multiple champions for a goal). > Somebody somewhere already has an action item to ask you about it :) > > For Ussuri, I'll get in touch with gmann and see where we can help. > > Granted it's easier now that we mostly just have to switch the job > > template to the new release. > > > > I guess to make that official we should commit the python3 update > Goal > > for the V cycle now... or at least as soon as we have a release name. > > > > > > How far off do you think we are from having a V name? If just a few > > weeks then I'm fine waiting but if over a month I'm more concerned. > > Sean's patch has the naming poll closing on 2019-12-16, and we have to > wait for legal approval from the OSF after that. (Ideally we'd have > started sooner, but we were entertaining proposals to change the process > and there was kind of an assumption that we wouldn't be using the > existing one again.) > > My take is that we shouldn't get too bureaucratic here. The criteria are > well-defined so the outcome is not in doubt. There's no reason to delay > until the patch is formally merged. We operate by lazy consensus, so if > any TC members object they can reply to this thread. I'll flag it in IRC > so people know about it. If there's no objections in the next week or > say then the openstack-zuul-jobs team would be entitled to take that as > approval. > That works for me. I'll check back in a week if nothing else comes up. Thanks, Corey > cheers, > Zane. > > > This is happening a little earlier than I think we anticipated but, > > given that there's no question what is going to happen in V, I don't > > think we'd be doing anybody any favours by delaying the process > > unnecessarily. > > > > > > I agree. And Python 3.9 doesn't release until Oct 2020 so that won't be > > in the picture for Ussuri or V. > > > > > > > For some further background: The next release of Ubuntu, Focal > > (20.04) > > > LTS, is scheduled to release in April 2020. Python 3.8 will be the > > > default in the Focal release, so I'm hopeful that non-voting unit > > tests > > > will help close some of the gap. > > > > > > I have a review here for the zuul project template enablement for > > ussuri: > > > https://review.opendev.org/#/c/693401 > > > > > > Also should this be updated considering py38 would be non-voting? > > > > https://governance.openstack.org/tc/reference/runtimes/ussuri.html > > > > No, I don't think this changes anything for Ussuri. It's preparation > > for V. > > > > > > Ok. Appreciate all the input and help. > > > > Thanks, > > Corey > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shambhu.mcp at hotmail.com Mon Nov 11 05:37:27 2019 From: shambhu.mcp at hotmail.com (SHAMBHU KUMAR) Date: Mon, 11 Nov 2019 05:37:27 +0000 Subject: deployment failed Message-ID: Dear Sir/mam i'm facing the issue while deployment of triple o no valid host found error code 500 Can you please eloborate in this matter because i'm stuck heere Your support will be highly appriciated.. From zhaowx at jxresearch.com Tue Nov 12 04:00:06 2019 From: zhaowx at jxresearch.com (zhaowx at jxresearch.com) Date: Tue, 12 Nov 2019 12:00:06 +0800 Subject: How to set password Message-ID: <201911121200054491741@jxresearch.com>+E418130A170D0D4F hello: When I use `trovestack build-image`, how to set password for the image? thanks zhaowx at jxresearch.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From whayutin at redhat.com Tue Nov 12 17:13:39 2019 From: whayutin at redhat.com (Wesley Hayutin) Date: Tue, 12 Nov 2019 10:13:39 -0700 Subject: [tripleo][ptg] Message-ID: Greetings, Thanks for all those Red Hatters who attended the OpenStack Summit and PTG in Shanghai! A special thanks to those who presented their topics and discussed work items with the folks in attendance. As the current PTL for TripleO I will do my best here to summarize those conversations and items others should be made aware of. Over the course of Thursday and Friday roughly 7-10 folks discussed the identified topics [1], at the following times with my raw notes attached [2]. My apologies if I did not accurately represent your topic here, please feel free to correct me. Thursday, Giulio Fidente: Edge ( SDS storage ) Guilio walked us through some background work with regards to support for storage in remote sites / edge deployments. Working through support for Cinder was straight forward enough with no real collaboration required. Support for ceph copy on write for nova guests was also added with the glance image added to remote sites. Where Guilio needed input was with regards to having change the ctrl plane config for glance for each remote site [3]. This ctrl plane update would force operators to put the cloud in maintenance mode for a stack update. It was determined this could not be avoided at this time. It was noted that the TripleO simplification project and rework puppet-apply, please help us achieve that by reviewing the following two topics [4][5]. Thanks Giulio! Thursday, Guilio Fidente: Virtual IP / Storage Guilio walked us through some challenges with hosting a shared file system on remote/edge sites using manilla. The idea was to use Ganesha translation with CephFS. The proposal was that Ganesha and pacemaker would be managed in the ctrl plane but there was a question with regards to the virtual ip on edge sites. This was an interesting conversation that ended up with a suggestion from Kevin Carter to use a host-only local route on the edge to properly route the ip. This seemed to everyone to be a very clever solution to the problem :) Thanks Guilio, Thanks Kevin! Thursday, Martin Schuppert: Nova CellV2 multicell Martin walked the group through the current status and plans for the multicell implementation. Background: Nova multicells are used to help scale a cloud and partition it in such a way to get the messaging queue closer to the compute cell, essentially rabbit, galera, collector, vnc proxy and a number of compute nodes. This architecture is already in use but with only one default cell, pike was the switch to cellv2. The work started in Stein and continued through train using a similar approach as DCN. Some of the specs are that there is one cell per stack that is initially created from an export of the central stack, more ansible is place for the deployment as well. Two different architectures were noted, all cells in one heat stack [6], and one that splits the cell controllers and computes into different heat stacks w/ multiple stacks on the edge sites [7]. The development work for updates is complete and upgrades is still a WIP. Plans for the future included integrating TLS everywhere and enabling storage in the cell ( cinder, ceph, glance). Tony Breeds pointed out this architecture should just work in multiarch but would like the teams help in designing / advice while creating a test environment. Please review the following patches [13] Thanks Martin!! We tried to get more folks to switch to their topics to Thursday but were not able to. On to Friday. Friday, Edge ( DCN ) roadmap: David Paterson This conversation was informally walked through Thursday mainly with Arkady and Guilio and was followed up on Friday with a joint edge session regarding booting edge nodes. Several questions were raised on Thursday regarding the networking and connectivity for edge sites as it relates to provisioning. Validations were discussed as a way to address the minimum requirements for booting edge nodes. David did not end up presenting here, but was available at the joint session. See the “edge booting” section later in the document for details. Friday, Backup and Restore: Carlos Camacho The project started in Newton. Initially the backup consisted of a database dump and files being backed up for a defined set of use cases. In the field it was discovered that customers had many different kinds of deployments and the feature did not work well for all the intended use cases. An improved plan included to move to full disk image backups utilizing REAR [8]. Carlos also noted that customers are now trying to use ( or misuse ) this feature to perform baremetal to virt migrations. One of the current issues with the current solution include that it’s not clear how services behave after backup and restore.. E.G. OSD mons. Wes Hayutin noted that we have an opportunity to test the full image backup and restore solution by moving to a more image based internal CI system currently being designed by Jesse Pretorious and others. Thanks Carlos!! Friday, Failure Domains: Kevin Carter Unfortunately Kevin was in high demand across PTG events and was unable to present this topic. This should be discussed in a mid-cycle ( virtual or in person ) and written up as a blueprint. Essentially Kevin is proposing in large deployments to allow some number or percentage of nodes to fail a deployment while not failing the entire deployment. If a few non-critical nodes fail a large scale deployment TripleO should be better able to handle that, report back and move on. It was pointed out to me there is a related customer bug as well. Thanks Kevin!! Friday, Cross project: Edge Booting: Julia Kreger You can find notes on this session here [9]. I will only summarize the questions proposed on earlier edge ( DCN ) topic. With regards to when does TripleO need to support redfish there was no immediate or extremely urgent requests ( please correct me if I do not have the correct information there). Redfish IMHO did seem to be a nice improvement when compared to IPMI. This was my first introduction to Redfish, and I of course curious what steps we had to take in order to CI it. Luckily after doc diving I found several helpful links that include steps with setting Redfish up with our own OVB tooling ( hooray \0/ ). Links can be found here [10], and it seems like others have done some hard work to make that possible so thank you!! Thank you Julia!! Friday, Further TripleO Ansible Integration: Wes Hayutin The idea here would be allow the TripleO project to govern how TripleO is deployed with ansible as an operator. The TripleO project would ship ansible modules and roles that directly import python-tripleoclient to support ansible to cli parity [12]. Using or modeling a new repo, perhaps called tripleo-operator-ansible would be used to host these modules and roles and include the same requirements and features of tripleo-ansible’s linting, molecule tests, and auto documentation. This could tie in well with an initiative from Dan Macpherson to ship Ansible playbooks as part of our OSP documentation. Julia Kreger noted that we should not ignore the Ansible OpenStackSDK for part of the deployment process which is a very valid point. Most everyone at the PTG agreed this was a good direction moving forward and would help consolidate the public and internal tooling around TripleO’s CLI in ansible. Thanks Dan, Julia!! Friday, TLS-E standalone gate: Ade Lee Ade Lee walked us through a proposal to help test and CI TLS upstream which has been very difficult to date ( I can personally vouch for this ). Using a two node setup upstream with one node as the IPA server and the other a TripleO standalone deployment. The keystone team is setting the right example for other projects and teams that are finding it difficult to keep outside patches from breaking their code, and that is to find a way to get something voting and gating upstream even if it’s not installed and deployed in the same exact ways customers may use it. Please help by reviewing the keystone / security teams patches here [14] Thanks Ade!! Friday, Octavia tempest plugin support in TripleO-CI: Chandan Kumar Chandan was off fighting battles with the infra team and other projects. Here are some of his notes: Have a rdo third party standalone job with full octavia tempest triggered against octavia patches from stein onwards. FS062 Look into multinode job for queens releases as a third party from queens and rocky. FS038 Have a support of octavia tempest plugin in os_tempest. We certainly should have a conversation offline regarding these topics. I’ll note the TripleO-CI community meeting or the #tripleo meeting on Tuesdays are a good way to continue collaborating here. Thanks Chandan!! Friday, Replace clouds.yaml with an openrc ansible module: Chandan Kumar Open Question: is this module [15] from the openstack-ansible project something we can reuse in TripleO via tripleo-ansible? Friday, Zuul jobs and ansible roles to handle rpm packaging: Chandan Kumar The background and context can be found here https://pagure.io/zuul-distro-jobs - collection of ansible roles to deal with rpm packaging https://opendev.org/openstack/tripleo-quickstart-extras/src/branch/master/roles/build-test-packages -> creating rpms from different projects depends-on in commit message Proposal: Move it to zuul-jobs Making the rpm packaging more generic for centos/fedora/rhel and move it to zuul jobs * Move mock and rpmbuild related roles to zuul jobs repo * Adding a mention of third party zuul jobs to main zuul jobs doc * Build-set_registry setup http server and start the job * Details are here Thanks Chandan!! This is indeed a very interesting and powerful proposal. We should definitely continue this conversation with the broader community. Did you make it all the way down here? Well done! I should add an easter egg :) Links: [1] https://etherpad.openstack.org/p/tripleo-ussuri-topics [2] https://etherpad.openstack.org/p/tripleo-ptg-ussuri [3] https://blueprints.launchpad.net/tripleo/+spec/split-controlplane-glance-cache [4 ] https://review.opendev.org/#/q/topic:disable/paunch+(status:open+OR+status:merged) [5] https://review.opendev.org/#/q/topic:deconstruct/container-puppet+(status:merged) [6] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deploy_cellv2_basic.html [7] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deploy_cellv2_advanced.html https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deploy_cellv2_routed.html [8] https://access.redhat.com/solutions/2115051 [9] https://etherpad.openstack.org/p/PVG-ECG-PTG [10] https://github.com/openstack/sushy https://docs.openstack.org/sushy/latest/contributor/index.html#contributing https://docs.openstack.org/sushy-tools/latest/ https://docs.openstack.org/sushy-tools/latest/user/dynamic-emulator.html#systems-resource-driver-openstack [12] https://hackmd.io/caRlGha7SueZxDRcyq9eGA?both [13] https://review.opendev.org/#/q/topic:cellv2+(status:open+OR+status:merged) [14] https://review.opendev.org/#/q/status:open+project:openstack/tripleo-heat-templates+branch:master+topic:add_standalone_tls [15] https://opendev.org/openstack/openstack-ansible-openstack_openrc [16] https://etherpad.openstack.org/p/PVG-keystone-forum-policy [17 https://datko.pl/zuul.pdf [18] https://github.com/openstack/tripleo-heat-templates/blob/master/README.rst#service-testing-matrix [19] https://github.com/openstack/openstack-virtual-baremetal Thanks all!! Wes Hayutin TripleO-PTL -------------- next part -------------- An HTML attachment was scrubbed... URL: From Albert.Braden at synopsys.com Tue Nov 12 19:42:31 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Tue, 12 Nov 2019 19:42:31 +0000 Subject: Scheduler sends VM to HV that lacks resources Message-ID: If I create 20 VMs at once, at least one of them fails with "Exceeded maximum number of retries." When I look at the logs I see that the scheduler sent the VM to a host that doesn't have enough CPU "Free vcpu 14.00 VCPU < requested 16 VCPU." https://paste.fedoraproject.org/paste/6N3wcDzlbNQgj6hRApHiDQ I thought that this must be caused by a race condition, so I stopped the scheduler and conductor on 2 controllers, and then created 20 more VMs. Now I see the logs only on controller 3, and some of the failures are now saying "Unable to establish connection to " but I still see the single scheduler sending VMs to a host that lacks resources "Free vcpu 14.00 VCPU < requested 16 VCPU." https://paste.fedoraproject.org/paste/lGlVpfbB9C19mMzrWQcHCQ I'm looking at my nova.conf but don't see anything misconfigured. My filters are pretty standard: enabled_filters = RetryFilter,AvailabilityZoneFilter,CoreFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,DifferentHostFilter,SameHostFilter What should I be looking for here? Why would a single scheduler send a VM to a host that is too full? We have lots of compute hosts that are not full: https://paste.fedoraproject.org/paste/6SX9pQ4V1KnWfQkVnfoHOw This is the command line I used: openstack server create --flavor s1.16cx120g --image QSC-P-CentOS6.6-19P1-v4 --network vg-network --max 20 alberttestB -------------- next part -------------- An HTML attachment was scrubbed... URL: From stig.openstack at telfer.org Tue Nov 12 20:05:25 2019 From: stig.openstack at telfer.org (Stig Telfer) Date: Tue, 12 Nov 2019 20:05:25 +0000 Subject: [scientific-sig] IRC meeting today - Shanghai roundup and Supercomputing 2019 Message-ID: <94DEBF9B-F1D4-40CC-91FE-8A6207CAC142@telfer.org> Greetings all - We have a Scientific SIG meeting in about an hour’s time (2100 UTC) in channel #openstack-meeting. Everyone is welcome. Agenda is here: https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_November_12th_2019 We are going to cover a trip report from Shanghai and planning for the many activities coming up next week in Supercomputing. Cheers, Stig -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Tue Nov 12 20:17:33 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 12 Nov 2019 15:17:33 -0500 Subject: [stable][glance] Proposal to add Abhishek Kekane to glance-stable-maint Message-ID: <86e16a59-8d38-6951-e779-26d97f4d2eaa@gmail.com> I'd like to propose adding Abhishek Kekane to the Glance stable maintenance team. He's been a glance core for a few years now, and we are currently understaffed in glance-stable-maint. Plus, he's the current Glance PTL. cheers, brian From smooney at redhat.com Tue Nov 12 20:21:45 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 12 Nov 2019 20:21:45 +0000 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: Message-ID: On Tue, 2019-11-12 at 19:42 +0000, Albert Braden wrote: > If I create 20 VMs at once, at least one of them fails with "Exceeded maximum number of retries." When I look at the > logs I see that the scheduler sent the VM to a host that doesn't have enough CPU "Free vcpu 14.00 VCPU < requested 16 > VCPU." > > https://paste.fedoraproject.org/paste/6N3wcDzlbNQgj6hRApHiDQ > > I thought that this must be caused by a race condition, so I stopped the scheduler and conductor on 2 controllers, and > then created 20 more VMs. Now I see the logs only on controller 3, and some of the failures are now saying "Unable to > establish connection to " but I still see the single scheduler sending VMs to a host that lacks resources "Free > vcpu 14.00 VCPU < requested 16 VCPU." > > https://paste.fedoraproject.org/paste/lGlVpfbB9C19mMzrWQcHCQ > > I'm looking at my nova.conf but don't see anything misconfigured. My filters are pretty standard: > > enabled_filters = > RetryFilter,AvailabilityZoneFilter,CoreFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter, > ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,DifferentHostFilter,SameHostFilter > > What should I be looking for here? Why would a single scheduler send a VM to a host that is too full? We have lots of > compute hosts that are not full: > > https://paste.fedoraproject.org/paste/6SX9pQ4V1KnWfQkVnfoHOw > > This is the command line I used: > > openstack server create --flavor s1.16cx120g --image QSC-P-CentOS6.6-19P1-v4 --network vg-network --max 20 alberttestB what version of openstack are you running? if its not using placement then this behaviour is expected as the resources are not claimed untill the vm is booted on the node so there is and interval where the scudler is selecting hosts where you can race with other vm boot. if you are using placement and you are not using numa or pci pass though, which you do not appear to be based on your enabled filters, then this should not happen and we should dig deeper as there is likely a bug either in your configuration or in nova. From rosmaita.fossdev at gmail.com Tue Nov 12 20:25:10 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 12 Nov 2019 15:25:10 -0500 Subject: [stable][glance] Proposal to remove Flavio Percoco from glance-stable-maint Message-ID: <2fee3345-c064-8428-8376-e8b5f114f422@gmail.com> I just noticed that Flavio is still a member of glance-stable-maint. Nothing against him personally -- he's an excellent dude -- but he hasn't been working on Glance (or OpenStack) for quite a while now and is no longer a member of glance-core, so he probably shouldn't be on the stable-maint team. (Not that he'd do anything bad, it just makes the glance-stable-maint team look larger than it actually is.) On the off chance he'll see this message, I'd like to thank Flavio for all his hard work in the past keeping Glance stable! cheers, brian From Albert.Braden at synopsys.com Tue Nov 12 20:30:00 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Tue, 12 Nov 2019 20:30:00 +0000 Subject: Filter costs / filter order Message-ID: I'm running Rocky and trying to figure out filter order. I'm reading this doc: https://docs.openstack.org/nova/rocky/user/filter-scheduler.html It says: Each filter selects hosts in a different way and has different costs. The order of filter_scheduler.enabled_filters affects scheduling performance. The general suggestion is to filter out invalid hosts as soon as possible to avoid unnecessary costs. We can sort filter_scheduler.enabled_filters items by their costs in reverse order. For example, ComputeFilter is better before any resource calculating filters like RamFilter, CoreFilter. Is there a document that specifies filter costs, or ranks filters by cost? Is there a well-known process for determining the optimal filter order? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Tue Nov 12 20:35:18 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 12 Nov 2019 15:35:18 -0500 Subject: [stable][cinder] Proposal to remove John Griffith from cinder-stable-maint Message-ID: <17610e99-5629-31c8-84ac-430ad06b2b62@gmail.com> John Griffith has taken on other commitments and stepped down as a cinder-core recently, so it doesn't make sense for him to continue on the cinder-stable-maint list. I'd like to acknowledge his role as "The Father of Cinder", though, and express my thanks on behalf of the Cinder team for all his past work on the project. cheers, brian From Tim.Bell at cern.ch Tue Nov 12 20:38:57 2019 From: Tim.Bell at cern.ch (Tim Bell) Date: Tue, 12 Nov 2019 20:38:57 +0000 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> Message-ID: <98616CAC-FEBD-46C2-A7BF-A1EBAD58F78B@cern.ch> Many thanks for the summaries. It’s really helpful for those who could not be in the discussions. CERN are also using ML2/Linuxbridge so we’d welcome being involved in any deprecation discussions and migration paths. Tim > On 12 Nov 2019, at 14:53, Slawek Kaplonski wrote: > > Hi Neutron team, > > First if all thank to all of You for great and very productive week during the > PTG in Shanghai. > Below is summary of our discussions from whole 3 days. > If I forgot about something, please respond to the email and update missing > informations. But if You want to have follow up discussion about one of the > topics from this summary, please start a new thread to keep this one only as > high level summary of the PTG. > > ... > Starting the process of removing ML2/Linuxbridge > ================================================ > > Currently in Neutron tree we have 4 drivers: > * Linuxbridge, > * Openvswitch, > * macvtap, > * sriov. > SR-IOV driver is out of discussion here as this driver is > addressing slightly different use case than other out drivers. > > We started discussion about above topic because we don't want to end up with too > many drivers in-tree and we also had some discussions (and we have spec for that > already) about include networking-ovn as in-tree driver. > So with networking-ovn in-tree we would have already 4 drivers which can be used > on any hardware: linuxbridge, ovs, macvtap and ovn. > Conclusions from the discussion are: > * each driver requires proper testing in the gate, so we need to add many new > jobs to our check/gate queue, > * currently linuxbridge driver don't have a lot of development and feature > parity gaps between linuxbridge and ovs drivers is getting bigger and bigger > (e.g. dvr, trunk ports), > * also macvtap driver don't have a lot of activity in last few cycles. Maybe > this one could be also considered as candidate to deprecation, > * we need to have process of deprecating some drivers and time horizon for such > actions should be at least 2 cycles. > * we will not remove any driver completly but rather we will move it to be in > stadium process first so it still can be maintained by people who are > interested in it. > > Actions to do after this discussion: > * Miguel Lavalle will contact RAX and Godaddy (we know that those are > Linuxbridge users currently) to ask about their feedback about this, > * if there are any other companies using LB driver, Nate Johnston is willing to > help conctating them, please reach to him in such case. > * we may ratify marking linuxbridge as deprecated in the team meeting during > Ussuri cycle if nothing surprising pops in. > From Albert.Braden at synopsys.com Tue Nov 12 20:47:38 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Tue, 12 Nov 2019 20:47:38 +0000 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: Message-ID: We are running placement under apache: https://paste.fedoraproject.org/paste/mZviLVe5xONPsXfLqdxI6A The placement error logs show a lot of GETs but no errors: https://paste.fedoraproject.org/paste/xDVGaXEdoQ5Z3wHv17Lezg We are planning to use NUMA but haven't started yet. It's probably a config error. Where should I be looking? This is our nova config on the controllers: https://paste.fedoraproject.org/paste/kNe1eRimk4ifrAuuN790bg -----Original Message----- From: Sean Mooney Sent: Tuesday, November 12, 2019 12:22 PM To: Albert Braden ; openstack-discuss at lists.openstack.org Subject: Re: Scheduler sends VM to HV that lacks resources On Tue, 2019-11-12 at 19:42 +0000, Albert Braden wrote: > If I create 20 VMs at once, at least one of them fails with "Exceeded maximum number of retries." When I look at the > logs I see that the scheduler sent the VM to a host that doesn't have enough CPU "Free vcpu 14.00 VCPU < requested 16 > VCPU." > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_6N3wcDzlbNQgj6hRApHiDQ&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=buklMe5R5iK--nSTPE8_2kdSLjTRHLCbk0XatjhiCnY&e= > > I thought that this must be caused by a race condition, so I stopped the scheduler and conductor on 2 controllers, and > then created 20 more VMs. Now I see the logs only on controller 3, and some of the failures are now saying "Unable to > establish connection to " but I still see the single scheduler sending VMs to a host that lacks resources "Free > vcpu 14.00 VCPU < requested 16 VCPU." > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_lGlVpfbB9C19mMzrWQcHCQ&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=PxLwkpEiTHvHxuPTPo0Pt5IHhe79vfnQqLgLLb7JQ8Y&e= > > I'm looking at my nova.conf but don't see anything misconfigured. My filters are pretty standard: > > enabled_filters = > RetryFilter,AvailabilityZoneFilter,CoreFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter, > ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,DifferentHostFilter,SameHostFilter > > What should I be looking for here? Why would a single scheduler send a VM to a host that is too full? We have lots of > compute hosts that are not full: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_6SX9pQ4V1KnWfQkVnfoHOw&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=Yl9s2ZJ47GPXSyPzh6Hf0gyoxbqKGD9J9I2eSE0V8TA&e= > > This is the command line I used: > > openstack server create --flavor s1.16cx120g --image QSC-P-CentOS6.6-19P1-v4 --network vg-network --max 20 alberttestB what version of openstack are you running? if its not using placement then this behaviour is expected as the resources are not claimed untill the vm is booted on the node so there is and interval where the scudler is selecting hosts where you can race with other vm boot. if you are using placement and you are not using numa or pci pass though, which you do not appear to be based on your enabled filters, then this should not happen and we should dig deeper as there is likely a bug either in your configuration or in nova. From aschultz at redhat.com Tue Nov 12 20:53:18 2019 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 12 Nov 2019 13:53:18 -0700 Subject: [tripleo] Re: deployment failed In-Reply-To: References: Message-ID: On Tue, Nov 12, 2019 at 11:43 AM SHAMBHU KUMAR wrote: > Dear Sir/mam > > i'm facing the issue while deployment of triple o > > > no valid host found error code 500 > > > Can you please eloborate in this matter because i'm stuck heere > > > Your support will be highly appriciated.. > > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/troubleshooting/index.html This error traditionally means that the hardware was unable to be provisioned. Check the nova/ironic logs for additional information. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Tue Nov 12 21:13:56 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 12 Nov 2019 15:13:56 -0600 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: Message-ID: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> On 11/12/2019 2:47 PM, Albert Braden wrote: > It's probably a config error. Where should I be looking? This is our nova config on the controllers: > > https://paste.fedoraproject.org/paste/kNe1eRimk4ifrAuuN790bg If your deployment is pike or newer (I'm guessing rocky because your other email says rocky), then you don't need these filters: RetryFilter - alternate hosts bp in queens release makes this moot CoreFilter - placement filters on VCPU RamFilter - placement filters on MEMORY_MB -- Thanks, Matt From mriedemos at gmail.com Tue Nov 12 21:15:13 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 12 Nov 2019 15:15:13 -0600 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> Message-ID: On 11/12/2019 3:13 PM, Matt Riedemann wrote: > If your deployment is pike or newer (I'm guessing rocky because your > other email says rocky), then you don't need these filters: > > RetryFilter - alternate hosts bp in queens release makes this moot > CoreFilter - placement filters on VCPU > RamFilter - placement filters on MEMORY_MB Sorry, I should have said: If your deployment is pike or newer then you don't need the CoreFilter or RamFilter. If your deployment is queens or newer then you don't need the RetryFilter. -- Thanks, Matt From smooney at redhat.com Tue Nov 12 21:22:49 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 12 Nov 2019 21:22:49 +0000 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: Message-ID: am what version of openstack have you deployed. i did not see that in your email. is it ocata or newer http://lists.openstack.org/pipermail/openstack-dev/2018-January/126283.html i see you have the CoreFilter and RamFilter filters enabled. form octa on they shoudl be disabled as we claim those in placement but it should not break anything on older releases. we have removed them in train after we removed the caching scheduler. On Tue, 2019-11-12 at 20:47 +0000, Albert Braden wrote: > We are running placement under apache: > > https://paste.fedoraproject.org/paste/mZviLVe5xONPsXfLqdxI6A > > The placement error logs show a lot of GETs but no errors: > > https://paste.fedoraproject.org/paste/xDVGaXEdoQ5Z3wHv17Lezg > > We are planning to use NUMA but haven't started yet. It's probably a config error. Where should I be looking? This is > our nova config on the controllers: > > https://paste.fedoraproject.org/paste/kNe1eRimk4ifrAuuN790bg > > -----Original Message----- > From: Sean Mooney > Sent: Tuesday, November 12, 2019 12:22 PM > To: Albert Braden ; openstack-discuss at lists.openstack.org > Subject: Re: Scheduler sends VM to HV that lacks resources > > On Tue, 2019-11-12 at 19:42 +0000, Albert Braden wrote: > > If I create 20 VMs at once, at least one of them fails with "Exceeded maximum number of retries." When I look at the > > logs I see that the scheduler sent the VM to a host that doesn't have enough CPU "Free vcpu 14.00 VCPU < requested > > 16 > > VCPU." > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_6N3wcDzlbNQgj6hRApHiDQ&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=buklMe5R5iK--nSTPE8_2kdSLjTRHLCbk0XatjhiCnY&e= > > > > > > I thought that this must be caused by a race condition, so I stopped the scheduler and conductor on 2 controllers, > > and > > then created 20 more VMs. Now I see the logs only on controller 3, and some of the failures are now saying "Unable > > to > > establish connection to " but I still see the single scheduler sending VMs to a host that lacks resources "Free > > vcpu 14.00 VCPU < requested 16 VCPU." > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_lGlVpfbB9C19mMzrWQcHCQ&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=PxLwkpEiTHvHxuPTPo0Pt5IHhe79vfnQqLgLLb7JQ8Y&e= > > > > > > I'm looking at my nova.conf but don't see anything misconfigured. My filters are pretty standard: > > > > enabled_filters = > > RetryFilter,AvailabilityZoneFilter,CoreFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilte > > r, > > ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,DifferentHostFilter,SameHostFilter > > > > What should I be looking for here? Why would a single scheduler send a VM to a host that is too full? We have lots > > of > > compute hosts that are not full: > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_6SX9pQ4V1KnWfQkVnfoHOw&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=Yl9s2ZJ47GPXSyPzh6Hf0gyoxbqKGD9J9I2eSE0V8TA&e= > > > > > > This is the command line I used: > > > > openstack server create --flavor s1.16cx120g --image QSC-P-CentOS6.6-19P1-v4 --network vg-network --max 20 > > alberttestB > > what version of openstack are you running? > if its not using placement then this behaviour is expected as the resources are not claimed untill the vm is booted on > the node so there is and interval where the scudler is selecting hosts where you can race with other vm boot. > > if you are using placement and you are not using numa or pci pass though, which you do not appear to be based on your > enabled filters, then this should not happen and we should dig deeper as there is likely a bug either in your > configuration or in nova. > From Albert.Braden at synopsys.com Tue Nov 12 21:25:25 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Tue, 12 Nov 2019 21:25:25 +0000 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: Message-ID: We're on Rocky -----Original Message----- From: Sean Mooney Sent: Tuesday, November 12, 2019 1:23 PM To: Albert Braden ; openstack-discuss at lists.openstack.org Subject: Re: Scheduler sends VM to HV that lacks resources am what version of openstack have you deployed. i did not see that in your email. is it ocata or newer https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_pipermail_openstack-2Ddev_2018-2DJanuary_126283.html&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=kS78s-S5ZnMEZmKqQgS-OX7Xz_pMjQQrrzv8fC0H5s8&s=jdZtsoEhWjn-EV3ffMxUc8E5Xum3xXbpR-0gpGp2Y14&e= i see you have the CoreFilter and RamFilter filters enabled. form octa on they shoudl be disabled as we claim those in placement but it should not break anything on older releases. we have removed them in train after we removed the caching scheduler. On Tue, 2019-11-12 at 20:47 +0000, Albert Braden wrote: > We are running placement under apache: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_mZviLVe5xONPsXfLqdxI6A&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=kS78s-S5ZnMEZmKqQgS-OX7Xz_pMjQQrrzv8fC0H5s8&s=-7cWLHHrr0qduVnO6FYrDXp3b3QSIBgC3M3CABtQup8&e= > > The placement error logs show a lot of GETs but no errors: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_xDVGaXEdoQ5Z3wHv17Lezg&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=kS78s-S5ZnMEZmKqQgS-OX7Xz_pMjQQrrzv8fC0H5s8&s=tQuMny6EiubEruJIJyN1zj2GSUBGBzqD3SW06H8ZIe8&e= > > We are planning to use NUMA but haven't started yet. It's probably a config error. Where should I be looking? This is > our nova config on the controllers: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_kNe1eRimk4ifrAuuN790bg&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=kS78s-S5ZnMEZmKqQgS-OX7Xz_pMjQQrrzv8fC0H5s8&s=r1qv0CcWP5-3CXkQsiNgoe3pxGqKGkqymdjTLsJ9dYI&e= > > -----Original Message----- > From: Sean Mooney > Sent: Tuesday, November 12, 2019 12:22 PM > To: Albert Braden ; openstack-discuss at lists.openstack.org > Subject: Re: Scheduler sends VM to HV that lacks resources > > On Tue, 2019-11-12 at 19:42 +0000, Albert Braden wrote: > > If I create 20 VMs at once, at least one of them fails with "Exceeded maximum number of retries." When I look at the > > logs I see that the scheduler sent the VM to a host that doesn't have enough CPU "Free vcpu 14.00 VCPU < requested > > 16 > > VCPU." > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_6N3wcDzlbNQgj6hRApHiDQ&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=buklMe5R5iK--nSTPE8_2kdSLjTRHLCbk0XatjhiCnY&e= > > > > > > I thought that this must be caused by a race condition, so I stopped the scheduler and conductor on 2 controllers, > > and > > then created 20 more VMs. Now I see the logs only on controller 3, and some of the failures are now saying "Unable > > to > > establish connection to " but I still see the single scheduler sending VMs to a host that lacks resources "Free > > vcpu 14.00 VCPU < requested 16 VCPU." > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_lGlVpfbB9C19mMzrWQcHCQ&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=PxLwkpEiTHvHxuPTPo0Pt5IHhe79vfnQqLgLLb7JQ8Y&e= > > > > > > I'm looking at my nova.conf but don't see anything misconfigured. My filters are pretty standard: > > > > enabled_filters = > > RetryFilter,AvailabilityZoneFilter,CoreFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilte > > r, > > ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,DifferentHostFilter,SameHostFilter > > > > What should I be looking for here? Why would a single scheduler send a VM to a host that is too full? We have lots > > of > > compute hosts that are not full: > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_6SX9pQ4V1KnWfQkVnfoHOw&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=Yl9s2ZJ47GPXSyPzh6Hf0gyoxbqKGD9J9I2eSE0V8TA&e= > > > > > > This is the command line I used: > > > > openstack server create --flavor s1.16cx120g --image QSC-P-CentOS6.6-19P1-v4 --network vg-network --max 20 > > alberttestB > > what version of openstack are you running? > if its not using placement then this behaviour is expected as the resources are not claimed untill the vm is booted on > the node so there is and interval where the scudler is selecting hosts where you can race with other vm boot. > > if you are using placement and you are not using numa or pci pass though, which you do not appear to be based on your > enabled filters, then this should not happen and we should dig deeper as there is likely a bug either in your > configuration or in nova. > From colleen at gazlene.net Tue Nov 12 21:42:18 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Tue, 12 Nov 2019 13:42:18 -0800 Subject: [keystone] Shanghai forum/PTG recap Message-ID: <9e8f0a58-b20d-49e2-87a6-5afa7c73423d@www.fastmail.com> While the keystone team didn't itself meet at last week's PTG, I posted a recap of the event from a keystone perspective here: http://www.gazlene.net/shanghai-forum-ptg.html Hope it's a useful summary for those who couldn't attend in-person. Colleen From smooney at redhat.com Tue Nov 12 21:46:28 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 12 Nov 2019 21:46:28 +0000 Subject: Filter costs / filter order In-Reply-To: References: Message-ID: <24b8fe814dd497bb6e39a255fefcea24a44bb518.camel@redhat.com> On Tue, 2019-11-12 at 20:30 +0000, Albert Braden wrote: > I'm running Rocky and trying to figure out filter order. I'm reading this doc: > https://docs.openstack.org/nova/rocky/user/filter-scheduler.html > > It says: > > Each filter selects hosts in a different way and has different costs. The order of filter_scheduler.enabled_filters > affects scheduling performance. The general suggestion is to filter out invalid hosts as soon as possible to avoid > unnecessary costs. We can sort filter_scheduler.enabled_filters items by their costs in reverse order. For example, > ComputeFilter is better before any resource calculating filters like RamFilter, CoreFilter. > > Is there a document that specifies filter costs, or ranks filters by cost? Is there a well-known process for > determining the optimal filter order? im not a aware of a specific document that cover it but this will very based on deployment. as a general guideline you should order your filter by which ones elmiate the most hosts. so the AvailabilityZoneFilter should generally be first. in older release the retry filter shoudl go first. the numa toplogy filter and pci passthough filter are kind fo expensive. so they are better to have near the end. so i would start with the Aggreaget* filters first folowed by "cheap" filter that dont have any complex boolean logic so SameHostFilter, DifferentHostFilter, IoOpsFilter, NumInstancesFilter there are a few others the the more complex filters like numa toplogy, pci passthogh, ComputeCapabilitiesFilter, JsonFilter effectivly what you want to do is maxius the infomation gain at each filtering step will miniusing the cost(reducing the possible host with as few cpu cycles as posible) its important to only enable the filter that matter to your deployment also but if we had a perfect costing for each filter then you could follow the ID3 algorithm to get an optimal layout. https://en.wikipedia.org/wiki/ID3_algorithm i have wanted to experiment with tracing the boot requests on large public clould and model this for some time but i always endup finding other things to thinker with instead but i think even with out that data to work with you could do some intersting things with code complexity metricts as a proxy to try and auto sort them. perhaps some of the operator can share what they do i know cern pre placement used to map tenant to cells as there first filtering step which signifcatly helped them with scale but if the goal is speed then you need to have each step give you the maxium infomation gain for the minium addtional cost. that is why the aggreate filters and multi host filters like affintiy filters tend to be better at the start of the list and very detailed filters like the numa topolgy filter then to be better at the end. From mriedemos at gmail.com Tue Nov 12 22:09:27 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 12 Nov 2019 16:09:27 -0600 Subject: [ptg][nova][cinder] x-p meeting minutes In-Reply-To: References: Message-ID: On 11/12/2019 8:07 AM, Sylvain Bauza wrote: > We aren't sure that Nova will allow a detach of a boot volume. This was never completed: https://specs.openstack.org/openstack/nova-specs/specs/train/approved/detach-boot-volume.html -- Thanks, Matt From mriedemos at gmail.com Tue Nov 12 22:11:41 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 12 Nov 2019 16:11:41 -0600 Subject: [stable][glance] Proposal to remove Flavio Percoco from glance-stable-maint In-Reply-To: <2fee3345-c064-8428-8376-e8b5f114f422@gmail.com> References: <2fee3345-c064-8428-8376-e8b5f114f422@gmail.com> Message-ID: <0191ee00-d6fb-80ba-2b74-db76e6219360@gmail.com> On 11/12/2019 2:25 PM, Brian Rosmaita wrote: > I just noticed that Flavio is still a member of glance-stable-maint. > Nothing against him personally -- he's an excellent dude -- but he > hasn't been working on Glance (or OpenStack) for quite a while now and > is no longer a member of glance-core, so he probably shouldn't be on the > stable-maint team.  (Not that he'd do anything bad, it just makes the > glance-stable-maint team look larger than it actually is.) Done. -- Thanks, Matt From mriedemos at gmail.com Tue Nov 12 22:11:53 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 12 Nov 2019 16:11:53 -0600 Subject: [stable][cinder] Proposal to remove John Griffith from cinder-stable-maint In-Reply-To: <17610e99-5629-31c8-84ac-430ad06b2b62@gmail.com> References: <17610e99-5629-31c8-84ac-430ad06b2b62@gmail.com> Message-ID: On 11/12/2019 2:35 PM, Brian Rosmaita wrote: > John Griffith has taken on other commitments and stepped down as a > cinder-core recently, so it doesn't make sense for him to continue on > the cinder-stable-maint list. Done. -- Thanks, Matt From openstack at nemebean.com Tue Nov 12 23:03:26 2019 From: openstack at nemebean.com (Ben Nemec) Date: Tue, 12 Nov 2019 17:03:26 -0600 Subject: [qa] required rabbitMQ materials In-Reply-To: References: <7620140da67f4b38bfbf0e88ab212874@inspur.com> Message-ID: On 10/31/19 11:03 AM, Brin Zhang(张百林) wrote: > Hi all > Can anyone provide me with some materials about RabbitMQ? like its implementation mechanisms, scenarios, etc. > Thanks anyway. There's some documentation about rabbitmq in the oslo.messaging docs: https://docs.openstack.org/oslo.messaging/train/admin/rabbit.html I don't know if that's exactly what you're looking for, but hopefully it will get you started and you can ask specific questions as a followup. > > Brin Zhang > From jasonanderson at uchicago.edu Tue Nov 12 23:17:00 2019 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Tue, 12 Nov 2019 23:17:00 +0000 Subject: [blazar] Why/how does Blazar use Keystone trusts? Message-ID: Hi Blazar contributors, We hit an issue today involving trusts in Blazar, where a host couldn't be deleted due to some issue authenticating against the trust associated with the host. We still haven't resolved this issue, but it felt odd to me: why is a trust even involved here? I have often wondered what the reason is for using trusts in Blazar, as I can't think of anything Blazar is doing that could not be done by the Blazar system user (and in fact, many operations are done via this user... via another trust.) There are also issues where a user leaves a project before their leases have ended; in this case Blazar has difficulty cleaning up because it tries to resurrect a trust that is not tied to a valid user/project relationship. Does anybody have context on to how trusts are used in Blazar and if they are still necessary? Does it make sense to remove this functionality? Thank you, -- Jason Anderson Chameleon DevOps Lead Consortium for Advanced Science and Engineering, The University of Chicago Mathematics & Computer Science Division, Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony at bakeyournoodle.com Tue Nov 12 23:29:27 2019 From: tony at bakeyournoodle.com (Tony Breeds) Date: Wed, 13 Nov 2019 10:29:27 +1100 Subject: [nova][ptg] PCI refactoring needs and a strawman proposal inside In-Reply-To: <35cff7ea13a85bde43a4626c84b6bc130eb67110.camel@redhat.com> References: <35cff7ea13a85bde43a4626c84b6bc130eb67110.camel@redhat.com> Message-ID: <20191112232927.GB22972@thor.bakeyournoodle.com> On Tue, Nov 12, 2019 at 01:29:32PM +0000, Sean Mooney wrote: > On Tue, 2019-11-12 at 11:51 +0100, Sylvain Bauza wrote: > i have a list of things i want to enhance related to pci/sriov so i would be interested > in this topic too. it might be worth consiering a SIG on this topic if it will be > cross project. /me too I have a bunch of work to do with device pass-through that may, or may not be on the PCI bus so I'd like to ensure we don't make my use case impossible ;P Yours Tony. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From tony at bakeyournoodle.com Tue Nov 12 23:42:33 2019 From: tony at bakeyournoodle.com (Tony Breeds) Date: Wed, 13 Nov 2019 10:42:33 +1100 Subject: [nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations? In-Reply-To: References: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> Message-ID: <20191112234233.GC22972@thor.bakeyournoodle.com> On Tue, Nov 05, 2019 at 08:51:13AM -0800, Dan Smith wrote: > > The question I'm posing is if people would like to see those options > > backported to stein and if so, would the stable team be OK with it? > > I'd say this falls into a gray area where these are things that are > > optional, not used by default, and are operational tooling so less > > risk to backport, but it's not zero risk. It's also worth noting that > > when I wrote those patches I did so with the intent that people could > > backport them at least internally. > > Backporting features to operator tooling that helps them recover from > bugs or other failures without doing database surgery seems like a good > thing. Hard to argue that the risk outweighs the benefit, IMHO. FWIW, I agree with this. #makeitso Yours Tony. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From soulxu at gmail.com Wed Nov 13 03:42:02 2019 From: soulxu at gmail.com (Alex Xu) Date: Wed, 13 Nov 2019 11:42:02 +0800 Subject: [nova] track error migrations and orphans in Resource Tracker In-Reply-To: References: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> Message-ID: Sean Mooney 于2019年11月12日周二 下午9:27写道: > On Tue, 2019-11-12 at 05:46 +0000, Zhong, Luyao wrote: > > Hi Nova experts, > > > > "Not tracking error migrations and orphans in RT." is probably a bug. > This may trigger some problems in > > update_available_resources in RT at the moment. That is some orphans or > error migrations are using cpus/memory/disk > > etc, but we don't take these usage into consideration. And > instance.resources is introduced from Train used to contain > > specific resources, we also track assigned specific resources in RT > based on tracked migrations and instances. So this > > bug will also affect the specific resources tracking. > > > > I draft an doc to clarify this bug and possible solutions: > > https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT > > Looking forward to suggestions from you. Thanks in advance. > > > there are patche up to allow cleaning up orpahn instances > https://review.opendev.org/#/c/627765/ > https://review.opendev.org/#/c/648912/ > if we can get those merged that woudl adress at least some of the proablem > Yes, and we separate the issue to be two parts, one part is tracking, another part is cleanup. Yongli's patch will help on cleanup. > > > Best Regards, > > Luyao > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.denton at rackspace.com Wed Nov 13 04:31:20 2019 From: james.denton at rackspace.com (James Denton) Date: Wed, 13 Nov 2019 04:31:20 +0000 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: <98616CAC-FEBD-46C2-A7BF-A1EBAD58F78B@cern.ch> References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> <98616CAC-FEBD-46C2-A7BF-A1EBAD58F78B@cern.ch> Message-ID: <59D1EFBF-B929-4814-9820-F5A9FAF9DA5C@rackspace.com> Appreciate the summary as well. For what it's worth, the ML2/LinuxBridge combo has been a very stable setup for us since its inception, and I'd hate to see it deprecated and removed for the sake of removing something. Last I checked, trunk ports were supported with the ML2/LinuxBridge driver. And while of course DVR is not a supported feature, a good number of our ML2/LXB environments forgo Neutron routers altogether in favor of putting VMs on the provider network. It has shown to be as performant as vanilla OVS, and a simpler model to implement and support as an operator. Just my two cents. Thanks, James Denton Network Engineer Rackspace Private Cloud james.denton at rackspace.com On 11/12/19, 3:41 PM, "Tim Bell" wrote: CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Many thanks for the summaries. It’s really helpful for those who could not be in the discussions. CERN are also using ML2/Linuxbridge so we’d welcome being involved in any deprecation discussions and migration paths. Tim > On 12 Nov 2019, at 14:53, Slawek Kaplonski wrote: > > Hi Neutron team, > > First if all thank to all of You for great and very productive week during the > PTG in Shanghai. > Below is summary of our discussions from whole 3 days. > If I forgot about something, please respond to the email and update missing > informations. But if You want to have follow up discussion about one of the > topics from this summary, please start a new thread to keep this one only as > high level summary of the PTG. > > ... > Starting the process of removing ML2/Linuxbridge > ================================================ > > Currently in Neutron tree we have 4 drivers: > * Linuxbridge, > * Openvswitch, > * macvtap, > * sriov. > SR-IOV driver is out of discussion here as this driver is > addressing slightly different use case than other out drivers. > > We started discussion about above topic because we don't want to end up with too > many drivers in-tree and we also had some discussions (and we have spec for that > already) about include networking-ovn as in-tree driver. > So with networking-ovn in-tree we would have already 4 drivers which can be used > on any hardware: linuxbridge, ovs, macvtap and ovn. > Conclusions from the discussion are: > * each driver requires proper testing in the gate, so we need to add many new > jobs to our check/gate queue, > * currently linuxbridge driver don't have a lot of development and feature > parity gaps between linuxbridge and ovs drivers is getting bigger and bigger > (e.g. dvr, trunk ports), > * also macvtap driver don't have a lot of activity in last few cycles. Maybe > this one could be also considered as candidate to deprecation, > * we need to have process of deprecating some drivers and time horizon for such > actions should be at least 2 cycles. > * we will not remove any driver completly but rather we will move it to be in > stadium process first so it still can be maintained by people who are > interested in it. > > Actions to do after this discussion: > * Miguel Lavalle will contact RAX and Godaddy (we know that those are > Linuxbridge users currently) to ask about their feedback about this, > * if there are any other companies using LB driver, Nate Johnston is willing to > help conctating them, please reach to him in such case. > * we may ratify marking linuxbridge as deprecated in the team meeting during > Ussuri cycle if nothing surprising pops in. > From akekane at redhat.com Wed Nov 13 05:57:31 2019 From: akekane at redhat.com (Abhishek Kekane) Date: Wed, 13 Nov 2019 11:27:31 +0530 Subject: [ptg][glance] PTG summary Message-ID: Hi All, I attended OpenInfra summit and PTG at Shanghai last week. It was an interesting event with lots of discussion happening around different OpenStack projects. I was mostly associated with Glance and cross-projects work related to Glance. There were other topics around Edge, UI and QA. During summit Me and Erno gave a Glance project update where we discussed what we achieved in Train cycle and what we are going to do in Ussuri cycle. As multiple stores feature is stabilized during Train, In Ussuri main focus of glance is on enhancing /v2/import API to import single image into multiple stores and copying existing images to multiple stores to avoid the manual efforts required by operator to copy the image across the stores. New delete API will also be added to delete the image from single store, also cinder driver of glance_store needs refactoring so that it can use multiple backends configured by cinder. Efforts will be continued for cluster awareness of glance API during this cycle as well. Apart from these edge related work, Glance team will also work on removing deprecated registry and related functional tests, removing of sheepdog driver from glance_store, adding s3 driver with multiple stores support in glance_store and some urgent bug fixes. Cross-Project work: In this PTG we had discussion with Nova and Cinder regarding the adoption of multiple store feature of Glance. As per discussion we have finalized the design and Glance team will work together with Nova and Cinder towards adding multiple store support feature in Train cycle. Support for Glance multiple stores in Cinder: As per discussion, volume-type will be used to add which store the image will be uploaded on upload-to-image operation, also cinder will send base image id to glance as a header using which glance will upload the image created from volume to all those stores in which base image is present. Nova snapshots to dedicated store: Agreement is, Nova will send us a base image id to glance as a header using which glance will upload the instance snapshot to all those stores in which base image is present. Talk with QA team: Glance has also talked with QA team for adding new tempest coverage for newly added features in the last couple of cycles, Glance team will work with tempest to add below new tempest tests. 1. New import workflow (image conversion, inject metadata etc.) - Depends on https://review.opendev.org/#/c/545483/ devstack patch 2. Hide old images 3. Multiple stores: https://review.opendev.org/#/c/689104/ in devstack 3.1 Devstack patch + zuul job to setup multiple stores and the job will run on glance and run glance api and scenario tests 4. Delete barbican secrets from glance images 4.1 add the tests the in barbican-tempest-plugin 4.2 run as part of barbican gate using their job 4.3 run that tests with new job (multi stores) on glance gate. do not run barbican job on glance. Below is the Ussuri cycle planning and deadline for Glance. Ussuri milestone planning: Ussuri U1 - December 09-13: 1. Import image in multiple stores (Specs + Implementation) 2. Copy existing image in multiple stores (Specs + Implementation) 3. S3 driver for glance 4. remove sheepdog driver from glance_store 5. Fix subunit parser error 6. Modify existing nova and cinder specs Ussuri U2 - February 10-14 1. Cluster awareness of glance API nodes 2. remove registry code 3. Delete image from single store 4. Nova and Cinder upload snapshot/volume to glance 5. image-import.conf parsing issue with uwsgi Ussuri U3 - April 06-10 1. Multiple cinder store support in glance_store (specs + implementation) 2. Creating image from volume using ceph (slow uploading issue) 3. Image encryption 4. Tempest work Glance PTG planning etherpad: https://etherpad.openstack.org/p/Glance-Ussuri-PTG-planning Let me know if you guys need more details on this. Thanks & Best Regards, Abhishek Kekane -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Wed Nov 13 08:25:31 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 13 Nov 2019 09:25:31 +0100 Subject: [blazar] Why/how does Blazar use Keystone trusts? In-Reply-To: References: Message-ID: Let me tell you a story. Long time ago, in a far far away galaxy, some people wanting to have a way to reserve some compute nodes in OpenStack created a new project that was named "Climate". Those folks weren't really knowing Keystone but they saw some problem : when the reservation was beginning, the token was expired. For that specific reason, they tried to see how to fix it and then saw Keystone trusts. They then said "heh, nice" and they started to use it. After 5 years, nobody really thought whether trusts should still be needed. Maybe the new Blazar team should look at service tokens, rather. Anyway, just my 2cts. -Sylvain On Wed, Nov 13, 2019 at 12:26 AM Jason Anderson wrote: > Hi Blazar contributors, > > We hit an issue today involving trusts in Blazar, where a host couldn't be > deleted due to some issue authenticating against the trust associated with > the host. We still haven't resolved this issue, but it felt odd to me: why > is a trust even involved here? > > I have often wondered what the reason is for using trusts in Blazar, as I > can't think of anything Blazar is doing that could not be done by the > Blazar system user (and in fact, many operations are done via this user... > via another trust.) There are also issues where a user leaves a project > before their leases have ended; in this case Blazar has difficulty cleaning > up because it tries to resurrect a trust that is not tied to a valid > user/project relationship. > > Does anybody have context on to how trusts are used in Blazar and if they > are still necessary? Does it make sense to remove this functionality? > > Thank you, > > -- > Jason Anderson > > Chameleon DevOps Lead > *Consortium for Advanced Science and Engineering, The University of > Chicago* > *Mathematics & Computer Science Division, Argonne National Laboratory* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Wed Nov 13 09:32:12 2019 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 13 Nov 2019 10:32:12 +0100 Subject: [stable][glance] Proposal to remove Flavio Percoco from glance-stable-maint In-Reply-To: <0191ee00-d6fb-80ba-2b74-db76e6219360@gmail.com> References: <2fee3345-c064-8428-8376-e8b5f114f422@gmail.com> <0191ee00-d6fb-80ba-2b74-db76e6219360@gmail.com> Message-ID: <22e5ebb5-b077-f503-3062-bf38a8347614@openstack.org> Matt Riedemann wrote: > On 11/12/2019 2:25 PM, Brian Rosmaita wrote: >> I just noticed that Flavio is still a member of glance-stable-maint. >> Nothing against him personally -- he's an excellent dude -- but he >> hasn't been working on Glance (or OpenStack) for quite a while now and >> is no longer a member of glance-core, so he probably shouldn't be on >> the stable-maint team.  (Not that he'd do anything bad, it just makes >> the glance-stable-maint team look larger than it actually is.) > > Done. Might be a good idea to remove him from stable-maint-core as well (https://review.opendev.org/#/admin/groups/530,members) otherwise this is a noop. -- Thierry Carrez (ttx) From dharmendra.kushwaha at gmail.com Wed Nov 13 10:15:22 2019 From: dharmendra.kushwaha at gmail.com (Dharmendra Kushwaha) Date: Wed, 13 Nov 2019 15:45:22 +0530 Subject: [tc][horizon][all] Horizon plugins maintenance In-Reply-To: References: Message-ID: Hi, As discussed in PTG, I had added horizon-core into tacker-horizon-core team. Thanks for your support. Thanks & Regards Dharmendra Kushwaha On Wed, Oct 23, 2019 at 6:20 PM Ivan Kolodyazhny wrote: > Hi team, > > As you may know, we've got a pretty big list of Horizon Plugins [1]. > Unfortunately, not all of them are in active development due to the lack of > resources in projects teams. > > As a Horizon team, we understand all the reasons, and we're doing our best > to help other teams to maintain plugins. > > That's why we're proposing our help to maintain horizon plugins. We raised > this topic during the last Horizon weekly meeting [2] and we'll have some > discussion during the PTG [3] too. > > There are a lot of Horizon changes which affect plugins and horizon team > is ready to help: > - new Django versions > - dependencies updates > - Horizon API changes > - etc. > > To get faster fixes in, it would be good to have +2 permissions for the > horizon-core team for each plugin. > > We helped Heat team during the last cycle adding horizon-core to the > heat-dashboard-core team. Also, we've got +2 on other plugins via global > project config [4] and via Gerrit configuration for > (neutron-*aas-dashboard, tuskar-ui). > > Vitrage PTL agreed to do the same for vitrage-dashboard during the last > meeting [5]. > > > Of course, it's up to each project to maintain horizon plugins and it's > responsibilities but I would like to raise this topic to the TC too. I > really sure, that it will speed up some critical fixes for Horizon plugins > and makes users and operators experience better. > > > [1] https://docs.openstack.org/horizon/latest/install/plugin-registry.html > [2] > http://eavesdrop.openstack.org/meetings/horizon/2019/horizon.2019-10-16-15.02.log.html#l-128 > [3] https://etherpad.openstack.org/p/horizon-u-ptg > [4] > http://codesearch.openstack.org/?q=horizon-core&i=nope&files=&repos=openstack/project-config > [5] > http://eavesdrop.openstack.org/meetings/vitrage/2019/vitrage.2019-10-23-08.03.log.html#l-21 > > Regards, > Ivan Kolodyazhny, > http://blog.e0ne.info/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Wed Nov 13 11:10:28 2019 From: pierre at stackhpc.com (Pierre Riteau) Date: Wed, 13 Nov 2019 12:10:28 +0100 Subject: [blazar] Why/how does Blazar use Keystone trusts? In-Reply-To: References: Message-ID: Hi Jason, As you point out, reliance on trusts causes problems when users are disabled or deleted from Keystone. In the past it even prevented non-admin users from starting leases, see [1] for context. I believe there are some operations that could still benefit from the use of trusts (or another mechanism to execute actions on behalf of users), such as snapshot in the before_end event. It's possible that with the current code, snapshot end up being owned by the blazar service user. I don't think I've ever used this feature… For management of hosts specifically, I don't see why trusts should be needed. I have a WIP patch to remove their use [2] which should fix your issue. IIRC it just needs unit tests fixes, maybe some from Chameleon could help to finish it? [1] https://bugs.launchpad.net/blazar/+bug/1663204 [2] https://review.opendev.org/#/c/641103/ On Wed, 13 Nov 2019 at 09:39, Sylvain Bauza wrote: > > Let me tell you a story. > Long time ago, in a far far away galaxy, some people wanting to have a way to reserve some compute nodes in OpenStack created a new project that was named "Climate". > Those folks weren't really knowing Keystone but they saw some problem : when the reservation was beginning, the token was expired. > > For that specific reason, they tried to see how to fix it and then saw Keystone trusts. They then said "heh, nice" and they started to use it. > After 5 years, nobody really thought whether trusts should still be needed. Maybe the new Blazar team should look at service tokens, rather. > > Anyway, just my 2cts. > > -Sylvain > > On Wed, Nov 13, 2019 at 12:26 AM Jason Anderson wrote: >> >> Hi Blazar contributors, >> >> We hit an issue today involving trusts in Blazar, where a host couldn't be deleted due to some issue authenticating against the trust associated with the host. We still haven't resolved this issue, but it felt odd to me: why is a trust even involved here? >> >> I have often wondered what the reason is for using trusts in Blazar, as I can't think of anything Blazar is doing that could not be done by the Blazar system user (and in fact, many operations are done via this user... via another trust.) There are also issues where a user leaves a project before their leases have ended; in this case Blazar has difficulty cleaning up because it tries to resurrect a trust that is not tied to a valid user/project relationship. >> >> Does anybody have context on to how trusts are used in Blazar and if they are still necessary? Does it make sense to remove this functionality? >> >> Thank you, >> >> -- >> Jason Anderson >> >> Chameleon DevOps Lead >> Consortium for Advanced Science and Engineering, The University of Chicago >> Mathematics & Computer Science Division, Argonne National Laboratory From thierry at openstack.org Wed Nov 13 11:18:47 2019 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 13 Nov 2019 12:18:47 +0100 Subject: [sig] Forming a Large scale SIG Message-ID: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> Hi everyone, In Shanghai we held a forum session to gauge interest in a new SIG to specifically address cluster scaling issues. In the past we had several groups ("Large deployments", "Performance", LCOO...) but those efforts were arguably a bit too wide and those groups are now abandoned. My main goal here is to get large users directly involved in a domain where their expertise can best translate into improvements in the software. It's easy for such a group to go nowhere while trying to boil the ocean. To maximize its chances of success and make it sustainable, the group should have a narrow focus, and reasonable objectives. My personal idea for the group focus was to specifically address scaling issues within a single cluster: basically identify and address issues that prevent scaling a single cluster (or cell) past a number of nodes. By sharing analysis and experience, the group could identify common pain points that, once solved, would help raising that number. There was a lot of interest in that session[1], and it predictably exploded in lots of different directions, including some that are definitely past a single cluster (like making Neutron better support cells). I think it's fine: my initial proposal was more of a strawman. Active members of the group should really define what they collectively want to work on. And the SIG name should be picked to match that. I'd like to help getting that group off the ground and to a place where it can fly by itself, without needing external coordination. The first step would be to identify interested members and discuss group scope and objectives. Given the nature of the group (with interested members in Japan, Europe, Australia and the US) it will be hard to come up with a synchronous meeting time that will work for everyone, so let's try to hold that discussion over email. So to kick this off: if you are interested in that group, please reply to this email, introduce yourself and tell us what you would like the group scope and objectives to be, and what you can contribute to the group. Thanks! [1] https://etherpad.openstack.org/p/PVG-large-scale-SIG -- Thierry Carrez (ttx) From jean-philippe at evrard.me Wed Nov 13 11:19:23 2019 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Wed, 13 Nov 2019 12:19:23 +0100 Subject: [kuryr] [tc] kuryr project mission In-Reply-To: <94950c5e942e22a4ea1599a4c814eb554d4f2a9b.camel@redhat.com> References: <94950c5e942e22a4ea1599a4c814eb554d4f2a9b.camel@redhat.com> Message-ID: <47d278b0338b1ca297aaef190df06a7bbb92831b.camel@evrard.me> On Tue, 2019-10-29 at 09:52 +0100, Michał Dulko wrote: > (snipped) I'd like to propose rephrasing Kuryr mission > statement from: > > > Bridge between container framework networking and storage models > > to OpenStack networking and storage abstractions. > > to > > > Bridge between container framework networking models > > to OpenStack networking abstractions. > > effectively getting storage out of project scope. > > I am looking forward to see the change in governance :) Regards, JP From skaplons at redhat.com Wed Nov 13 11:20:04 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Wed, 13 Nov 2019 12:20:04 +0100 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: <59D1EFBF-B929-4814-9820-F5A9FAF9DA5C@rackspace.com> References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> <98616CAC-FEBD-46C2-A7BF-A1EBAD58F78B@cern.ch> <59D1EFBF-B929-4814-9820-F5A9FAF9DA5C@rackspace.com> Message-ID: <20191113112004.ynsuamxrl7pa7hiq@skaplons-mac> Hi, On Wed, Nov 13, 2019 at 04:31:20AM +0000, James Denton wrote: > Appreciate the summary as well. > > For what it's worth, the ML2/LinuxBridge combo has been a very stable setup for us since its inception, and I'd hate to see it deprecated and removed for the sake of removing something. Last I checked, trunk ports were supported with the ML2/LinuxBridge driver. And while of course DVR is not a supported feature, a good number of our ML2/LXB environments forgo Neutron routers altogether in favor of putting VMs on the provider network. It has shown to be as performant as vanilla OVS, and a simpler model to implement and support as an operator. You're right. Trunk ports are ofcourse supported by LB agent. But e.g. some of QoS rules aren't supported by this backend. > > Just my two cents. > > Thanks, > > James Denton > Network Engineer > Rackspace Private Cloud > james.denton at rackspace.com > > On 11/12/19, 3:41 PM, "Tim Bell" wrote: > > CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > > > Many thanks for the summaries. It’s really helpful for those who could not be in the discussions. > > CERN are also using ML2/Linuxbridge so we’d welcome being involved in any deprecation discussions and migration paths. > > Tim > > > On 12 Nov 2019, at 14:53, Slawek Kaplonski wrote: > > > > Hi Neutron team, > > > > First if all thank to all of You for great and very productive week during the > > PTG in Shanghai. > > Below is summary of our discussions from whole 3 days. > > If I forgot about something, please respond to the email and update missing > > informations. But if You want to have follow up discussion about one of the > > topics from this summary, please start a new thread to keep this one only as > > high level summary of the PTG. > > > > ... > > > Starting the process of removing ML2/Linuxbridge > > ================================================ > > > > Currently in Neutron tree we have 4 drivers: > > * Linuxbridge, > > * Openvswitch, > > * macvtap, > > * sriov. > > SR-IOV driver is out of discussion here as this driver is > > addressing slightly different use case than other out drivers. > > > > We started discussion about above topic because we don't want to end up with too > > many drivers in-tree and we also had some discussions (and we have spec for that > > already) about include networking-ovn as in-tree driver. > > So with networking-ovn in-tree we would have already 4 drivers which can be used > > on any hardware: linuxbridge, ovs, macvtap and ovn. > > Conclusions from the discussion are: > > * each driver requires proper testing in the gate, so we need to add many new > > jobs to our check/gate queue, > > * currently linuxbridge driver don't have a lot of development and feature > > parity gaps between linuxbridge and ovs drivers is getting bigger and bigger > > (e.g. dvr, trunk ports), > > * also macvtap driver don't have a lot of activity in last few cycles. Maybe > > this one could be also considered as candidate to deprecation, > > * we need to have process of deprecating some drivers and time horizon for such > > actions should be at least 2 cycles. > > * we will not remove any driver completly but rather we will move it to be in > > stadium process first so it still can be maintained by people who are > > interested in it. > > > > Actions to do after this discussion: > > * Miguel Lavalle will contact RAX and Godaddy (we know that those are > > Linuxbridge users currently) to ask about their feedback about this, > > * if there are any other companies using LB driver, Nate Johnston is willing to > > help conctating them, please reach to him in such case. > > * we may ratify marking linuxbridge as deprecated in the team meeting during > > Ussuri cycle if nothing surprising pops in. > > > > > -- Slawek Kaplonski Senior software engineer Red Hat From amotoki at gmail.com Wed Nov 13 13:16:30 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Wed, 13 Nov 2019 22:16:30 +0900 Subject: [ptg][PTL] Auto-generated etherpad links ! In-Reply-To: <86f9ea36-5c38-ef64-aa7c-dd5849143c5d@openstack.org> References: <86f9ea36-5c38-ef64-aa7c-dd5849143c5d@openstack.org> Message-ID: Hi, I created the wiki page for Ussuri PTG etherpad [1] based on the latest snapshot of the auto-generated etherpad links [2]. The auto-generated page to collect etherpad links is really useful, but it will be gone when the next PTG comes, so I believe the wiki page is still useful for memory. PTLs, please update the links of your projects if they are not up-to-date. Thanks, Akihiro Motoki (amotoki) [1] https://wiki.openstack.org/wiki/PTG/Ussuri/Etherpads [2] http://ptg.openstack.org/etherpads.html On Fri, Oct 11, 2019 at 12:45 AM Thierry Carrez wrote: > > Hi everyone, > > The PTGbot grew a new feature over the summer. It now dynamically > generates the list of PTG track etherpads. You can find that list at: > > http://ptg.openstack.org/etherpads.html > > If you haven't created your etherpad already, just follow the link there > to create your etherpad. > > If you have created your track etherpad already under a different name, > you can overload the automatically-generated name using the PTGbot. Just > join the #openstack-ptg channel and (as a Freenode authenticated user) > send the following command: > > #TRACKNAME etherpad > > Example: > #keystone etherpad https://etherpad.openstack.org/p/awesome-keystone-pad > > That will update the link on that page automatically. > > Hoping to see you in Shanghai! > > -- > Thierry Carrez (ttx) > From amotoki at gmail.com Wed Nov 13 13:46:23 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Wed, 13 Nov 2019 22:46:23 +0900 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: <20191113112004.ynsuamxrl7pa7hiq@skaplons-mac> References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> <98616CAC-FEBD-46C2-A7BF-A1EBAD58F78B@cern.ch> <59D1EFBF-B929-4814-9820-F5A9FAF9DA5C@rackspace.com> <20191113112004.ynsuamxrl7pa7hiq@skaplons-mac> Message-ID: Hi, After the neutron PTG session on the future of ML2/LinuxBridge, I discussed this topic in the ops-meetup PTG room (L.43- in [1]). - A lot of needs for Linux Bridge driver was expressed in the room. - LB is for simple network and many ops need it to keep deployment simple including a provider network without L3 feature. - The stats on Linux Bridge usage were shared as well. LB still has a large user base. 40% use Linux Bridge driver according to a survey in Wed's ops(?) session and the user survey last Oct shows 33% use Linux Bridge driver (63% use OVS based) [2]. This discussion does not mean the deprecation of the linux bridge driver. In my understanding, the main motivation is how the neutron team can keep the reference implementations simple. One example is that the features supported in the linux bridge driver are behind those in the OVS driver and some developers think this is the lack of the interest for the linux bridge driver, but this may show that most/not small number of linux bridge users just want simple features. That's my understanding in the PTG. Hope it helps the discussion :) Thanks, Akihiro [1] https://etherpad.openstack.org/p/shanghai-ptg-ops-meetup [2] Deployment Decisions in https://www.openstack.org/analytics On Wed, Nov 13, 2019 at 8:21 PM Slawek Kaplonski wrote: > > Hi, > > On Wed, Nov 13, 2019 at 04:31:20AM +0000, James Denton wrote: > > Appreciate the summary as well. > > > > For what it's worth, the ML2/LinuxBridge combo has been a very stable setup for us since its inception, and I'd hate to see it deprecated and removed for the sake of removing something. Last I checked, trunk ports were supported with the ML2/LinuxBridge driver. And while of course DVR is not a supported feature, a good number of our ML2/LXB environments forgo Neutron routers altogether in favor of putting VMs on the provider network. It has shown to be as performant as vanilla OVS, and a simpler model to implement and support as an operator. > > You're right. Trunk ports are ofcourse supported by LB agent. But e.g. some of > QoS rules aren't supported by this backend. > > > > > Just my two cents. > > > > Thanks, > > > > James Denton > > Network Engineer > > Rackspace Private Cloud > > james.denton at rackspace.com > > > > On 11/12/19, 3:41 PM, "Tim Bell" wrote: > > > > CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > > > > > > Many thanks for the summaries. It’s really helpful for those who could not be in the discussions. > > > > CERN are also using ML2/Linuxbridge so we’d welcome being involved in any deprecation discussions and migration paths. > > > > Tim > > > > > On 12 Nov 2019, at 14:53, Slawek Kaplonski wrote: > > > > > > Hi Neutron team, > > > > > > First if all thank to all of You for great and very productive week during the > > > PTG in Shanghai. > > > Below is summary of our discussions from whole 3 days. > > > If I forgot about something, please respond to the email and update missing > > > informations. But if You want to have follow up discussion about one of the > > > topics from this summary, please start a new thread to keep this one only as > > > high level summary of the PTG. > > > > > > ... > > > > > Starting the process of removing ML2/Linuxbridge > > > ================================================ > > > > > > Currently in Neutron tree we have 4 drivers: > > > * Linuxbridge, > > > * Openvswitch, > > > * macvtap, > > > * sriov. > > > SR-IOV driver is out of discussion here as this driver is > > > addressing slightly different use case than other out drivers. > > > > > > We started discussion about above topic because we don't want to end up with too > > > many drivers in-tree and we also had some discussions (and we have spec for that > > > already) about include networking-ovn as in-tree driver. > > > So with networking-ovn in-tree we would have already 4 drivers which can be used > > > on any hardware: linuxbridge, ovs, macvtap and ovn. > > > Conclusions from the discussion are: > > > * each driver requires proper testing in the gate, so we need to add many new > > > jobs to our check/gate queue, > > > * currently linuxbridge driver don't have a lot of development and feature > > > parity gaps between linuxbridge and ovs drivers is getting bigger and bigger > > > (e.g. dvr, trunk ports), > > > * also macvtap driver don't have a lot of activity in last few cycles. Maybe > > > this one could be also considered as candidate to deprecation, > > > * we need to have process of deprecating some drivers and time horizon for such > > > actions should be at least 2 cycles. > > > * we will not remove any driver completly but rather we will move it to be in > > > stadium process first so it still can be maintained by people who are > > > interested in it. > > > > > > Actions to do after this discussion: > > > * Miguel Lavalle will contact RAX and Godaddy (we know that those are > > > Linuxbridge users currently) to ask about their feedback about this, > > > * if there are any other companies using LB driver, Nate Johnston is willing to > > > help conctating them, please reach to him in such case. > > > * we may ratify marking linuxbridge as deprecated in the team meeting during > > > Ussuri cycle if nothing surprising pops in. > > > > > > > > > > > -- > Slawek Kaplonski > Senior software engineer > Red Hat > > From mriedemos at gmail.com Wed Nov 13 13:58:25 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 07:58:25 -0600 Subject: [stable][glance] Proposal to remove Flavio Percoco from glance-stable-maint In-Reply-To: <22e5ebb5-b077-f503-3062-bf38a8347614@openstack.org> References: <2fee3345-c064-8428-8376-e8b5f114f422@gmail.com> <0191ee00-d6fb-80ba-2b74-db76e6219360@gmail.com> <22e5ebb5-b077-f503-3062-bf38a8347614@openstack.org> Message-ID: On 11/13/2019 3:32 AM, Thierry Carrez wrote: > Might be a good idea to remove him from stable-maint-core as well > (https://review.opendev.org/#/admin/groups/530,members) otherwise this > is a noop. Good point. Done. Alan and Chuck should probably come off that list as well. -- Thanks, Matt From balazs.gibizer at est.tech Wed Nov 13 14:21:52 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Wed, 13 Nov 2019 14:21:52 +0000 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: <4ffbcc4c-c043-5d7a-7f7a-d78de9fc75d7@gmail.com> References: <1573200108.23158.4@est.tech> <4ffbcc4c-c043-5d7a-7f7a-d78de9fc75d7@gmail.com> Message-ID: <1573654907.26082.2@est.tech> On Fri, Nov 8, 2019 at 08:03, Matt Riedemann wrote: > On 11/8/2019 2:01 AM, Balázs Gibizer wrote: >> * deployer needs to create the sharing disk RP and report inventory / >> traits on it >> * deployer needs to define the placement aggregate and add the >> sharing >> disk RP into it >> * when compute restarts and sees that 'using_shared_disk_provider' = >> True in the config, it adds the its compute RP to the aggregate >> defined >> in 'sharing_disk_aggregate' Then if it sees that the root RP still >> has >> DISK_GB inventory then trigger a reshape > > Does the compute host also get added to a nova host aggregate which > mirrors the resource provider aggregate in placmeent or do we only > need the placement resource provider sharing DISK_GB aggregate? As far as I see we only need the placement aggregate to make this work. > > -- > > Thanks, > > Matt > From balazs.gibizer at est.tech Wed Nov 13 14:22:17 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Wed, 13 Nov 2019 14:22:17 +0000 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> References: <1573200108.23158.4@est.tech> <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> Message-ID: <1573654932.26082.3@est.tech> On Fri, Nov 8, 2019 at 08:05, Matt Riedemann wrote: > On 11/8/2019 2:01 AM, Balázs Gibizer wrote: >> * when compute restarts and sees that 'using_shared_disk_provider' = >> True in the config, it adds the its compute RP to the aggregate >> defined >> in 'sharing_disk_aggregate' Then if it sees that the root RP still >> has >> DISK_GB inventory then trigger a reshape > > Conversely, if the deployer decides to use local disk for the host > again, what are the steps? > > 1. Change using_shared_disk_provider=False > 2. Restart/SIGHUP compute service > 3. Compute removes itself from the aggregate > 4. Compute reshapes to add DISK_GB inventory on the root compute node > resource provider and moves DISK_GB allocations from the sharing > provider back to the root compute node provider. > > Correct? Seems correct to me. gibi > > -- > > Thanks, > > Matt > From sbauza at redhat.com Wed Nov 13 14:34:16 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 13 Nov 2019 15:34:16 +0100 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: <1573654932.26082.3@est.tech> References: <1573200108.23158.4@est.tech> <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> <1573654932.26082.3@est.tech> Message-ID: On Wed, Nov 13, 2019 at 3:32 PM Balázs Gibizer wrote: > > > On Fri, Nov 8, 2019 at 08:05, Matt Riedemann > wrote: > > On 11/8/2019 2:01 AM, Balázs Gibizer wrote: > >> * when compute restarts and sees that 'using_shared_disk_provider' = > >> True in the config, it adds the its compute RP to the aggregate > >> defined > >> in 'sharing_disk_aggregate' Then if it sees that the root RP still > >> has > >> DISK_GB inventory then trigger a reshape > > > > Conversely, if the deployer decides to use local disk for the host > > again, what are the steps? > > > > 1. Change using_shared_disk_provider=False > > 2. Restart/SIGHUP compute service > > 3. Compute removes itself from the aggregate > > 4. Compute reshapes to add DISK_GB inventory on the root compute node > > resource provider and moves DISK_GB allocations from the sharing > > provider back to the root compute node provider. > > > > Correct? > > Seems correct to me. > > gibi > > Me too. To be clear, I don't think operators would modify the above but if so, they would need reshapes. > > > > -- > > > > Thanks, > > > > Matt > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed Nov 13 14:41:08 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 08:41:08 -0600 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: References: <1573200108.23158.4@est.tech> <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> <1573654932.26082.3@est.tech> Message-ID: On 11/13/2019 8:34 AM, Sylvain Bauza wrote: > Me too. To be clear, I don't think operators would modify the above but > if so, they would need reshapes. Maybe not, but this is the kind of detail that should be in the spec and functional tests to make sure it's solid since this is a big architectural change in nova. -- Thanks, Matt From rosmaita.fossdev at gmail.com Wed Nov 13 14:54:42 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 13 Nov 2019 09:54:42 -0500 Subject: [cinder] meeting time reminder Message-ID: <3906b712-f327-81d1-ed24-2962026ce69c@gmail.com> For people who were on Daylight Savings Time but now are not, just a reminder that this week's Cinder meeting at 16:00 UTC may be an hour earlier for you. The meeting-time-change-poll link will be in a separate email. If the time is changed, it will be effective for the first meeting in December (4 December 2019). That's because we'll be having the Virtual PTG during the last week of November (that poll will be out shortly). cheers, brian From rosmaita.fossdev at gmail.com Wed Nov 13 14:57:58 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 13 Nov 2019 09:57:58 -0500 Subject: [cinder] meeting time change poll Message-ID: The poll to help decide whether we move the time of the weekly Cinder meeting is now available: https://forms.gle/kA2JGzoxegy2KRDB6 The poll closes on 20 November at 23:59 UTC. If the time is changed, it will be effective for the first meeting in December (4 December 2019). cheers, brian From mriedemos at gmail.com Wed Nov 13 15:19:05 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 09:19:05 -0600 Subject: [nova][ironic] nova docs bug for ironic looking for an owner Message-ID: <453e2ccb-ef4f-0e5b-aa15-cacf0ca104e8@gmail.com> While discussing some tribal knowledge about how ironic is the black sheep of nova compute drivers I realized that we (nova) have no docs about the ironic driver like we do for other drivers, so we don't mention anything about the weird cardinality rules around compute service : node : instance and host vs nodename things, how to configure the service for HA mode, how to configure baremetal flavors with custom resource classes, how to partition for conductor groups, how to deal with scaling issues, missing features (migrate), etc. I've opened a bug in case someone wants to get started on some of that information: https://bugs.launchpad.net/nova/+bug/1852446 -- Thanks, Matt From rosmaita.fossdev at gmail.com Wed Nov 13 15:42:25 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 13 Nov 2019 10:42:25 -0500 Subject: [cinder] virtual PTG datetime selection poll Message-ID: At the Shanghai PTG last week, the Cinder team decided to hold a Virtual PTG the last week of November. The format will be 2 two-hour sessions, ideally held on consecutive days. Since most people have the 16:00 UTC Cinder meeting already on their calendars, I suggest that we skip the meeting on 27 November and instead use that time for Virtual PTG. So, basically, I'd like to meet: Wednesday for sure, and either Tuesday or Thursday (with Monday or Friday as possibilities if Tuesday or Thursday are impossible for too many people). Let me know what your preferences are on this poll: https://forms.gle/rKDJpSZvAxbnBESp7 The poll closes at 23:39 on *Tuesday* 19 November 2019. thanks, brian From openstack at nemebean.com Wed Nov 13 16:12:38 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 13 Nov 2019 10:12:38 -0600 Subject: [oslo] Adding Michael Johnson as Taskflow core Message-ID: <0558ec80-5894-7961-f2ac-3de502f90fe4@nemebean.com> Hi, After discussion with the Oslo team, we (and he) have agreed to add Michael as a Taskflow core. He's done more work on the project than anyone else still active in Oslo and also works on a project that consumes it so he likely understands it better than anyone else at this point. Welcome Michael and thanks for your contributions! -Ben From openstack at nemebean.com Wed Nov 13 16:22:44 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 13 Nov 2019 10:22:44 -0600 Subject: [tripleo] Adding Alex Schultz as OVB core Message-ID: <7562aee5-1ea2-2d8f-ebb5-9fa02d9dc354@nemebean.com> Hi, After a discussion with Wes in Shanghai about how to make me less of a SPOF for OVB, one of the outcomes was that we should try to grow the OVB core team. Alex has been reviewing a lot of the patches to OVB lately and obviously has a good handle on how all of this stuff fits together, so I've added him to the OVB core team. Thanks and congratulations(?) Alex! :-) -Ben From jasonanderson at uchicago.edu Wed Nov 13 16:42:25 2019 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Wed, 13 Nov 2019 16:42:25 +0000 Subject: [blazar] Why/how does Blazar use Keystone trusts? In-Reply-To: References: Message-ID: <73e756a3-029c-679b-b94a-5bca069c6799@uchicago.edu> Thank you both for the information. Some comments inline. On 11/13/19 5:10 AM, Pierre Riteau wrote: > Hi Jason, > > As you point out, reliance on trusts causes problems when users are > disabled or deleted from Keystone. In the past it even prevented > non-admin users from starting leases, see [1] for context. > > I believe there are some operations that could still benefit from the > use of trusts (or another mechanism to execute actions on behalf of > users), such as snapshot in the before_end event. It's possible that > with the current code, snapshot end up being owned by the blazar > service user. I don't think I've ever used this feature… That's a good point, I also haven't used that functionality. Though, I can't think of many cases where the system user couldn't just override the image owner on create. > For management of hosts specifically, I don't see why trusts should be > needed. I have a WIP patch to remove their use [2] which should fix > your issue. IIRC it just needs unit tests fixes, maybe some from > Chameleon could help to finish it? > > [1] https://bugs.launchpad.net/blazar/+bug/1663204 > [2] https://review.opendev.org/#/c/641103/ I did see this original patch. Yes, perhaps we can pick it up and see what to do with it. It does call out that, if trusts are removed, the notification payload would change. This likely is not used in practice, perhaps others on this list can chime in if that is not the case. > > On Wed, 13 Nov 2019 at 09:39, Sylvain Bauza wrote: >> Let me tell you a story. >> Long time ago, in a far far away galaxy, some people wanting to have a way to reserve some compute nodes in OpenStack created a new project that was named "Climate". >> Those folks weren't really knowing Keystone but they saw some problem : when the reservation was beginning, the token was expired. >> >> For that specific reason, they tried to see how to fix it and then saw Keystone trusts. They then said "heh, nice" and they started to use it. >> After 5 years, nobody really thought whether trusts should still be needed. Maybe the new Blazar team should look at service tokens, rather. >> >> Anyway, just my 2cts. >> >> -Sylvain Thanks for the historical context! Good to know that there aren't any technical blockers from considering simplifying this. >> >> On Wed, Nov 13, 2019 at 12:26 AM Jason Anderson wrote: >>> Hi Blazar contributors, >>> >>> We hit an issue today involving trusts in Blazar, where a host couldn't be deleted due to some issue authenticating against the trust associated with the host. We still haven't resolved this issue, but it felt odd to me: why is a trust even involved here? >>> >>> I have often wondered what the reason is for using trusts in Blazar, as I can't think of anything Blazar is doing that could not be done by the Blazar system user (and in fact, many operations are done via this user... via another trust.) There are also issues where a user leaves a project before their leases have ended; in this case Blazar has difficulty cleaning up because it tries to resurrect a trust that is not tied to a valid user/project relationship. >>> >>> Does anybody have context on to how trusts are used in Blazar and if they are still necessary? Does it make sense to remove this functionality? >>> >>> Thank you, >>> >>> -- >>> Jason Anderson >>> >>> Chameleon DevOps Lead >>> Consortium for Advanced Science and Engineering, The University of Chicago >>> Mathematics & Computer Science Division, Argonne National Laboratory Cheers, /Jason From moguimar at redhat.com Wed Nov 13 16:42:36 2019 From: moguimar at redhat.com (Moises Guimaraes de Medeiros) Date: Wed, 13 Nov 2019 17:42:36 +0100 Subject: [oslo] Adding Michael Johnson as Taskflow core In-Reply-To: <0558ec80-5894-7961-f2ac-3de502f90fe4@nemebean.com> References: <0558ec80-5894-7961-f2ac-3de502f90fe4@nemebean.com> Message-ID: Welcome Michael! On Wed, Nov 13, 2019 at 5:13 PM Ben Nemec wrote: > Hi, > > After discussion with the Oslo team, we (and he) have agreed to add > Michael as a Taskflow core. He's done more work on the project than > anyone else still active in Oslo and also works on a project that > consumes it so he likely understands it better than anyone else at this > point. > > Welcome Michael and thanks for your contributions! > > -Ben > > -- Moisés Guimarães Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed Nov 13 16:51:59 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 10:51:59 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event Message-ID: tl;dr: What do people think about storing and showing the *type* of exception that is recorded with a failed instance action event (like a fault) to the owner of the server who may not be an admin? Details: As noted here [1] and recreated here [2] the instance action event details that a non-admin owner of a server sees do not contain any useful information about what caused the failure of the action. Here is an example of a failed resize from that paste (this is what the non-admin owner of the server would see): $ openstack --os-compute-api-version 2.51 server event show vm2 req-11487504-da59-411b-b3b8-267bebe9b0d2 -f json -c events { "events": [ { "finish_time": "2019-11-13T16:18:27.000000", "start_time": "2019-11-13T16:18:26.000000", "event": "cold_migrate", "result": "Error" }, { "finish_time": "2019-11-13T16:18:27.000000", "start_time": "2019-11-13T16:18:26.000000", "event": "conductor_migrate_server", "result": "Error" } ] } Super useful, right? In this case scheduling failed for the resize so the instance is not in ERROR status which means the user cannot see a fault message with the NoValidHost error either. The admin can see the traceback in the failed action event list: $ openstack --os-compute-api-version 2.51 server event show 3ef043ea-e2d7-4565-a401-5c758e149f23 req-11487504-da59-411b-b3b8-267bebe9b0d2 -f json -c events { "events": [ { "finish_time": "2019-11-13T16:18:27.000000", "start_time": "2019-11-13T16:18:26.000000", "traceback": " File \"/opt/stack/nova/nova/conductor/manager.py\", line 301, in migrate_server\n host_list)\n File \"/opt/stack/nova/nova/conductor/manager.py\", line 367, in _cold_migrate\n raise exception.NoValidHost(reason=msg)\n", "event": "cold_migrate", "result": "Error" }, { "finish_time": "2019-11-13T16:18:27.000000", "start_time": "2019-11-13T16:18:26.000000", "traceback": " File \"/opt/stack/nova/nova/compute/utils.py\", line 1411, in decorated_function\n return function(self, context, *args, **kwargs)\n File \"/opt/stack/nova/nova/conductor/manager.py\", line 301, in migrate_server\n host_list)\n File \"/opt/stack/nova/nova/conductor/manager.py\", line 367, in _cold_migrate\n raise exception.NoValidHost(reason=msg)\n", "event": "conductor_migrate_server", "result": "Error" } ] } So when the admin gets the support ticket they can at least tell that scheduling failed and then dig into why. My idea is to store the exception *type* with the action event, similar to the recorded instance fault message for non-NovaExceptions [3] which will show to the non-admin owner of the server if the server status is ERROR or DELETED [4]. We should record the exc_val to get a prettier message like "No valid host was found." but that could leak details in the error message that we don't want non-admins to see [5]. With what I'm thinking, the non-admin owner of the server could see something like this for a failed event: { "finish_time": "2019-11-13T16:18:27.000000", "start_time": "2019-11-13T16:18:26.000000", "event": "cold_migrate", "result": "Error", "details": "NoValidHost" } That's pretty simple, doesn't leak details, and at least indicates to the user that maybe they can retry the resize with another flavor or something. It's just an example. This would require a microversion so before writing a spec I wanted to get general feelings about this in the mailing list. I accept that it might not really be worth the effort so that's good feedback if it's how you feel (I'll only cry a little). [1] https://review.opendev.org/#/c/693937/2/nova/objects/instance_action.py [2] http://paste.openstack.org/show/786054/ [3] https://github.com/openstack/nova/blob/20.0.0/nova/compute/utils.py#L101 [4] https://github.com/openstack/nova/blob/20.0.0/nova/api/openstack/compute/views/servers.py#L564 [5] https://bugs.launchpad.net/nova/+bug/1851587 -- Thanks, Matt From mriedemos at gmail.com Wed Nov 13 16:55:20 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 10:55:20 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: References: Message-ID: On 11/13/2019 10:51 AM, Matt Riedemann wrote: > We should record the exc_val to get a prettier message like "No valid > host was found." but that could leak details in the error message that > we don't want non-admins to see [5]. Typo above, should have been "We *could* record...". -- Thanks, Matt From openstack at fried.cc Wed Nov 13 17:17:25 2019 From: openstack at fried.cc (Eric Fried) Date: Wed, 13 Nov 2019 11:17:25 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: References: Message-ID: Unless it's likely to be something other than NoValidHost a significant percentage of the time, IMO it... On 11/13/19 10:51 AM, Matt Riedemann wrote: > might not really be worth the effort efried . From sylvain.bauza at gmail.com Wed Nov 13 17:35:42 2019 From: sylvain.bauza at gmail.com (Sylvain Bauza) Date: Wed, 13 Nov 2019 18:35:42 +0100 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: References: Message-ID: Le mer. 13 nov. 2019 à 18:27, Eric Fried a écrit : > Unless it's likely to be something other than NoValidHost a significant > percentage of the time, IMO it... > > On 11/13/19 10:51 AM, Matt Riedemann wrote: > > might not really be worth the effort > > efried > . > > FWIW, os-instance-actions is super useful for some ops, at least my customers :-) Having the exact same answer from this API than a nova show would be very nice honestly. So, yeah, please +1 to the spec and add me for a review :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed Nov 13 17:41:43 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 11:41:43 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: References: Message-ID: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> On 11/13/2019 11:17 AM, Eric Fried wrote: > Unless it's likely to be something other than NoValidHost a significant > percentage of the time, IMO it... Well just taking resize, it could be one of many things: https://github.com/openstack/nova/blob/20.0.0/nova/conductor/manager.py#L366 - oops you tried resizing which would screw up your group affinity policy https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L4490 - (for an admin, cold migrate) oops you tried cold migrating a vcenter vm or you have allow_resize_to_same_host=True and the scheduler picks the same host (silly scheduler, see bug 1748697) https://github.com/openstack/nova/blob/20.0.0/nova/compute/claims.py#L113 - oops you lost a resource claims race, try again https://github.com/openstack/nova/blob/20.0.0/nova/scheduler/client/report.py#L1898 - oops you lost a race with allocation consumer generation conflicts, try again -- Thanks, Matt From juliaashleykreger at gmail.com Wed Nov 13 17:48:53 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Wed, 13 Nov 2019 09:48:53 -0800 Subject: [ironic] Anyone using the process metrics collection functionality in ironic? Message-ID: A question from the PTG was raised if anyone was using the existing statsd metrics publishing support in ironic/ironic-python-agent to publish internal performance times? There seems to be some interest in expanding this metric data publishing capability so Prometheus can also be used, but before we really even think of heading down that path, we wanted to understand if there were present users of that feature. Thanks, -Julia From stig.openstack at telfer.org Wed Nov 13 17:54:48 2019 From: stig.openstack at telfer.org (Stig Telfer) Date: Wed, 13 Nov 2019 17:54:48 +0000 Subject: [sig] Forming a Large scale SIG In-Reply-To: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> Message-ID: <47730534-09AF-4018-A2AE-670D4725E569@telfer.org> Hi Thierry & all - Thanks for your mail. I’m interested in joining this SIG. Among others, I’m interested in participating in discussions around these common problems: - golden signals for scaling bottlenecks (and what to do about them) - using Ansible at scale - strategies for simplifying OpenStack functionality in order to scale Cheers, Stig > On 13 Nov 2019, at 11:18, Thierry Carrez wrote: > > Hi everyone, > > In Shanghai we held a forum session to gauge interest in a new SIG to specifically address cluster scaling issues. In the past we had several groups ("Large deployments", "Performance", LCOO...) but those efforts were arguably a bit too wide and those groups are now abandoned. > > My main goal here is to get large users directly involved in a domain where their expertise can best translate into improvements in the software. It's easy for such a group to go nowhere while trying to boil the ocean. To maximize its chances of success and make it sustainable, the group should have a narrow focus, and reasonable objectives. > > My personal idea for the group focus was to specifically address scaling issues within a single cluster: basically identify and address issues that prevent scaling a single cluster (or cell) past a number of nodes. By sharing analysis and experience, the group could identify common pain points that, once solved, would help raising that number. > > There was a lot of interest in that session[1], and it predictably exploded in lots of different directions, including some that are definitely past a single cluster (like making Neutron better support cells). I think it's fine: my initial proposal was more of a strawman. Active members of the group should really define what they collectively want to work on. And the SIG name should be picked to match that. > > I'd like to help getting that group off the ground and to a place where it can fly by itself, without needing external coordination. The first step would be to identify interested members and discuss group scope and objectives. Given the nature of the group (with interested members in Japan, Europe, Australia and the US) it will be hard to come up with a synchronous meeting time that will work for everyone, so let's try to hold that discussion over email. > > So to kick this off: if you are interested in that group, please reply to this email, introduce yourself and tell us what you would like the group scope and objectives to be, and what you can contribute to the group. > > Thanks! > > [1] https://etherpad.openstack.org/p/PVG-large-scale-SIG > > -- > Thierry Carrez (ttx) > From rosmaita.fossdev at gmail.com Wed Nov 13 17:58:07 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 13 Nov 2019 12:58:07 -0500 Subject: [cinder] meeting time change poll In-Reply-To: References: Message-ID: <74017827-bb03-24aa-59b0-8cd812e34a8f@gmail.com> On 11/13/19 9:57 AM, Brian Rosmaita wrote: > The poll to help decide whether we move the time of the weekly Cinder > meeting is now available: > > https://forms.gle/kA2JGzoxegy2KRDB6 The target of that link, a google form, may not be available in China. This one is probably not blocked: https://rosmaita.wufoo.com/forms/cinder-ussuri-meeting-time-poll/ If you already voted, please do NOT use the wufoo poll to vote again. I will collate the results from the two polls. > > The poll closes on 20 November at 23:59 UTC. > > If the time is changed, it will be effective for the first meeting in > December (4 December 2019). > > > cheers, > brian > From rosmaita.fossdev at gmail.com Wed Nov 13 17:59:09 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 13 Nov 2019 12:59:09 -0500 Subject: [cinder] virtual PTG datetime selection poll In-Reply-To: References: Message-ID: On 11/13/19 10:42 AM, Brian Rosmaita wrote: > At the Shanghai PTG last week, the Cinder team decided to hold a Virtual > PTG the last week of November. > > The format will be 2 two-hour sessions, ideally held on consecutive days. > > Since most people have the 16:00 UTC Cinder meeting already on their > calendars, I suggest that we skip the meeting on 27 November and instead > use that time for Virtual PTG. > > So, basically, I'd like to meet: Wednesday for sure, and either Tuesday > or Thursday (with Monday or Friday as possibilities if Tuesday or > Thursday are impossible for too many people). > > Let me know what your preferences are on this poll: > > https://forms.gle/rKDJpSZvAxbnBESp7 The target of that link, a google form, may not be available in China. This one is probably not blocked: https://rosmaita.wufoo.com/forms/cinder-ussuri-virtual-ptg/ If you already voted, please do NOT use the wufoo poll to vote again. I will collate the results from the two polls. > > The poll closes at 23:39 on *Tuesday* 19 November 2019. > > > thanks, > brian From openstack at nemebean.com Wed Nov 13 18:08:27 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 13 Nov 2019 12:08:27 -0600 Subject: [oslo] Virtual PTG Planning Message-ID: Hi Osloers, Given that a lot of the team was not in Shanghai and we had a few topics proposed that didn't make sense to discuss as a result, I would like to try doing a virtual PTG the way a number of the other teams are. I've added a section to the PTG etherpad[0] with some proposed details, but in general I'm thinking we meet on Jitsi (it's open source) around the time of the Oslo meeting. It's possible we might be able to get through everything in the regularly scheduled hour, but if possible I'd like to keep the following hour (1600-1700 UTC) open as well. If everyone's available we could do it next week (the 18th) or possibly the following week (the 25th), although that runs into Thanksgiving week in the US so people might be out. I've created a Doodle poll[1] with selections for the next three weeks so please respond there if you can make it any of those days. If none of them work well we can discuss alternative options. Thanks. -Ben 0: https://etherpad.openstack.org/p/oslo-shanghai-topics 1: https://doodle.com/poll/8bqiv865ucyt8499 From smooney at redhat.com Wed Nov 13 18:12:51 2019 From: smooney at redhat.com (Sean Mooney) Date: Wed, 13 Nov 2019 18:12:51 +0000 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> Message-ID: <2ce84a47ac59bdd160a71b37eaf05f0eca9e1f85.camel@redhat.com> On Tue, 2019-11-12 at 14:53 +0100, Slawek Kaplonski wrote: > Stateless security groups > ========================= > > Old RFE [21] was approved for neutron-fwaas project but we all agreed that this > should be now implemented for security groups in core Neutron. > People from Nuage are interested in work on this in upstream. > We should probably also explore how easy/hard it will be to implement it in > networking-ovn backend. for what its worth we implemented this 4 years ago and it was breifly used in production trial deployment in a telco deployment but i dont think it ever went to full production as they went wtih sriov instead https://review.opendev.org/#/c/264131/ as part of this RFE https://bugs.launchpad.net/neutron/+bug/1531205 which was closed as wont fix https://bugs.launchpad.net/neutron/+bug/1531205/comments/14 as it was view that this was not the correct long term direction for the community. this is the summit presentation for austin for anyone that does not rememebr this effort https://www.openstack.org/videos/summits/austin-2016/tired-of-iptables-based-security-groups-heres-how-to-gain-tremendous-speed-with-open-vswitch-instead im not sure how the new proposal differeres form our previous proposal for the same feautre but the main pushback we got was that the securtiy group api is assumed to be stateful and that is why this was rejected. form our mesurments at the time we expected the stateless approch to scale better then contrack driver so it woudl be nice to see a stateless approch avialable. i never got around to deleteing our implemenation form networking-ovs-dpdk https://opendev.org/x/networking-ovs-dpdk/src/branch/master/networking_ovs_dpdk/agent/ovs_dpdk_firewall.py but i has not been tested our updated really for the last 2 years but it could be used as a basis of this effort if nuage does not have a poc already. From mriedemos at gmail.com Wed Nov 13 18:53:27 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 12:53:27 -0600 Subject: [stable][glance] Proposal to add Abhishek Kekane to glance-stable-maint In-Reply-To: <86e16a59-8d38-6951-e779-26d97f4d2eaa@gmail.com> References: <86e16a59-8d38-6951-e779-26d97f4d2eaa@gmail.com> Message-ID: <23ee160b-ce08-4418-cdc3-756659452ea2@gmail.com> On 11/12/2019 2:17 PM, Brian Rosmaita wrote: > we are currently understaffed in glance-stable-maint.  Plus, he's the > current Glance PTL. glance-stable-maint is understaffed yes. I ran a reviewstats report on glance stable branch reviews over the last 180 days: http://paste.openstack.org/show/786058/ Abhishek has only done 3 stable branch reviews in 6 months which is pretty low but to be fair maybe there aren't that many open reviews on stable branches for glance and the other existing glance-stable-maint cores don't have a lot more reviews either, so maybe that's just par for the course. As for being core on master or being PTL, as you probably know, that doesn't really mean much when it comes to stable branch reviews, which is more about the stable branch guidelines. Nova has a few stable branch cores that aren't core on master because they adhere to the guidelines and do a lot of stable branch reviews. Anyway, I'm OK trusting Abhishek here and adding him to the glance-stable-maint team. Things are such these days that beggars can't really be choosers. -- Thanks, Matt From gmann at ghanshyammann.com Wed Nov 13 19:01:18 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 14 Nov 2019 03:01:18 +0800 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> Message-ID: <16e6624331a.e64752ca206466.2746510287828263922@ghanshyammann.com> ---- On Tue, 12 Nov 2019 22:12:29 +0800 Corey Bryant wrote ---- > > On Mon, Nov 11, 2019 at 2:33 PM Zane Bitter wrote: > On 7/11/19 2:11 pm, Corey Bryant wrote: > > Hello TC members, > > > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's > > too late to enable voting py38 unit tests for ussuri, I'd like to at > > least enable non-voting py38 unit tests. This email is seeking approval > > and direction from the TC to move forward with enabling non-voting py38 > > tests. > > I was a bit fuzzy on this myself, so I looked it up and this is what the > TC decided when we passed the resolution: > > > If the new Zuul template contains test jobs that were not in the previous one, the goal champion(s) may choose to update the previous template to add a non-voting check job (or jobs) to match the gating jobs in the new template. This means that all repositories that have not yet converted to the template for the upcoming release will see a non-voting preview of the new job(s) that will be added once they update. If this option is chosen, the non-voting job should be limited to the master branch so that it does not run on the preceding release’s stable branch. > > > Thanks for digging that up and explaining. I recall that wording and it makes a lot more sense now that we have a scenario in front of us. > > (from > https://governance.openstack.org/tc/resolutions/20181024-python-update-process.html#unit-tests > ) > > So to follow that process we would need to define the python versions > for V, then appoint a goal champion, and after that it would be at the > champion's discretion to add a non-voting job on master in Ussuri. I > happened to be sitting next to Sean when I saw this thread, and after > discussing it with him I think he would OK with having a non-voting job > on every commit, since it's what we have documented. Previous > discussions established that the overhead of adding one Python unit test > job to every project was pretty inconsequential (we'll offset it by > dropping 2.7 jobs anyway). > > I submitted a draft governance patch defining the Python versions for V > (https://review.opendev.org/693743). Unfortunately we can't merge it yet > because we don't have a release name for V (Sean is working on that: > https://review.opendev.org/693266). It's gazing in the crystal ball a > > Thanks very much for getting that going. > little bit, but even if for some reason Ubuntu 20.04 is not released > before the V cycle starts, it's inevitable that we will be selecting > Python 3.8 because it meets the first criterion ("The latest released > version of Python 3 that is available in any distribution we can > feasibly use for testing") - 3.8 is released and it's available in > Ubuntu 18.04, which is the distro we use for testing anyway. > > So, in my opinion, if you're volunteering to be the goal champion then > there's no need for any further approval by the TC ;) > > > Sure, I can champion that. Just to be clear, would that be Ussuri and V python3-updates champion, similar to the following? > https://governance.openstack.org/tc/goals/selected/train/python3-updates.html > Granted it's easier now that we mostly just have to switch the job template to the new release. > I guess to make that official we should commit the python3 update Goal > for the V cycle now... or at least as soon as we have a release name. > > How far off do you think we are from having a V name? If just a few weeks then I'm fine waiting but if over a month I'm more concerned. > > This is happening a little earlier than I think we anticipated but, > given that there's no question what is going to happen in V, I don't > think we'd be doing anybody any favours by delaying the process > unnecessarily. ++ on not delaying the process. That is the main point of the goal process schedule also. To be clear, are we going to add the py3.8 n-v job as part of v cycle template (openstack-python3-v*-jobs) ? I hope yes, as it will enable us to make the one-time change on the project's side. Once we are in V cycle then template can be updated to make it a voting job. If not as part of the template (adding n-v job explicitly in Ussuri cycle and then add the V template once V cycle starts. ) then it will be two changes per project which I would like to avoid. -gmann > I agree. And Python 3.9 doesn't release until Oct 2020 so that won't be in the picture for Ussuri or V. > > > > For some further background: The next release of Ubuntu, Focal (20.04) > > LTS, is scheduled to release in April 2020. Python 3.8 will be the > > default in the Focal release, so I'm hopeful that non-voting unit tests > > will help close some of the gap. > > > > I have a review here for the zuul project template enablement for ussuri: > > https://review.opendev.org/#/c/693401 > > > > Also should this be updated considering py38 would be non-voting? > > https://governance.openstack.org/tc/reference/runtimes/ussuri.html > > No, I don't think this changes anything for Ussuri. It's preparation for V. > > > Ok. Appreciate all the input and help. > Thanks,Corey > From Albert.Braden at synopsys.com Wed Nov 13 19:30:15 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Wed, 13 Nov 2019 19:30:15 +0000 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> Message-ID: Removing these 3 obsolete filters appears to have fixed the problem. Thank you for your advice! -----Original Message----- From: Matt Riedemann Sent: Tuesday, November 12, 2019 1:14 PM To: openstack-discuss at lists.openstack.org Subject: Re: Scheduler sends VM to HV that lacks resources On 11/12/2019 2:47 PM, Albert Braden wrote: > It's probably a config error. Where should I be looking? This is our nova config on the controllers: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_kNe1eRimk4ifrAuuN790bg&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=TZI4wT8_y-RAnwbbXaWBhdvAhhcbY1qymxKLRVpPt2U&s=3aQNqwtEMfOC7U_QUTqNqXiZv4yJy6ceB4kCuZKuL0o&e= If your deployment is pike or newer (I'm guessing rocky because your other email says rocky), then you don't need these filters: RetryFilter - alternate hosts bp in queens release makes this moot CoreFilter - placement filters on VCPU RamFilter - placement filters on MEMORY_MB -- Thanks, Matt From juliaashleykreger at gmail.com Wed Nov 13 19:35:23 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Wed, 13 Nov 2019 11:35:23 -0800 Subject: [ironic][ptg] Summary of discussions/happenings related to ironic Message-ID: Overall, There was quite a bit of interest in Ironic. We had great attendance for the Project Update, Rico Lin’s Heat/Ironic integration presentation, demonstration of dhcp-less virtual media boot, and the forum discussion on snapshot support for bare metal machines, and more! We also learned there are some very large bare metal clouds in China, even larger than the clouds we typically talk about when we discuss scale issues. As such, I think it would behoove the ironic community and OpenStack in general to be mindful of hyper-scale. These are not clouds with 100s of compute nodes, but with baremetal clouds containing thousands to tens of thousands of physical bare metal machines. So in no particular order, below is an overview of the sessions, discussions, and commentary with additional status where applicable. My apologies now since this is over 4,000 words in length. Project Update =========== The project update was fairly quick. I’ll try and record a video of it sometime this week or next and post it online. Essentially Ironic’s code addition/deletion levels are relatively stable cycle to cycle. Our developer and Ironic operator commit contribution levels have increased in Train over Stein, while the overall pool of contributors has continued to decline cycle after cycle, although not dramatically. I think the takeaway from this is that as ironic has become more and more stable, and that the problems being solved in many cases are operator specific needs or wants, or bug fixes in cases that are only raised in particular environment configurations. The only real question that came out of the project update was, if my memory is correct, was “What does Metal^3 mean for Ironic”, and “Who is driving forward Metal^3?” The answers are fairly straight forward, more ironic users and more use cases from Metal^3 driving ironic to deploy machines. As for who is driving it forward, it is largely being driven forward by Red Hat along with interested communities and hardware vendors. Quick, Solid, and Automatic OpenStack Bare-Metal Orchestration ================================================== Rico Lin, the Heat PTL, proposed this talk promoting the possibility of using ironic naively to deploy bare metal nodes. Specifically where configuration pass-through can’t be made generic or somehow articulated through the compute API. Cases where they may be is where someone wishes to utilize something like our “ramdisk” deploy_interface which does not deploy an image to the actual physical disk. The only real question that I seem to remember coming up was the question why might someone want or need to do this, which again becomes more of a question of doing things that are not quite “compute” API-ish. The patches are available in gerrit[10]. Operator Feedback Session ===================== The operator feedback[0] session was not as well populated with maybe ~20-25 people present. Overall the feeling of the room was that “everything works”, however there is a need and desire for information and additional capabilities. * Detailed driver support matrix * Reduce the deployment times further * Disk key rotation is an ask from operators for drives that claim smart erase support but end up doing a drive wipe instead. In essence, to reduce the overall time spent cleaning * Software RAID is needed at deploy time. * IPA needs improved error handling. - This may be a case where something of the communication flow changes that had been previously discussed could help in that we could actively try and keep track of the agent a little more. Additional discussion will definitely be required. * There does still seem to be some interest in graphical console support. A contributor has been revising patches, but I think it would really help for a vendor to become involved here and support accessing their graphical interface through such a method. * Information and an information sharing location is needed. I’ve reached out to the Foundation staff regarding the Bare Metal Logo Program to see if we can find a common place that we can build/foster moving forward. In this topic, the one major pain point began being stressed, issues with the resource tracker at 3,500 bare metal nodes. Privately another operator reached out with the same issue in the scale of tens of thousands of bare metal nodes. As such, this became a topic during the PTG which gained further discussion. I’ll cover that later. Ironic – Snapshots? =============== As a result of some public discussion of adding snapshot capability, I proposed a forum session to discuss the topic[1] such that requirements can be identified and the discussion can continue over the next cycle. I didn't expect the number of attendees present to swell from the operator's feedback session. The discussion of requirements went back and forth to ultimately define "what is a snapshot" in this case, and "what should Ironic do?" There was quite a bit of interaction in this session and the consensus seemed to be the following: * Don’t make it reliant on nova, for standalone users may want/need to use it. * This could be a very powerful feature as an operator could ``adopt`` a machine into ironic and then ``snapshot`` it to capture the disk contents. * Block level only and we can’t forget about capturing/storing content checksum * Capture the machine’s contents with the same expectation as we would have for a VM, and upload this to someplace. In order to make this happen in a fashion which will scale, the ironic team will likely need to leverage the application credentials. Ironically reeling in large bare metal deployment without PXE ============================================== This was a talk submitted by Ilya Etingof, who unfortunately was unable to make it to the summit. Special thanks goes to Both Ilya and Richard Pioso for working together to make this demonstration happen. The idea was to demonstrate where the ironic team sees the future of deployment of machines on the edge using virtual media and how vendors would likely interact with that in some cases as slightly different mechanics may be required even if the BMCs all speak Redfish, which is the case for a Dell iDRAC BMC. The idea[2] ultimately being is that the conductor would inject the configuration information into the virtual media ISO image that is attached via virtual media negating the need for DHCP. We have videos posted that allow those interested to see what this functionality looks like with neutron[3] and without neutron[4]. While the large audience was impressed, it seemed to be a general surprise that Ironic had virtual media support in some of the drivers previously. This talk spurred quite a bit of conversation and hallway track style discussion after the presentation concluded which is always an excellent sign. Project Teams Gathering =================== The ironic community PTG attendance was nothing short of excellent. Thank you everyone who attended! At one point we had fifteen people and a chair had to be pulled up to our table for a 16th person to join us. At which point, we may have captured another table and created confusion. We did things a little differently this time around. Given some of the unknowns, we did not create a strict schedule around the topics. We simply went through and prioritized topics and tried to discuss them each as thoroughly as possible until we had reached the conclusion or a consensus on the topic. Topics and a few words on each topic we discussed in the notes section on the PTG etherpad[5]. On-boarding ----------------- We had three contributors that attended a fairly brief on-boarding overview of Ironic. Two of them were more developer focused where as the third was more of an operator focus looking to leverage ironic and see how they can contribute back to the community. BareMetal SIG - Next Steps ------------------------------------- Arne Wiebalck and I both provided an update including current conversations where we saw the SIG, the Logo Program, the white paper, and what should the SIG do beyond the whitepaper. To start with the Logo program, it largely seems there that somewhere along the way a message or document got lost and that largely impacted the Logo Program -> SIG feedback mechanism. I’m working with the OpenStack Foundation to fix that and get communication going again. Largely what spurred that was that some vendors expressed interest in joining, and wanted additional information. As for the white paper, contributions are welcome and progress is being made again. >From a next steps standpoint, the question was raised how do we build up an improved Operator point of contact. There was some consensus that we as a community should try to encourage at least one contributor to attend the operations mid-cycles. This allows for a somewhat shorter feedback look with a different audience. We also discussed knowledge sharing, or how to improve it. Included with this is how do we share best practices. I’ve got the question out to folks at the foundation if there is a better way as part of the Logo program, or if we should just use the Wiki. I think this will be an open discussion topic in the coming weeks. The final question that came up as part of the SIG is how to show activity. I reached out to Amy on the UC regarding this, and it seems the process is largely just reach out to the current leaders of the SIG, so it is critical that we keep that up to date moving forward. Sensor Data/Metrics --------------------------- The barrier between Tenant level information and Operator level information is difficult with this topic. The consensus among the group was that the capability to collect some level of OOB sensor data should be present on all drivers, but there is also a recognition that this comes at a cost and possible performance impact. Mainly this performance impact question was raised with Redfish because this data is scattered around the API where multiple API calls are required, and may even cause some interruption to actively inquire upon some data points. The middle ground in the discussion came to adding a capability of somehow saying “collect power status, temp every minute, fan speeds every five minutes, drive/cpu health data maybe every 30 minutes”. I would be remiss if I didn't note that there was joking about how this would in essence be re-implementation of Cron. What this would end up looking like, we don’t know, but it would provide operators the data resolution necessary for the failure risk/impact. The analogy used was that “If the temperature sensor has risen to an alarm level, either a AC failure or a thermal hot spot forming based upon load in the data center, checking the sensor too often is just not going to result in a human investigating that on the data center floor any faster.” Mainly I believe this discussion largely stresses that the information is for the operator of the bare metal and not to provide insight into a tenant monitoring system, that those activities should largely be done with-in the operating system. One question among the group was if anyone was using the metrics framework built into ironic already for metrics of ironic itself, to see if we can re-use it. Well, it uses a plugin interface! In any event, I've sent a post to the openstack-discuss mailing list seeking usage information. Node Retirement ----------------------- This is a returning discussion from the last PTG, and in discussing the topic we figured out where the discussion became derailed at previously. In essence, the desire was to mix this with the concept of being able to take a node “out of service”. Except, taking a node out of service is an immediate state related flag, where as retiring might be as soon as the current tenant vacates the machine… possibly in three to six months. In other words, one is “do something or nothing now”, and the other is “do something later when a particular state boundary is crossed”. Trying to make one solution for both, doesn’t exactly work. Unanimous consensus among those present was that in order to provide node retirement functionality, that the logic should be similar to maintenance/maintenance reason. A top level field in the node object that would allow API queries for nodes slated for retirement, which helps solve an operator workflow conundrum “How do I know what is slated for retirement but not yet vacated?” Going back to the “out of service” discussion, we reached consensus that this was in essence a “user declarable failed state”, and as such that it should be done only in the state machine as it is in the present, not a future action. Should we implement out of service, we’ll need to check the nova.virt.ironic code and related virt code to properly handle nodes dropping from `ACTIVE` state, which could also be problematic and need to be API version guarded to prevent machines from accidentally entering `ERROR` state if they are not automatically recovered in nova. Multi-tenancy ------------------ Lots of interest existed around making the API somewhat of a mutli-tenant aware interaction, and the exact interactions and uses involved there are not exactly clear. What IS clear is that providing functionality as such will allow operators to remove complication in their resource classes and tenant specific flavors which is presently being used to enable tenant specific hardware pools. The added benefit of providing some level for normally non-admin users to access the ironic API is that it would allow those tenants to have a clear understanding of their used resources and available resources by directly asking ironic, where as presently, they don’t have a good way to collect nor understand that short of asking the cloud operator when it comes to bare metal. Initial work has been posted for this to gerrit[6]. In terms of how tenants resources would be shared, there was consensus that the community should stress that new special use tenants should be created for collaborative efforts. There was some discussion regarding explicitly dropping fields for non-privileged users that can see the nodes, such as driver_info and possibly even driver_internal_info. Definitely a topic that requires more discussion, but that would solve operator reporting and use headaches. Manual Cleaning Out-Of-Band ---------------------------------------- The point was raised that we unconditionally start the agent ramdisk to perform manual cleaning. Except, we should support a method of out of band cleaning operators to only be executed so the bare metal node doesn’t need to be booted to a ramdisk. The consensus seemed to be that we should consider a decorator or existing decorator change that allows the conductor to hold off actually powering the node on for ramdisk boot unless or until a step is reached that is not purely out of band. In essence, fixing this allows a “fix_bmc” out of band clean step to be executed first without trying to modify BMC settings, which would presently fail. Scale issues ----------------- A number of scaling issues between how nova and ironic interact, specifically with the resource tracker and how inventory is updated from ironic and loaded into nova. Largely this issue revolves around the concept in nova that each ``nova-compute`` is a hypervisor. And while one can run multiple ``nova-compute`` processes to serve as the connection to ironic, the underlying lock in Nova is at the level of the compute node, not the node level. This means as thousands of records are downloaded, synced, copied into the resource tracker, the compute process is essentially blocked from other actions while this serialized job runs. In a typical VM case, you may only have at most a couple hundred VMs on a hypervisor, where as with bare metal, we’re potentially servicing thousands of physical machines. It should be noted that there are several large scale operators that indicated during the PTG that this was their pain point. Some of the contributors from CERN sat down with us and the nova team to try and hammer out a solution to this issue. A summary of that cross project session can be found at line 212 in the PTG etherpad[0]. But there is another pain point that contributes to this performance issue and that is the speed at which records are returned by our API. We’ve had some operators voice some frustration with this before, and we should at least be mindful of this and hopefully see if we can improve record retrieval performance. In addition to this, if we supported some form of bulk “GET” of nodes, it might be able to be leveraged as opposed to a get on each node one at a time which is presently what occurs in the nova-compute process. Boot Mode Config ------------------------ Previously, when scheduling occurred with flavors and filters were appropriately set, if a machine was declared as supporting only one boot mode, requests would only ever land on that node. Now with Traits, this is a bit different and unfortunately optional without logic to really guard the setting application for an instance. So in this case, if filters are such that a request for a Legacy boot instance lands on a UEFI only machine, we’ll still try to deploy it. In reality, we really should try and fail fast. Ideally the solution here is we consult with the BMC through some sort of get_supported_boot_modes method, and if we determine a mismatch between what the settings are or what the requested instance is from the data we have, we fail the deploy. This ultimately may require work in the nova.virt.ironic driver code to identify the cause of the failure being an invalid configuration and reporting that back, however it may not be fatal on another machine. Security of /heartbeat and /lookup endpoints ----------------------------------------------------------- We had a discussion of adding some additional layers of security mechanics around the /heartbeat and /lookup endpoints in ironic’s REST API. These limited endpoints are documented as being unauthenticated, so naturally some issues can arise from these and we want to minimize the vectors in which an attacker that has gained access to a cleaning/provisioning/rescue network could possibly impersonate a running ironic-python-agent. Conversely, the ironic-python-agent runs in a similar fashion, intended to run on secure trusted networks which is only accessible to the ironic-conductor. As such, we also want to add some validation to the API request is from the same Ironic deployment that IPA is heart-beating to. The solution to this introduce a limited lifetime token that is unique per node per deployment. It would be stored in RAM on the agent, and in the node.driver_internal_info so it is available to the conductor. It would be provided only once via out of band OR via the first “lookup” of a node, and then only become accessible again during known reboot steps. Conceptually the introduction of tokens was well supported in the discussions and there were zero objections to doing so. Some initial patches[7][8] are under development to move this forward. An additional item is to add IP address filtering capabilities to both endpoints such that we only process the heartbeat/lookup address if we know it came from the correct IP address. An operator has written this feature downstream and consensus was unanimous at the PTG that we should accept this feature upstream. We should expect a patch for this functionality to be posted soon. Persistent Agents ------------------------ The use case behind persistent agents is “I want to kexec my way to the agent ramdisk, or the next operating system.” and “I want to have up to date inspection data.” We’ve already somewhat solved the latter, but the former is a harder problem requiring the previously mentioned endpoint security enhancements to be in-place first. There is some interest from CERN and some other large scale operators. In other words, we should expect more of this from an bare metal fleet operations point of view for some environments as we move forward. “Managing hardware the Ironic way” ------------------------------------------------- The question that spurred this discussion was “How do I provide a way for my hardware manager to know what it might need to do by default.” Except, those defaults may differ between racks that serve different purposes. “Rack 1, node0” may need a port set to FiberChannel mode, where as “Rack2, node1” may require it to be Ethernet. This quickly also reaches the discussion of “What if I need different firmware versions by default?” This topic quickly evolved from there and the idea that surfaced was that we introduce a new field on the node object for the storage of such data. Something like ``node.default_config``, where it would be a dictionary sort of like what a user provides for cleaning steps or deploy steps, that provides argument values which is iterated through when in automated cleaning mode to allow operators to fill in configuration requirement gaps for hardware managers. Interestingly enough, even today we just had someone ask a similar question in IRC. This should ultimately be usable to assert desired/default firmware from an administrative point of view. Adrianc (Mellanox) is going to reach out to bdobb (DMTF) regarding the redfish PLDM firmware update interface to see where this may go from here. Edge computing working group session ---------------------------------------------------- The edge working group largely became a session to update everyone on where Ironic was going and where we see things going in terms of managing bare metal at the edge/far-edge. This included some in-depth questions about dhcp-less deployment and related mechanics as well as HTTPBoot’ing machines. Supporting HTTPBoot does definitely seem to be of interest to a number of people, although at least after sharing my context only five or six people in attendance really seemed interested in ironic prioritizing such functionality. The primary blocker, for those that are unaware, is pre-built UEFI images for us to do integration testing for IPv4 HTTPBoot. Functionally ironic already supports IPv6 HTTPBoot via DHCPv6 as part of our IPv6 support with PXE/iPXE, however we also don’t have an integration test job for this code path for the same reason, pre-built UEFI firmware images lack the built-in support. More minor PTG topics ------------------------------- * Smartnics - A desire to attach virtual ports in ironic baremetal nodes with smartnics was raised. Seems that we don’t need to try and create a port entry in ironic. It seems we only need to track/signal and remove the “vif” attachment” to the node in general as there is no physical mac required for that virtual port in ironic. The constraint that at least one MAC address would be required to identify the machine is understood. If anyone sees an issue with this, please raise this to adrianc. * Metal^3 - Within the group attending the PTG, there was not much interest in Metal^3 or using CRDs to manage bare metal resources with ironic hidden behind the CRD. One factor related to this is the desire to define more data to be passed through to ironic which is not presently supported in the CRD definition.. Stable Backports with Ironic's release model ================================== I was pulled into a discussion with the TC and the Stable team regarding frustrations that have been expressed with-in the ironic team regarding stable back-porting of fixes, mainly drivers. There is consensus that it is okay for us as the ironic team to backport drivery things when needed to support vendors as long as they are not breaking or overall behavior contracts. This quickly leads us to needing to also modify constraints for drivery things as well. Constraints changes will continue to be evaluated on a case by case basis, but the general consensus is there is full support to "do the right thing" for ironic's users, vendors, and community. The key is making sure we are on the same page and agreeing to what that right thing is. This is where asynchronous communication can get us into trouble, and I would highly encourage trying to start higher bandwidth discussion when these cases arise in the future. The key takeaway that we should likely keep in mind is policy is there for good reasons, but policy is not and can not be a crutch to prevent the right thing from being done. Additional items worth noting - Q1 Gatherings =================================== There will be an operations mid-cycle at Bloomberg in London, January 7th-8th, 2020. It would be good if at least one ironic contributor could attend as the operators group tends to be closer to the physical baremetal, and it is a good chance to build mutual context between developers and operations people actually using our software. Additionally, we want to gauge the interest of having an ironic mid-cycle in central Europe in Q1 of 2020. We need to identify the number of contributors that would be interested in and able to attend since the next PTG will be in June. Please email me off-list if your interested in attending and I'll make a note of it as we're still having initial discussions. And now I've reached a buffer under-run on words. If there are any questions, just reply to the list. -Julia Links: [0]: https://etherpad.openstack.org/p/PVG-ironic-operator-feedback [1]: https://etherpad.openstack.org/p/PVG-ironic-snapshot-support [2]: https://review.opendev.org/#/c/672780/ [3]: https://tinyurl.com/vwts36l [4]: https://tinyurl.com/st6azrw [5]: https://etherpad.openstack.org/p/PVG-Ironic-Planning [6]: https://review.opendev.org/#/c/689551/ [7]: https://review.opendev.org/692609 [8]: https://review.opendev.org/692614 [9]: https://etherpad.openstack.org/p/ops-meetup-1st-2020 [10]: https://review.opendev.org/#/q/topic:story/2006403+(status:open+OR+status:merged) From corey.bryant at canonical.com Wed Nov 13 19:43:29 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Wed, 13 Nov 2019 14:43:29 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: <16e6624331a.e64752ca206466.2746510287828263922@ghanshyammann.com> References: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> <16e6624331a.e64752ca206466.2746510287828263922@ghanshyammann.com> Message-ID: On Wed, Nov 13, 2019 at 2:01 PM Ghanshyam Mann wrote: > ---- On Tue, 12 Nov 2019 22:12:29 +0800 Corey Bryant < > corey.bryant at canonical.com> wrote ---- > > > > On Mon, Nov 11, 2019 at 2:33 PM Zane Bitter wrote: > > On 7/11/19 2:11 pm, Corey Bryant wrote: > > > Hello TC members, > > > > > > Python 3.8 is available in Ubuntu Bionic now and while I understand > it's > > > too late to enable voting py38 unit tests for ussuri, I'd like to at > > > least enable non-voting py38 unit tests. This email is seeking > approval > > > and direction from the TC to move forward with enabling non-voting > py38 > > > tests. > > > > I was a bit fuzzy on this myself, so I looked it up and this is what > the > > TC decided when we passed the resolution: > > > > > If the new Zuul template contains test jobs that were not in the > previous one, the goal champion(s) may choose to update the previous > template to add a non-voting check job (or jobs) to match the gating jobs > in the new template. This means that all repositories that have not yet > converted to the template for the upcoming release will see a non-voting > preview of the new job(s) that will be added once they update. If this > option is chosen, the non-voting job should be limited to the master branch > so that it does not run on the preceding release’s stable branch. > > > > > > Thanks for digging that up and explaining. I recall that wording and it > makes a lot more sense now that we have a scenario in front of us. > > > > (from > > > https://governance.openstack.org/tc/resolutions/20181024-python-update-process.html#unit-tests > > ) > > > > So to follow that process we would need to define the python versions > > for V, then appoint a goal champion, and after that it would be at the > > champion's discretion to add a non-voting job on master in Ussuri. I > > happened to be sitting next to Sean when I saw this thread, and after > > discussing it with him I think he would OK with having a non-voting job > > on every commit, since it's what we have documented. Previous > > discussions established that the overhead of adding one Python unit > test > > job to every project was pretty inconsequential (we'll offset it by > > dropping 2.7 jobs anyway). > > > > I submitted a draft governance patch defining the Python versions for V > > (https://review.opendev.org/693743). Unfortunately we can't merge it > yet > > because we don't have a release name for V (Sean is working on that: > > https://review.opendev.org/693266). It's gazing in the crystal ball a > > > > Thanks very much for getting that going. > > little bit, but even if for some reason Ubuntu 20.04 is not released > > before the V cycle starts, it's inevitable that we will be selecting > > Python 3.8 because it meets the first criterion ("The latest released > > version of Python 3 that is available in any distribution we can > > feasibly use for testing") - 3.8 is released and it's available in > > Ubuntu 18.04, which is the distro we use for testing anyway. > > > > So, in my opinion, if you're volunteering to be the goal champion then > > there's no need for any further approval by the TC ;) > > > > > > Sure, I can champion that. Just to be clear, would that be Ussuri and V > python3-updates champion, similar to the following? > > > https://governance.openstack.org/tc/goals/selected/train/python3-updates.html > > Granted it's easier now that we mostly just have to switch the job > template to the new release. > > I guess to make that official we should commit the python3 update Goal > > for the V cycle now... or at least as soon as we have a release name. > > > > How far off do you think we are from having a V name? If just a few > weeks then I'm fine waiting but if over a month I'm more concerned. > > > > This is happening a little earlier than I think we anticipated but, > > given that there's no question what is going to happen in V, I don't > > think we'd be doing anybody any favours by delaying the process > > unnecessarily. > > ++ on not delaying the process. That is the main point of the goal process > schedule also. > To be clear, are we going to add the py3.8 n-v job as part of v cycle > template (openstack-python3-v*-jobs) ? I hope yes, as > it will enable us to make the one-time change on the project's side. Once > we are in V cycle then template can be updated to make it a voting job. > > If not as part of the template (adding n-v job explicitly in Ussuri cycle > and then add the V template once V cycle starts. ) then it will be two > changes per project which I would like to avoid. > > -gmann > > My plan is to create V templates soon which will include voting py38. And ussuri templates will have non-voting py38: https://review.opendev.org/#/c/693401/ I was thinking we couldn't add V templates to projects until after their stable/ussuri branches are created, which would mean one update per project per release. Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Wed Nov 13 19:43:56 2019 From: melwittt at gmail.com (melanie witt) Date: Wed, 13 Nov 2019 11:43:56 -0800 Subject: [nova] track error migrations and orphans in Resource Tracker In-Reply-To: References: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> Message-ID: On 11/12/19 05:18, Sean Mooney wrote: > On Tue, 2019-11-12 at 05:46 +0000, Zhong, Luyao wrote: >> Hi Nova experts, >> >> "Not tracking error migrations and orphans in RT." is probably a bug. This may trigger some problems in >> update_available_resources in RT at the moment. That is some orphans or error migrations are using cpus/memory/disk >> etc, but we don't take these usage into consideration. And instance.resources is introduced from Train used to contain >> specific resources, we also track assigned specific resources in RT based on tracked migrations and instances. So this >> bug will also affect the specific resources tracking. >> >> I draft an doc to clarify this bug and possible solutions: >> https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT >> Looking forward to suggestions from you. Thanks in advance. >> > there are patche up to allow cleaning up orpahn instances > https://review.opendev.org/#/c/627765/ > https://review.opendev.org/#/c/648912/ > if we can get those merged that woudl adress at least some of the proablem I just wanted to mention: I have reviewed the cleanup patches ^ multiple times and I'm having a hard time getting past the fact that any way you slice it (AFAICT), the cleanup code will have a window where a valid guest could be destroyed erroneously (not an orphan). This is because the "get instance list by host" can miss instances that are mid-migration, because of how/where we update the instance.host field. Maybe this ^ could be acceptable (?) if we put a big fat warning on the config option help for 'reap_unknown'. But I was unsure of the answers about what recovery looks like in case a guest is erroneously destroyed for an instance that is in the middle of migrating. In the case of resize or cold migrate, a hard reboot would fix it AFAIK. What about for a live migration? If recovery is possible in every case, those would also need to be documented in the config option help for 'reap_unknown'. The patch has lots of complexities to think about and I'm left wondering if the pitfalls are better or worse than the current state. It would help if others joined in the review with their thoughts about it. -melanie From mriedemos at gmail.com Wed Nov 13 19:51:33 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 13:51:33 -0600 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> Message-ID: On 11/13/2019 1:30 PM, Albert Braden wrote: > Removing these 3 obsolete filters appears to have fixed the problem. Thank you for your advice! Awesome, I'm glad it worked. -- Thanks, Matt From mriedemos at gmail.com Wed Nov 13 19:53:27 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 13:53:27 -0600 Subject: [nova] track error migrations and orphans in Resource Tracker In-Reply-To: References: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> Message-ID: On 11/13/2019 1:43 PM, melanie witt wrote: > This is because the "get instance list by host" can miss instances that > are mid-migration, because of how/where we update the instance.host field. Why not just filter out any instances that have a non-None task_state? Or barring that, filter out any instances that have an in-progress migration (there is a method that the ResourceTracker uses to get those kinds of migrations occurring either as incoming to or outgoing from the host). -- Thanks, Matt From gmann at ghanshyammann.com Wed Nov 13 20:02:35 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 14 Nov 2019 04:02:35 +0800 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> <16e6624331a.e64752ca206466.2746510287828263922@ghanshyammann.com> Message-ID: <16e665c4c72.12a62ab24207568.4557683352710362215@ghanshyammann.com> ---- On Thu, 14 Nov 2019 03:43:29 +0800 Corey Bryant wrote ---- > > > On Wed, Nov 13, 2019 at 2:01 PM Ghanshyam Mann wrote: > ---- On Tue, 12 Nov 2019 22:12:29 +0800 Corey Bryant wrote ---- > > > > On Mon, Nov 11, 2019 at 2:33 PM Zane Bitter wrote: > > On 7/11/19 2:11 pm, Corey Bryant wrote: > > > Hello TC members, > > > > > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's > > > too late to enable voting py38 unit tests for ussuri, I'd like to at > > > least enable non-voting py38 unit tests. This email is seeking approval > > > and direction from the TC to move forward with enabling non-voting py38 > > > tests. > > > > I was a bit fuzzy on this myself, so I looked it up and this is what the > > TC decided when we passed the resolution: > > > > > If the new Zuul template contains test jobs that were not in the previous one, the goal champion(s) may choose to update the previous template to add a non-voting check job (or jobs) to match the gating jobs in the new template. This means that all repositories that have not yet converted to the template for the upcoming release will see a non-voting preview of the new job(s) that will be added once they update. If this option is chosen, the non-voting job should be limited to the master branch so that it does not run on the preceding release’s stable branch. > > > > > > Thanks for digging that up and explaining. I recall that wording and it makes a lot more sense now that we have a scenario in front of us. > > > > (from > > https://governance.openstack.org/tc/resolutions/20181024-python-update-process.html#unit-tests > > ) > > > > So to follow that process we would need to define the python versions > > for V, then appoint a goal champion, and after that it would be at the > > champion's discretion to add a non-voting job on master in Ussuri. I > > happened to be sitting next to Sean when I saw this thread, and after > > discussing it with him I think he would OK with having a non-voting job > > on every commit, since it's what we have documented. Previous > > discussions established that the overhead of adding one Python unit test > > job to every project was pretty inconsequential (we'll offset it by > > dropping 2.7 jobs anyway). > > > > I submitted a draft governance patch defining the Python versions for V > > (https://review.opendev.org/693743). Unfortunately we can't merge it yet > > because we don't have a release name for V (Sean is working on that: > > https://review.opendev.org/693266). It's gazing in the crystal ball a > > > > Thanks very much for getting that going. > > little bit, but even if for some reason Ubuntu 20.04 is not released > > before the V cycle starts, it's inevitable that we will be selecting > > Python 3.8 because it meets the first criterion ("The latest released > > version of Python 3 that is available in any distribution we can > > feasibly use for testing") - 3.8 is released and it's available in > > Ubuntu 18.04, which is the distro we use for testing anyway. > > > > So, in my opinion, if you're volunteering to be the goal champion then > > there's no need for any further approval by the TC ;) > > > > > > Sure, I can champion that. Just to be clear, would that be Ussuri and V python3-updates champion, similar to the following? > > https://governance.openstack.org/tc/goals/selected/train/python3-updates.html > > Granted it's easier now that we mostly just have to switch the job template to the new release. > > I guess to make that official we should commit the python3 update Goal > > for the V cycle now... or at least as soon as we have a release name. > > > > How far off do you think we are from having a V name? If just a few weeks then I'm fine waiting but if over a month I'm more concerned. > > > > This is happening a little earlier than I think we anticipated but, > > given that there's no question what is going to happen in V, I don't > > think we'd be doing anybody any favours by delaying the process > > unnecessarily. > > ++ on not delaying the process. That is the main point of the goal process schedule also. > To be clear, are we going to add the py3.8 n-v job as part of v cycle template (openstack-python3-v*-jobs) ? I hope yes, as > it will enable us to make the one-time change on the project's side. Once we are in V cycle then template can be updated to make it a voting job. > > If not as part of the template (adding n-v job explicitly in Ussuri cycle and then add the V template once V cycle starts. ) then it will be two > changes per project which I would like to avoid. I saw the review now and that too works well and matches the TC resolution. Once we have V cycle testing runtime (zane patch) reflecting the 3.8 as required version then we are good to merge that. -gmann > > -gmann > > > My plan is to create V templates soon which will include voting py38. And ussuri templates will have non-voting py38: https://review.opendev.org/#/c/693401/ > I was thinking we couldn't add V templates to projects until after their stable/ussuri branches are created, which would mean one update per project per release. > > Thanks, > Corey > From cboylan at sapwetik.org Wed Nov 13 20:34:49 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Wed, 13 Nov 2019 12:34:49 -0800 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: On Fri, Nov 8, 2019, at 6:09 AM, Corey Bryant wrote: > > > On Thu, Nov 7, 2019 at 5:56 PM Sean McGinnis wrote: > > My non-TC take on this... > > > > > > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's too late to enable voting py38 unit tests for ussuri, I'd like to at least enable non-voting py38 unit tests. This email is seeking approval and direction from the TC to move forward with enabling non-voting py38 tests. > > > > I think it would be great to start testing 3.8 so there are no surprises once we need to officially move there. But I would actually not want to see that run on every since patch in every single repo. > > Just to be clear I'm only talking about unit tests right now which are > generally light on resource requirements. However it would be great to > also have py38 function test enablement and periodic would make sense > for function tests at this point. For unit tests though it seems the > benefit of knowing whether your patch regresses unit tests for the > latest python version far outweighs the resources required, so I don't > see much benefit in adding periodic unit test jobs. > Wanted to point out that we've begun to expose resource consumption in nodepool to graphite. You can find per project and per tenant resource usage under stats.zuul.nodepool.resources at https://graphite.opendev.org. Unfortunately, I don't think we have per job resource tracking there yet, but previous measurements from log files do agree that unittest consumption is relatively low. It is large multinode integration jobs that run for extended periods of time that have the greatest impact on our resource utilization. Clark From openstack at fried.cc Wed Nov 13 20:38:37 2019 From: openstack at fried.cc (Eric Fried) Date: Wed, 13 Nov 2019 14:38:37 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> References: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> Message-ID: <9abb3dc9-4b01-90f6-c020-80d4b50d6356@fried.cc> Okay, are we going to have a document that maps exception classes to these explanations and recovery actions? Which we then have to maintain as the code changes? Or are they expected to look through code (without a stack trace)? I'm not against the idea, just playing devil's advocate. Sylvain seems to have a use case, so great. As an alternative, have we considered a mechanism whereby we could, in appropriate code paths, provide some text that's expressly intended for the end user to see? Maybe it's a new user_message field on NovaException which, if present, gets percolated up to a new field similar to the one you suggested. efried On 11/13/19 11:41 AM, Matt Riedemann wrote: > On 11/13/2019 11:17 AM, Eric Fried wrote: >> Unless it's likely to be something other than NoValidHost a significant >> percentage of the time, IMO it... > > Well just taking resize, it could be one of many things: > > https://github.com/openstack/nova/blob/20.0.0/nova/conductor/manager.py#L366 > - oops you tried resizing which would screw up your group affinity policy > > https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L4490 > - (for an admin, cold migrate) oops you tried cold migrating a vcenter > vm or you have allow_resize_to_same_host=True and the scheduler picks > the same host (silly scheduler, see bug 1748697) > > https://github.com/openstack/nova/blob/20.0.0/nova/compute/claims.py#L113 - > oops you lost a resource claims race, try again > > https://github.com/openstack/nova/blob/20.0.0/nova/scheduler/client/report.py#L1898 > - oops you lost a race with allocation consumer generation conflicts, > try again > From juliaashleykreger at gmail.com Wed Nov 13 20:40:41 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Wed, 13 Nov 2019 12:40:41 -0800 Subject: [ironic][ptg] Summary of discussions/happenings related to ironic In-Reply-To: References: Message-ID: A minor revision, we have new links for the videos as it seems there was an access permission issue. Links replaced below. On Wed, Nov 13, 2019 at 11:35 AM Julia Kreger wrote: > > Overall, There was quite a bit of interest in Ironic. We had great > attendance for the Project Update, Rico Lin’s Heat/Ironic integration > presentation, demonstration of dhcp-less virtual media boot, and the > forum discussion on snapshot support for bare metal machines, and > more! We also learned there are some very large bare metal clouds in > China, even larger than the clouds we typically talk about when we > discuss scale issues. As such, I think it would behoove the ironic > community and OpenStack in general to be mindful of hyper-scale. These > are not clouds with 100s of compute nodes, but with baremetal clouds > containing thousands to tens of thousands of physical bare metal > machines. > > So in no particular order, below is an overview of the sessions, > discussions, and commentary with additional status where applicable. > > My apologies now since this is over 4,000 words in length. > > Project Update > =========== > > The project update was fairly quick. I’ll try and record a video of it > sometime this week or next and post it online. Essentially Ironic’s > code addition/deletion levels are relatively stable cycle to cycle. > Our developer and Ironic operator commit contribution levels have > increased in Train over Stein, while the overall pool of contributors > has continued to decline cycle after cycle, although not dramatically. > I think the takeaway from this is that as ironic has become more and > more stable, and that the problems being solved in many cases are > operator specific needs or wants, or bug fixes in cases that are only > raised in particular environment configurations. > > The only real question that came out of the project update was, if my > memory is correct, was “What does Metal^3 mean for Ironic”, and “Who > is driving forward Metal^3?” The answers are fairly straight forward, > more ironic users and more use cases from Metal^3 driving ironic to > deploy machines. As for who is driving it forward, it is largely being > driven forward by Red Hat along with interested communities and > hardware vendors. > > Quick, Solid, and Automatic OpenStack Bare-Metal Orchestration > ================================================== > > Rico Lin, the Heat PTL, proposed this talk promoting the possibility > of using ironic naively to deploy bare metal nodes. Specifically where > configuration pass-through can’t be made generic or somehow > articulated through the compute API. Cases where they may be is where > someone wishes to utilize something like our “ramdisk” > deploy_interface which does not deploy an image to the actual physical > disk. The only real question that I seem to remember coming up was the > question why might someone want or need to do this, which again > becomes more of a question of doing things that are not quite > “compute” API-ish. The patches are available in gerrit[10]. > > Operator Feedback Session > ===================== > > The operator feedback[0] session was not as well populated with maybe > ~20-25 people present. Overall the feeling of the room was that > “everything works”, however there is a need and desire for information > and additional capabilities. > > * Detailed driver support matrix > * Reduce the deployment times further > * Disk key rotation is an ask from operators for drives that claim > smart erase support but end up doing a drive wipe instead. In essence, > to reduce the overall time spent cleaning > * Software RAID is needed at deploy time. > * IPA needs improved error handling. - This may be a case where > something of the communication flow changes that had been previously > discussed could help in that we could actively try and keep track of > the agent a little more. Additional discussion will definitely be > required. > * There does still seem to be some interest in graphical console > support. A contributor has been revising patches, but I think it would > really help for a vendor to become involved here and support accessing > their graphical interface through such a method. > * Information and an information sharing location is needed. I’ve > reached out to the Foundation staff regarding the Bare Metal Logo > Program to see if we can find a common place that we can build/foster > moving forward. In this topic, the one major pain point began being > stressed, issues with the resource tracker at 3,500 bare metal nodes. > Privately another operator reached out with the same issue in the > scale of tens of thousands of bare metal nodes. As such, this became a > topic during the PTG which gained further discussion. I’ll cover that > later. > > Ironic – Snapshots? > =============== > > As a result of some public discussion of adding snapshot capability, I > proposed a forum session to discuss the topic[1] such that > requirements can be identified and the discussion can continue over > the next cycle. > I didn't expect the number of attendees present to swell from the > operator's feedback session. The discussion of requirements went back > and forth to ultimately define "what is a snapshot" in this case, and > "what should Ironic do?" > > There was quite a bit of interaction in this session and the consensus > seemed to be the following: > * Don’t make it reliant on nova, for standalone users may want/need to use it. > * This could be a very powerful feature as an operator could ``adopt`` > a machine into ironic and then ``snapshot`` it to capture the disk > contents. > * Block level only and we can’t forget about capturing/storing content checksum > * Capture the machine’s contents with the same expectation as we would > have for a VM, and upload this to someplace. > > In order to make this happen in a fashion which will scale, the ironic > team will likely need to leverage the application credentials. > > Ironically reeling in large bare metal deployment without PXE > ============================================== > > This was a talk submitted by Ilya Etingof, who unfortunately was > unable to make it to the summit. Special thanks goes to Both Ilya and > Richard Pioso for working together to make this demonstration happen. > The idea was to demonstrate where the ironic team sees the future of > deployment of machines on the edge using virtual media and how vendors > would likely interact with that in some cases as slightly different > mechanics may be required even if the BMCs all speak Redfish, which is > the case for a Dell iDRAC BMC. > > The idea[2] ultimately being is that the conductor would inject the > configuration information into the virtual media ISO image that is > attached via virtual media negating the need for DHCP. We have videos > posted that allow those interested to see what this functionality > looks like with neutron[3] and without neutron[4]. > > While the large audience was impressed, it seemed to be a general > surprise that Ironic had virtual media support in some of the drivers > previously. This talk spurred quite a bit of conversation and hallway > track style discussion after the presentation concluded which is > always an excellent sign. > > Project Teams Gathering > =================== > > The ironic community PTG attendance was nothing short of excellent. > Thank you everyone who attended! At one point we had fifteen people > and a chair had to be pulled up to our table for a 16th person to join > us. At which point, we may have captured another table and created > confusion. > > We did things a little differently this time around. Given some of the > unknowns, we did not create a strict schedule around the topics. We > simply went through and prioritized topics and tried to discuss them > each as thoroughly as possible until we had reached the conclusion or > a consensus on the topic. > > Topics and a few words on each topic we discussed in the notes section > on the PTG etherpad[5]. > > On-boarding > ----------------- > > We had three contributors that attended a fairly brief on-boarding > overview of Ironic. Two of them were more developer focused where as > the third was more of an operator focus looking to leverage ironic and > see how they can contribute back to the community. > > BareMetal SIG - Next Steps > ------------------------------------- > > Arne Wiebalck and I both provided an update including current > conversations where we saw the SIG, the Logo Program, the white paper, > and what should the SIG do beyond the whitepaper. > > To start with the Logo program, it largely seems there that somewhere > along the way a message or document got lost and that largely impacted > the Logo Program -> SIG feedback mechanism. I’m working with the > OpenStack Foundation to fix that and get communication going again. > Largely what spurred that was that some vendors expressed interest in > joining, and wanted additional information. > > As for the white paper, contributions are welcome and progress is > being made again. > > From a next steps standpoint, the question was raised how do we build > up an improved Operator point of contact. There was some consensus > that we as a community should try to encourage at least one > contributor to attend the operations mid-cycles. This allows for a > somewhat shorter feedback look with a different audience. > > We also discussed knowledge sharing, or how to improve it. Included > with this is how do we share best practices. I’ve got the question out > to folks at the foundation if there is a better way as part of the > Logo program, or if we should just use the Wiki. I think this will be > an open discussion topic in the coming weeks. > > The final question that came up as part of the SIG is how to show > activity. I reached out to Amy on the UC regarding this, and it seems > the process is largely just reach out to the current leaders of the > SIG, so it is critical that we keep that up to date moving forward. > > Sensor Data/Metrics > --------------------------- > > The barrier between Tenant level information and Operator level > information is difficult with this topic. > > The consensus among the group was that the capability to collect some > level of OOB sensor data should be present on all drivers, but there > is also a recognition that this comes at a cost and possible > performance impact. Mainly this performance impact question was raised > with Redfish because this data is scattered around the API where > multiple API calls are required, and may even cause some interruption > to actively inquire upon some data points. > > The middle ground in the discussion came to adding a capability of > somehow saying “collect power status, temp every minute, fan speeds > every five minutes, drive/cpu health data maybe every 30 minutes”. I > would be remiss if I didn't note that there was joking about how this > would in essence be re-implementation of Cron. What this would end up > looking like, we don’t know, but it would provide operators the data > resolution necessary for the failure risk/impact. The analogy used was > that “If the temperature sensor has risen to an alarm level, either a > AC failure or a thermal hot spot forming based upon load in the data > center, checking the sensor too often is just not going to result in a > human investigating that on the data center floor any faster.” > > Mainly I believe this discussion largely stresses that the information > is for the operator of the bare metal and not to provide insight into > a tenant monitoring system, that those activities should largely be > done with-in the operating system. > > One question among the group was if anyone was using the metrics > framework built into ironic already for metrics of ironic itself, to > see if we can re-use it. Well, it uses a plugin interface! In any > event, I've sent a post to the openstack-discuss mailing list seeking > usage information. > > > Node Retirement > ----------------------- > > This is a returning discussion from the last PTG, and in discussing > the topic we figured out where the discussion became derailed at > previously. In essence, the desire was to mix this with the concept > of being able to take a node “out of service”. Except, taking a node > out of service is an immediate state related flag, where as retiring > might be as soon as the current tenant vacates the machine… possibly > in three to six months. > > In other words, one is “do something or nothing now”, and the other is > “do something later when a particular state boundary is crossed”. > Trying to make one solution for both, doesn’t exactly work. > > Unanimous consensus among those present was that in order to provide > node retirement functionality, that the logic should be similar to > maintenance/maintenance reason. A top level field in the node object > that would allow API queries for nodes slated for retirement, which > helps solve an operator workflow conundrum “How do I know what is > slated for retirement but not yet vacated?” > > Going back to the “out of service” discussion, we reached consensus > that this was in essence a “user declarable failed state”, and as such > that it should be done only in the state machine as it is in the > present, not a future action. Should we implement out of service, > we’ll need to check the nova.virt.ironic code and related virt code to > properly handle nodes dropping from `ACTIVE` state, which could also > be problematic and need to be API version guarded to prevent machines > from accidentally entering `ERROR` state if they are not automatically > recovered in nova. > > Multi-tenancy > ------------------ > > Lots of interest existed around making the API somewhat of a > mutli-tenant aware interaction, and the exact interactions and uses > involved there are not exactly clear. What IS clear is that providing > functionality as such will allow operators to remove complication in > their resource classes and tenant specific flavors which is presently > being used to enable tenant specific hardware pools. The added benefit > of providing some level for normally non-admin users to access the > ironic API is that it would allow those tenants to have a clear > understanding of their used resources and available resources by > directly asking ironic, where as presently, they don’t have a good way > to collect nor understand that short of asking the cloud operator when > it comes to bare metal. Initial work has been posted for this to > gerrit[6]. > > In terms of how tenants resources would be shared, there was consensus > that the community should stress that new special use tenants should > be created for collaborative efforts. > > There was some discussion regarding explicitly dropping fields for > non-privileged users that can see the nodes, such as driver_info and > possibly even driver_internal_info. Definitely a topic that requires > more discussion, but that would solve operator reporting and use > headaches. > > Manual Cleaning Out-Of-Band > ---------------------------------------- > > The point was raised that we unconditionally start the agent ramdisk > to perform manual cleaning. Except, we should support a method of out > of band cleaning operators to only be executed so the bare metal node > doesn’t need to be booted to a ramdisk. > > The consensus seemed to be that we should consider a decorator or > existing decorator change that allows the conductor to hold off > actually powering the node on for ramdisk boot unless or until a step > is reached that is not purely out of band. > > In essence, fixing this allows a “fix_bmc” out of band clean step to > be executed first without trying to modify BMC settings, which would > presently fail. > > Scale issues > ----------------- > > A number of scaling issues between how nova and ironic interact, > specifically with the resource tracker and how inventory is updated > from ironic and loaded into nova. Largely this issue revolves around > the concept in nova that each ``nova-compute`` is a hypervisor. And > while one can run multiple ``nova-compute`` processes to serve as the > connection to ironic, the underlying lock in Nova is at the level of > the compute node, not the node level. This means as thousands of > records are downloaded, synced, copied into the resource tracker, the > compute process is essentially blocked from other actions while this > serialized job runs. > > In a typical VM case, you may only have at most a couple hundred VMs > on a hypervisor, where as with bare metal, we’re potentially servicing > thousands of physical machines. > > It should be noted that there are several large scale operators that > indicated during the PTG that this was their pain point. Some of the > contributors from CERN sat down with us and the nova team to try and > hammer out a solution to this issue. A summary of that cross project > session can be found at line 212 in the PTG etherpad[0]. > > But there is another pain point that contributes to this performance > issue and that is the speed at which records are returned by our API. > We’ve had some operators voice some frustration with this before, and > we should at least be mindful of this and hopefully see if we can > improve record retrieval performance. In addition to this, if we > supported some form of bulk “GET” of nodes, it might be able to be > leveraged as opposed to a get on each node one at a time which is > presently what occurs in the nova-compute process. > > Boot Mode Config > ------------------------ > > Previously, when scheduling occurred with flavors and filters were > appropriately set, if a machine was declared as supporting only one > boot mode, requests would only ever land on that node. Now with > Traits, this is a bit different and unfortunately optional without > logic to really guard the setting application for an instance. > > So in this case, if filters are such that a request for a Legacy boot > instance lands on a UEFI only machine, we’ll still try to deploy it. > In reality, we really should try and fail fast. > > Ideally the solution here is we consult with the BMC through some sort > of get_supported_boot_modes method, and if we determine a mismatch > between what the settings are or what the requested instance is from > the data we have, we fail the deploy. > > This ultimately may require work in the nova.virt.ironic driver code > to identify the cause of the failure being an invalid configuration > and reporting that back, however it may not be fatal on another > machine. > > Security of /heartbeat and /lookup endpoints > ----------------------------------------------------------- > > We had a discussion of adding some additional layers of security > mechanics around the /heartbeat and /lookup endpoints in ironic’s REST > API. These limited endpoints are documented as being unauthenticated, > so naturally some issues can arise from these and we want to minimize > the vectors in which an attacker that has gained access to a > cleaning/provisioning/rescue network could possibly impersonate a > running ironic-python-agent. Conversely, the ironic-python-agent runs > in a similar fashion, intended to run on secure trusted networks which > is only accessible to the ironic-conductor. As such, we also want to > add some validation to the API request is from the same Ironic > deployment that IPA is heart-beating to. > > The solution to this introduce a limited lifetime token that is unique > per node per deployment. It would be stored in RAM on the agent, and > in the node.driver_internal_info so it is available to the conductor. > It would be provided only once via out of band OR via the first > “lookup” of a node, and then only become accessible again during known > reboot steps. > > Conceptually the introduction of tokens was well supported in the > discussions and there were zero objections to doing so. Some initial > patches[7][8] are under development to move this forward. > > An additional item is to add IP address filtering capabilities to both > endpoints such that we only process the heartbeat/lookup address if we > know it came from the correct IP address. An operator has written this > feature downstream and consensus was unanimous at the PTG that we > should accept this feature upstream. We should expect a patch for this > functionality to be posted soon. > > Persistent Agents > ------------------------ > > The use case behind persistent agents is “I want to kexec my way to > the agent ramdisk, or the next operating system.” and “I want to have > up to date inspection data.” We’ve already somewhat solved the latter, > but the former is a harder problem requiring the previously mentioned > endpoint security enhancements to be in-place first. There is some > interest from CERN and some other large scale operators. > > In other words, we should expect more of this from an bare metal fleet > operations point of view for some environments as we move forward. > > “Managing hardware the Ironic way” > ------------------------------------------------- > > The question that spurred this discussion was “How do I provide a way > for my hardware manager to know what it might need to do by default.” > Except, those defaults may differ between racks that serve different > purposes. “Rack 1, node0” may need a port set to FiberChannel mode, > where as “Rack2, node1” may require it to be Ethernet. > > This quickly also reaches the discussion of “What if I need different > firmware versions by default?” > > This topic quickly evolved from there and the idea that surfaced was > that we introduce a new field on the node object for the storage of > such data. Something like ``node.default_config``, where it would be a > dictionary sort of like what a user provides for cleaning steps or > deploy steps, that provides argument values which is iterated through > when in automated cleaning mode to allow operators to fill in > configuration requirement gaps for hardware managers. > > Interestingly enough, even today we just had someone ask a similar > question in IRC. > > This should ultimately be usable to assert desired/default firmware > from an administrative point of view. Adrianc (Mellanox) is going to > reach out to bdobb (DMTF) regarding the redfish PLDM firmware update > interface to see where this may go from here. > > Edge computing working group session > ---------------------------------------------------- > > The edge working group largely became a session to update everyone on > where Ironic was going and where we see things going in terms of > managing bare metal at the edge/far-edge. This included some in-depth > questions about dhcp-less deployment and related mechanics as well as > HTTPBoot’ing machines. > > Supporting HTTPBoot does definitely seem to be of interest to a number > of people, although at least after sharing my context only five or six > people in attendance really seemed interested in ironic prioritizing > such functionality. The primary blocker, for those that are unaware, > is pre-built UEFI images for us to do integration testing for IPv4 > HTTPBoot. Functionally ironic already supports IPv6 HTTPBoot via > DHCPv6 as part of our IPv6 support with PXE/iPXE, however we also > don’t have an integration test job for this code path for the same > reason, pre-built UEFI firmware images lack the built-in support. > > More minor PTG topics > ------------------------------- > > * Smartnics - A desire to attach virtual ports in ironic baremetal > nodes with smartnics was raised. Seems that we don’t need to try and > create a port entry in ironic. It seems we only need to track/signal > and remove the “vif” attachment” to the node in general as there is no > physical mac required for that virtual port in ironic. The constraint > that at least one MAC address would be required to identify the > machine is understood. If anyone sees an issue with this, please raise > this to adrianc. > * Metal^3 - Within the group attending the PTG, there was not much > interest in Metal^3 or using CRDs to manage bare metal resources with > ironic hidden behind the CRD. One factor related to this is the desire > to define more data to be passed through to ironic which is not > presently supported in the CRD definition.. > > Stable Backports with Ironic's release model > ================================== > > I was pulled into a discussion with the TC and the Stable team > regarding frustrations that have been expressed with-in the ironic > team regarding stable back-porting of fixes, mainly drivers. There is > consensus that it is okay for us as the ironic team to backport > drivery things when needed to support vendors as long as they are not > breaking or overall behavior contracts. This quickly leads us to > needing to also modify constraints for drivery things as well. > Constraints changes will continue to be evaluated on a case by case > basis, but the general consensus is there is full support to "do the > right thing" for ironic's users, vendors, and community. The key is > making sure we are on the same page and agreeing to what that right > thing is. This is where asynchronous communication can get us into > trouble, and I would highly encourage trying to start higher bandwidth > discussion when these cases arise in the future. The key takeaway that > we should likely keep in mind is policy is there for good reasons, but > policy is not and can not be a crutch to prevent the right thing from > being done. > > Additional items worth noting - Q1 Gatherings > =================================== > > There will be an operations mid-cycle at Bloomberg in London, January > 7th-8th, 2020. It would be good if at least one ironic contributor > could attend as the operators group tends to be closer to the physical > baremetal, and it is a good chance to build mutual context between > developers and operations people actually using our software. > > Additionally, we want to gauge the interest of having an ironic > mid-cycle in central Europe in Q1 of 2020. We need to identify the > number of contributors that would be interested in and able to attend > since the next PTG will be in June. Please email me off-list if your > interested in attending and I'll make a note of it as we're still > having initial discussions. > > > And now I've reached a buffer under-run on words. If there are any > questions, just reply to the list. > > -Julia > > Links: > > [0]: https://etherpad.openstack.org/p/PVG-ironic-operator-feedback > [1]: https://etherpad.openstack.org/p/PVG-ironic-snapshot-support > [2]: https://review.opendev.org/#/c/672780/ [3] https://drive.google.com/file/d/1_PaPM5FvCyM6jkACADwQtDeoJkfuZcAs/view?usp=sharing [4] https://drive.google.com/file/d/1YUFmwblLbJ9uJgW6Rkf6pkW8ouU-PYFK/view?usp=sharing > [5]: https://etherpad.openstack.org/p/PVG-Ironic-Planning > [6]: https://review.opendev.org/#/c/689551/ > [7]: https://review.opendev.org/692609 > [8]: https://review.opendev.org/692614 > [9]: https://etherpad.openstack.org/p/ops-meetup-1st-2020 > [10]: https://review.opendev.org/#/q/topic:story/2006403+(status:open+OR+status:merged) From melwittt at gmail.com Wed Nov 13 21:21:08 2019 From: melwittt at gmail.com (melanie witt) Date: Wed, 13 Nov 2019 13:21:08 -0800 Subject: [nova] track error migrations and orphans in Resource Tracker In-Reply-To: References: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> Message-ID: <5e082c92-b05c-e0d9-a418-ec7120331fe3@gmail.com> On 11/13/19 11:53, Matt Riedemann wrote: > On 11/13/2019 1:43 PM, melanie witt wrote: >> This is because the "get instance list by host" can miss instances >> that are mid-migration, because of how/where we update the >> instance.host field. > > Why not just filter out any instances that have a non-None task_state? > Or barring that, filter out any instances that have an in-progress > migration (there is a method that the ResourceTracker uses to get those > kinds of migrations occurring either as incoming to or outgoing from the > host). Yeah, an earlier version of the patch was trying to do that: https://review.opendev.org/#/c/627765/36/nova/compute/manager.py at 8455 but it was not a complete list of all the possible migrating intermediate states. We didn't know about the method the resource tracker is already using for the same purpose, that we could re-use. After some confusion on my part, we removed the task_state checks and now I see we need to put them back. I'll find the RT method and comment on the review. Thanks for mentioning that. -melanie From Albert.Braden at synopsys.com Wed Nov 13 21:23:05 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Wed, 13 Nov 2019 21:23:05 +0000 Subject: Filter costs / filter order In-Reply-To: <24b8fe814dd497bb6e39a255fefcea24a44bb518.camel@redhat.com> References: <24b8fe814dd497bb6e39a255fefcea24a44bb518.camel@redhat.com> Message-ID: This is very helpful, thank you! Does anyone have a "filter order" document that they are willing to share, or documentation on how you decide filter order? -----Original Message----- From: Sean Mooney Sent: Tuesday, November 12, 2019 1:46 PM To: Albert Braden ; openstack-discuss at lists.openstack.org Subject: Re: Filter costs / filter order On Tue, 2019-11-12 at 20:30 +0000, Albert Braden wrote: > I'm running Rocky and trying to figure out filter order. I'm reading this doc: > https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_nova_rocky_user_filter-2Dscheduler.html&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=m0wl70cY9eaA6LFc5XBoTcth0vOUW424EfBg5nqVuOQ&s=TulvynR6iVICFMuzLG7D9NnLLGC2cgMj9d0h6KZI1J0&e= > > It says: > > Each filter selects hosts in a different way and has different costs. The order of filter_scheduler.enabled_filters > affects scheduling performance. The general suggestion is to filter out invalid hosts as soon as possible to avoid > unnecessary costs. We can sort filter_scheduler.enabled_filters items by their costs in reverse order. For example, > ComputeFilter is better before any resource calculating filters like RamFilter, CoreFilter. > > Is there a document that specifies filter costs, or ranks filters by cost? Is there a well-known process for > determining the optimal filter order? im not a aware of a specific document that cover it but this will very based on deployment. as a general guideline you should order your filter by which ones elmiate the most hosts. so the AvailabilityZoneFilter should generally be first. in older release the retry filter shoudl go first. the numa toplogy filter and pci passthough filter are kind fo expensive. so they are better to have near the end. so i would start with the Aggreaget* filters first folowed by "cheap" filter that dont have any complex boolean logic so SameHostFilter, DifferentHostFilter, IoOpsFilter, NumInstancesFilter there are a few others the the more complex filters like numa toplogy, pci passthogh, ComputeCapabilitiesFilter, JsonFilter effectivly what you want to do is maxius the infomation gain at each filtering step will miniusing the cost(reducing the possible host with as few cpu cycles as posible) its important to only enable the filter that matter to your deployment also but if we had a perfect costing for each filter then you could follow the ID3 algorithm to get an optimal layout. https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_ID3-5Falgorithm&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=m0wl70cY9eaA6LFc5XBoTcth0vOUW424EfBg5nqVuOQ&s=0dTE5QPOFLn9yT2vwDZNqaF5RXbLMjtSTM90MjI2fZc&e= i have wanted to experiment with tracing the boot requests on large public clould and model this for some time but i always endup finding other things to thinker with instead but i think even with out that data to work with you could do some intersting things with code complexity metricts as a proxy to try and auto sort them. perhaps some of the operator can share what they do i know cern pre placement used to map tenant to cells as there first filtering step which signifcatly helped them with scale but if the goal is speed then you need to have each step give you the maxium infomation gain for the minium addtional cost. that is why the aggreate filters and multi host filters like affintiy filters tend to be better at the start of the list and very detailed filters like the numa topolgy filter then to be better at the end. From mriedemos at gmail.com Wed Nov 13 22:54:10 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 16:54:10 -0600 Subject: [nova] track error migrations and orphans in Resource Tracker In-Reply-To: <5e082c92-b05c-e0d9-a418-ec7120331fe3@gmail.com> References: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> <5e082c92-b05c-e0d9-a418-ec7120331fe3@gmail.com> Message-ID: <870543fc-782a-2249-a7a1-37329388d6a7@gmail.com> On 11/13/2019 3:21 PM, melanie witt wrote: > I'll find the RT method and comment on the review. https://github.com/openstack/nova/blob/1c7a3d59080e5de50615bd2408b10d372ec30861/nova/compute/resource_tracker.py#L935 -- Thanks, Matt From openstack at nemebean.com Wed Nov 13 23:56:07 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 13 Nov 2019 17:56:07 -0600 Subject: [oslo] Adoption of microversion-parse In-Reply-To: <30c14806-6c43-a2be-9612-0a54de6c4323@openstack.org> References: <7d239dce-77c5-0e23-e1cb-57785e241b07@openstack.org> <72c50e25-0a26-b3d8-5f3a-64c48570824d@nemebean.com> <30c14806-6c43-a2be-9612-0a54de6c4323@openstack.org> Message-ID: On 10/21/19 9:14 AM, Thierry Carrez wrote: > Thierry Carrez wrote: >> [...] >> I'll propose the project addition so you can all vote directly on it :) > > https://review.opendev.org/#/c/689754/ > This has merged, but I still don't have access to the core group for the library. Is this the point where we need to get infra involved or are there other steps needed to make this official first? From cboylan at sapwetik.org Thu Nov 14 00:20:03 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Wed, 13 Nov 2019 16:20:03 -0800 Subject: [oslo] Adoption of microversion-parse In-Reply-To: References: <7d239dce-77c5-0e23-e1cb-57785e241b07@openstack.org> <72c50e25-0a26-b3d8-5f3a-64c48570824d@nemebean.com> <30c14806-6c43-a2be-9612-0a54de6c4323@openstack.org> Message-ID: <16d88443-8d02-4c90-b3af-b0b143fb6348@www.fastmail.com> On Wed, Nov 13, 2019, at 3:56 PM, Ben Nemec wrote: > > > On 10/21/19 9:14 AM, Thierry Carrez wrote: > > Thierry Carrez wrote: > >> [...] > >> I'll propose the project addition so you can all vote directly on it :) > > > > https://review.opendev.org/#/c/689754/ > > > > This has merged, but I still don't have access to the core group for the > library. Is this the point where we need to get infra involved or are > there other steps needed to make this official first? > > Ideally the existing cores would simply add you as the method of checks and balances here. Any current member can manage the member list as well as a Gerrit admin. Once you've been added by the existing core group you'll be able to add any others (like oslo-core). You can find the existing group members here: https://review.opendev.org/#/admin/groups/1345,members If for some reason this voluntary hand over doesn't work then the infra team's gerrit admins can get involved, but the ideal is that existing core members would do it themselves to ack the handover. Clark From cdent+os at anticdent.org Thu Nov 14 00:29:49 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Thu, 14 Nov 2019 00:29:49 +0000 (GMT) Subject: [oslo] Adoption of microversion-parse In-Reply-To: <16d88443-8d02-4c90-b3af-b0b143fb6348@www.fastmail.com> References: <7d239dce-77c5-0e23-e1cb-57785e241b07@openstack.org> <72c50e25-0a26-b3d8-5f3a-64c48570824d@nemebean.com> <30c14806-6c43-a2be-9612-0a54de6c4323@openstack.org> <16d88443-8d02-4c90-b3af-b0b143fb6348@www.fastmail.com> Message-ID: On Wed, 13 Nov 2019, Clark Boylan wrote: > On Wed, Nov 13, 2019, at 3:56 PM, Ben Nemec wrote: >> >> >> On 10/21/19 9:14 AM, Thierry Carrez wrote: >>> Thierry Carrez wrote: >>>> [...] >>>> I'll propose the project addition so you can all vote directly on it :) >>> >>> https://review.opendev.org/#/c/689754/ >>> >> >> This has merged, but I still don't have access to the core group for the >> library. Is this the point where we need to get infra involved or are >> there other steps needed to make this official first? >> >> > > Ideally the existing cores would simply add you as the method of checks and balances here. Any current member can manage the member list as well as a Gerrit admin. Once you've been added by the existing core group you'll be able to add any others (like oslo-core). I've added oslo-core. I've been somewhat out of touch, so forgot about this step. (Note, it appears that oslo-core is way out of date...) -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From johnsomor at gmail.com Thu Nov 14 01:20:48 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Wed, 13 Nov 2019 17:20:48 -0800 Subject: [oslo] Adding Michael Johnson as Taskflow core In-Reply-To: <0558ec80-5894-7961-f2ac-3de502f90fe4@nemebean.com> References: <0558ec80-5894-7961-f2ac-3de502f90fe4@nemebean.com> Message-ID: Thank you Ben, happy to help! Michael On Wed, Nov 13, 2019 at 8:18 AM Ben Nemec wrote: > > Hi, > > After discussion with the Oslo team, we (and he) have agreed to add > Michael as a Taskflow core. He's done more work on the project than > anyone else still active in Oslo and also works on a project that > consumes it so he likely understands it better than anyone else at this > point. > > Welcome Michael and thanks for your contributions! > > -Ben > From cp769u at att.com Wed Nov 13 23:07:17 2019 From: cp769u at att.com (PARSONS, CLIFF) Date: Wed, 13 Nov 2019 23:07:17 +0000 Subject: Keystone user ID case sensitivity issue Message-ID: <52A47B23EC4EC94C8EE6BFE332D37E224EBFCB3F@MOSTLS1MSGUSRFB.ITServices.sbc.com> Hello everyone! My organization has a need to make the user name/ID retrieval from Heat template to be case insensitive. For example: suppose we already have a user in keystone, "xyz123". Then we have a client that creates a heat stack containing a UserRoleAssignment resource, in which the user was specified as "XYZ123". The user would not be found in the Keystone database (due to Keystone user IDs being case sensitive) and the role assignment would not occur. Either Keystone could be changed so that its users are treated case insensitive, or we could make the change to heat (Heat KeystoneClientPlugin class) like in https://review.opendev.org/#/c/694117/ so that it converts to lower case before querying keystone. Can I get some thoughts on this? Would something like this be acceptable at all? Would we need to make it configurable, and if we did, would that be acceptable? Thanks in advance for your thoughts/concerns/suggestions. Thank you, Cliff Parsons -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Nov 14 01:57:01 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 14 Nov 2019 01:57:01 +0000 Subject: Keystone user ID case sensitivity issue In-Reply-To: <52A47B23EC4EC94C8EE6BFE332D37E224EBFCB3F@MOSTLS1MSGUSRFB.ITServices.sbc.com> References: <52A47B23EC4EC94C8EE6BFE332D37E224EBFCB3F@MOSTLS1MSGUSRFB.ITServices.sbc.com> Message-ID: On Wed, 2019-11-13 at 23:07 +0000, PARSONS, CLIFF wrote: > Hello everyone! > > My organization has a need to make the user name/ID retrieval from Heat template to be case insensitive. For example: > suppose we already have a user in keystone, "xyz123". Then we have a client that creates a heat stack containing a > UserRoleAssignment resource, in which the user was specified as "XYZ123". The user would not be found in the Keystone > database (due to Keystone user IDs being case sensitive) and the role assignment would not occur. > > Either Keystone could be changed so that its users are treated case insensitive, or we could make the change to heat > (Heat KeystoneClientPlugin class) like in https://review.opendev.org/#/c/694117/ so that it converts to lower case > before querying keystone. i honestly dont think we shoudl force everyone to use case insensitive user names so i dont think converting to lower case is valid. however it might we worth exploring if you could change the encoding of the database so that it uses the case insensitive by using the utf8_general_ci encodeing so that all db opertion are case insensitive on the user tabel. > Can I get some thoughts on this? Would something like this be acceptable at all? Would we need to make it > configurable, and if we did, would that be acceptable? i think chaing api behavior based on a config option is an interoperablity probelm keystone has to interact with external identity systesm and so assuming all of those will be case inseitive would proably break someone else who has the opisite requirement. i honestly think that people should just use the correct case in the heat template. if heat is not currently erroring out when the role assignment failts that feels like a heat bug but i would personlly think its an error if i type my user name with the wrong case and my correct passwourd and was able to get a keystone token. > > Thanks in advance for your thoughts/concerns/suggestions. > > Thank you, > Cliff Parsons From luyao.zhong at intel.com Thu Nov 14 02:33:18 2019 From: luyao.zhong at intel.com (Luyao Zhong) Date: Thu, 14 Nov 2019 10:33:18 +0800 Subject: [nova] track error migrations and orphans in Resource Tracker In-Reply-To: References: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> Message-ID: <05502840-fc3f-4bca-89bd-a18db3a5ad80@intel.com> On 2019/11/14 上午3:43, melanie witt wrote: > On 11/12/19 05:18, Sean Mooney wrote: >> On Tue, 2019-11-12 at 05:46 +0000, Zhong, Luyao wrote: >>> Hi Nova experts, >>> >>> "Not tracking error migrations and orphans in RT." is probably a bug. >>> This may trigger some problems in >>> update_available_resources in RT at the moment. That is some orphans >>> or error migrations are using cpus/memory/disk >>> etc, but we don't take these usage into consideration. And >>> instance.resources is introduced from Train used to contain >>> specific resources, we also track assigned specific resources in RT >>> based on tracked migrations and instances. So this >>> bug will also affect the specific resources tracking. >>> >>> I draft an doc to clarify this bug and possible solutions: >>> https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT >>> Looking forward to suggestions from you. Thanks in advance. >>> >> there are patche up to allow cleaning up orpahn instances >> https://review.opendev.org/#/c/627765/ >> https://review.opendev.org/#/c/648912/ >> if we can get those merged that woudl adress at least some of the >> proablem > > I just wanted to mention: > > I have reviewed the cleanup patches ^ multiple times and I'm having a > hard time getting past the fact that any way you slice it (AFAICT), the > cleanup code will have a window where a valid guest could be destroyed > erroneously (not an orphan). This is because the "get instance list by > host" can miss instances that are mid-migration, because of how/where we > update the instance.host field. > > Maybe this ^ could be acceptable (?) if we put a big fat warning on the > config option help for 'reap_unknown'. But I was unsure of the answers > about what recovery looks like in case a guest is erroneously destroyed > for an instance that is in the middle of migrating. In the case of > resize or cold migrate, a hard reboot would fix it AFAIK. What about for > a live migration? If recovery is possible in every case, those would > also need to be documented in the config option help for 'reap_unknown'. > > The patch has lots of complexities to think about and I'm left wondering > if the pitfalls are better or worse than the current state. It would > help if others joined in the review with their thoughts about it. > > -melanie Hi Sean Mooney and melanir, thanks for mentioning. This ^ is for cleanup orphans. For imcomplete migations, you prefer not destroying them, right? I'm not sure about it either. But I gave a possible solution on the etherpad (set instance.host and apply/revert migration context and then invoke cleanup_running_deleted_instances to cleanup the instance). And before cleanup done, we need track these instances/migrations in RT, need more people join our discussion. Welcome put your suggestion on the etherpad. https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT. Thanks in advance. BR, Luyao From Tushar.Patil at nttdata.com Thu Nov 14 02:58:49 2019 From: Tushar.Patil at nttdata.com (Patil, Tushar) Date: Thu, 14 Nov 2019 02:58:49 +0000 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: References: <1573200108.23158.4@est.tech> <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> <1573654932.26082.3@est.tech> , Message-ID: On 11/13/2019 8:34 AM, Sylvain Bauza wrote: >> Me too. To be clear, I don't think operators would modify the above but >> if so, they would need reshapes. > Maybe not, but this is the kind of detail that should be in the spec and > functional tests to make sure it's solid since this is a big > architectural change in nova. It depends on how the aggregates are created on the nova and placement side. A) From placement point of view, operator can create a new aggregate and add shared storage RP to it (tag MISC_SHARES_VIA_AGGREGATE trait to this RP). The newly created valid UUID would then be set in the config option ``sharing_disk_aggregate`` on the compute node side. This aggregate UUID wouldn't be present in the nova aggregate. so it's not possible to add host to the nova aggregate unless a new aggregate is created on nova side. B) If nova aggregates are synced to the placement service and say below is the picture: Nova: Agg1 - metadata (pinned=True) - host1 - host2 Now, operator adds a new shared storage RP to Agg1 on placement side and then set Agg1 UUID in ``sharing_disk_aggregate`` on compute nodes along with ``using_shared_disk_provider`=True``, then it would add compute node RP to the Agg1 on the placement without any issues but when you want to reverse the configuration, using_shared_disk_provider=False, then it not that straight to remove the host from the placement/nova aggregate because there would be other traits set to compute RPs which could cause those functions stop working. We had same kind of discussion [1] when implementing forbidden aggregates where we want to sync traits set to the aggregates but later it was concluded that operator will do it manually. I will include the details Matt has pointed out in this email in my next patchset. [1] : http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006950.html Regards, tpatil ________________________________________ From: Matt Riedemann Sent: Wednesday, November 13, 2019 11:41 PM To: openstack-discuss at lists.openstack.org Subject: Re: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship On 11/13/2019 8:34 AM, Sylvain Bauza wrote: > Me too. To be clear, I don't think operators would modify the above but > if so, they would need reshapes. Maybe not, but this is the kind of detail that should be in the spec and functional tests to make sure it's solid since this is a big architectural change in nova. -- Thanks, Matt Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. From ramishra at redhat.com Thu Nov 14 03:24:05 2019 From: ramishra at redhat.com (Rabi Mishra) Date: Thu, 14 Nov 2019 08:54:05 +0530 Subject: Keystone user ID case sensitivity issue In-Reply-To: References: <52A47B23EC4EC94C8EE6BFE332D37E224EBFCB3F@MOSTLS1MSGUSRFB.ITServices.sbc.com> Message-ID: On Thu, Nov 14, 2019 at 7:31 AM Sean Mooney wrote: > On Wed, 2019-11-13 at 23:07 +0000, PARSONS, CLIFF wrote: > > Hello everyone! > > > > My organization has a need to make the user name/ID retrieval from Heat > template to be case insensitive. For example: > > suppose we already have a user in keystone, "xyz123". Then we have a > client that creates a heat stack containing a > > UserRoleAssignment resource, in which the user was specified as > "XYZ123". The user would not be found in the Keystone > > database (due to Keystone user IDs being case sensitive) and the role > assignment would not occur. > > > > Either Keystone could be changed so that its users are treated case > insensitive, or we could make the change to heat > > (Heat KeystoneClientPlugin class) like in > https://review.opendev.org/#/c/694117/ so that it converts to lower case > > before querying keystone. > i honestly dont think we shoudl force everyone to use case insensitive > user names so i dont think converting to lower > case is valid. however it might we worth exploring if you could change the > encoding of the database so that it uses the > case insensitive by using the utf8_general_ci encodeing so that all db > opertion are case insensitive on the user tabel. > > Can I get some thoughts on this? Would something like this be > acceptable at all? Would we need to make it > > configurable, and if we did, would that be acceptable? > i think chaing api behavior based on a config option is an interoperablity > probelm > > keystone has to interact with external identity systesm and so assuming > all of those will be case inseitive would > proably break someone else who has the opisite requirement. > > i honestly think that people should just use the correct case in the heat > template. > if heat is not currently erroring out when the role assignment failts that > feels like a heat bug There is no heat bug. Heat would fail if the user does not exist and it does not override any service behaviour in the default client plugins. However, heat allows to write your own custom client plugin for keystone (if that's what you want), which overrides the behavior and use it in place of the default plugin. > but i would > personlly think its an error if i type my user name with the wrong case > and my correct passwourd and was able > to get a keystone token. > > > > Thanks in advance for your thoughts/concerns/suggestions. > > > > Thank you, > > Cliff Parsons > > > -- Regards, Rabi Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Thu Nov 14 05:39:25 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Thu, 14 Nov 2019 14:39:25 +0900 Subject: [horizon] next weekly meeting cancelled Message-ID: Hi, The weekly team meeting next week (Nov 20) is cancelled. I will be on a business trip to join a conference in US and cannot run it. We agreed to cancel it in the team meeting yesterday. Akihiro Motoki (amotoki) From amotoki at gmail.com Thu Nov 14 05:59:34 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Thu, 14 Nov 2019 14:59:34 +0900 Subject: [neutron][docs] networking-onos EOL? In-Reply-To: References: Message-ID: Hi, networking-onos project was under the neutron team governance, but it was retired in Oct 2016 [4][5]. Regarding the 'latest' documentation, there is no clear guideline on cleaning up "docs.o.o/latest/foo" when a repository is retried. I think that is the only reason we can still see docs.o.o/latest/networking-onos. Only projects under TC governance can publish documentation under docs.o.o, so I thnk we need a cleanup when a repository retirement. Thanks, Akihiro Motoki (amotoki) [4] https://review.opendev.org/#/c/383911/ (neutron team decision) [5] https://review.opendev.org/#/c/392010/ (governance change) On Mon, Nov 4, 2019 at 7:12 PM Mark Goddard wrote: > > Hi, > > We (kolla) had a bug report [1] from someone trying to use the neutron > onos_ml2 ML2 driver for the ONOS SDN controller. As far as I can tell > [2], this project hasn't been released since 2015. However, the > 'latest' documentation is still accessible [3], and does not mention > that the project is dead. What can we do to help steer people away > from projects like this? > > Cheers, > Mark > > [1] https://bugs.launchpad.net/bugs/1850763 > [2] https://pypi.org/project/networking-onos/#history > [3] https://docs.openstack.org/networking-onos/latest/ > From arnaud.morin at gmail.com Thu Nov 14 07:10:51 2019 From: arnaud.morin at gmail.com (Arnaud MORIN) Date: Thu, 14 Nov 2019 08:10:51 +0100 Subject: [sig] Forming a Large scale SIG In-Reply-To: <47730534-09AF-4018-A2AE-670D4725E569@telfer.org> References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> <47730534-09AF-4018-A2AE-670D4725E569@telfer.org> Message-ID: Hi all, +1 for me and my employer (OVH). We are mostly interested in sharing good practices when deploying a region at scale, and operating it. For the deployment part, my main pain point is about the configuration parameters I should use on different software (e.g. nova behind wsgi). The current doc is designed to deploy a small pod, but when we are going large, usually some of those params needs tuning. I'd like to identify them and eventually tag them to help other being aware that they are useful at large scale. About operating, I am pretty sure we can share some good advices as well. E.g., avoid restarting neutron agents in a single shot. So definitely interested in that group. Thanks for bringing that up. Cheers. Le mer. 13 nov. 2019 à 19:00, Stig Telfer a écrit : > Hi Thierry & all - > > Thanks for your mail. I’m interested in joining this SIG. Among others, > I’m interested in participating in discussions around these common problems: > > - golden signals for scaling bottlenecks (and what to do about them) > - using Ansible at scale > - strategies for simplifying OpenStack functionality in order to scale > > Cheers, > Stig > > > > On 13 Nov 2019, at 11:18, Thierry Carrez wrote: > > > > Hi everyone, > > > > In Shanghai we held a forum session to gauge interest in a new SIG to > specifically address cluster scaling issues. In the past we had several > groups ("Large deployments", "Performance", LCOO...) but those efforts were > arguably a bit too wide and those groups are now abandoned. > > > > My main goal here is to get large users directly involved in a domain > where their expertise can best translate into improvements in the software. > It's easy for such a group to go nowhere while trying to boil the ocean. To > maximize its chances of success and make it sustainable, the group should > have a narrow focus, and reasonable objectives. > > > > My personal idea for the group focus was to specifically address scaling > issues within a single cluster: basically identify and address issues that > prevent scaling a single cluster (or cell) past a number of nodes. By > sharing analysis and experience, the group could identify common pain > points that, once solved, would help raising that number. > > > > There was a lot of interest in that session[1], and it predictably > exploded in lots of different directions, including some that are > definitely past a single cluster (like making Neutron better support > cells). I think it's fine: my initial proposal was more of a strawman. > Active members of the group should really define what they collectively > want to work on. And the SIG name should be picked to match that. > > > > I'd like to help getting that group off the ground and to a place where > it can fly by itself, without needing external coordination. The first step > would be to identify interested members and discuss group scope and > objectives. Given the nature of the group (with interested members in > Japan, Europe, Australia and the US) it will be hard to come up with a > synchronous meeting time that will work for everyone, so let's try to hold > that discussion over email. > > > > So to kick this off: if you are interested in that group, please reply > to this email, introduce yourself and tell us what you would like the group > scope and objectives to be, and what you can contribute to the group. > > > > Thanks! > > > > [1] https://etherpad.openstack.org/p/PVG-large-scale-SIG > > > > -- > > Thierry Carrez (ttx) > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at est.tech Thu Nov 14 07:45:01 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Thu, 14 Nov 2019 07:45:01 +0000 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: References: <1573200108.23158.4@est.tech> <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> <1573654932.26082.3@est.tech> Message-ID: <1573717497.26082.4@est.tech> On Thu, Nov 14, 2019 at 02:58, "Patil, Tushar" wrote: > On 11/13/2019 8:34 AM, Sylvain Bauza wrote: >>> Me too. To be clear, I don't think operators would modify the >>> above but >>> if so, they would need reshapes. > >> Maybe not, but this is the kind of detail that should be in the >> spec and >> functional tests to make sure it's solid since this is a big >> architectural change in nova. > > It depends on how the aggregates are created on the nova and > placement side. > > A) From placement point of view, operator can create a new aggregate > and add shared storage RP to it (tag MISC_SHARES_VIA_AGGREGATE trait > to this RP). The newly created valid UUID would then be set in the > config option ``sharing_disk_aggregate`` on the compute node side. > This aggregate UUID wouldn't be present in the nova aggregate. so > it's not possible to add host to the nova aggregate unless a new > aggregate is created on nova side. > > B) If nova aggregates are synced to the placement service and say > below is the picture: > > Nova: > > Agg1 - metadata (pinned=True) > - host1 > - host2 > > Now, operator adds a new shared storage RP to Agg1 on placement side > and then set Agg1 UUID in ``sharing_disk_aggregate`` on compute nodes > along with ``using_shared_disk_provider`=True``, then it would add > compute node RP to the Agg1 on the placement without any issues but > when you want to reverse the configuration, > using_shared_disk_provider=False, then it not that straight to remove > the host from the placement/nova aggregate because there would be > other traits set to compute RPs which could cause those functions > stop working. For me from the sharing disk provider feature perspective the placement aggregate that is needed for the sharing to work, and any kind of nova host aggregate (either synced to placement or not) is independent. The placement aggregate is a must for the feature. On top of that if the operator wants to create a nova host aggregate as well and sync it to placement then at the end there will be two, independent placement aggregates. One to express the sharing relationship and one to express a host aggregate from nova. These two aggregate will not be the same as the first one will have the sharing provider in it while the second one doesn't. gibi > > We had same kind of discussion [1] when implementing forbidden > aggregates where we want to sync traits set to the aggregates but > later it was concluded that operator will do it manually. > > I will include the details Matt has pointed out in this email in my > next patchset. > > [1] : > http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006950.html > > Regards, > tpatil > > > > ________________________________________ > From: Matt Riedemann > Sent: Wednesday, November 13, 2019 11:41 PM > To: openstack-discuss at lists.openstack.org > Subject: Re: [nova][ptg] Allow compute nodes to use DISK_GB from > shared storage RP by using aggregate relationship > > On 11/13/2019 8:34 AM, Sylvain Bauza wrote: >> Me too. To be clear, I don't think operators would modify the above >> but >> if so, they would need reshapes. > > Maybe not, but this is the kind of detail that should be in the spec > and > functional tests to make sure it's solid since this is a big > architectural change in nova. > > -- > > Thanks, > > Matt > > Disclaimer: This email and any attachments are sent in strictest > confidence for the sole use of the addressee and may contain legally > privileged, confidential, and proprietary data. If you are not the > intended recipient, please advise the sender by replying promptly to > this email and then delete and destroy this email and any attachments > without any further use, copying or forwarding. > From fsbiz at yahoo.com Thu Nov 14 08:03:45 2019 From: fsbiz at yahoo.com (fsbiz at yahoo.com) Date: Thu, 14 Nov 2019 08:03:45 +0000 (UTC) Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> Message-ID: <78766172.92122.1573718625984@mail.yahoo.com> I am running stable Queens with hundreds of ironic baremetal nodes. Things are mostly stable but occasionally some baremetal node provisions are failing.  These failures have been tracked to nova placement failure leading to 409 errors.My nova and baremetal filters do NOT have the 3 filters you mention. [root at sc-control03 objects]# grep filter /etc/nova/nova.conf | grep filters # * enabled_filters #enabled_filters=RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter#use_baremetal_filters=false#baremetal_enabled_filters=RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ExactRamFilter,ExactDiskFilter,ExactCoreFilter The baremetal nodes are all using resource class.  My image does NOT  have the changes for https://review.opendev.org/#/c/565841 Ultimately, nova-conductor is reported "NoValidHost: No valid host was found. There are not enough hosts available"This has been traced to nova-placement-api "Allocation for CUSTOM_RRR430 on resource provider 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 violates min_unit, max_unit, or step_size. Requested: 2, min_unit: 1, max_unit: 1, step_size: 1" Any pointers on what next steps I should be looking at ? thanks,Fred. Relevant logs:  nova-conductor.log2019-11-12 10:26:02.593 1666486 ERROR nova.conductor.manager [req-fa1bfb2e-c765-432d-aa66-e16db8329312 - - - - -] Failed to schedule instances: NoValidHost_Remote: No valid host was found. There are not enough hosts available.Traceback (most recent call last):   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 226, in inner    return func(*args, **kwargs)   File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 154, in select_destinations    allocation_request_version, return_alternates)   File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 91, in select_destinations    allocation_request_version, return_alternates)   File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 243, in _schedule    claimed_instance_uuids)   File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 280, in _ensure_sufficient_hosts    raise exception.NoValidHost(reason=reason) NoValidHost: No valid host was found. There are not enough hosts available. nova-placement-api.log  3cacac3f-9af0-4e39-9bc8-d1f362bdb730 = resource ID of baremetal node 84ea2b90-06b2-489e-92ea-24b859b3c997 = instance ID 2019-11-12 10:26:02.427 4161131 INFO nova.api.openstack.placement.requestlog [req-66a6dc45-8326-4e24-9216-fc77099303ba 1ee9f9bf77294e8e8bf50bb35c581689 acf8cd411e5e4751a61d1ed54e8e874d - default default] 10.33.24.13 "GET /allocations/84ea2b90-06b2-489e-92ea-24b859b3c997" status: 200 len: 111 microversion: 1.0 2019-11-12 10:26:02.461 4161129 WARNING nova.objects.resource_provider [req-6d79841e-6abe-490e-b79b-8d88b04215af 1ee9f9bf77294e8e8bf50bb35c581689 acf8cd411e5e4751a61d1ed54e8e874d - default default] Allocation for CUSTOM_Z370_A on resource provider 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 violates min_unit, max_unit, or step_size. Requested: 2, min_unit: 1, max_unit: 1, step_size: 1 2019-11-12 10:26:02.568 4161129 INFO nova.api.openstack.placement.requestlog [req-6d79841e-6abe-490e-b79b-8d88b04215af 1ee9f9bf77294e8e8bf50bb35c581689 acf8cd411e5e4751a61d1ed54e8e874d - default default] 10.33.24.13 "PUT /allocations/84ea2b90-06b2-489e-92ea-24b859b3c997" status: 409 len: 383 microversion: 1.17 http_access_log10.33.24.13 - - [12/Nov/2019:10:26:02 -0800] "GET /allocations/84ea2b90-06b2-489e-92ea-24b859b3c997 HTTP/1.1" 200 111 "-" "nova-scheduler keystoneauth1/3.4.0 python-requests/2.14.2 CPython/2.7.5"10.33.24.13 - - [12/Nov/2019:10:26:02 -0800] "PUT /allocations/84ea2b90-06b2-489e-92ea-24b859b3c997 HTTP/1.1" 409 383 "-" "nova-scheduler keystoneauth1/3.4.0 python-requests/2.14.2 CPython/2.7.5" On Wednesday, November 13, 2019, 11:36:35 AM PST, Albert Braden wrote: Removing these 3 obsolete filters appears to have fixed the problem. Thank you for your advice! -----Original Message----- From: Matt Riedemann Sent: Tuesday, November 12, 2019 1:14 PM To: openstack-discuss at lists.openstack.org Subject: Re: Scheduler sends VM to HV that lacks resources On 11/12/2019 2:47 PM, Albert Braden wrote: > It's probably a config error. Where should I be looking? This is our nova config on the controllers: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_kNe1eRimk4ifrAuuN790bg&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=TZI4wT8_y-RAnwbbXaWBhdvAhhcbY1qymxKLRVpPt2U&s=3aQNqwtEMfOC7U_QUTqNqXiZv4yJy6ceB4kCuZKuL0o&e= If your deployment is pike or newer (I'm guessing rocky because your other email says rocky), then you don't need these filters: RetryFilter - alternate hosts bp in queens release makes this moot CoreFilter - placement filters on VCPU RamFilter - placement filters on MEMORY_MB -- Thanks, Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at est.tech Thu Nov 14 08:37:14 2019 From: balazs.gibizer at est.tech (=?utf-8?B?QmFsw6F6cyBHaWJpemVy?=) Date: Thu, 14 Nov 2019 08:37:14 +0000 Subject: [nova][ptg] Flavor explosion In-Reply-To: References: Message-ID: <1573720630.26082.5@est.tech> On Sun, Nov 10, 2019 at 16:09, Brin Zhang(张百林) wrote: > Hi all, > Based on the discussion on the Train PTG, and reference to the > records on the etherpad and ML, I was updated that SPEC, and I think > there are some details need to be discussed, and I have listed some > details, > if there are any other things that I have not considered, or if some > place that I thoughtless, please post a discussion. > > List some details as follows, and you can review that spec in > https://review.opendev.org/#/c/663563. > > Listed details: > - Don't change the model of the flavor in nova code and in the db. > > - No change for operators who choose not to request the flavor > extra specs group. > > - Requested more than one flavor extra specs groups, if there are > different values for the same spec will be raised a 409. > > - Flavor in request body of server create that has the same spec in > the request ``flavor_extra_specs_group``, it will be raised a 409. > > - When resize an instance, you need to compare the > ``flavor_extra_specs_group`` with the spec request spec, otherwise > raise a 400. > Thanks Brin for updating the spec, I did a review round on it and left comments. gibi From balazs.gibizer at est.tech Thu Nov 14 08:38:56 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Thu, 14 Nov 2019 08:38:56 +0000 Subject: [nova][ptg] Expose auto converge and post copy In-Reply-To: <75036C56-67F9-4C14-9ECD-BFF1DEAD006B@99cloud.net> References: <1573402509.31166.3@est.tech> <5717FBD3-8373-4EAA-9EF4-5F3EFDE7B53F@99cloud.net> <75036C56-67F9-4C14-9ECD-BFF1DEAD006B@99cloud.net> Message-ID: <1573720733.26082.6@est.tech> On Mon, Nov 11, 2019 at 15:45, wang.ya wrote: > Hi: > > Here is the spec [1]_ > Because the exist spec [2]_ has gap with the agreement, so I rewrote > a new spec. > > .. [1]: https://review.opendev.org/#/c/693655/ > .. [2]: https://review.opendev.org/#/c/687199/ Could you please abandon one of the specs this is no confuses me which solution you want to push forward. Cheers, gibi > > Best Regards > From zhangbailin at inspur.com Thu Nov 14 08:47:48 2019 From: zhangbailin at inspur.com (=?utf-8?B?QnJpbiBaaGFuZyjlvKDnmb7mnpcp?=) Date: Thu, 14 Nov 2019 08:47:48 +0000 Subject: =?utf-8?B?W2xpc3RzLm9wZW5zdGFjay5vcmfku6Plj5FdUmU6IFtub3ZhXSBUaG91Z2h0?= =?utf-8?B?cyBvbiBleHBvc2luZyBleGNlcHRpb24gdHlwZSB0byBub24tYWRtaW5zIGlu?= =?utf-8?Q?_instance_action_event?= In-Reply-To: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> References: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> Message-ID: <03bfc8edb0fe4b23955ae8007a11e8c1@inspur.com> I would like to see this feature, our customers have mentioned the same problem, I think this is useful. I think that should consider of the all instance action operations, such as actions in nova/compute/instance_actions.py. brinzhang > 主题: [lists.openstack.org代发]Re: [nova] Thoughts on exposing exception > type to non-admins in instance action event > > On 11/13/2019 11:17 AM, Eric Fried wrote: > > Unless it's likely to be something other than NoValidHost a > > significant percentage of the time, IMO it... > > Well just taking resize, it could be one of many things: > > https://github.com/openstack/nova/blob/20.0.0/nova/conductor/manager.py# > L366 > - oops you tried resizing which would screw up your group affinity policy > > https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L > 4490 > - (for an admin, cold migrate) oops you tried cold migrating a vcenter vm or you > have allow_resize_to_same_host=True and the scheduler picks the same host > (silly scheduler, see bug 1748697) > > https://github.com/openstack/nova/blob/20.0.0/nova/compute/claims.py#L11 > 3 > - oops you lost a resource claims race, try again > > https://github.com/openstack/nova/blob/20.0.0/nova/scheduler/client/report. > py#L1898 > - oops you lost a race with allocation consumer generation conflicts, try again > > -- > > Thanks, > > Matt From zhangbailin at inspur.com Thu Nov 14 08:55:48 2019 From: zhangbailin at inspur.com (=?utf-8?B?QnJpbiBaaGFuZyjlvKDnmb7mnpcp?=) Date: Thu, 14 Nov 2019 08:55:48 +0000 Subject: [nova][ptg] Flavor explosion Message-ID: Thanks Gibizer, I will update this specs to the ussuri directory, and update & reply your comments. brinzhang > 发件人: Balázs Gibizer [mailto:balazs.gibizer at est.tech] > > On Sun, Nov 10, 2019 at 16:09, Brin Zhang(张百林) > wrote: > > Hi all, > > Based on the discussion on the Train PTG, and reference to the > > records on the etherpad and ML, I was updated that SPEC, and I think > > there are some details need to be discussed, and I have listed some > > details, if there are any other things that I have not considered, or > > if some place that I thoughtless, please post a discussion. > > > > List some details as follows, and you can review that spec in > > https://review.opendev.org/#/c/663563. > > > > Listed details: > > - Don't change the model of the flavor in nova code and in the db. > > > > - No change for operators who choose not to request the flavor extra > > specs group. > > > > - Requested more than one flavor extra specs groups, if there are > > different values for the same spec will be raised a 409. > > > > - Flavor in request body of server create that has the same spec in > > the request ``flavor_extra_specs_group``, it will be raised a 409. > > > > - When resize an instance, you need to compare the > > ``flavor_extra_specs_group`` with the spec request spec, otherwise > > raise a 400. > > > > Thanks Brin for updating the spec, I did a review round on it and left comments. > > gibi > From sfinucan at redhat.com Thu Nov 14 09:06:34 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Thu, 14 Nov 2019 09:06:34 +0000 Subject: [nova][ptg] pinned and unpinned CPUs in one instance In-Reply-To: References: <1573196961.23158.1@est.tech> Message-ID: <1eafabf9c807438461a96afd2af9aa6d7992765e.camel@redhat.com> On Fri, 2019-11-08 at 12:20 +0000, Sean Mooney wrote: > > Naming: use the 'shared' and 'dedicated' terminology > didn't we want to have a hw:cpu_policy=mixed specificaly for this case? It wasn't clear, but gibi was referring to how we'd distinguish the "types" of CPU and instances using those CPUs. The alternative was pinned and unpinned. Stephen From sfinucan at redhat.com Thu Nov 14 09:08:46 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Thu, 14 Nov 2019 09:08:46 +0000 Subject: [nova][ptg] pinned and unpinned CPUs in one instance In-Reply-To: <77E9D723B6A15C4CB27F7C3F130DE8624776582B@shsmsx102.ccr.corp.intel.com> References: <1573196961.23158.1@est.tech> <77E9D723B6A15C4CB27F7C3F130DE8624776582B@shsmsx102.ccr.corp.intel.com> Message-ID: <4a0ab4e36683efefb5289c0ab2a8861569dd691a.camel@redhat.com> On Mon, 2019-11-11 at 11:58 +0000, Wang, Huaqiang wrote: > > -----Original Message----- > > From: Balázs Gibizer > > Sent: Friday, November 8, 2019 3:10 PM > > To: openstack-discuss > > Subject: [nova][ptg] pinned and unpinned CPUs in one instance > > > > spec: https://review.opendev.org/668656 > > > > Agreements from the PTG: > > > > How we will test it: > > * do functional test with libvirt driver, like the pinned cpu tests we have > > today > > * donyd's CI supports nested virt so we can do pinned cpu testing but not > > realtime. As this CI is still work in progress we should not block on this. > > * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to > > have > > > > Naming: use the 'shared' and 'dedicated' terminology > > > > Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra > > specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax will > > have less expression power until nova models NUMA in placement. So nova > > will try to evenly distribute PCPUs between numa nodes. If it not possible we > > reject the request and ask the user to use the > > hw:pinvcpus=3 syntax. > > > > Realtime mask is an exclusion mask, any vcpus not listed there has to be in > > the dedicated set of the instance. > > > > TODOInvestigate whether we want to enable NUMA by default > > * Pros: Simpler, everything is NUMA by default > > * Cons: We'll either have to break/make configurablethe 1:1 guest:host > > NUMA mapping else we won't be able to boot e.g. a 40 core shared instance > > on a 40 core, 2 NUMA node host > > For the case of 'booting a 40 core shared instance on 40 core 2NUMA node' that will > not be covered by the new 'mixed' policy. It is just a legacy 'shared' instance with no > assumption about instance NUMA topology. Correct. However, this investigation refers to *all* instances, not just those using the 'mixed' policy. For the 'mixed' policy, I assume we'll need to apply a virtual NUMA topology since we currently apply one for instances using the 'dedicated' policy. > By the way if you want a 'shared' instance, with 40 cores, to be scheduled on a host > of 40cores, 2 NUMA nodes, you also need to register all host cores as 'shared' cpus > through 'conf.compute.cpu_shared_set'. > > For instance with 'mixed' policy, what I want to propose is the instance should > demand at least one 'dedicated'(or PCPU) core. Thus, any 'mixed' instance or 'dedicated' > instance will not be scheduled one this host due to no PCPU available on this host. > > And also, a 'mixed' instance should also demand at least one 'shared' (or VCPU) core. > a 'mixed' instance demanding all cores from PCPU resource should be considered as > an invalid one. And an instance demanding all cores from PCPU resource is just a > legacy 'dedicated' instance, which CPU allocation policy is 'dedicated'. > > In conclusion, a instance with the policy of 'mixed' > -. demands at least one 'dedicated' cpu and at least one 'shared' cpu. > -. with NUMA topology by default due to requesting pinned cpu > > In my understanding the cons does not exist by making above rules. > > Br > Huaqiang > > > > > Cheers, > > gibi From cdent+os at anticdent.org Thu Nov 14 09:12:51 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Thu, 14 Nov 2019 09:12:51 +0000 (GMT) Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: <78766172.92122.1573718625984@mail.yahoo.com> References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> <78766172.92122.1573718625984@mail.yahoo.com> Message-ID: On Thu, 14 Nov 2019, fsbiz at yahoo.com wrote: > Ultimately, nova-conductor is reported "NoValidHost: No valid host was found. There are not enough hosts available"This has been traced to nova-placement-api "Allocation for CUSTOM_RRR430 on resource provider 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 violates min_unit, max_unit, or step_size. Requested: 2, min_unit: 1, max_unit: 1, step_size: 1" > Any pointers on what next steps I should be looking at ? Your request, is asking for CUSTOM_RRR430 will a value of 2, but it is only available as 1. Have a look at your server create request, there's something, probably your flavor, which is unexpected. Placement and nova scheduler are working correctly with the data they have, the problem is with how inventory is being reported or requested. This could either be with how your ironic nodes are being reported, or with flavors. > 2019-11-12 10:26:02.461 4161129 WARNING nova.objects.resource_provider [req-6d79841e-6abe-490e-b79b-8d88b04215af 1ee9f9bf77294e8e8bf50bb35c581689 acf8cd411e5e4751a61d1ed54e8e874d - default default] Allocation for CUSTOM_Z370_A on resource provider 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 violates min_unit, max_unit, or step_size. Requested: 2, min_unit: 1, max_unit: 1, step_size: 1 This is the same issue, but with a different class of inventory -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From wang.ya at 99cloud.net Thu Nov 14 09:23:35 2019 From: wang.ya at 99cloud.net (wangya) Date: Thu, 14 Nov 2019 17:23:35 +0800 Subject: [nova][ptg] Expose auto converge and post copy In-Reply-To: <1573720733.26082.6@est.tech> References: <1573402509.31166.3@est.tech> <5717FBD3-8373-4EAA-9EF4-5F3EFDE7B53F@99cloud.net> <75036C56-67F9-4C14-9ECD-BFF1DEAD006B@99cloud.net> <1573720733.26082.6@est.tech> Message-ID: <4118925c-15c8-3f91-2fea-7ece720d5dd9@99cloud.net> > On Mon, Nov 11, 2019 at 15:45, wang.ya wrote: >> Hi: >> >> Here is the spec [1]_ >> Because the exist spec [2]_ has gap with the agreement, so I rewrote >> a new spec. >> >> .. [1]: https://review.opendev.org/#/c/693655/ >> .. [2]: https://review.opendev.org/#/c/687199/ > Could you please abandon one of the specs this is no confuses me which > solution you want to push forward. https://review.opendev.org/#/c/687199/ has been abandoned Please discuss in this spec: https://review.opendev.org/#/c/693655/;-) From mark at stackhpc.com Thu Nov 14 09:24:11 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 14 Nov 2019 09:24:11 +0000 Subject: [neutron][docs][infra] networking-onos EOL? In-Reply-To: References: Message-ID: Added [infra]. On Thu, 14 Nov 2019 at 05:59, Akihiro Motoki wrote: > > Hi, > > networking-onos project was under the neutron team governance, but it > was retired in Oct 2016 [4][5]. > > Regarding the 'latest' documentation, there is no clear guideline on > cleaning up "docs.o.o/latest/foo" > when a repository is retried. I think that is the only reason we can > still see docs.o.o/latest/networking-onos. > Only projects under TC governance can publish documentation under > docs.o.o, so I thnk we need a cleanup > when a repository retirement. That sounds like a fair argument to me. > > Thanks, > Akihiro Motoki (amotoki) > > [4] https://review.opendev.org/#/c/383911/ (neutron team decision) > [5] https://review.opendev.org/#/c/392010/ (governance change) > > On Mon, Nov 4, 2019 at 7:12 PM Mark Goddard wrote: > > > > Hi, > > > > We (kolla) had a bug report [1] from someone trying to use the neutron > > onos_ml2 ML2 driver for the ONOS SDN controller. As far as I can tell > > [2], this project hasn't been released since 2015. However, the > > 'latest' documentation is still accessible [3], and does not mention > > that the project is dead. What can we do to help steer people away > > from projects like this? > > > > Cheers, > > Mark > > > > [1] https://bugs.launchpad.net/bugs/1850763 > > [2] https://pypi.org/project/networking-onos/#history > > [3] https://docs.openstack.org/networking-onos/latest/ > > From moreira.belmiro.email.lists at gmail.com Thu Nov 14 09:31:10 2019 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Thu, 14 Nov 2019 10:31:10 +0100 Subject: [sig] Forming a Large scale SIG In-Reply-To: References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> <47730534-09AF-4018-A2AE-670D4725E569@telfer.org> Message-ID: Hi, first of all thanks to Thierry for driving this SIG creation. Having a SIG to discuss how to deploy/operate a large deployment will be incredibly useful. In my opinion we shouldn't restrict ourselves to a specific project or deployment size (or number of cells) but discuss the limits of each project architecture, the projects dependencies, limitations at scale (functionality vs simplicity), operational difficulties... Sharing experiences and understand the different challenges and actions that we are using to mitigate them will be extremely valuable. I think that we already have a lot of examples of companies/organizations that are deploying OpenStack at large scale. Compiling all this information (Summit presentations, blogs, superuser articles, ...) will be a good starting point for all operators and discussions. Every deployment is different. I also would like this SIG to be the bridge between the operators of large deployments and developers. Bringing specific pain points to discussion with developers. cheers, Belmiro CERN On Thu, Nov 14, 2019 at 8:25 AM Arnaud MORIN wrote: > Hi all, > > +1 for me and my employer (OVH). > We are mostly interested in sharing good practices when deploying a region > at scale, and operating it. > > For the deployment part, my main pain point is about the configuration > parameters I should use on different software (e.g. nova behind wsgi). > The current doc is designed to deploy a small pod, but when we are going > large, usually some of those params needs tuning. I'd like to identify them > and eventually tag them to help other being aware that they are useful at > large scale. > > About operating, I am pretty sure we can share some good advices as well. > E.g., avoid restarting neutron agents in a single shot. > > So definitely interested in that group. Thanks for bringing that up. > > Cheers. > > Le mer. 13 nov. 2019 à 19:00, Stig Telfer a > écrit : > >> Hi Thierry & all - >> >> Thanks for your mail. I’m interested in joining this SIG. Among others, >> I’m interested in participating in discussions around these common problems: >> >> - golden signals for scaling bottlenecks (and what to do about them) >> - using Ansible at scale >> - strategies for simplifying OpenStack functionality in order to scale >> >> Cheers, >> Stig >> >> >> > On 13 Nov 2019, at 11:18, Thierry Carrez wrote: >> > >> > Hi everyone, >> > >> > In Shanghai we held a forum session to gauge interest in a new SIG to >> specifically address cluster scaling issues. In the past we had several >> groups ("Large deployments", "Performance", LCOO...) but those efforts were >> arguably a bit too wide and those groups are now abandoned. >> > >> > My main goal here is to get large users directly involved in a domain >> where their expertise can best translate into improvements in the software. >> It's easy for such a group to go nowhere while trying to boil the ocean. To >> maximize its chances of success and make it sustainable, the group should >> have a narrow focus, and reasonable objectives. >> > >> > My personal idea for the group focus was to specifically address >> scaling issues within a single cluster: basically identify and address >> issues that prevent scaling a single cluster (or cell) past a number of >> nodes. By sharing analysis and experience, the group could identify common >> pain points that, once solved, would help raising that number. >> > >> > There was a lot of interest in that session[1], and it predictably >> exploded in lots of different directions, including some that are >> definitely past a single cluster (like making Neutron better support >> cells). I think it's fine: my initial proposal was more of a strawman. >> Active members of the group should really define what they collectively >> want to work on. And the SIG name should be picked to match that. >> > >> > I'd like to help getting that group off the ground and to a place where >> it can fly by itself, without needing external coordination. The first step >> would be to identify interested members and discuss group scope and >> objectives. Given the nature of the group (with interested members in >> Japan, Europe, Australia and the US) it will be hard to come up with a >> synchronous meeting time that will work for everyone, so let's try to hold >> that discussion over email. >> > >> > So to kick this off: if you are interested in that group, please reply >> to this email, introduce yourself and tell us what you would like the group >> scope and objectives to be, and what you can contribute to the group. >> > >> > Thanks! >> > >> > [1] https://etherpad.openstack.org/p/PVG-large-scale-SIG >> > >> > -- >> > Thierry Carrez (ttx) >> > >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Thu Nov 14 09:38:49 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 14 Nov 2019 10:38:49 +0100 Subject: [neutron] Review priorities Message-ID: <20191114093849.nqpjoomcfq6s2dfw@skaplons-mac> Hi neutrinos, According to our discussion during Train retrospective in Shanghai, I added "review-priority" label for neutron projects. It can be set by every core team member to values like: -1 - Branch Freeze +1 - Important Change +2 - Gate Blocker Fix / Urgent Change You can use dashboard like [1] to track such high priority patches and review them. I will also add some note about this to our docs this week to make it clear and visible for everyone. [1] https://tinyurl.com/vezk6n6 -- Slawek Kaplonski Senior software engineer Red Hat From dh3 at sanger.ac.uk Thu Nov 14 09:44:33 2019 From: dh3 at sanger.ac.uk (Dave Holland) Date: Thu, 14 Nov 2019 09:44:33 +0000 Subject: [sig] Forming a Large scale SIG [EXT] In-Reply-To: References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> <47730534-09AF-4018-A2AE-670D4725E569@telfer.org> Message-ID: <20191114094433.GN3793@sanger.ac.uk> Hi, Belmiro's point of linking operators and developers is hugely important because the developers have the tough job of catering for both large and small deployments. What can look like a safety net for small systems (e.g. per-container file descriptor limits) turns into a huge pitfall when deploying at scale. I'm really interested to be involved in this SIG. Cheers, Dave -- ** Dave Holland ** Systems Support -- Informatics Systems Group ** ** 01223 496923 ** Wellcome Sanger Institute, Hinxton, UK ** On Thu, Nov 14, 2019 at 10:31:10AM +0100, Belmiro Moreira wrote: > Hi, > first of all thanks to Thierry for driving this SIG creation. > Having a SIG to discuss how to deploy/operate a large deployment will > be incredibly useful. > In my opinion we shouldn't restrict ourselves to a specific project > or deployment size (or number of cells) but discuss the limits of > each project architecture, the projects dependencies, limitations at > scale (functionality vs simplicity), operational difficulties... > Sharing experiences and understand the different challenges and > actions that we are using to mitigate them will be extremely > valuable. > I think that we already have a lot of examples of > companies/organizations that are deploying OpenStack at large scale. > Compiling all this information (Summit presentations, blogs, > superuser articles, ...) will be a good starting point for all > operators and discussions. Every deployment is different. > I also would like this SIG to be the bridge between the operators of > large deployments and developers. Bringing specific pain points to > discussion with developers. > cheers, > Belmiro > CERN > > On Thu, Nov 14, 2019 at 8:25 AM Arnaud MORIN > <[1]arnaud.morin at gmail.com> wrote: > > Hi all, > +1 for me and my employer (OVH). > We are mostly interested in sharing good practices when deploying a > region at scale, and operating it. > For the deployment part, my main pain point is about the > configuration parameters I should use on different software (e.g. > nova behind wsgi). > The current doc is designed to deploy a small pod, but when we are > going large, usually some of those params needs tuning. I'd like to > identify them and eventually tag them to help other being aware that > they are useful at large scale. > About operating, I am pretty sure we can share some good advices as > well. E.g., avoid restarting neutron agents in a single shot. > So definitely interested in that group. Thanks for bringing that up. > Cheers. > > Le mer. 13 nov. 2019 à 19:00, Stig Telfer > <[2]stig.openstack at telfer.org> a écrit : > > Hi Thierry & all - > Thanks for your mail. I’m interested in joining this SIG. Among > others, I’m interested in participating in discussions around > these common problems: > - golden signals for scaling bottlenecks (and what to do about > them) > - using Ansible at scale > - strategies for simplifying OpenStack functionality in order to > scale > Cheers, > Stig > > On 13 Nov 2019, at 11:18, Thierry Carrez > <[3]thierry at openstack.org> wrote: > > > > Hi everyone, > > > > In Shanghai we held a forum session to gauge interest in a new > SIG to specifically address cluster scaling issues. In the past we > had several groups ("Large deployments", "Performance", LCOO...) > but those efforts were arguably a bit too wide and those groups > are now abandoned. > > > > My main goal here is to get large users directly involved in a > domain where their expertise can best translate into improvements > in the software. It's easy for such a group to go nowhere while > trying to boil the ocean. To maximize its chances of success and > make it sustainable, the group should have a narrow focus, and > reasonable objectives. > > > > My personal idea for the group focus was to specifically address > scaling issues within a single cluster: basically identify and > address issues that prevent scaling a single cluster (or cell) > past a number of nodes. By sharing analysis and experience, the > group could identify common pain points that, once solved, would > help raising that number. > > > > There was a lot of interest in that session[1], and it > predictably exploded in lots of different directions, including > some that are definitely past a single cluster (like making > Neutron better support cells). I think it's fine: my initial > proposal was more of a strawman. Active members of the group > should really define what they collectively want to work on. And > the SIG name should be picked to match that. > > > > I'd like to help getting that group off the ground and to a > place where it can fly by itself, without needing external > coordination. The first step would be to identify interested > members and discuss group scope and objectives. Given the nature > of the group (with interested members in Japan, Europe, Australia > and the US) it will be hard to come up with a synchronous meeting > time that will work for everyone, so let's try to hold that > discussion over email. > > > > So to kick this off: if you are interested in that group, please > reply to this email, introduce yourself and tell us what you would > like the group scope and objectives to be, and what you can > contribute to the group. > > > > Thanks! > > > > [1] [4]https://etherpad.openstack.org/p/PVG-large-scale-SIG > [etherpad.openstack.org] > > > > -- > > Thierry Carrez (ttx) > > > > References > > 1. mailto:arnaud.morin at gmail.com > 2. mailto:stig.openstack at telfer.org > 3. mailto:thierry at openstack.org > 4. https://urldefense.proofpoint.com/v2/url?u=https-3A__etherpad.openstack.org_p_PVG-2Dlarge-2Dscale-2DSIG&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=64bKjxgut4Pa0xs5b84yPg&m=DdEhOLy_myry74y3z2LhDWbl3ztokcSVufGIqfDSCaM&s=L7GyQqoSsD_56ROhOkKxfMtbER6jrPjcNSZrjNsQrMg&e= -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From moreira.belmiro.email.lists at gmail.com Thu Nov 14 09:58:59 2019 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Thu, 14 Nov 2019 10:58:59 +0100 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: <2ce84a47ac59bdd160a71b37eaf05f0eca9e1f85.camel@redhat.com> References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> <2ce84a47ac59bdd160a71b37eaf05f0eca9e1f85.camel@redhat.com> Message-ID: Hi, Akihiro, thanks for you summary. We use the linuxbridge driver because its simplicity and the match with the old nova-network schema (yes, are we still migrating). The functionality gap between ovs driver and linuxbridge is a good think in my view. It allows operators to chose the best solution considering their deployment use case and scale. Slawek, Miguel please keep us in the discussions. Belmiro CERN On Wed, Nov 13, 2019 at 7:22 PM Sean Mooney wrote: > On Tue, 2019-11-12 at 14:53 +0100, Slawek Kaplonski wrote: > > Stateless security groups > > ========================= > > > > Old RFE [21] was approved for neutron-fwaas project but we all agreed > that this > > should be now implemented for security groups in core Neutron. > > People from Nuage are interested in work on this in upstream. > > We should probably also explore how easy/hard it will be to implement it > in > > networking-ovn backend. > > for what its worth we implemented this 4 years ago and it was breifly used > in production trial deployment > in a telco deployment but i dont think it ever went to full production as > they went wtih sriov instead > https://review.opendev.org/#/c/264131/ as part of this RFE > https://bugs.launchpad.net/neutron/+bug/1531205 which was > closed as wont fix > https://bugs.launchpad.net/neutron/+bug/1531205/comments/14 > as it was view that this was not the correct long term direction for the > community. > this is the summit presentation for austin for anyone that does not > rememebr this effort > > > https://www.openstack.org/videos/summits/austin-2016/tired-of-iptables-based-security-groups-heres-how-to-gain-tremendous-speed-with-open-vswitch-instead > > im not sure how the new proposal differeres form our previous proposal for > the same > feautre but the main pushback we got was that the securtiy group api is > assumed to be stateful > and that is why this was rejected. form our mesurments at the time we > expected the stateless approch > to scale better then contrack driver so it woudl be nice to see a > stateless approch avialable. > i never got around to deleteing our implemenation form networking-ovs-dpdk > > https://opendev.org/x/networking-ovs-dpdk/src/branch/master/networking_ovs_dpdk/agent/ovs_dpdk_firewall.py > but i has not been tested our updated really for the last 2 years but it > could be used as a basis of this effort > if nuage does not have a poc already. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Nov 14 11:14:21 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 14 Nov 2019 11:14:21 +0000 Subject: [nova][ptg] pinned and unpinned CPUs in one instance In-Reply-To: <4a0ab4e36683efefb5289c0ab2a8861569dd691a.camel@redhat.com> References: <1573196961.23158.1@est.tech> <77E9D723B6A15C4CB27F7C3F130DE8624776582B@shsmsx102.ccr.corp.intel.com> <4a0ab4e36683efefb5289c0ab2a8861569dd691a.camel@redhat.com> Message-ID: <5ac1120f9d872879d9cfaf19d2f61fa02e63887b.camel@redhat.com> On Thu, 2019-11-14 at 09:08 +0000, Stephen Finucane wrote: > On Mon, 2019-11-11 at 11:58 +0000, Wang, Huaqiang wrote: > > > -----Original Message----- > > > From: Balázs Gibizer > > > Sent: Friday, November 8, 2019 3:10 PM > > > To: openstack-discuss > > > Subject: [nova][ptg] pinned and unpinned CPUs in one instance > > > > > > spec: https://review.opendev.org/668656 > > > > > > Agreements from the PTG: > > > > > > How we will test it: > > > * do functional test with libvirt driver, like the pinned cpu tests we have > > > today > > > * donyd's CI supports nested virt so we can do pinned cpu testing but not > > > realtime. As this CI is still work in progress we should not block on this. > > > * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to > > > have > > > > > > Naming: use the 'shared' and 'dedicated' terminology > > > > > > Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra > > > specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax will > > > have less expression power until nova models NUMA in placement. So nova > > > will try to evenly distribute PCPUs between numa nodes. If it not possible we > > > reject the request and ask the user to use the > > > hw:pinvcpus=3 syntax. > > > > > > Realtime mask is an exclusion mask, any vcpus not listed there has to be in > > > the dedicated set of the instance. > > > > > > TODOInvestigate whether we want to enable NUMA by default > > > * Pros: Simpler, everything is NUMA by default > > > * Cons: We'll either have to break/make configurablethe 1:1 guest:host > > > NUMA mapping else we won't be able to boot e.g. a 40 core shared instance > > > on a 40 core, 2 NUMA node host > > > > For the case of 'booting a 40 core shared instance on 40 core 2NUMA node' that will > > not be covered by the new 'mixed' policy. It is just a legacy 'shared' instance with no > > assumption about instance NUMA topology. > > Correct. However, this investigation refers to *all* instances, not > just those using the 'mixed' policy. For the 'mixed' policy, I assume > we'll need to apply a virtual NUMA topology since we currently apply > one for instances using the 'dedicated' policy. yes for consitency i think that would be the correct approch too. > > > By the way if you want a 'shared' instance, with 40 cores, to be scheduled on a host > > of 40cores, 2 NUMA nodes, you also need to register all host cores as 'shared' cpus > > through 'conf.compute.cpu_shared_set'. > > > > For instance with 'mixed' policy, what I want to propose is the instance should > > demand at least one 'dedicated'(or PCPU) core. Thus, any 'mixed' instance or 'dedicated' > > instance will not be scheduled one this host due to no PCPU available on this host. > > > > And also, a 'mixed' instance should also demand at least one 'shared' (or VCPU) core. > > a 'mixed' instance demanding all cores from PCPU resource should be considered as > > an invalid one. And an instance demanding all cores from PCPU resource is just a > > legacy 'dedicated' instance, which CPU allocation policy is 'dedicated'. > > > > In conclusion, a instance with the policy of 'mixed' > > -. demands at least one 'dedicated' cpu and at least one 'shared' cpu. > > -. with NUMA topology by default due to requesting pinned cpu > > > > In my understanding the cons does not exist by making above rules. > > > > Br > > Huaqiang > > > > > > > > Cheers, > > > gibi > > From fungi at yuggoth.org Thu Nov 14 11:54:45 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 14 Nov 2019 11:54:45 +0000 Subject: [neutron][docs][infra] networking-onos EOL? In-Reply-To: References: Message-ID: <20191114115445.mfuin7xuqfkss42r@yuggoth.org> On 2019-11-14 09:24:11 +0000 (+0000), Mark Goddard wrote: > Added [infra]. [...] Can you clarify why? Reading back through the thread this sounds like you either want a change to the way the main documentation redirects in the openstack-manuals repo are designed, or you want some change merged to networking-onos to replace its documentation with some indication it's retired, or you want a change to the openstackdocstheme Sphinx theme to add a banner/admonishment on docs for retired repos... or are you simply relying on the Infra team to remind people on other teams how their projects are interrelated? A more specific request/question would really help. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From corey.bryant at canonical.com Thu Nov 14 13:47:35 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Thu, 14 Nov 2019 08:47:35 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: On Wed, Nov 13, 2019 at 3:36 PM Clark Boylan wrote: > On Fri, Nov 8, 2019, at 6:09 AM, Corey Bryant wrote: > > > > > > On Thu, Nov 7, 2019 at 5:56 PM Sean McGinnis > wrote: > > > My non-TC take on this... > > > > > > > > > > Python 3.8 is available in Ubuntu Bionic now and while I understand > it's too late to enable voting py38 unit tests for ussuri, I'd like to at > least enable non-voting py38 unit tests. This email is seeking approval and > direction from the TC to move forward with enabling non-voting py38 tests. > > > > > > I think it would be great to start testing 3.8 so there are no > surprises once we need to officially move there. But I would actually not > want to see that run on every since patch in every single repo. > > > > Just to be clear I'm only talking about unit tests right now which are > > generally light on resource requirements. However it would be great to > > also have py38 function test enablement and periodic would make sense > > for function tests at this point. For unit tests though it seems the > > benefit of knowing whether your patch regresses unit tests for the > > latest python version far outweighs the resources required, so I don't > > see much benefit in adding periodic unit test jobs. > > > > Wanted to point out that we've begun to expose resource consumption in > nodepool to graphite. You can find per project and per tenant resource > usage under stats.zuul.nodepool.resources at https://graphite.opendev.org. > Unfortunately, I don't think we have per job resource tracking there yet, > but previous measurements from log files do agree that unittest consumption > is relatively low. > > It is large multinode integration jobs that run for extended periods of > time that have the greatest impact on our resource utilization. > > Clark > > That's great, thanks for sharing. Per job would be a super nice addition. Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Nov 14 13:58:17 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 07:58:17 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: <9abb3dc9-4b01-90f6-c020-80d4b50d6356@fried.cc> References: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> <9abb3dc9-4b01-90f6-c020-80d4b50d6356@fried.cc> Message-ID: On 11/13/2019 2:38 PM, Eric Fried wrote: > Okay, are we going to have a document that maps exception classes to > these explanations and recovery actions? Which we then have to maintain > as the code changes? Or are they expected to look through code (without > a stack trace)? Nope. > > I'm not against the idea, just playing devil's advocate. Sylvain seems > to have a use case, so great. Yeah I know. Like I said in the original email, just having the exception type might not be very useful to an end user. That's almost like just showing an error code that is then used by support staff. If we do expose the details as the formatted exception message, like we do for faults, then I think it would be more useful to end users, but then you also run into the same issues as we have for fault messages that maybe leak too much detail [1]. However, with the way I was thinking about doing this, the instance action code would use the same utility method that generates the fault message so if we fix [1] for faults it's also fixed for instance actions automatically. If I get the time this week I'll WIP something together that does what I'm thinking as a proof of concept, likely without the microversion stuff just since that's unnecessary overhead for a PoC. > > As an alternative, have we considered a mechanism whereby we could, in > appropriate code paths, provide some text that's expressly intended for > the end user to see? Maybe it's a new user_message field on > NovaException which, if present, gets percolated up to a new field > similar to the one you suggested. I think that likely becomes as whack-a-mole to contain as documenting all of the different types of errors. [1] https://bugs.launchpad.net/nova/+bug/1851587 -- Thanks, Matt From mriedemos at gmail.com Thu Nov 14 13:59:30 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 07:59:30 -0600 Subject: =?UTF-8?B?UmU6IFtsaXN0cy5vcGVuc3RhY2sub3Jn5Luj5Y+RXVJlOiBbbm92YV0g?= =?UTF-8?Q?Thoughts_on_exposing_exception_type_to_non-admins_in_instance_act?= =?UTF-8?Q?ion_event?= In-Reply-To: <03bfc8edb0fe4b23955ae8007a11e8c1@inspur.com> References: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> <03bfc8edb0fe4b23955ae8007a11e8c1@inspur.com> Message-ID: <1c9e6663-4c27-6865-7a3f-4cd15664e581@gmail.com> On 11/14/2019 2:47 AM, Brin Zhang(张百林) wrote: > I think that should consider of the all instance action operations, such as actions in nova/compute/instance_actions.py. The resize examples in my email are just examples. The code that generates the action events is centralized in the InstanceActionEvent object so it would be used for all actions that fail with some exception. -- Thanks, Matt From mriedemos at gmail.com Thu Nov 14 14:02:28 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 08:02:28 -0600 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: <1573717497.26082.4@est.tech> References: <1573200108.23158.4@est.tech> <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> <1573654932.26082.3@est.tech> <1573717497.26082.4@est.tech> Message-ID: On 11/14/2019 1:45 AM, Balázs Gibizer wrote: > For me from the sharing disk provider feature perspective the placement > aggregate that is needed for the sharing to work, and any kind of nova > host aggregate (either synced to placement or not) is independent. The > placement aggregate is a must for the feature. On top of that if the > operator wants to create a nova host aggregate as well and sync it to > placement then at the end there will be two, independent placement > aggregates. One to express the sharing relationship and one to express > a host aggregate from nova. These two aggregate will not be the same as > the first one will have the sharing provider in it while the second one > doesn't. I tend to agree with the simplicity of this as well. -- Thanks, Matt From witold.bedyk at suse.com Thu Nov 14 14:03:55 2019 From: witold.bedyk at suse.com (Witek Bedyk) Date: Thu, 14 Nov 2019 15:03:55 +0100 Subject: [monasca] New team meeting time poll Message-ID: Hello everyone, We would like to find the new time slot for the Monasca Team Meeting which suites you best. Please fill in the times which work for you in that poll [1] until next Wednesday. Thanks Witek [1] https://doodle.com/poll/ey6brvmbsubkxpp9 From mriedemos at gmail.com Thu Nov 14 14:10:00 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 08:10:00 -0600 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> <78766172.92122.1573718625984@mail.yahoo.com> Message-ID: On 11/14/2019 3:12 AM, Chris Dent wrote: > Your request, is asking for CUSTOM_RRR430 will a value of 2, but it > is only available as 1. Have a look at your server create request, > there's something, probably your flavor, which is unexpected. https://review.opendev.org/#/c/620111/ comes to mind, I'm not sure if that helps you workaround the problem or not. Be sure to go through this doc as well: https://docs.openstack.org/ironic/queens/install/configure-nova-flavors.html#scheduling-based-on-resource-classes Specifically the part about overriding the VCPU/MEMORY_MB/DISK_GB values in the baremetal flavors. My guess is maybe you haven't done that and the scheduler is selecting a node based on vcpu/ram/disk that is already fully consumed by another node with the same resource class? Failing all that, it might be an issue due to https://review.opendev.org/#/c/637217/ which I abandoned because I just didn't have the time or will to push on it any further. If nothing else the bugs linked to those patches might be helpful with workarounds that CERN did when they were doing their baremetal flavor migration to custom resource classes. There were definitely bumps along the way. -- Thanks, Matt From mriedemos at gmail.com Thu Nov 14 14:13:05 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 08:13:05 -0600 Subject: [sig] Forming a Large scale SIG In-Reply-To: References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> <47730534-09AF-4018-A2AE-670D4725E569@telfer.org> Message-ID: <473b01fd-217d-3739-c8a2-ab26944bbb6a@gmail.com> On 11/14/2019 1:10 AM, Arnaud MORIN wrote: > The current doc is designed to deploy a small pod, but when we are going > large, usually some of those params needs tuning. I'd like to identify > them and eventually tag them to help other being aware that they are > useful at large scale. For anything nova specific you could dump it into [1] with a comment. That's a bug tracking stuff like this that should eventually be documented in nova for large scale performance considerations. [1] https://bugs.launchpad.net/nova/+bug/1838819 -- Thanks, Matt From tpb at dyncloud.net Thu Nov 14 14:30:38 2019 From: tpb at dyncloud.net (Tom Barron) Date: Thu, 14 Nov 2019 09:30:38 -0500 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: <9abb3dc9-4b01-90f6-c020-80d4b50d6356@fried.cc> References: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> <9abb3dc9-4b01-90f6-c020-80d4b50d6356@fried.cc> Message-ID: <20191114143038.r3afg4ai6rq65qsr@barron.net> On 13/11/19 14:38 -0600, Eric Fried wrote: >Okay, are we going to have a document that maps exception classes to >these explanations and recovery actions? Which we then have to maintain >as the code changes? Or are they expected to look through code (without >a stack trace)? > >I'm not against the idea, just playing devil's advocate. Sylvain seems >to have a use case, so great. > >As an alternative, have we considered a mechanism whereby we could, in >appropriate code paths, provide some text that's expressly intended for >the end user to see? Maybe it's a new user_message field on >NovaException which, if present, gets percolated up to a new field >similar to the one you suggested. Would this be like the "user messages" provided by block [1] and file [2] storage components? [1] https://docs.openstack.org/cinder/latest/contributor/user_messages.html [2] https://docs.openstack.org/manila/latest/contributor/user_messages.html -- Tom >efried > >On 11/13/19 11:41 AM, Matt Riedemann wrote: >> On 11/13/2019 11:17 AM, Eric Fried wrote: >>> Unless it's likely to be something other than NoValidHost a significant >>> percentage of the time, IMO it... >> >> Well just taking resize, it could be one of many things: >> >> https://github.com/openstack/nova/blob/20.0.0/nova/conductor/manager.py#L366 >> - oops you tried resizing which would screw up your group affinity policy >> >> https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L4490 >> - (for an admin, cold migrate) oops you tried cold migrating a vcenter >> vm or you have allow_resize_to_same_host=True and the scheduler picks >> the same host (silly scheduler, see bug 1748697) >> >> https://github.com/openstack/nova/blob/20.0.0/nova/compute/claims.py#L113 - >> oops you lost a resource claims race, try again >> >> https://github.com/openstack/nova/blob/20.0.0/nova/scheduler/client/report.py#L1898 >> - oops you lost a race with allocation consumer generation conflicts, >> try again >> > From skaplons at redhat.com Thu Nov 14 14:35:18 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 14 Nov 2019 15:35:18 +0100 Subject: [neutron][ci] Jobs cleaning Message-ID: <20191114143518.cxbjuiismoj5v5af@skaplons-mac> Hi, As we discussed during the PTG, I'm now checking what multinode and singlenode jobs we are exactly running in Neutron CI and what jobs can be potentially removed maybe. Here is what I found +-------------------------------+-----------------------------------------------+-----------------------------------------------------------+ | Singlenode job | Multinode job | Comments | +-------------------------------+-----------------------------------------------+-----------------------------------------------------------+ | neutron-tempest-dvr | neutron-tempest-plugin-dvr-multinode-scenario | Singlenode job runs tempest tests, | | | (non-voting) | multinode job runs tests from neutron-tempest-plugin repo | | | | multinode job isn't stable currently | +-------------------------------+-----------------------------------------------+-----------------------------------------------------------+ | tempest-integrated-networking | tempest-multinode-full-py3 (non-voting) | Singlenode job runs tempest tests related to neutron/nova,| | | | Multinode job runs all tempest tests | | | | multinode job is stable enough to make it voting IMO | +-------------------------------+-----------------------------------------------+-----------------------------------------------------------+ | grenade-py3 | neutron-grenade-multinode | Both jobs runs the same tests and are voting already | +-------------------------------+-----------------------------------------------+-----------------------------------------------------------+ I also found that we have few jobs which we ver similar but the only difference is that one runs tempest tests and other runs tests from neutron tempest plugin. Such jobs are: neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid and neutron-tempest-iptables_hybrid neutron-tempest-plugin-scenario-linuxbridge and neutron-tempest-linuxbridge Do we need all those jobs? Maybe we can simply stay only with neutron-tempest-plugins jobs for those configurations? Or maybe we should "merge" them and run tests from both tempest and neutron-tempest-plugin in one job? -- Slawek Kaplonski Senior software engineer Red Hat From sriram.ec at gmail.com Thu Nov 14 14:36:06 2019 From: sriram.ec at gmail.com (Sriram) Date: Thu, 14 Nov 2019 20:06:06 +0530 Subject: [Neutron] VPNaaS using certs Message-ID: Hi, I would like to know if VPNaaS in openstack provides ipsec with cert based authentication mechanism using certs. Documentation says only psk based authentication is supported. Please advise. Regards, Sriram -------------- next part -------------- An HTML attachment was scrubbed... URL: From akekane at redhat.com Thu Nov 14 14:46:31 2019 From: akekane at redhat.com (Abhishek Kekane) Date: Thu, 14 Nov 2019 20:16:31 +0530 Subject: [stable][glance] Proposal to add Abhishek Kekane to glance-stable-maint In-Reply-To: <23ee160b-ce08-4418-cdc3-756659452ea2@gmail.com> References: <86e16a59-8d38-6951-e779-26d97f4d2eaa@gmail.com> <23ee160b-ce08-4418-cdc3-756659452ea2@gmail.com> Message-ID: Hi Matt, I agree, your concern is valid. I have been working in glance since Icehouse and aware about stable branch guidelines. Given the opportunity, I will try my best to justify my selection. Thanks & Best Regards, Abhishek Kekane On Thu, Nov 14, 2019 at 12:29 AM Matt Riedemann wrote: > On 11/12/2019 2:17 PM, Brian Rosmaita wrote: > > we are currently understaffed in glance-stable-maint. Plus, he's the > > current Glance PTL. > > glance-stable-maint is understaffed yes. I ran a reviewstats report on > glance stable branch reviews over the last 180 days: > > http://paste.openstack.org/show/786058/ > > Abhishek has only done 3 stable branch reviews in 6 months which is > pretty low but to be fair maybe there aren't that many open reviews on > stable branches for glance and the other existing glance-stable-maint > cores don't have a lot more reviews either, so maybe that's just par for > the course. > > As for being core on master or being PTL, as you probably know, that > doesn't really mean much when it comes to stable branch reviews, which > is more about the stable branch guidelines. Nova has a few stable branch > cores that aren't core on master because they adhere to the guidelines > and do a lot of stable branch reviews. > > Anyway, I'm OK trusting Abhishek here and adding him to the > glance-stable-maint team. Things are such these days that beggars can't > really be choosers. > > -- > > Thanks, > > Matt > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Nov 14 14:48:26 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 08:48:26 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: <20191114143038.r3afg4ai6rq65qsr@barron.net> References: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> <9abb3dc9-4b01-90f6-c020-80d4b50d6356@fried.cc> <20191114143038.r3afg4ai6rq65qsr@barron.net> Message-ID: On 11/14/2019 8:30 AM, Tom Barron wrote: > Would this be like the "user messages" provided by block [1] and file > [2] storage components? > > [1] https://docs.openstack.org/cinder/latest/contributor/user_messages.html > [2] https://docs.openstack.org/manila/latest/contributor/user_messages.html The instance actions API in nova is very similar. Rather than build a new "user messages" API in nova I'm just talking about providing more detail on the actual error that occurred per failed event per action, basically the same as the user would see in a fault message on the server when it's in ERROR status. Because right now the instance action and events either say "Success" or "Error" for the message/result which is not useful in the Error case. -- Thanks, Matt From openstack at fried.cc Thu Nov 14 14:56:48 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 14 Nov 2019 08:56:48 -0600 Subject: [nova] Today's meeting Message-ID: <6713ed8b-0f8f-46cb-96a9-a52f0ec2e4a6@fried.cc> Attendance at today's nova meeting was sparse, to say the least. Predictably, some forgot about DST [1], some had conflicts, some are jetlagged, some probably all three. Most hot topics are on ML threads anyway. I and others have updated the meeting agenda [2] with links to those threads. Please be sure to chime in on topics of interest. Thanks, efried [1] DST is stupid and should be abolished [2] https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting From mriedemos at gmail.com Thu Nov 14 15:35:46 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 09:35:46 -0600 Subject: [nova][ptg] Resource provider delete at service delete In-Reply-To: <1573373353.31166.0@est.tech> References: <1573373353.31166.0@est.tech> Message-ID: <628bf2aa-56bc-47fc-81f7-37445a7c1f86@gmail.com> On 11/10/2019 2:09 AM, Balázs Gibizer wrote: > * Check ongoing migration and reject the delete if migration with this > compute having the source node exists. Let operator confirm the > migrations To be clear, the suggestion here is call [1] from the API like around [2]? That's a behavior change but so was blocking the delete when the compute was hosting instances [3] and we added a release note for that. Anyway, that's a pretty simple change and not really something I thought about in earlier threads on this problem. Regarding evacuate migration records that should also work since the final states for an evacuate migration are done, failed or error for which [1] accounts. > * Cascade delete providers and allocations in placement. > * in case of evacuated instances this is the right thing to do OK this seems to confirm my TODO here [4]. > * in any other dangling allocation case nova has the final thrut so > nova > has the authority to delete them. So this would build on the first idea above about blocking the service delete if there are in-progress migrations involving the node (either incoming or outgoing) right? So if we get to the point of deleting the provider we know (1) there are no in-progress migrations and (2) there are no instances on the host (outside of evacuated instances which we can cleanup automatically per [4]). Given that, I'm not sure there is really anything else to do here. > * Document possible ways to reconcile Placement with Nova using > heal_allocations and eventually the audit command once it's merged. Done (merged yesterday) [5]. [1] https://github.com/openstack/nova/blob/20.0.0/nova/objects/migration.py#L240 [2] https://github.com/openstack/nova/blob/20.0.0/nova/api/openstack/compute/services.py#L254 [3] https://review.opendev.org/#/c/560674/ [4] https://review.opendev.org/#/c/678100/2/nova/scheduler/client/report.py at 2165 [5] https://docs.openstack.org/nova/latest/admin/troubleshooting/orphaned-allocations.html -- Thanks, Matt From openstack at nemebean.com Thu Nov 14 15:41:29 2019 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 14 Nov 2019 09:41:29 -0600 Subject: [oslo] Adoption of microversion-parse In-Reply-To: References: <7d239dce-77c5-0e23-e1cb-57785e241b07@openstack.org> <72c50e25-0a26-b3d8-5f3a-64c48570824d@nemebean.com> <30c14806-6c43-a2be-9612-0a54de6c4323@openstack.org> <16d88443-8d02-4c90-b3af-b0b143fb6348@www.fastmail.com> Message-ID: On 11/13/19 6:29 PM, Chris Dent wrote: > On Wed, 13 Nov 2019, Clark Boylan wrote: > >> On Wed, Nov 13, 2019, at 3:56 PM, Ben Nemec wrote: >>> >>> >>> On 10/21/19 9:14 AM, Thierry Carrez wrote: >>>> Thierry Carrez wrote: >>>>> [...] >>>>> I'll propose the project addition so you can all vote directly on >>>>> it :) >>>> >>>> https://review.opendev.org/#/c/689754/ >>>> >>> >>> This has merged, but I still don't have access to the core group for the >>> library. Is this the point where we need to get infra involved or are >>> there other steps needed to make this official first? >>> >>> >> >> Ideally the existing cores would simply add you as the method of >> checks and balances here. Any current member can manage the member >> list as well as a Gerrit admin. Once you've been added by the existing >> core group you'll be able to add any others (like oslo-core). > > I've added oslo-core. I've been somewhat out of touch, so forgot > about this step. Great, thanks! > > (Note, it appears that oslo-core is way out of date...) We've never really removed cores from Oslo. Maybe we should, but I've never run into a compelling reason to. From openstack at nemebean.com Thu Nov 14 15:43:40 2019 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 14 Nov 2019 09:43:40 -0600 Subject: [oslo] Adoption of microversion-parse In-Reply-To: References: <7d239dce-77c5-0e23-e1cb-57785e241b07@openstack.org> <72c50e25-0a26-b3d8-5f3a-64c48570824d@nemebean.com> Message-ID: On 10/21/19 9:08 AM, Eric Fried wrote: >> Makes sense. We probably want to have an independent core team for it in >> addition to oslo-core so we can add people like Chris to it. > > I volunteer to help maintain it, if you'll have me. Works for me. Any objections from the existing core team? From sean.mcginnis at gmx.com Thu Nov 14 15:45:31 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 14 Nov 2019 09:45:31 -0600 Subject: [all] requirements-check failures Message-ID: <20191114154531.GA10859@sm-workstation> Hey everyone, You may have noticed some odd failures with the requirements-check job on your patches lately. The requirements team is aware of this issue and are working to get it resolved ASAP. I believe things should be good again once https://review.opendev.org/694248 lands. So for the time being, please hold off on doing rechecks on these patches. This job should only be running for patches that touch any of the requirements files. As a workaround for now, if your patch can make a change without modifying requirements, that should bypass the need to run this job. Another alternative would be to add: Depends-on: https://review.opendev.org/694248 to your commit message, but hopefully that will not be necessary once this patch makes it through and the test is fixed. Sorry for the inconvenience this has caused. Sean From cdent+os at anticdent.org Thu Nov 14 15:48:31 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Thu, 14 Nov 2019 15:48:31 +0000 (GMT) Subject: [oslo] Adoption of microversion-parse In-Reply-To: References: <7d239dce-77c5-0e23-e1cb-57785e241b07@openstack.org> <72c50e25-0a26-b3d8-5f3a-64c48570824d@nemebean.com> Message-ID: On Thu, 14 Nov 2019, Ben Nemec wrote: > On 10/21/19 9:08 AM, Eric Fried wrote: >>> Makes sense. We probably want to have an independent core team for it in >>> addition to oslo-core so we can add people like Chris to it. >> >> I volunteer to help maintain it, if you'll have me. > > Works for me. Any objections from the existing core team? Works for me too. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From dangtrinhnt at gmail.com Thu Nov 14 15:56:17 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Fri, 15 Nov 2019 00:56:17 +0900 Subject: [all] requirements-check failures In-Reply-To: <20191114154531.GA10859@sm-workstation> References: <20191114154531.GA10859@sm-workstation> Message-ID: Thank Sean for the notification. On Fri, Nov 15, 2019 at 12:49 AM Sean McGinnis wrote: > Hey everyone, > > You may have noticed some odd failures with the requirements-check job on > your > patches lately. The requirements team is aware of this issue and are > working to > get it resolved ASAP. I believe things should be good again once > https://review.opendev.org/694248 lands. > > So for the time being, please hold off on doing rechecks on these patches. > > This job should only be running for patches that touch any of the > requirements > files. As a workaround for now, if your patch can make a change without > modifying requirements, that should bypass the need to run this job. > > Another alternative would be to add: > > Depends-on: https://review.opendev.org/694248 > > to your commit message, but hopefully that will not be necessary once this > patch makes it through and the test is fixed. > > Sorry for the inconvenience this has caused. > > Sean > > -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Nov 14 15:58:59 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 09:58:59 -0600 Subject: [stable][glance] Proposal to add Abhishek Kekane to glance-stable-maint In-Reply-To: References: <86e16a59-8d38-6951-e779-26d97f4d2eaa@gmail.com> <23ee160b-ce08-4418-cdc3-756659452ea2@gmail.com> Message-ID: <205717d9-7e51-8157-a944-58a9d4c4a64d@gmail.com> On 11/14/2019 8:46 AM, Abhishek Kekane wrote: > I have been working in glance since Icehouse and aware about stable > branch guidelines. Given the opportunity, I will try my best to justify > my selection. Sure, I added you to glance-stable-maint yesterday. Enjoy. -- Thanks, Matt From balazs.gibizer at est.tech Thu Nov 14 16:06:03 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Thu, 14 Nov 2019 16:06:03 +0000 Subject: [nova][ptg] Resource provider delete at service delete In-Reply-To: <628bf2aa-56bc-47fc-81f7-37445a7c1f86@gmail.com> References: <1573373353.31166.0@est.tech> <628bf2aa-56bc-47fc-81f7-37445a7c1f86@gmail.com> Message-ID: <1573747559.19107.0@est.tech> On Thu, Nov 14, 2019 at 09:35, Matt Riedemann wrote: > On 11/10/2019 2:09 AM, Balázs Gibizer wrote: >> * Check ongoing migration and reject the delete if migration with >> this >> compute having the source node exists. Let operator confirm the >> migrations > > To be clear, the suggestion here is call [1] from the API like around > [2]? That's a behavior change but so was blocking the delete when the > compute was hosting instances [3] and we added a release note for > that. Anyway, that's a pretty simple change and not really something > I thought about in earlier threads on this problem. Regarding > evacuate migration records that should also work since the final > states for an evacuate migration are done, failed or error for which > [1] accounts. Yeah, [1] called at [2] sounds good to me. Regarding evacuation records. If the evacuation succeeded, i.e. the migration is in 'done' state then we are OK. But if it is finished with 'error' or 'failed' state then we still have an instance on the host so we should not allow deleting the compute service. As far as I see get_count_by_hosts will cover this case. > >> * Cascade delete providers and allocations in placement. >> * in case of evacuated instances this is the right thing to do > > OK this seems to confirm my TODO here [4]. > >> * in any other dangling allocation case nova has the final thrut >> so >> nova >> has the authority to delete them. > > So this would build on the first idea above about blocking the > service delete if there are in-progress migrations involving the node > (either incoming or outgoing) right? So if we get to the point of > deleting the provider we know (1) there are no in-progress migrations > and (2) there are no instances on the host (outside of evacuated > instances which we can cleanup automatically per [4]). Given that, > I'm not sure there is really anything else to do here. In theory cannot be any other allocation on the compute RP tree if there is no instance on the host, no ongoing migrations involving the host. But still I guess we need to cascade the delete to make sure that orphaned allocations (which is a bug itself but we no that it happens) are cleaned up when the service is deleted. cheers, gibi > >> * Document possible ways to reconcile Placement with Nova using >> heal_allocations and eventually the audit command once it's >> merged. > > Done (merged yesterday) [5]. > > [1] > https://github.com/openstack/nova/blob/20.0.0/nova/objects/migration.py#L240 > [2] > https://github.com/openstack/nova/blob/20.0.0/nova/api/openstack/compute/services.py#L254 > [3] https://review.opendev.org/#/c/560674/ > [4] > https://review.opendev.org/#/c/678100/2/nova/scheduler/client/report.py at 2165 > [5] > https://docs.openstack.org/nova/latest/admin/troubleshooting/orphaned-allocations.html > > -- > > Thanks, > > Matt > From fsbiz at yahoo.com Thu Nov 14 16:09:01 2019 From: fsbiz at yahoo.com (fsbiz at yahoo.com) Date: Thu, 14 Nov 2019 16:09:01 +0000 (UTC) Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> <78766172.92122.1573718625984@mail.yahoo.com> Message-ID: <1952364384.238482.1573747741880@mail.yahoo.com> Hi Chris, Thanks for the response. >Your request, is asking for CUSTOM_RRR430 will a value of 2, but it >is only available as 1. Have a look at your server create request, >there's something, probably your flavor, which is unexpected. The requests coming in are "forced host" requests.  The PaaS layer maintains an inventory of actual bare-metal available nodes and a user has to explicitly selecta baremetal node.  The PaaS layer then makes a nova api call for an instance to be createdon that specific baremetal node.    >Placement and nova scheduler are working correctly with the data they >have, the problem is with how inventory is being reported or requested. >This could either be with how your ironic nodes are being reported, >or with flavors.As far as I can recall, we've started seeing this particular error only recently after we added another 200 nodes to our flat infrastructure.   Thanks,Fred. On Thursday, November 14, 2019, 01:18:40 AM PST, Chris Dent wrote: On Thu, 14 Nov 2019, fsbiz at yahoo.com wrote: > Ultimately, nova-conductor is reported "NoValidHost: No valid host was found. There are not enough hosts available"This has been traced to nova-placement-api "Allocation for CUSTOM_RRR430 on resource provider 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 violates min_unit, max_unit, or step_size. Requested: 2, min_unit: 1, max_unit: 1, step_size: 1" > Any pointers on what next steps I should be looking at ? Your request, is asking for CUSTOM_RRR430 will a value of 2, but it is only available as 1. Have a look at your server create request, there's something, probably your flavor, which is unexpected. Placement and nova scheduler are working correctly with the data they have, the problem is with how inventory is being reported or requested. This could either be with how your ironic nodes are being reported, or with flavors. > 2019-11-12 10:26:02.461 4161129 WARNING nova.objects.resource_provider [req-6d79841e-6abe-490e-b79b-8d88b04215af 1ee9f9bf77294e8e8bf50bb35c581689 acf8cd411e5e4751a61d1ed54e8e874d - default default] Allocation for CUSTOM_Z370_A on resource provider 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 violates min_unit, max_unit, or step_size. Requested: 2, min_unit: 1, max_unit: 1, step_size: 1 This is the same issue, but with a different class of inventory -- Chris Dent                      ٩◔̯◔۶          https://anticdent.org/ freenode: cdent -------------- next part -------------- An HTML attachment was scrubbed... URL: From hberaud at redhat.com Thu Nov 14 16:19:45 2019 From: hberaud at redhat.com (Herve Beraud) Date: Thu, 14 Nov 2019 17:19:45 +0100 Subject: [oslo] Adding Michael Johnson as Taskflow core In-Reply-To: References: <0558ec80-5894-7961-f2ac-3de502f90fe4@nemebean.com> Message-ID: Welcome Michael! Le jeu. 14 nov. 2019 à 02:24, Michael Johnson a écrit : > Thank you Ben, happy to help! > > Michael > > On Wed, Nov 13, 2019 at 8:18 AM Ben Nemec wrote: > > > > Hi, > > > > After discussion with the Oslo team, we (and he) have agreed to add > > Michael as a Taskflow core. He's done more work on the project than > > anyone else still active in Oslo and also works on a project that > > consumes it so he likely understands it better than anyone else at this > > point. > > > > Welcome Michael and thanks for your contributions! > > > > -Ben > > > > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From nate.johnston at redhat.com Thu Nov 14 16:28:07 2019 From: nate.johnston at redhat.com (Nate Johnston) Date: Thu, 14 Nov 2019 11:28:07 -0500 Subject: [neutron] Review priorities In-Reply-To: <20191114093849.nqpjoomcfq6s2dfw@skaplons-mac> References: <20191114093849.nqpjoomcfq6s2dfw@skaplons-mac> Message-ID: <20191114162807.m7e3itidrkofw5xa@firewall> On Thu, Nov 14, 2019 at 10:38:49AM +0100, Slawek Kaplonski wrote: > Hi neutrinos, > > According to our discussion during Train retrospective in Shanghai, I added > "review-priority" label for neutron projects. > It can be set by every core team member to values like: > > -1 - Branch Freeze > +1 - Important Change > +2 - Gate Blocker Fix / Urgent Change > > You can use dashboard like [1] to track such high priority patches and review > them. > I will also add some note about this to our docs this week to make it clear and > visible for everyone. > > [1] https://tinyurl.com/vezk6n6 Thanks, Slawek! I'll add this to my daily routine. Nate > -- > Slawek Kaplonski > Senior software engineer > Red Hat > > From mriedemos at gmail.com Thu Nov 14 17:53:54 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 11:53:54 -0600 Subject: [nova][ptg] Resource provider delete at service delete In-Reply-To: <1573747559.19107.0@est.tech> References: <1573373353.31166.0@est.tech> <628bf2aa-56bc-47fc-81f7-37445a7c1f86@gmail.com> <1573747559.19107.0@est.tech> Message-ID: <49c66574-82c9-9343-47a2-81dde219380a@gmail.com> On 11/14/2019 10:06 AM, Balázs Gibizer wrote: > If the evacuation succeeded, i.e. the migration is in 'done' > state then we are OK. But if it is finished with 'error' or 'failed' > state then we still have an instance on the host so we should not allow > deleting the compute service. As far as I see get_count_by_hosts will > cover this case. If the evacuation succeeded then we need to detect it and cleanup the allocation from the evacuated-from-host because get_count_by_hosts won't catch that case (that's bug 1829479). That's the TODO in my patch. If the evacuation failed, I agree with you that get_count_by_hosts should detect and block the deletion of the service since the instance is still hosted there in the DB. -- Thanks, Matt From mriedemos at gmail.com Thu Nov 14 18:01:08 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 12:01:08 -0600 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: <1952364384.238482.1573747741880@mail.yahoo.com> References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> <78766172.92122.1573718625984@mail.yahoo.com> <1952364384.238482.1573747741880@mail.yahoo.com> Message-ID: <7d53de2f-46de-edcf-63dc-fe7ba8b61f83@gmail.com> On 11/14/2019 10:09 AM, fsbiz at yahoo.com wrote: > The requests coming in are "forced host" requests.  The PaaS layer > maintains > an inventory of actual bare-metal available nodes and a user has to > explicitly select > a baremetal node.  The PaaS layer then makes a nova api call for an > instance to be created > on that specific baremetal node. To be clear, by forced host you mean creating the server with an availability zone in the format ZONE:HOST:NODE or ZONE:NODE where NODE is the ironic node UUID, correct? https://docs.openstack.org/nova/latest/admin/availability-zones.html#using-availability-zones-to-select-hosts Yeah that's a problem because then the scheduler filters aren't run. A potential alternative is to create the server using a hypervisor_hostname query hint that will run through the JsonFilter: https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#jsonfilter Then at least you're not forcing the node and run the scheduler filters. I forget exactly how the scheduler code works in Queens with respect to forced hosts/nodes on server create but the scheduler still has to allocate resources in placement. It looks like we work around that in Queens by disabling the limit we place on getting allocation candidates from placement: https://review.opendev.org/#/c/584616/ My guess is your PaaS layer has bugs in it since it's allowing users to select hosts that are already consumed, or it's just racy. Anyway, this is why nova uses placement since Pike for atomic consumption of resources during scheduling. -- Thanks, Matt From miguel at mlavalle.com Thu Nov 14 18:23:03 2019 From: miguel at mlavalle.com (Miguel Lavalle) Date: Thu, 14 Nov 2019 12:23:03 -0600 Subject: [neutron] Review priorities In-Reply-To: <20191114162807.m7e3itidrkofw5xa@firewall> References: <20191114093849.nqpjoomcfq6s2dfw@skaplons-mac> <20191114162807.m7e3itidrkofw5xa@firewall> Message-ID: Yeah, looking good. I just bookmarked it so it becomes part of my daily routine Thanks On Thu, Nov 14, 2019 at 10:28 AM Nate Johnston wrote: > On Thu, Nov 14, 2019 at 10:38:49AM +0100, Slawek Kaplonski wrote: > > Hi neutrinos, > > > > According to our discussion during Train retrospective in Shanghai, I > added > > "review-priority" label for neutron projects. > > It can be set by every core team member to values like: > > > > -1 - Branch Freeze > > +1 - Important Change > > +2 - Gate Blocker Fix / Urgent Change > > > > You can use dashboard like [1] to track such high priority patches and > review > > them. > > I will also add some note about this to our docs this week to make it > clear and > > visible for everyone. > > > > [1] https://tinyurl.com/vezk6n6 > > Thanks, Slawek! I'll add this to my daily routine. > > Nate > > > -- > > Slawek Kaplonski > > Senior software engineer > > Red Hat > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel at mlavalle.com Thu Nov 14 18:26:54 2019 From: miguel at mlavalle.com (Miguel Lavalle) Date: Thu, 14 Nov 2019 12:26:54 -0600 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> <2ce84a47ac59bdd160a71b37eaf05f0eca9e1f85.camel@redhat.com> Message-ID: Hi Belmiro, The Neutron team is fully cognizant that we have operators large and small using Linuxbridge. No decision will be made without involving you Regards On Thu, Nov 14, 2019 at 3:59 AM Belmiro Moreira < moreira.belmiro.email.lists at gmail.com> wrote: > Hi, > Akihiro, thanks for you summary. > > We use the linuxbridge driver because its simplicity and the match with > the old nova-network schema (yes, are we still migrating). > > The functionality gap between ovs driver and linuxbridge is a good think > in my view. > It allows operators to chose the best solution considering their > deployment use case and scale. > > Slawek, Miguel please keep us in the discussions. > > Belmiro > CERN > > > On Wed, Nov 13, 2019 at 7:22 PM Sean Mooney wrote: > >> On Tue, 2019-11-12 at 14:53 +0100, Slawek Kaplonski wrote: >> > Stateless security groups >> > ========================= >> > >> > Old RFE [21] was approved for neutron-fwaas project but we all agreed >> that this >> > should be now implemented for security groups in core Neutron. >> > People from Nuage are interested in work on this in upstream. >> > We should probably also explore how easy/hard it will be to implement >> it in >> > networking-ovn backend. >> >> for what its worth we implemented this 4 years ago and it was breifly >> used in production trial deployment >> in a telco deployment but i dont think it ever went to full production as >> they went wtih sriov instead >> https://review.opendev.org/#/c/264131/ as part of this RFE >> https://bugs.launchpad.net/neutron/+bug/1531205 which was >> closed as wont fix >> https://bugs.launchpad.net/neutron/+bug/1531205/comments/14 >> as it was view that this was not the correct long term direction for the >> community. >> this is the summit presentation for austin for anyone that does not >> rememebr this effort >> >> >> https://www.openstack.org/videos/summits/austin-2016/tired-of-iptables-based-security-groups-heres-how-to-gain-tremendous-speed-with-open-vswitch-instead >> >> im not sure how the new proposal differeres form our previous proposal >> for the same >> feautre but the main pushback we got was that the securtiy group api is >> assumed to be stateful >> and that is why this was rejected. form our mesurments at the time we >> expected the stateless approch >> to scale better then contrack driver so it woudl be nice to see a >> stateless approch avialable. >> i never got around to deleteing our implemenation form >> networking-ovs-dpdk >> >> https://opendev.org/x/networking-ovs-dpdk/src/branch/master/networking_ovs_dpdk/agent/ovs_dpdk_firewall.py >> but i has not been tested our updated really for the last 2 years but it >> could be used as a basis of this effort >> if nuage does not have a poc already. >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Thu Nov 14 18:52:52 2019 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 14 Nov 2019 12:52:52 -0600 Subject: [oslo] Shanghai Wrapup Message-ID: <22e64fec-f998-c2b0-aa66-f94a070727d2@nemebean.com> I wrote up a bunch of thoughts about Oslo stuff in Shanghai: http://blog.nemebean.com/content/oslo-shanghai Hopefully I covered everything (and accurately) but if I messed anything up I blame jet lag. :-P -Ben From openstack at fried.cc Thu Nov 14 19:16:32 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 14 Nov 2019 13:16:32 -0600 Subject: [nova][oot] Adding `context` to ComputeDriver.unrescue Message-ID: <6cf09602-9e08-27c8-6ea4-d5a7c9f07aa4@fried.cc> Though still very much WIP, emulated TPM [1] is looking like it will need to add the RequestContext to the ``unrescue`` ComputeDriver method [2]. This is a heads up to out-of-tree virt driver maintainers to keep an eye on this patch, as you will need to update your overrides accordingly once it merges. Thanks, efried [1] https://review.opendev.org/#/c/631363/ [2] https://review.opendev.org/#/c/631363/31/nova/virt/driver.py From whayutin at redhat.com Thu Nov 14 19:16:47 2019 From: whayutin at redhat.com (Wesley Hayutin) Date: Thu, 14 Nov 2019 12:16:47 -0700 Subject: [tripleo] Adding Alex Schultz as OVB core In-Reply-To: <7562aee5-1ea2-2d8f-ebb5-9fa02d9dc354@nemebean.com> References: <7562aee5-1ea2-2d8f-ebb5-9fa02d9dc354@nemebean.com> Message-ID: On Wed, Nov 13, 2019 at 9:23 AM Ben Nemec wrote: > Hi, > > After a discussion with Wes in Shanghai about how to make me less of a > SPOF for OVB, one of the outcomes was that we should try to grow the OVB > core team. Alex has been reviewing a lot of the patches to OVB lately > and obviously has a good handle on how all of this stuff fits together, > so I've added him to the OVB core team. > > Thanks and congratulations(?) Alex! :-) > > -Ben > > +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Thu Nov 14 19:20:07 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Thu, 14 Nov 2019 11:20:07 -0800 Subject: [nova][ironic] nova docs bug for ironic looking for an owner In-Reply-To: <453e2ccb-ef4f-0e5b-aa15-cacf0ca104e8@gmail.com> References: <453e2ccb-ef4f-0e5b-aa15-cacf0ca104e8@gmail.com> Message-ID: Hey Matt, I've gone ahead and added this to the ironic team's meeting agenda for next week. Thanks for bringing this up! -Julia On Wed, Nov 13, 2019 at 7:33 AM Matt Riedemann wrote: > > While discussing some tribal knowledge about how ironic is the black > sheep of nova compute drivers I realized that we (nova) have no docs > about the ironic driver like we do for other drivers, so we don't > mention anything about the weird cardinality rules around compute > service : node : instance and host vs nodename things, how to configure > the service for HA mode, how to configure baremetal flavors with custom > resource classes, how to partition for conductor groups, how to deal > with scaling issues, missing features (migrate), etc. I've opened a bug > in case someone wants to get started on some of that information: > > https://bugs.launchpad.net/nova/+bug/1852446 > > -- > > Thanks, > > Matt > From mriedemos at gmail.com Thu Nov 14 19:31:03 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 13:31:03 -0600 Subject: [nova][ptg] Resource provider delete at service delete In-Reply-To: <1573747559.19107.0@est.tech> References: <1573373353.31166.0@est.tech> <628bf2aa-56bc-47fc-81f7-37445a7c1f86@gmail.com> <1573747559.19107.0@est.tech> Message-ID: <5bf5e113-770d-5bb7-af62-c34e45c4f981@gmail.com> On 11/14/2019 10:06 AM, Balázs Gibizer wrote: > Yeah, [1] called at [2] sounds good to me. Done with functional recreate test patches underneath: https://review.opendev.org/#/c/694389/ -- Thanks, Matt From Albert.Braden at synopsys.com Thu Nov 14 21:44:11 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Thu, 14 Nov 2019 21:44:11 +0000 Subject: Order filters by cost Message-ID: I'm working on a list of filters ordered by cost. This is what I have so far. Does this look reasonably correct for Rocky? Cheap Filters: AllHostsFilter - does no filtering. It passes all the available hosts. AvailabilityZoneFilter - filters hosts by availability zone. AggregateInstanceExtraSpecsFilter - checks aggregate metadata set with aggregate_instance_extra_specs. All hosts are passed if no extra_specs are specified. AggregateCoreFilter - filters hosts by CPU core number with per-aggregate cpu_allocation_ratio setting. AggregateRamFilter - filters hosts by RAM with per-aggregate ram_allocation_ratio setting. AggregateDiskFilter - filters hosts by disk allocation with per-aggregate disk_allocation_ratio setting. AggregateNumInstancesFilter - filters hosts by number of instances with per-aggregate max_instances_per_host setting. AggregateIoOpsFilter - filters hosts by I/O operations with per-aggregate max_io_ops_per_host setting. AggregateMultiTenancyIsolation - isolate tenants in specific aggregates. AggregateTypeAffinityFilter - limits instance_type by aggregate. AggregateImagePropertiesIsolation - isolates hosts based on image properties and aggregate metadata. DifferentHostFilter - allows the instance on a different host from a set of instances. SameHostFilter - puts the instance on the same host as another instance in a set of instances. ComputeFilter - passes all hosts that are operational and enabled. NumInstancesFilter - filters compute nodes by number of running instances. IoOpsFilter - filters hosts by concurrent I/O operations. More Expensive Filters: ServerGroupAntiAffinityFilter - This filter implements anti-affinity for a server group. ServerGroupAffinityFilter - This filter works the same way as ServerGroupAntiAffinityFilter. The difference is that when you create the server group, you should specify a policy of 'affinity'. ImagePropertiesFilter - filters hosts based on properties defined on the instance's image. Doc on setting image properties is here: https://docs.openstack.org/glance/rocky/admin/useful-image-properties.html IsolatedHostsFilter - filter based on isolated_images, isolated_hosts and restrict_isolated_hosts_to_isolated_images flags. SimpleCIDRAffinityFilter - allows a new instance on a host within the same IP block. MetricsFilter - filters hosts based on metrics weight_setting. Most Expensive Filters: PciPassthroughFilter - Filter that schedules instances on a host if the host has devices to meet the device requests in the 'extra_specs' for the flavor. ComputeCapabilitiesFilter - checks that the capabilities provided by the host compute service satisfy any extra specifications associated with the instance type. NUMATopologyFilter - filters hosts based on the NUMA topology requested by the instance, if any. JsonFilter - allows simple JSON-based grammar for selecting hosts. Deprecated Filters: Please don't use any of these; they are obsolete in Rocky: RetryFilter - filters hosts that have already been attempted for scheduling. Obsolete since Queens. RamFilter - filters hosts by their RAM. Obsolete since Pike. CoreFilter - filters based on CPU core utilization. Obsolete since Pike. DiskFilter - filters hosts by their disk allocation. Obsolete since Pike. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Thu Nov 14 22:10:51 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 14 Nov 2019 16:10:51 -0600 Subject: [all] requirements-check failures In-Reply-To: <20191114154531.GA10859@sm-workstation> References: <20191114154531.GA10859@sm-workstation> Message-ID: <20191114221051.GA30152@sm-workstation> On Thu, Nov 14, 2019 at 09:45:35AM -0600, Sean McGinnis wrote: > Hey everyone, > > You may have noticed some odd failures with the requirements-check job on your > patches lately. The requirements team is aware of this issue and are working to > get it resolved ASAP. I believe things should be good again once > https://review.opendev.org/694248 lands. > This requirements job fix has landed, and I've seen at least one rechecked patch successfully pass already. Things should be all clear. From smooney at redhat.com Thu Nov 14 22:44:46 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 14 Nov 2019 22:44:46 +0000 Subject: Order filters by cost In-Reply-To: References: Message-ID: <96314f2dfb4198447b7cf7833ae08a0cbb2fa33c.camel@redhat.com> On Thu, 2019-11-14 at 21:44 +0000, Albert Braden wrote: > I'm working on a list of filters ordered by cost. This is what I have so far. Does this look reasonably correct for > Rocky? some comments in line but more or less yes. also the best thing you can do is disable filters you dont need. > > Cheap Filters: > > AllHostsFilter - does no filtering. It passes all the available hosts. > AvailabilityZoneFilter - filters hosts by availability zone. > AggregateInstanceExtraSpecsFilter - checks aggregate metadata set with aggregate_instance_extra_specs. All hosts are > passed if no extra_specs are specified. the AggregateInstanceExtraSpecsFilter is actually expensive in some cases as it has to get the aggreate metadata for each aggreate the hosts is a member of and then compare the flavor extra specs to the metadata specifid in those aggrates so this get expssive as the numaber of aggreates a host is a member of grows. the worst case sclaing for this in NxM where N is the number of hosts an M is the maxium number of aggates a host is a part of. in other words this scales in quadratic complexity. fortunately that is the worst case but its generally liniar. Thinking about it a little more the aggreate filters below also have the same upper bound but in generaly most hosts are in 1 aggreated so it remains liniar. > AggregateCoreFilter - filters hosts by CPU core number with per-aggregate cpu_allocation_ratio setting. > AggregateRamFilter - filters hosts by RAM with per-aggregate ram_allocation_ratio setting. > AggregateDiskFilter - filters hosts by disk allocation with per-aggregate disk_allocation_ratio setting. strictly speaking in rocky the 3 filter above are not deprecated yet but they are in train. the reason they werent deprecacated sooner was we forgot when we deprecated the non aggregate versions. so you shoudl ideally avoid useign those. > AggregateNumInstancesFilter - filters hosts by number of instances with per-aggregate max_instances_per_host setting. this is pretty cheap i hope to eventurally replace this with placment eventurally. > AggregateIoOpsFilter - filters hosts by I/O operations with per-aggregate max_io_ops_per_host setting. > AggregateMultiTenancyIsolation - isolate tenants in specific aggregates. > AggregateTypeAffinityFilter - limits instance_type by aggregate. > AggregateImagePropertiesIsolation - isolates hosts based on image properties and aggregate metadata. this is more or less the same as the AggregateInstanceExtraSpecsFilter in terms of cost > DifferentHostFilter - allows the instance on a different host from a set of instances. > SameHostFilter - puts the instance on the same host as another instance in a set of instances. > ComputeFilter - passes all hosts that are operational and enabled. > NumInstancesFilter - filters compute nodes by number of running instances. > IoOpsFilter - filters hosts by concurrent I/O operations. > > More Expensive Filters: > > ServerGroupAntiAffinityFilter - This filter implements anti-affinity for a server group. > ServerGroupAffinityFilter - This filter works the same way as ServerGroupAntiAffinityFilter. The difference is that > when you create the server group, you should specify a policy of 'affinity'. > ImagePropertiesFilter - filters hosts based on properties defined on the instance's image. Doc on setting image > properties is here: https://docs.openstack.org/glance/rocky/admin/useful-image-properties.html > IsolatedHostsFilter - filter based on isolated_images, isolated_hosts and restrict_isolated_hosts_to_isolated_images > flags. > SimpleCIDRAffinityFilter - allows a new instance on a host within the same IP block. > MetricsFilter - filters hosts based on metrics weight_setting. > > Most Expensive Filters: > my gut feeling is both the AggregateInstanceExtraSpecsFilter and AggregateImagePropertiesIsolation would be here for non simple cases but if you keep things pretty lininar it would be in the group above because it does are resonably number of comparisons and may be non liniar. > PciPassthroughFilter - Filter that schedules instances on a host if the host has devices to meet the device requests > in the 'extra_specs' for the flavor. > ComputeCapabilitiesFilter - checks that the capabilities provided by the host compute service satisfy any extra > specifications associated with the instance type. > NUMATopologyFilter - filters hosts based on the NUMA topology requested by the instance, if any. > JsonFilter - allows simple JSON-based grammar for selecting hosts. > > Deprecated Filters: > > Please don't use any of these; they are obsolete in Rocky: > > RetryFilter - filters hosts that have already been attempted for scheduling. Obsolete since Queens. > RamFilter - filters hosts by their RAM. Obsolete since Pike. > CoreFilter - filters based on CPU core utilization. Obsolete since Pike. > DiskFilter - filters hosts by their disk allocation. Obsolete since Pike. From mriedemos at gmail.com Fri Nov 15 00:45:20 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 18:45:20 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: References: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> <9abb3dc9-4b01-90f6-c020-80d4b50d6356@fried.cc> Message-ID: <62fd76e1-a3ae-7b62-ebba-824f667b3095@gmail.com> On 11/14/2019 7:58 AM, Matt Riedemann wrote: > If I get the time this week I'll WIP something together that does what > I'm thinking as a proof of concept Here is a simple PoC: https://review.opendev.org/#/q/topic:bp/action-event-fault-details The API change with a new microversion (sans API samples) is actually smaller than the object code change to store the fault message. Anyway, this gives an idea and it was pretty simple to write up. -- Thanks, Matt From missile0407 at gmail.com Fri Nov 15 03:26:47 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Fri, 15 Nov 2019 11:26:47 +0800 Subject: [kolla] Shutdown ordering of MariaDB containers? Message-ID: Hi everyone, I want to ask about the order of shutdown MariaDB (or mean controller) node. For previous steps we found is usually shutdown slaves first, then master [1]. But we found that the MariaDB still get container restarting issue even I followed the step after booting up the cluster. Below is that I did when shutdown/boot up controller. 1. Shutdown the slaves first, then master 2. Boot master first, then slaves. For looking which one is master, we usually looking for the haproxy log and find which mariadb node that the last session access the DB. Or looking for which mariadb container has "--wsrep-new-cluster" in BOOTSTRAP_ARGS. Does anyone has experience about this? Many thanks, Eddie. [1] https://bugs.launchpad.net/kolla-ansible/+bug/1712087 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangbailin at inspur.com Fri Nov 15 06:58:03 2019 From: zhangbailin at inspur.com (=?utf-8?B?QnJpbiBaaGFuZyjlvKDnmb7mnpcp?=) Date: Fri, 15 Nov 2019 06:58:03 +0000 Subject: [nova][ptg] Flavor explosion In-Reply-To: <1573720630.26082.5@est.tech> References: <8a1e435702fb7dfe572bd59d2d652320@sslemail.net> <1573720630.26082.5@est.tech> Message-ID: Hi all, The patch link is https://review.opendev.org/#/c/663563 Rename the bp name from "add-flavor-metadata-or-metadata-group" to "resources-metadata-of-instance", because it not only can compose the extra specs from the *flavor* (current status), and it can be compose the vcpu, ram and disk, I think call this is resource metadata is ok, if you have some suggestion please leave a comment. About the model design, there will be add two DB table in the nova api DB: a) Add "resources_metadata" to record the composable bits, as following fields: - id(int),create_at,updated_at,deleted_at, name, rules, description and deleted fields Saved format like this: { "cpu_pinning": { "hw:cpu_policy": "shared", "hw:cpu_thread_policy": "require" } } If there is one spec that you need, you can set it in the rules as {"key": value}, it means like this: { "mem_huge_page": { "hw:mem_page_size": "1GB" } } b) Add "resources_metadata_mapping" to record the composable bits used by which instance, as following fields: - created_at, updated_at,deleted_at, id(int), resources_md_id, instance_uuid and deleted fields. With b, we have another alternative way, it was wrote in the "Alternatives" in the SPEC, it means add a column to the ``instance_medata`` table, but, this way we should separate the rule in the "resources_metadata" to one by one to save. This way will change the existing data table structure, I am not sure if this will affect some of the features of the instance. (more details you can review this SPEC) We can get all the metadata used by an instance (instance_uuid) through the "resources_metadata_mapping" table easily. > Items: Re: [nova][ptg] Flavor explosion > > > > On Sun, Nov 10, 2019 at 16:09, Brin Zhang(张百林) > wrote: > > Hi all, > > Based on the discussion on the Train PTG, and reference to the > > records on the etherpad and ML, I was updated that SPEC, and I think > > there are some details need to be discussed, and I have listed some > > details, if there are any other things that I have not considered, or > > if some place that I thoughtless, please post a discussion. > > > > List some details as follows, and you can review that spec in > > https://review.opendev.org/#/c/663563. > > > > Listed details: > > - Don't change the model of the flavor in nova code and in the db. > > > > - No change for operators who choose not to request the flavor extra > > specs group. > > > > - Requested more than one flavor extra specs groups, if there are > > different values for the same spec will be raised a 409. > > > > - Flavor in request body of server create that has the same spec in > > the request ``flavor_extra_specs_group``, it will be raised a 409. > > > > - When resize an instance, you need to compare the > > ``flavor_extra_specs_group`` with the spec request spec, otherwise > > raise a 400. > > > > Thanks Brin for updating the spec, I did a review round on it and left comments. > > gibi > From hberaud at redhat.com Fri Nov 15 08:16:28 2019 From: hberaud at redhat.com (Herve Beraud) Date: Fri, 15 Nov 2019 09:16:28 +0100 Subject: [oslo] Shanghai Wrapup In-Reply-To: <22e64fec-f998-c2b0-aa66-f94a070727d2@nemebean.com> References: <22e64fec-f998-c2b0-aa66-f94a070727d2@nemebean.com> Message-ID: Thanks Ben Le jeu. 14 nov. 2019 à 19:55, Ben Nemec a écrit : > I wrote up a bunch of thoughts about Oslo stuff in Shanghai: > http://blog.nemebean.com/content/oslo-shanghai > > Hopefully I covered everything (and accurately) but if I messed anything > up I blame jet lag. :-P > > -Ben > > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From doka.ua at gmx.com Fri Nov 15 08:54:48 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Fri, 15 Nov 2019 10:54:48 +0200 Subject: [Neutron] OVS forwarding issues In-Reply-To: References: Message-ID: <39de8fd7-a57f-5b70-4f7a-2934bbe6b7cc@gmx.com> Hi colleagues, thanks for the pointing on this. Can anybody _assume_ whether this bug affects also ML2/OVN implementation of networking? I was looking into OVN sometimes ago, but due to lack of resources skipped this research, now I think it makes sense to return back to this question. Thank you. On 11.11.2019 19:38, James Denton wrote: > > Hi, > > This is a known issue with the openvswitch firewall[1]. > > > firewall_driver = openvswitch > > I recommend running iptables_hybrid until that is resolved. > > [1] https://bugs.launchpad.net/neutron/+bug/1732067 > > > James Denton > > Network Engineer > > Rackspace Private Cloud > > james.denton at rackspace.com > > *From: *Volodymyr Litovka > *Date: *Monday, November 11, 2019 at 12:10 PM > *To: *"openstack-discuss at lists.openstack.org" > > *Cc: *"doka.ua at gmx.com" > *Subject: *[Neutron] OVS forwarding issues > > *CAUTION:*This message originated externally, please use caution when > clicking on links or opening attachments! > > Dear colleagues, > > just faced an issue with Openvswitch, which looks strange for me. The > problem is that any particular VM receives a lot of packets, which are > unicasted: > - from other VMs which reside on the same host (let's name them "local > VMs") > - to other VMs which reside on other hosts (let's name them "remote VMs") > > Long output from "ovs-ofctl dump-flows br-int" which, as far as I can > narrow, ends there: > > # ovs-ofctl dump-flows br-int |grep " table=94," |egrep > "n_packets=[123456789]" >  cookie=0xaf6b1435fe826bdf, duration=2952350.695s, table=94, > n_packets=291494723, n_bytes=40582103074, idle_age=0, hard_age=65534, > priority=1 actions=NORMAL > > coming to normal processing (classic MAC learning). Looking into > br-int MAC-table (ovs-appctl fdb/show br-int) shows, that there are > really no MAC addresses of remote VMs and br-int behaves in the right > way, flooding unknown unicast to all ports in this L2 segment. > > Of course, there is br-tun which connected over vxlan to all other > hosts and to br-int: > >     Bridge br-tun >         Controller "tcp:127.0.0.1:6633" >             is_connected: true >         fail_mode: secure >         Port "vxlan-0a960008" >             Interface "vxlan-0a960008" >                 type: vxlan >                 options: {df_default="true", in_key=flow, > local_ip="10.150.0.5", out_key=flow, remote_ip="10.150.0.8"} >         [ ... ] >         Port br-tun >             Interface br-tun >                 type: internal >         Port patch-int >             Interface patch-int >                 type: patch >                 options: {peer=patch-tun} > > but MAC table on br-tun is empty as well: > > # ovs-appctl fdb/show br-tun >  port  VLAN  MAC                Age > # > > Finally, packets get to destination, while being copied to all ports > on source host, which is serious security issue. > > I do not think so conceived by design, I rather think we missed > something in configuration. Can anybody point me where we're wrong and > help with this issue? > > We're using Openstack Rocky and OVS 2.10.0 under Ubuntu 16.04. Network > configuration is: > > @controller: > # cat /etc/neutron/plugins/ml2/ml2_conf.ini |egrep -v "^$|^#" > [DEFAULT] > verbose = true > [ml2] > type_drivers = flat,vxlan > tenant_network_types = vxlan > mechanism_drivers = l2population,openvswitch > extension_drivers = port_security,qos,dns_domain_ports > [ml2_type_flat] > flat_networks = provider > [ml2_type_geneve] > [ml2_type_gre] > [ml2_type_vlan] > [ml2_type_vxlan] > vni_ranges = 400:400000 > [securitygroup] > firewall_driver = openvswitch > enable_security_group = true > enable_ipset = true > > @agent: > # cat /etc/neutron/plugins/ml2/openvswitch_agent.ini |egrep -v "^$|^#" > [DEFAULT] > verbose = true > [agent] > tunnel_types = vxlan > l2_population = true > arp_responder = true > extensions = qos > [ovs] > local_ip = 10.150.0.5 > bridge_mappings = provider:br-ex > [securitygroup] > firewall_driver = openvswitch > enable_security_group = true > enable_ipset = true > [xenapi] > > Thank you. > > > -- > Volodymyr Litovka >   "Vision without Execution is Hallucination." -- Thomas Edison -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Fri Nov 15 08:56:05 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 15 Nov 2019 09:56:05 +0100 Subject: [neutron] Removing neutron-interconnection out of stadium Message-ID: <20191115085605.zj35uembs2gaql4v@skaplons-mac> Hi neutrinos, As we discussed during the Shanghai PTG I just proposed changes to move neutron-interconnection project out of stadium to "x/" namespace. So it will not be official neutron project anymore after those changes. Patches for that are in [1] and [2] I also proposed to remove neutron-interconnection api-ref from neutron-lib. Patch is here [3]. Please review it :) [1] https://review.opendev.org/#/c/694478/ [2] https://review.opendev.org/#/c/694480/ [3] https://review.opendev.org/#/c/694466/ -- Slawek Kaplonski Senior software engineer Red Hat From merlin.blom at bertelsmann.de Fri Nov 15 09:13:20 2019 From: merlin.blom at bertelsmann.de (Blom, Merlin, NMU-OI) Date: Fri, 15 Nov 2019 09:13:20 +0000 Subject: [RabbitMQ][cinder] Listen to messages Message-ID: Hey there, it seems to me as if ask.openstack.org is down, so I ask my question here: I'd like to listen to oslo messages from cinder as I do for neutron and octavia to know what is going on. For me the following code worked for neutron: EXCHANGE_NAME = os.getenv('EXCHANGE_NAME', 'neutron') ROUTING_KEY = os.getenv('ROUTING_KEY', 'notifications.info') QUEUE_NAME = os.getenv('QUEUE_NAME', 'messaging_queue') BROKER_URI = os.getenv('BROKER_URI', 'UNDEFINED') BROKER_PASSWORD = os.getenv('BROKER_PASSWORD', '') class Messages(ConsumerMixin): def __init__(self, connection): self.connection = connection return def get_consumers(self, consumer, channel): exchange = Exchange(EXCHANGE_NAME, type="topic", durable=False) queue = Queue(QUEUE_NAME, exchange, routing_key=ROUTING_KEY, durable=False, auto_delete=True, no_ack=True) return [consumer(queues=[queue], callbacks=[self.on_message])] def on_message(self, body, message): try: print(message) except Exception as e: log.info(repr(e)) if __name__ == "__main__": log.info("Connecting to broker {}".format(BROKER_URI)) with BrokerConnection(hostname=BROKER_URI, userid='messaging', password=BROKER_PASSWORD, virtual_host='/'+EXCHANGE_NAME, heartbeat=4, failover_strategy='round-robin') as connection: Messaging(connection).run() BrokerConnection.connection.close() But on the cinder vhost (/cinder) I can't find an exchange that the code is working on. (cinder, cinder-backup, .) I tried using the rabbitmq tracer: https://www.rabbitmq.com/firehose.html And got all the cinder messages but I don't want to use it in production because of performance issues. Does anyone have an idea how to find the correct exchange for the notification info queue in cinder? Cheers, Merlin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5195 bytes Desc: not available URL: From hberaud at redhat.com Fri Nov 15 10:13:42 2019 From: hberaud at redhat.com (Herve Beraud) Date: Fri, 15 Nov 2019 11:13:42 +0100 Subject: [RabbitMQ][cinder] Listen to messages In-Reply-To: References: Message-ID: Le ven. 15 nov. 2019 à 10:17, Blom, Merlin, NMU-OI < merlin.blom at bertelsmann.de> a écrit : > Hey there, > > it seems to me as if ask.openstack.org is down, so I ask my question here: > > > > I’d like to listen to oslo messages from cinder as I do for neutron and > octavia to know what is going on. > > For me the following code worked for neutron: > > > > EXCHANGE_NAME = os.getenv('EXCHANGE_NAME', 'neutron') > > ROUTING_KEY = os.getenv('ROUTING_KEY', 'notifications.info') > > QUEUE_NAME = os.getenv('QUEUE_NAME', 'messaging_queue') > > BROKER_URI = os.getenv('BROKER_URI', 'UNDEFINED') > > BROKER_PASSWORD = os.getenv('BROKER_PASSWORD', '') > > > > class Messages(ConsumerMixin): > > def __init__(self, connection): > > self.connection = connection > > return > > > > def get_consumers(self, consumer, channel): > > exchange = Exchange(EXCHANGE_NAME, type="topic", durable=False) > > queue = Queue(QUEUE_NAME, exchange, routing_key=ROUTING_KEY, > durable=False, auto_delete=True, no_ack=True) > > return [consumer(queues=[queue], callbacks=[self.on_message])] > > > > def on_message(self, body, message): > > try: > > print(message) > > except Exception as e: > > log.info(repr(e)) > > > > if __name__ == "__main__": > > log.info("Connecting to broker {}".format(BROKER_URI)) > > with BrokerConnection(hostname=BROKER_URI, userid='messaging', > password=BROKER_PASSWORD, > > virtual_host='/'+EXCHANGE_NAME, > > heartbeat=4, failover_strategy='round-robin') as > connection: > > Messaging(connection).run() > > BrokerConnection.connection.close() > > > > But on the cinder vhost (/cinder) > Are you sure cinder use a dedicated vhost? I'm notconviced, if I'm right they all use the default vhost '/'. I can’t find an exchange that the code is working on. (cinder, > cinder-backup, …) > > I tried using the rabbitmq tracer: https://www.rabbitmq.com/firehose.html > > And got all the cinder messages but I don’t want to use it in production > because of performance issues. > > > > Does anyone have an idea how to find the correct exchange for the > notification info queue in cinder? > > > > Cheers, > > Merlin > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Fri Nov 15 12:07:43 2019 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 15 Nov 2019 12:07:43 +0000 Subject: [kolla] Shutdown ordering of MariaDB containers? In-Reply-To: References: Message-ID: On Fri, 15 Nov 2019 at 03:28, Eddie Yen wrote: > > Hi everyone, > I want to ask about the order of shutdown MariaDB (or mean controller) node. > > For previous steps we found is usually shutdown slaves first, then master [1]. > But we found that the MariaDB still get container restarting issue even I followed the step after booting up the cluster. > > Below is that I did when shutdown/boot up controller. > 1. Shutdown the slaves first, then master > 2. Boot master first, then slaves. > > For looking which one is master, we usually looking for the haproxy log and find which mariadb node that the last session access the DB. > Or looking for which mariadb container has "--wsrep-new-cluster" in BOOTSTRAP_ARGS. > > Does anyone has experience about this? Hi Eddie, You can use the kolla-ansible mariadb_recovery command to bootstrap a cluster where all nodes have gone down. Mark > > Many thanks, > Eddie. > > [1] https://bugs.launchpad.net/kolla-ansible/+bug/1712087 From aj at suse.com Fri Nov 15 12:28:17 2019 From: aj at suse.com (Andreas Jaeger) Date: Fri, 15 Nov 2019 13:28:17 +0100 Subject: [neutron] Removing neutron-interconnection out of stadium In-Reply-To: <20191115085605.zj35uembs2gaql4v@skaplons-mac> References: <20191115085605.zj35uembs2gaql4v@skaplons-mac> Message-ID: On 15/11/2019 09.56, Slawek Kaplonski wrote: > Hi neutrinos, > > As we discussed during the Shanghai PTG I just proposed changes to move > neutron-interconnection project out of stadium to "x/" namespace. > So it will not be official neutron project anymore after those changes. > Patches for that are in [1] and [2] > > I also proposed to remove neutron-interconnection api-ref from neutron-lib. > Patch is here [3]. Please review it :) > > [1] https://review.opendev.org/#/c/694478/ > [2] https://review.opendev.org/#/c/694480/ > [3] https://review.opendev.org/#/c/694466/ > Looking at https://review.opendev.org/#/q/project:openstack/neutron-interconnection I suggest to retire only with those few changes to the repo - and nothing in the last few months. Or is there anybody committing to continue the work? We can also retire now - and create again in the "x/" namespace if interest suddenly arises. But let's not move repos that are de-facto dead around, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From missile0407 at gmail.com Fri Nov 15 13:19:40 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Fri, 15 Nov 2019 21:19:40 +0800 Subject: [kolla] Shutdown ordering of MariaDB containers? In-Reply-To: References: Message-ID: Yes, we're doing this when all MariaDB containers are down. But we still curious about this problem. Ordering to shutdown then boot up the MariaDB cluster still can caused this issue. My initial guess is that the docker is startup earlier than network, caused they can't connect each other. But it still a guess because the connection usually back if restart container manually. At least this is what we solve if the service can't connect to DB or AMQP but both them are fine. Perhaps I may try this. For now, it seems like using mariadb_recovery is the only way to let MariaDB back online if reboot the whole cluster right? Mark Goddard 於 2019年11月15日 週五 下午8:07寫道: > On Fri, 15 Nov 2019 at 03:28, Eddie Yen wrote: > > > > Hi everyone, > > I want to ask about the order of shutdown MariaDB (or mean controller) > node. > > > > For previous steps we found is usually shutdown slaves first, then > master [1]. > > But we found that the MariaDB still get container restarting issue even > I followed the step after booting up the cluster. > > > > Below is that I did when shutdown/boot up controller. > > 1. Shutdown the slaves first, then master > > 2. Boot master first, then slaves. > > > > For looking which one is master, we usually looking for the haproxy log > and find which mariadb node that the last session access the DB. > > Or looking for which mariadb container has "--wsrep-new-cluster" in > BOOTSTRAP_ARGS. > > > > Does anyone has experience about this? > > Hi Eddie, > You can use the kolla-ansible mariadb_recovery command to bootstrap a > cluster where all nodes have gone down. > Mark > > > > Many thanks, > > Eddie. > > > > [1] https://bugs.launchpad.net/kolla-ansible/+bug/1712087 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.denton at rackspace.com Fri Nov 15 14:03:42 2019 From: james.denton at rackspace.com (James Denton) Date: Fri, 15 Nov 2019 14:03:42 +0000 Subject: [Neutron] OVS forwarding issues In-Reply-To: <39de8fd7-a57f-5b70-4f7a-2934bbe6b7cc@gmx.com> References: <39de8fd7-a57f-5b70-4f7a-2934bbe6b7cc@gmx.com> Message-ID: <48F9BC4F-02B7-4E7A-AF06-EAED4053B63A@rackspace.com> I seem to recall checking this when the issue was first discovered, and OVN did not appear to implement the same flow rules that resulted in the issue. I don’t have a live environment to test with, though. James Denton Network Engineer Rackspace Private Cloud james.denton at rackspace.com From: Volodymyr Litovka Date: Friday, November 15, 2019 at 3:55 AM To: James Denton , "openstack-discuss at lists.openstack.org" , Slawek Kaplonski Cc: "doka.ua at gmx.com" Subject: Re: [Neutron] OVS forwarding issues CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Hi colleagues, thanks for the pointing on this. Can anybody _assume_ whether this bug affects also ML2/OVN implementation of networking? I was looking into OVN sometimes ago, but due to lack of resources skipped this research, now I think it makes sense to return back to this question. Thank you. On 11.11.2019 19:38, James Denton wrote: Hi, This is a known issue with the openvswitch firewall[1]. > firewall_driver = openvswitch I recommend running iptables_hybrid until that is resolved. [1] https://bugs.launchpad.net/neutron/+bug/1732067 James Denton Network Engineer Rackspace Private Cloud james.denton at rackspace.com From: Volodymyr Litovka Date: Monday, November 11, 2019 at 12:10 PM To: "openstack-discuss at lists.openstack.org" Cc: "doka.ua at gmx.com" Subject: [Neutron] OVS forwarding issues CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Dear colleagues, just faced an issue with Openvswitch, which looks strange for me. The problem is that any particular VM receives a lot of packets, which are unicasted: - from other VMs which reside on the same host (let's name them "local VMs") - to other VMs which reside on other hosts (let's name them "remote VMs") Long output from "ovs-ofctl dump-flows br-int" which, as far as I can narrow, ends there: # ovs-ofctl dump-flows br-int |grep " table=94," |egrep "n_packets=[123456789]" cookie=0xaf6b1435fe826bdf, duration=2952350.695s, table=94, n_packets=291494723, n_bytes=40582103074, idle_age=0, hard_age=65534, priority=1 actions=NORMAL coming to normal processing (classic MAC learning). Looking into br-int MAC-table (ovs-appctl fdb/show br-int) shows, that there are really no MAC addresses of remote VMs and br-int behaves in the right way, flooding unknown unicast to all ports in this L2 segment. Of course, there is br-tun which connected over vxlan to all other hosts and to br-int: Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "vxlan-0a960008" Interface "vxlan-0a960008" type: vxlan options: {df_default="true", in_key=flow, local_ip="10.150.0.5", out_key=flow, remote_ip="10.150.0.8"} [ ... ] Port br-tun Interface br-tun type: internal Port patch-int Interface patch-int type: patch options: {peer=patch-tun} but MAC table on br-tun is empty as well: # ovs-appctl fdb/show br-tun port VLAN MAC Age # Finally, packets get to destination, while being copied to all ports on source host, which is serious security issue. I do not think so conceived by design, I rather think we missed something in configuration. Can anybody point me where we're wrong and help with this issue? We're using Openstack Rocky and OVS 2.10.0 under Ubuntu 16.04. Network configuration is: @controller: # cat /etc/neutron/plugins/ml2/ml2_conf.ini |egrep -v "^$|^#" [DEFAULT] verbose = true [ml2] type_drivers = flat,vxlan tenant_network_types = vxlan mechanism_drivers = l2population,openvswitch extension_drivers = port_security,qos,dns_domain_ports [ml2_type_flat] flat_networks = provider [ml2_type_geneve] [ml2_type_gre] [ml2_type_vlan] [ml2_type_vxlan] vni_ranges = 400:400000 [securitygroup] firewall_driver = openvswitch enable_security_group = true enable_ipset = true @agent: # cat /etc/neutron/plugins/ml2/openvswitch_agent.ini |egrep -v "^$|^#" [DEFAULT] verbose = true [agent] tunnel_types = vxlan l2_population = true arp_responder = true extensions = qos [ovs] local_ip = 10.150.0.5 bridge_mappings = provider:br-ex [securitygroup] firewall_driver = openvswitch enable_security_group = true enable_ipset = true [xenapi] Thank you. -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From deepa.kr at fingent.com Thu Nov 14 05:53:13 2019 From: deepa.kr at fingent.com (Deepa) Date: Thu, 14 Nov 2019 11:23:13 +0530 Subject: Freezer Project Update Message-ID: <003201d59aaf$cf415fa0$6dc41ee0$@fingent.com> Hello Team Good Day I am Deepa from Fingent Global Solutions and we are a big fan of Openstack and we do have 4 + openstack setup (including production) We have deployed Openstack using juju and Maas .So when we check for backup feasibility other than cinder-backup we were able to see Freezer Project. But couldn't find any charms for it in juju charms. Also there isn't a clear documentation on how to install freezer . https://docs.openstack.org/releasenotes/freezer/train.html. No proper release notes in the latest version as well. Can you please tell me whether this project is in developing state? Whether charms will be added to juju in future. Can you also share a proper documentation on how to install Freezer in cluster setup. Thanks for your help. Regards, Deepa K R -------------- next part -------------- An HTML attachment was scrubbed... URL: From rabia.shaheen at xflowresearch.com Fri Nov 15 04:51:18 2019 From: rabia.shaheen at xflowresearch.com (rabia.shaheen at xflowresearch.com) Date: Fri, 15 Nov 2019 09:51:18 +0500 Subject: Trove Image issue Message-ID: <008d01d59b70$542067f0$fc6137d0$@xflowresearch.com> Hi Team, I have deployed kolla-ansible(Stein) with Trove enable and using Trove prebuild images to build the Database VM but VM is constantly stuck in build state. I am not sure how to use prebuild image key which is in https://opendev.org/openstack/trove/src/branch/master/integration/scripts/fi les/keys folder. Can you please guide me regarding the key usage for prebuild images (http://tarballs.openstack.org/trove/images/) To build my own trove image on kolla-ansible, is there any specific guide available for it? Warm Regards, Rabia Shaheen Lead Engineer, xFlow Research Inc. +923075462720 (GMT+5) rabia.shaheen at xflowresearch.com www.xflowresearch.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Fri Nov 15 15:29:03 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 15 Nov 2019 09:29:03 -0600 Subject: [all] Nominations for the "V" release name Message-ID: <20191115152903.GA29931@sm-workstation> Hey everyone, There is ongoing discussion about changing our release naming process, but for the time being we are going to stick with what we have been doing. That means it's time to start thinking about the "V" release name! The next developer event will take place in Vancouver, BC. The geographic location for this release will be things starting with "V" in the British Columbia province. The nomination period is now open. Please add suitable names to https://wiki.openstack.org/wiki/Release_Naming/V_Proposals. We will accept nominations until December 6, 2019 23:59:59 UTC. A recap of our current naming rules: * Each release name must start with the letter of the ISO basic Latin alphabet following the initial letter of the previous release, starting with the initial release of "Austin". After "Z", the next name should start with "A" again. * The name must be composed only of the 26 characters of the ISO basic Latin alphabet. Names which can be transliterated into this character set are also acceptable. * The name must refer to the physical or human geography of the region encompassing the location of the OpenStack design summit for the corresponding release. The exact boundaries of the geographic region under consideration must be declared before the opening of nominations, as part of the initiation of the selection process. * The name must be a single word with a maximum of 10 characters. Words that describe the feature should not be included, so "Foo City" or "Foo Peak" would both be eligible as "Foo". Names which do not meet these criteria but otherwise sound really cool should be added to a separate section of the wiki page and the TC may make an exception for one or more of them to be considered in the Condorcet poll. The naming official is responsible for presenting the list of exceptional names for consideration to the TC before the poll opens. Additional information about the release naming process can be found here: https://governance.openstack.org/tc/reference/release-naming.html Looking forward to having a name for our next release! Sean From fungi at yuggoth.org Fri Nov 15 15:55:57 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 15 Nov 2019 15:55:57 +0000 Subject: [all] Nominations for the "V" release name In-Reply-To: <20191115152903.GA29931@sm-workstation> References: <20191115152903.GA29931@sm-workstation> Message-ID: <20191115155557.kt6yazzkzkr3mztl@yuggoth.org> On 2019-11-15 09:29:03 -0600 (-0600), Sean McGinnis wrote: [...] > The next developer event will take place in Vancouver, BC. [...] > The name must refer to the physical or human geography of the > region encompassing the location of the OpenStack design summit > for the corresponding release. [...] It's worth noting we haven't had an OpenStack Design Summit for years now (not since the PTG/Forum split), and the last few have been Open Infrastructure Summits. But the upcoming event in Vancouver isn't one of those either (event naming yet to be determined), so presumably the event name in this rule is being interpreted loosely. (With love, your friendly neighborhood pedant.) -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From skaplons at redhat.com Fri Nov 15 15:57:05 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 15 Nov 2019 16:57:05 +0100 Subject: [neutron] Removing neutron-interconnection out of stadium In-Reply-To: References: <20191115085605.zj35uembs2gaql4v@skaplons-mac> Message-ID: <20191115155705.rdyesxnpgvgbm5fw@skaplons-mac> Hi, Yes, after some discussions on IRC I think that it will be better to simple retire project as it has no any activity since it was created. Finally I have 5 patches to retire this project. See [1] for them. [1] https://review.opendev.org/#/q/topic:neutron-interconnection-retire+(status:open+OR+status:merged) On Fri, Nov 15, 2019 at 01:28:17PM +0100, Andreas Jaeger wrote: > On 15/11/2019 09.56, Slawek Kaplonski wrote: > > Hi neutrinos, > > > > As we discussed during the Shanghai PTG I just proposed changes to move > > neutron-interconnection project out of stadium to "x/" namespace. > > So it will not be official neutron project anymore after those changes. > > Patches for that are in [1] and [2] > > > > I also proposed to remove neutron-interconnection api-ref from neutron-lib. > > Patch is here [3]. Please review it :) > > > > [1] https://review.opendev.org/#/c/694478/ > > [2] https://review.opendev.org/#/c/694480/ > > [3] https://review.opendev.org/#/c/694466/ > > > > Looking at > https://review.opendev.org/#/q/project:openstack/neutron-interconnection > > I suggest to retire only with those few changes to the repo - and > nothing in the last few months. > > Or is there anybody committing to continue the work? > > We can also retire now - and create again in the "x/" namespace if > interest suddenly arises. But let's not move repos that are de-facto > dead around, > > Andreas > -- > Andreas Jaeger aj at suse.com Twitter: jaegerandi > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 > -- Slawek Kaplonski Senior software engineer Red Hat From sean.mcginnis at gmx.com Fri Nov 15 16:34:20 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 15 Nov 2019 10:34:20 -0600 Subject: [all] Nominations for the "V" release name In-Reply-To: <20191115155557.kt6yazzkzkr3mztl@yuggoth.org> References: <20191115152903.GA29931@sm-workstation> <20191115155557.kt6yazzkzkr3mztl@yuggoth.org> Message-ID: <20191115163420.GA1678@sm-workstation> > [...] > > The next developer event will take place in Vancouver, BC. > [...] > > The name must refer to the physical or human geography of the > > region encompassing the location of the OpenStack design summit > > for the corresponding release. > [...] > > It's worth noting we haven't had an OpenStack Design Summit for > years now (not since the PTG/Forum split), and the last few have > been Open Infrastructure Summits. But the upcoming event in > Vancouver isn't one of those either (event naming yet to be > determined), so presumably the event name in this rule is being > interpreted loosely. > > (With love, your friendly neighborhood pedant.) Oops! I had actually updated that on the wiki page, but then forgot to do so in the announcement. :) From openstack at nemebean.com Fri Nov 15 17:00:39 2019 From: openstack at nemebean.com (Ben Nemec) Date: Fri, 15 Nov 2019 11:00:39 -0600 Subject: [oslo] Virtual PTG Planning In-Reply-To: References: Message-ID: <990d6f1f-ac09-a2cb-2512-360195340202@nemebean.com> Okay, so far just three of us have responded to the poll. Since this was sort of short notice for next week and so far everyone seems to be available on any of the days, I'm going to propose that we do this on Nov. 25. As an added bonus that means it can double as a virtual birthday party for me. :-) If that ends up not working for anyone we can revisit this, but otherwise let's plan on doing it then. Thanks. -Ben On 11/13/19 12:08 PM, Ben Nemec wrote: > Hi Osloers, > > Given that a lot of the team was not in Shanghai and we had a few topics > proposed that didn't make sense to discuss as a result, I would like to > try doing a virtual PTG the way a number of the other teams are. I've > added a section to the PTG etherpad[0] with some proposed details, but > in general I'm thinking we meet on Jitsi (it's open source) around the > time of the Oslo meeting. It's possible we might be able to get through > everything in the regularly scheduled hour, but if possible I'd like to > keep the following hour (1600-1700 UTC) open as well. If everyone's > available we could do it next week (the 18th) or possibly the following > week (the 25th), although that runs into Thanksgiving week in the US so > people might be out. I've created a Doodle poll[1] with selections for > the next three weeks so please respond there if you can make it any of > those days. If none of them work well we can discuss alternative options. > > Thanks. > > -Ben > > 0: https://etherpad.openstack.org/p/oslo-shanghai-topics > 1: https://doodle.com/poll/8bqiv865ucyt8499 > From mnaser at vexxhost.com Fri Nov 15 17:45:37 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 15 Nov 2019 12:45:37 -0500 Subject: [sig] Forming a Large scale SIG In-Reply-To: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> Message-ID: On Wed, Nov 13, 2019 at 6:22 AM Thierry Carrez wrote: > > Hi everyone, > > In Shanghai we held a forum session to gauge interest in a new SIG to > specifically address cluster scaling issues. In the past we had several > groups ("Large deployments", "Performance", LCOO...) but those efforts > were arguably a bit too wide and those groups are now abandoned. > > My main goal here is to get large users directly involved in a domain > where their expertise can best translate into improvements in the > software. It's easy for such a group to go nowhere while trying to boil > the ocean. To maximize its chances of success and make it sustainable, > the group should have a narrow focus, and reasonable objectives. > > My personal idea for the group focus was to specifically address scaling > issues within a single cluster: basically identify and address issues > that prevent scaling a single cluster (or cell) past a number of nodes. > By sharing analysis and experience, the group could identify common pain > points that, once solved, would help raising that number. > > There was a lot of interest in that session[1], and it predictably > exploded in lots of different directions, including some that are > definitely past a single cluster (like making Neutron better support > cells). I think it's fine: my initial proposal was more of a strawman. > Active members of the group should really define what they collectively > want to work on. And the SIG name should be picked to match that. > > I'd like to help getting that group off the ground and to a place where > it can fly by itself, without needing external coordination. The first > step would be to identify interested members and discuss group scope and > objectives. Given the nature of the group (with interested members in > Japan, Europe, Australia and the US) it will be hard to come up with a > synchronous meeting time that will work for everyone, so let's try to > hold that discussion over email. > > So to kick this off: if you are interested in that group, please reply > to this email, introduce yourself and tell us what you would like the > group scope and objectives to be, and what you can contribute to the group. Count me in, I'll be watching from the sidelines and chiming in when I see things happen and come up. > Thanks! > > [1] https://etherpad.openstack.org/p/PVG-large-scale-SIG > > -- > Thierry Carrez (ttx) > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. https://vexxhost.com From timothy.gresham at intel.com Fri Nov 15 19:12:22 2019 From: timothy.gresham at intel.com (Gresham, Timothy) Date: Fri, 15 Nov 2019 19:12:22 +0000 Subject: Intel 3rd Party CI - Offline until Monday due to upgrades. Message-ID: <5A3D1F5D71F58E4A9E7DAA38716F9FB9B92119E1@FMSMSX112.amr.corp.intel.com> Infrastructure upgrades are occurring in the lab which hosts Intel's OpenStack 3rd party CI. These upgrades will require us to take our CI offline for the weekend. Jobs covering the following areas will be offline. * Persistent memory * Tap as a Service * NFV * PCI * SRIOV Jobs covering Cinder/RSD should not be impacted. Service is expected to be restored Monday afternoon Pacific time. We will send out another email once service has been restored. Tim Gresham Cloud Engineer - Intel Corporation Intel Architecture, Graphics, and Software -------------- next part -------------- An HTML attachment was scrubbed... URL: From laurentfdumont at gmail.com Fri Nov 15 19:15:06 2019 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Fri, 15 Nov 2019 14:15:06 -0500 Subject: [nova] Displayed state of VM when nova-compute is down/unresponsive. Message-ID: Hey everyone, We had a discussion with some colleagues at work. There was some confusion over the expected behavior of Openstack/Nova regarding the state of VMs on a compute that is down(or one where Nova is in a "bad" state and unable to update properly). Right now, it seems that the VMs will stay in the last state they we're seen. We we're wondering if there was a way to expose the fact that the underlying hypervisor is down? Something like a "Warning : no data from compute since xx:xx:xx" I did not see any documentation regarding a possible configuration option somewhere but a lot of posts with people with similar questions. I understand that the state of the VM shouldn't be changed based on the status of a compute - but exposing the fact that the state itself is not current might be good middle-ground. I do see a possible issue with the fact that the hypervisor itself is not known if the User/Project is not admin. Is anyone aware of anything similar in the past? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From kotobi at dkrz.de Fri Nov 15 19:42:06 2019 From: kotobi at dkrz.de (Amjad Kotobi) Date: Fri, 15 Nov 2019 20:42:06 +0100 Subject: Freezer Project Update In-Reply-To: <003201d59aaf$cf415fa0$6dc41ee0$@fingent.com> References: <003201d59aaf$cf415fa0$6dc41ee0$@fingent.com> Message-ID: <14B06106-F3AB-47DA-BD35-99B18383C147@dkrz.de> Hi, This project is pretty much in production state, from last summit it got active again from developer ends, we are using it for backup solution too. Documentation side isn’t that bright, very soon gonna get updated, anyhow you are able to install as standalone project in instance, I did it manually, didn’t use any provision tools. Let me know for specific part of deployment that is not clear. Amjad > On 14. Nov 2019, at 06:53, Deepa wrote: > > Hello Team > > Good Day > > I am Deepa from Fingent Global Solutions and we are a big fan of Openstack and we do have 4 + openstack setup (including production) > We have deployed Openstack using juju and Maas .So when we check for backup feasibility other than cinder-backup we were able to see > Freezer Project. But couldn’t find any charms for it in juju charms. Also there isn’t a clear documentation on how to install freezer . > https://docs.openstack.org/releasenotes/freezer/train.html . No proper release notes in the latest version as well. > Can you please tell me whether this project is in developing state? Whether charms will be added to juju in future. > Can you also share a proper documentation on how to install Freezer in cluster setup. > > Thanks for your help. > > Regards, > Deepa K R -------------- next part -------------- An HTML attachment was scrubbed... URL: From dms at danplanet.com Fri Nov 15 20:02:15 2019 From: dms at danplanet.com (Dan Smith) Date: Fri, 15 Nov 2019 12:02:15 -0800 Subject: [nova] Displayed state of VM when nova-compute is down/unresponsive. In-Reply-To: (Laurent Dumont's message of "Fri, 15 Nov 2019 14:15:06 -0500") References: Message-ID: > We we're wondering if there was a way to expose the fact that the > underlying hypervisor is down? Something like a "Warning : no data > from compute since xx:xx:xx" I did not see any documentation regarding > a possible configuration option somewhere but a lot of posts with > people with similar questions. You're looking for "host_status" in the detailed server output. It gives you an indication of what the state is of the host the instance is on without revealing too much and without altering the state of the instance itself, which as you note could be wrong if the problem is merely communication. https://docs.openstack.org/api-ref/compute/?expanded=show-server-details-detail#show-server-details This is controlled by policy and only visible past microversion 2.16, so make sure both of those details are handled for whatever users you want to be able to have that level of vibility. --Dan From info at dantalion.nl Sat Nov 16 09:36:32 2019 From: info at dantalion.nl (info at dantalion.nl) Date: Sat, 16 Nov 2019 10:36:32 +0100 Subject: [oslo][i18n][pbr] get_available_languages() always only returns ['en_US'] Message-ID: <862457b9-719c-59b7-85ff-694370946d62@dantalion.nl> Hello, Across several projects I have noticed that both with unit tests or while the service is running calling olso_i18n.get_available_languages() only returns ['en_US'] (which is inserted as default). Even though the projects I have tested this on have several languages available in the locale directory. Calling any of the python setup.py extract_messages / compile_catalog / update_catalog / install commands does not solve this. However, when I manually copy the .mo files into /usr/share/locale/**/LC_MESSAGES it works as expected. My questions are: Surely there must be a less manual method to properly install all locale files but what is it? Why aren't the locale files installed by pbr when python setup.py install is called? I hope anyone knows the answers to these questions. Kind regards, Corne Lukken (Dantali0n) From thomas.morin at orange.com Sat Nov 16 15:38:39 2019 From: thomas.morin at orange.com (thomas.morin at orange.com) Date: Sat, 16 Nov 2019 16:38:39 +0100 Subject: [neutron] Removing neutron-interconnection out of stadium In-Reply-To: References: <20191115085605.zj35uembs2gaql4v@skaplons-mac> Message-ID: <14033_1573918723_5DD01803_14033_397_1_2a8a0e9f-0ad9-45f0-8519-7eefab4d9a30@OPEXCAUBM41.corporate.adroot.infra.ftgroup> Hi stackers & neutrinos, I understand the need to adapt the project status to the lack of activity on the project in the past year. During this time, my time as a reviewer, preventing the code submitted to be merged, has been taken by other activites (still OpenStack related!). Having been at the origin of the project, I have to apologize for the lack of communication of where we were about this project. My apologies for that. We still would like to have a place to let the proposal exit, code be reviewed and tested. Hosting under "x/" would work for us. Hope that this can work like this... Thanks! -Thomas Andreas Jaeger : > On 15/11/2019 09.56, Slawek Kaplonski wrote: >> Hi neutrinos, >> >> As we discussed during the Shanghai PTG I just proposed changes to move >> neutron-interconnection project out of stadium to "x/" namespace. >> So it will not be official neutron project anymore after those changes. >> Patches for that are in [1] and [2] >> >> I also proposed to remove neutron-interconnection api-ref from neutron-lib. >> Patch is here [3]. Please review it :) >> >> [1] https://review.opendev.org/#/c/694478/ >> [2] https://review.opendev.org/#/c/694480/ >> [3] https://review.opendev.org/#/c/694466/ >> > Looking at > https://review.opendev.org/#/q/project:openstack/neutron-interconnection > > I suggest to retire only with those few changes to the repo - and > nothing in the last few months. > > Or is there anybody committing to continue the work? > > We can also retire now - and create again in the "x/" namespace if > interest suddenly arises. But let's not move repos that are de-facto > dead around, > > Andreas _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. From aj at suse.com Sat Nov 16 16:52:46 2019 From: aj at suse.com (Andreas Jaeger) Date: Sat, 16 Nov 2019 17:52:46 +0100 Subject: [neutron] Removing neutron-interconnection out of stadium In-Reply-To: <14033_1573918723_5DD01803_14033_397_1_2a8a0e9f-0ad9-45f0-8519-7eefab4d9a30@OPEXCAUBM41.corporate.adroot.infra.ftgroup> References: <20191115085605.zj35uembs2gaql4v@skaplons-mac> <14033_1573918723_5DD01803_14033_397_1_2a8a0e9f-0ad9-45f0-8519-7eefab4d9a30@OPEXCAUBM41.corporate.adroot.infra.ftgroup> Message-ID: On 16/11/2019 16.38, thomas.morin at orange.com wrote: > Hi stackers & neutrinos, > > I understand the need to adapt the project status to the lack of > activity on the project in the past year. > During this time, my time as a reviewer, preventing the code submitted > to be merged, has been taken by other activites (still OpenStack related!). > > Having been at the origin of the project, I have to apologize for the > lack of communication of where we were about this project. > My apologies for that. > > We still would like to have a place to let the proposal exit, code be > reviewed and tested. > Hosting under "x/" would work for us. Sure, no problem. Please read first https://governance.openstack.org/tc/resolutions/20190711-mandatory-repository-retirement.html - that's the process that applies here. So, we retire the repo completely (with the existing changes) - and you can anytime push up a new change to create the repo in the "x" namespace import the content (minus the retirement change) into it... I'll quickly review such an import if it shows up, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From fungi at yuggoth.org Sat Nov 16 19:34:18 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sat, 16 Nov 2019 19:34:18 +0000 Subject: [neutron] Removing neutron-interconnection out of stadium In-Reply-To: References: <20191115085605.zj35uembs2gaql4v@skaplons-mac> <14033_1573918723_5DD01803_14033_397_1_2a8a0e9f-0ad9-45f0-8519-7eefab4d9a30@OPEXCAUBM41.corporate.adroot.infra.ftgroup> Message-ID: <20191116193418.g7wthwefudfc6rv7@yuggoth.org> On 2019-11-16 17:52:46 +0100 (+0100), Andreas Jaeger wrote: [...] > So, we retire the repo completely (with the existing changes) - and you > can anytime push up a new change to create the repo in the "x" namespace > import the content (minus the retirement change) into it... [...] Even easier is probably to import it all (just set the upstream import to the opendev.org clone URL for the original project), and then as part of the first change to the new project revert the retirement commit. That doesn't require pushing a temporary copy of the repository anywhere for import. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From hberaud at redhat.com Mon Nov 18 09:07:00 2019 From: hberaud at redhat.com (Herve Beraud) Date: Mon, 18 Nov 2019 10:07:00 +0100 Subject: [oslo] Virtual PTG Planning In-Reply-To: <990d6f1f-ac09-a2cb-2512-360195340202@nemebean.com> References: <990d6f1f-ac09-a2cb-2512-360195340202@nemebean.com> Message-ID: +1 Wise decision. Do we need to bring some party favors? Le ven. 15 nov. 2019 à 18:04, Ben Nemec a écrit : > Okay, so far just three of us have responded to the poll. Since this was > sort of short notice for next week and so far everyone seems to be > available on any of the days, I'm going to propose that we do this on > Nov. 25. As an added bonus that means it can double as a virtual > birthday party for me. :-) > > If that ends up not working for anyone we can revisit this, but > otherwise let's plan on doing it then. > > Thanks. > > -Ben > > On 11/13/19 12:08 PM, Ben Nemec wrote: > > Hi Osloers, > > > > Given that a lot of the team was not in Shanghai and we had a few topics > > proposed that didn't make sense to discuss as a result, I would like to > > try doing a virtual PTG the way a number of the other teams are. I've > > added a section to the PTG etherpad[0] with some proposed details, but > > in general I'm thinking we meet on Jitsi (it's open source) around the > > time of the Oslo meeting. It's possible we might be able to get through > > everything in the regularly scheduled hour, but if possible I'd like to > > keep the following hour (1600-1700 UTC) open as well. If everyone's > > available we could do it next week (the 18th) or possibly the following > > week (the 25th), although that runs into Thanksgiving week in the US so > > people might be out. I've created a Doodle poll[1] with selections for > > the next three weeks so please respond there if you can make it any of > > those days. If none of them work well we can discuss alternative options. > > > > Thanks. > > > > -Ben > > > > 0: https://etherpad.openstack.org/p/oslo-shanghai-topics > > 1: https://doodle.com/poll/8bqiv865ucyt8499 > > > > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaronzhu1121 at gmail.com Mon Nov 18 11:46:16 2019 From: aaronzhu1121 at gmail.com (Rong Zhu) Date: Mon, 18 Nov 2019 19:46:16 +0800 Subject: [release][stable][telemetry]Please add Rong Zhu to ceilometer-stable-maint group Message-ID: Hi Stable Maintenance Core team, I am the current Telemetry PTL, could you please add me to ceilometer-stable-maint group. And please also add Lingxian Kong to this group. -- Thanks, Rong Zhu -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Mon Nov 18 14:17:36 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Mon, 18 Nov 2019 06:17:36 -0800 Subject: [horizon] Changing the release model to cycle-with-intermediary Message-ID: Hi, I just proposed a patch to change the horizon release model to cycle-with-intermediary. https://review.opendev.org/#/c/694772/ It was discussed during the Shanghai PTG. Horizon provides GUI to users. On the other hand, it is a library for horizon plugins. When horizon plugins would like to use recent changes or to avoid bugs, they need to consume beta releases of horizon. More frequent releases of horizon would make more sense, so I am proposing the release model change. If there are concerns, reply to this thread or drop comments in the review mentioned above. Thanks, Akihiro Motoki (irc: amotoki) From kgiusti at gmail.com Mon Nov 18 14:42:41 2019 From: kgiusti at gmail.com (Ken Giusti) Date: Mon, 18 Nov 2019 09:42:41 -0500 Subject: [oslo] Virtual PTG Planning In-Reply-To: <990d6f1f-ac09-a2cb-2512-360195340202@nemebean.com> References: <990d6f1f-ac09-a2cb-2512-360195340202@nemebean.com> Message-ID: +1 (year for Ben) +1 to Nov 25th On Fri, Nov 15, 2019 at 12:14 PM Ben Nemec wrote: > Okay, so far just three of us have responded to the poll. Since this was > sort of short notice for next week and so far everyone seems to be > available on any of the days, I'm going to propose that we do this on > Nov. 25. As an added bonus that means it can double as a virtual > birthday party for me. :-) > > If that ends up not working for anyone we can revisit this, but > otherwise let's plan on doing it then. > > Thanks. > > -Ben > > On 11/13/19 12:08 PM, Ben Nemec wrote: > > Hi Osloers, > > > > Given that a lot of the team was not in Shanghai and we had a few topics > > proposed that didn't make sense to discuss as a result, I would like to > > try doing a virtual PTG the way a number of the other teams are. I've > > added a section to the PTG etherpad[0] with some proposed details, but > > in general I'm thinking we meet on Jitsi (it's open source) around the > > time of the Oslo meeting. It's possible we might be able to get through > > everything in the regularly scheduled hour, but if possible I'd like to > > keep the following hour (1600-1700 UTC) open as well. If everyone's > > available we could do it next week (the 18th) or possibly the following > > week (the 25th), although that runs into Thanksgiving week in the US so > > people might be out. I've created a Doodle poll[1] with selections for > > the next three weeks so please respond there if you can make it any of > > those days. If none of them work well we can discuss alternative options. > > > > Thanks. > > > > -Ben > > > > 0: https://etherpad.openstack.org/p/oslo-shanghai-topics > > 1: https://doodle.com/poll/8bqiv865ucyt8499 > > > > -- Ken Giusti (kgiusti at gmail.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alifshit at redhat.com Mon Nov 18 15:02:28 2019 From: alifshit at redhat.com (Artom Lifshitz) Date: Mon, 18 Nov 2019 10:02:28 -0500 Subject: [all] Nominations for the "V" release name In-Reply-To: <20191115152903.GA29931@sm-workstation> References: <20191115152903.GA29931@sm-workstation> Message-ID: Top posting for the lulz: Clearly the name must be V for Vendetta. On Fri, Nov 15, 2019 at 10:33 AM Sean McGinnis wrote: > > Hey everyone, > > There is ongoing discussion about changing our release naming process, but for > the time being we are going to stick with what we have been doing. That means > it's time to start thinking about the "V" release name! > > The next developer event will take place in Vancouver, BC. The geographic > location for this release will be things starting with "V" in the British > Columbia province. > > The nomination period is now open. Please add suitable names to > https://wiki.openstack.org/wiki/Release_Naming/V_Proposals. We will accept > nominations until December 6, 2019 23:59:59 UTC. > > A recap of our current naming rules: > > * Each release name must start with the letter of the ISO basic Latin > alphabet following the initial letter of the previous release, starting > with the initial release of "Austin". After "Z", the next name should > start with "A" again. > > * The name must be composed only of the 26 characters of the ISO basic > Latin alphabet. Names which can be transliterated into this character > set are also acceptable. > > * The name must refer to the physical or human geography of the region > encompassing the location of the OpenStack design summit for the > corresponding release. The exact boundaries of the geographic region > under consideration must be declared before the opening of nominations, > as part of the initiation of the selection process. > > * The name must be a single word with a maximum of 10 characters. Words > that describe the feature should not be included, so "Foo City" or "Foo > Peak" would both be eligible as "Foo". > > Names which do not meet these criteria but otherwise sound really cool > should be added to a separate section of the wiki page and the TC may > make an exception for one or more of them to be considered in the > Condorcet poll. The naming official is responsible for presenting the > list of exceptional names for consideration to the TC before the poll opens. > > > Additional information about the release naming process can be found here: > > https://governance.openstack.org/tc/reference/release-naming.html > > Looking forward to having a name for our next release! > > Sean From fungi at yuggoth.org Mon Nov 18 15:11:35 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 18 Nov 2019 15:11:35 +0000 Subject: [all] Nominations for the "V" release name In-Reply-To: References: <20191115152903.GA29931@sm-workstation> Message-ID: <20191118151135.wwztoxfyp2zo3iym@yuggoth.org> On 2019-11-18 10:02:28 -0500 (-0500), Artom Lifshitz wrote: > Top posting for the lulz: > > Clearly the name must be V for Vendetta. [...] If only you could have made that joke two weeks ago, it would have been far more timely. ;) -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Mon Nov 18 15:34:55 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 18 Nov 2019 15:34:55 +0000 Subject: [all] Nominations for the "V" release name In-Reply-To: <20191118151135.wwztoxfyp2zo3iym@yuggoth.org> References: <20191115152903.GA29931@sm-workstation> <20191118151135.wwztoxfyp2zo3iym@yuggoth.org> Message-ID: <20191118153455.zfw7r3e425dei3vo@yuggoth.org> On 2019-11-18 15:11:35 +0000 (+0000), Jeremy Stanley wrote: > On 2019-11-18 10:02:28 -0500 (-0500), Artom Lifshitz wrote: > > Top posting for the lulz: > > > > Clearly the name must be V for Vendetta. > [...] > > If only you could have made that joke two weeks ago, it would have > been far more timely. ;) Sorry, because this allusion confused some people, the plot line in "V for Vendetta" centers around Guy Fawkes Night which is observed annually on November 5. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From luka.peschke at objectif-libre.com Mon Nov 18 15:41:01 2019 From: luka.peschke at objectif-libre.com (Luka Peschke) Date: Mon, 18 Nov 2019 16:41:01 +0100 Subject: [cloudkitty] 18/11 IRC meeting recap Message-ID: <1ad2de5b6d9e6afeaaa93d4b112f4357@objectif-libre.com> Hello, This is the recap for today's IRC meeting of the cloudkitty team. The agenda can be found at [1] and the logs can be found at [2]. CloudKitty 11.0.1 ================= CloudKitty 11.0.1 has been released. It includes a fix for security issue on the GET /v1/dataframes and GET /v2/dataframes endpoints which had been introduced during the train development cycle. Various updates =============== Two patches had been discussed during the previous meeting: one for developer documentation for scope fetchers, and one allowing to group results by timestamps on GET /v2/summary. Both have been merged. API improvements ================ Two patches improving the API are currently under review: - The first one updates the way oslo.context and oslo.policy are used. External reviews on this one would be very helpful [3] - The second one improves the way various drivers are loaded in the v2 API. It is available at [4] Standalone cloudkitty dashboard =============================== Julien Pinchelimouroux (julien-pinchelim) has been working on a standalone dashboard (which will also support keystone authentication)for cloudkitty. A 0.1.0 release should happen during in Q4 2019. Some screenshots are available at [5] Cheers, -- Luka Peschke (peschk_l) [1] https://etherpad.openstack.org/p/cloudkitty-meeting-topics [2] http://eavesdrop.openstack.org/meetings/cloudkitty/2019/cloudkitty.2019-11-18-14.02.log.html [3] https://review.opendev.org/#/c/692333/ [4] https://review.opendev.org/#/c/686393/ [5] https://kutt.it/khA6yF From donny at fortnebula.com Mon Nov 18 16:05:19 2019 From: donny at fortnebula.com (Donny Davis) Date: Mon, 18 Nov 2019 11:05:19 -0500 Subject: [all] Nominations for the "V" release name In-Reply-To: References: <20191115152903.GA29931@sm-workstation> Message-ID: I know I only get a +1, but I +1 this name a thousand times. On Mon, Nov 18, 2019 at 10:08 AM Artom Lifshitz wrote: > Top posting for the lulz: > > Clearly the name must be V for Vendetta. > > On Fri, Nov 15, 2019 at 10:33 AM Sean McGinnis > wrote: > > > > Hey everyone, > > > > There is ongoing discussion about changing our release naming process, > but for > > the time being we are going to stick with what we have been doing. That > means > > it's time to start thinking about the "V" release name! > > > > The next developer event will take place in Vancouver, BC. The geographic > > location for this release will be things starting with "V" in the British > > Columbia province. > > > > The nomination period is now open. Please add suitable names to > > https://wiki.openstack.org/wiki/Release_Naming/V_Proposals. We will > accept > > nominations until December 6, 2019 23:59:59 UTC. > > > > A recap of our current naming rules: > > > > * Each release name must start with the letter of the ISO basic Latin > > alphabet following the initial letter of the previous release, starting > > with the initial release of "Austin". After "Z", the next name should > > start with "A" again. > > > > * The name must be composed only of the 26 characters of the ISO basic > > Latin alphabet. Names which can be transliterated into this character > > set are also acceptable. > > > > * The name must refer to the physical or human geography of the region > > encompassing the location of the OpenStack design summit for the > > corresponding release. The exact boundaries of the geographic region > > under consideration must be declared before the opening of nominations, > > as part of the initiation of the selection process. > > > > * The name must be a single word with a maximum of 10 characters. Words > > that describe the feature should not be included, so "Foo City" or "Foo > > Peak" would both be eligible as "Foo". > > > > Names which do not meet these criteria but otherwise sound really cool > > should be added to a separate section of the wiki page and the TC may > > make an exception for one or more of them to be considered in the > > Condorcet poll. The naming official is responsible for presenting the > > list of exceptional names for consideration to the TC before the poll > opens. > > > > > > Additional information about the release naming process can be found > here: > > > > https://governance.openstack.org/tc/reference/release-naming.html > > > > Looking forward to having a name for our next release! > > > > Sean > > > -- ~/DonnyD C: 805 814 6800 "No mission too difficult. No sacrifice too great. Duty First" -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Nov 18 16:08:15 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 18 Nov 2019 17:08:15 +0100 Subject: [neutron][drivers] Drivers meeting cancel Message-ID: <20191118160815.xrb7zslwnjgitzzz@skaplons-mac> Hi, I can't attend next drivers meeting on Friday, 22.11.2019. I know also that 2 other members of drivers team will not be able to be on the meeting so as we will not have quorum on this meeting, lets cancel it. See You on the meeting next week, on 29.11.2019 where (I hope) we should have quorum even if it's just after Thanksgiving. -- Slawek Kaplonski Senior software engineer Red Hat From fsbiz at yahoo.com Mon Nov 18 16:14:59 2019 From: fsbiz at yahoo.com (fsbiz at yahoo.com) Date: Mon, 18 Nov 2019 16:14:59 +0000 (UTC) Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: <7d53de2f-46de-edcf-63dc-fe7ba8b61f83@gmail.com> References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> <78766172.92122.1573718625984@mail.yahoo.com> <1952364384.238482.1573747741880@mail.yahoo.com> <7d53de2f-46de-edcf-63dc-fe7ba8b61f83@gmail.com> Message-ID: <1515926373.1791856.1574093699349@mail.yahoo.com> Thanks Matt for the excellent suggestions in this email and the prior one.I am currently trying to eliminate them one by one and will update. Yes, by  forced host I do mean creating the server with an availability zone in the ZONE:NODE format.   Yes, I understand the scheduler filters aren't run but why should that bean issue?  For now, I am tracing all the logs from the PaaS layer all the way to Openstack nova placement API tosee if there is anything unusual. Thanks,Fred. On Thursday, November 14, 2019, 10:07:15 AM PST, Matt Riedemann wrote: On 11/14/2019 10:09 AM, fsbiz at yahoo.com wrote: > The requests coming in are "forced host" requests.  The PaaS layer > maintains > an inventory of actual bare-metal available nodes and a user has to > explicitly select > a baremetal node.  The PaaS layer then makes a nova api call for an > instance to be created > on that specific baremetal node. To be clear, by forced host you mean creating the server with an availability zone in the format ZONE:HOST:NODE or ZONE:NODE where NODE is the ironic node UUID, correct? https://docs.openstack.org/nova/latest/admin/availability-zones.html#using-availability-zones-to-select-hosts Yeah that's a problem because then the scheduler filters aren't run. A potential alternative is to create the server using a hypervisor_hostname query hint that will run through the JsonFilter: https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#jsonfilter Then at least you're not forcing the node and run the scheduler filters. I forget exactly how the scheduler code works in Queens with respect to forced hosts/nodes on server create but the scheduler still has to allocate resources in placement. It looks like we work around that in Queens by disabling the limit we place on getting allocation candidates from placement: https://review.opendev.org/#/c/584616/ My guess is your PaaS layer has bugs in it since it's allowing users to select hosts that are already consumed, or it's just racy. Anyway, this is why nova uses placement since Pike for atomic consumption of resources during scheduling. -- Thanks, Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Mon Nov 18 17:37:02 2019 From: openstack at nemebean.com (Ben Nemec) Date: Mon, 18 Nov 2019 11:37:02 -0600 Subject: [oslo] Virtual PTG Planning In-Reply-To: References: <990d6f1f-ac09-a2cb-2512-360195340202@nemebean.com> Message-ID: I was going to suggest party hats, but I can't wear one with my headset on. :-) On 11/18/19 3:07 AM, Herve Beraud wrote: > +1 Wise decision. > Do we need to bring some party favors? > > Le ven. 15 nov. 2019 à 18:04, Ben Nemec > a écrit : > > Okay, so far just three of us have responded to the poll. Since this > was > sort of short notice for next week and so far everyone seems to be > available on any of the days, I'm going to propose that we do this on > Nov. 25. As an added bonus that means it can double as a virtual > birthday party for me. :-) > > If that ends up not working for anyone we can revisit this, but > otherwise let's plan on doing it then. > > Thanks. > > -Ben > > On 11/13/19 12:08 PM, Ben Nemec wrote: > > Hi Osloers, > > > > Given that a lot of the team was not in Shanghai and we had a few > topics > > proposed that didn't make sense to discuss as a result, I would > like to > > try doing a virtual PTG the way a number of the other teams are. > I've > > added a section to the PTG etherpad[0] with some proposed > details, but > > in general I'm thinking we meet on Jitsi (it's open source) > around the > > time of the Oslo meeting. It's possible we might be able to get > through > > everything in the regularly scheduled hour, but if possible I'd > like to > > keep the following hour (1600-1700 UTC) open as well. If everyone's > > available we could do it next week (the 18th) or possibly the > following > > week (the 25th), although that runs into Thanksgiving week in the > US so > > people might be out. I've created a Doodle poll[1] with > selections for > > the next three weeks so please respond there if you can make it > any of > > those days. If none of them work well we can discuss alternative > options. > > > > Thanks. > > > > -Ben > > > > 0: https://etherpad.openstack.org/p/oslo-shanghai-topics > > 1: https://doodle.com/poll/8bqiv865ucyt8499 > > > > > > -- > Hervé Beraud > Senior Software Engineer > Red Hat - Openstack Oslo > irc: hberaud > -----BEGIN PGP SIGNATURE----- > > wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ > Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ > RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP > F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G > 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g > glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw > m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ > hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 > qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y > F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 > B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O > v6rDpkeNksZ9fFSyoY2o > =ECSj > -----END PGP SIGNATURE----- > From kendall at openstack.org Mon Nov 18 18:08:40 2019 From: kendall at openstack.org (Kendall Waters) Date: Mon, 18 Nov 2019 12:08:40 -0600 Subject: Shanghai PTG Team Photos Message-ID: <4C08DD59-9EFE-4670-ACFA-D13CC8626234@openstack.org> Hi everyone, Thank you for attending the Project Teams Gathering in Shanghai! If your team took a team picture, you can find a copy of the photo file in this Dropbox folder: https://www.dropbox.com/sh/1my6wdtuc1hf58o/AACU49pjWxzFNzcZJgjLG8n1a?dl=0 If you are unable to open Dropbox, please send me an email with which team photo you are looking for and I can send you the file directly. Cheers, Kendall Kendall Waters OpenStack Marketing & Events kendall at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kchamart at redhat.com Mon Nov 18 18:11:06 2019 From: kchamart at redhat.com (Kashyap Chamarthy) Date: Mon, 18 Nov 2019 19:11:06 +0100 Subject: On next minimum libvirt / QEMU versions for "V" release Message-ID: <20191118181106.GD7032@paraplu> Heya, The last time we incremented versions for libvirt and QEMU was during the "Stein" release[1]. For "Train" we didn't do any. Although we advertized NEXT_MIN_{LIBVIRT,QEMU} versions for "Train" release to be libvirt 4.0.0 and QEMU 2.11.0, but we actually didn't bump; we'll do that for "Ussuri". But before we bump the versions for "Ussuri", we need to pick NEXT_MIN versions for the "V" release. Based on the updated the DistroSupportMatrix page[2], it looks we can pick the next libvirt and QEMU versions for "V" release to the following: libvirt: 5.0.0 [GAed on: 15-Jan-2019] QEMU: 4.0.0 [GAed on: 24-Apr-2019] I have the initial patch here[3] for comments. Debian, Fedora, Ubuntu[4], CentOS, RHEL currently already ship the above versions (actually, higher than those). And it is reasonable to assume -- but let's confirm below -- that openSUSE, SLES, and Oracle Linux would also have the above versions available by "V" release time. Action Items for Linux Distros ------------------------------ (a) Oracle Linux: Please update your libvirt/QEMU versions for Oracle Linux 8? I couldn't find anything related to libvirt/QEMU here: https://yum.oracle.com/oracle-linux-8.html. (My educated guess is: the versions roughly match what's in CentOS/RHEL.) (b) openSUSE and SLES: Same request as above. Andreas Jaegaer said on #openstack-infra that the proposed versions for 'V' release should be fine for SLES. (And by extension open SUSE, I assume.) - - - Assuming Oracle Linux and SLES confirm, please let us know if there are any objections if we pick NEXT_MIN_* versions for the OpenStack "V" release to be libvirt: 5.0.0 and QEMU: 4.0.0. Comments / alternative proposals welcome :-) [1] https://opendev.org/openstack/nova/commit/489b5f762e -- Pick next minimum libvirt / QEMU versions for "T" release, 2018-09-25) [2] https://wiki.openstack.org/wiki/LibvirtDistroSupportMatrix [3] https://review.opendev.org/694821 -- [RFC] Pick NEXT_MIN libvirt/QEMU versions for "V" release [4] For Ubuntu, I updated the versions based on what is in the Cloud Archive repo for "Bionic" (the LTS) release: http://reqorts.qa.ubuntu.com/reports/ubuntu-server/cloud-archive/train_versions.html -- /kashyap From gouthampravi at gmail.com Mon Nov 18 18:25:13 2019 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Mon, 18 Nov 2019 10:25:13 -0800 Subject: [manila] No IRC meeting on Nov 7 and Nov 21 2019 In-Reply-To: References: Message-ID: Folks, A reminder that the manila IRC meeting on 21st November 2019 has been canceled. The next meeting is on 28th November 2019 at 15:00 UTC - however, it's a holiday in the US and many contributors may not join. Carlos Eduardo has graciously agreed to chair this meeting. If we do not have a quorum, we'll defer any decisions and communicate on the mailing list instead. Thank you, Goutham On Thu, Oct 31, 2019 at 8:57 AM Goutham Pacha Ravi wrote: > > Hello Zorillas and interested stackers, > > Due to a part of our community attending the Open Infrastructure > Summit+PTG (Nov 4-8, 2019) and KubeCon+CloudNativeCon (Nov 18-21, > 2019), I propose that we cancel the weekly IRC meetings on Nov 7th and > Nov 21st. > > If you'd like to discuss anything during these weeks, please chime in > on freenode/#openstack-manila, or post to this mailing list. > > Thanks, > Goutham From pkliczew at redhat.com Mon Nov 18 17:59:35 2019 From: pkliczew at redhat.com (Piotr Kliczewski) Date: Mon, 18 Nov 2019 18:59:35 +0100 Subject: [Openstack] FOSDEM 2020 Virtualization & IaaS Devroom CfP Message-ID: Friendly reminder that there are 2 weeks before the submission deadline. Room day update: This year Virt and IaaS room will be on the 2nd of February. See you all at FOSDEM! -------------- next part -------------- An HTML attachment was scrubbed... URL: From colleen at gazlene.net Mon Nov 18 19:39:16 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Mon, 18 Nov 2019 11:39:16 -0800 Subject: [keystone] post-PTG virtual meeting reminder Message-ID: <23af6fa2-d554-480c-9e5e-fdf1762ed3f2@www.fastmail.com> Hi keystoners, As a reminder, we'll be holding our post-PTG meeting tomorrow at 14:00 UTC (with the daylight savings time change that makes it 6:00 PST (RIP me) / 9:00 EST / 19:30 IST). Last week I briefly floated the idea of rescheduling it but quickly decided there was not enough notice, so we're holding it at the day and time that we've had planned for the last few weeks. Our agenda and notes: https://etherpad.openstack.org/p/keystone-shanghai-ptg We'll try jitsi.org again, I think the technical issues we were having last time were on my end and not related to the platform: https://meet.jit.si/keystone-ptg Please please please review the roadmap board and assign yourself to items you have been working on and update their status: https://tree.taiga.io/project/keystone-ussuri-roadmap/kanban . We'll be going over the board together tomorrow, and updating it ahead of time will save us time as a group. Colleen From mnaser at vexxhost.com Mon Nov 18 21:40:04 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 18 Nov 2019 16:40:04 -0500 Subject: [tc][stable] Changing stable branch policy Message-ID: Hi everyone, At the PTG, the TC discussed what we can do about our stable branch policy and there was a few different ideas put on the table, however, something that I felt made a lot of sense was revisiting the way that we currently apply it. We all know that we're definitely a lot more resource limited as a community and it's important for us to start breaking down some of those ideas which made sense when the velocity of the project was very high. One of the things that used to make sense is maintaining a dedicated stable core team across all projects. At the current time: 1. Some projects seem to have some sort of power of their stable branches through historical reasons 2. Some projects don't have access to merging things into stable branches and need to rely on the stable maintenance team to do that 3. We are *really* thankful for our current stable team, but it seems that there is a lot of work that does bottleneck other teams (and really, stable reviews is a difficult task). The proposal that I had was that in mind would be for us to let teams self manage their own stable branches. I think we've reached a point where we can trust most of our community to be familiar with the stable branch policy (and let teams decide for themselves what they believe is best for the success of their own projects). I'd like to invite the community to comment on this change, the approach that we can take to do this (or other ideas) -- being mindful with the limited set of resources that we have inside the community. Thanks, Mohammed From feilong at catalyst.net.nz Mon Nov 18 21:46:30 2019 From: feilong at catalyst.net.nz (Feilong Wang) Date: Tue, 19 Nov 2019 10:46:30 +1300 Subject: [Magnum] Virtual PTG planning Message-ID: <2f35bc6c-b4bb-dbe7-c16d-ede34bc23914@catalyst.net.nz> Hi team, As we discussed on last weekly team meeting, we'd like to have a virtual PTG before the Xmas holiday to plan our work for the U release. The general idea is extending our current weekly meeting time from 1 hour to 2 hours and having 2 sessions with total 4 hours. My current proposal is as below, please reply if you have question or comments. Thanks. Pre discussion/Ideas collection:   20th Nov  9:00AM-10:00AM UTC 1st Session:  27th Nov 9:00AM-11:00AM UTC 2nd Session: 4th Dec 9:00AM-11:00AM UTC -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- From mriedemos at gmail.com Mon Nov 18 22:08:24 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 18 Nov 2019 16:08:24 -0600 Subject: [tc][stable] Changing stable branch policy In-Reply-To: References: Message-ID: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> On 11/18/2019 3:40 PM, Mohammed Naser wrote: > The proposal that I had was that in mind would be for us to let teams > self manage their own stable branches. I think we've reached a point > where we can trust most of our community to be familiar with the > stable branch policy (and let teams decide for themselves what they > believe is best for the success of their own projects). So for a project like nova that has a separate nova-core [1] and nova-stable-maint team [2] where some from [2] aren't in [1], what does this mean? Drop [2] and just rely on [1]? That won't work for those in nova-core that aren't familiar enough with the stable branch guidelines or simply don't care to review stable branch changes, and won't work for those that are in nova-stable-maint but not nova-core. [1] https://review.opendev.org/#/admin/groups/25,members [2] https://review.opendev.org/#/admin/groups/540,members -- Thanks, Matt From openstack at nemebean.com Mon Nov 18 22:35:31 2019 From: openstack at nemebean.com (Ben Nemec) Date: Mon, 18 Nov 2019 16:35:31 -0600 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> Message-ID: <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> On 11/18/19 4:08 PM, Matt Riedemann wrote: > On 11/18/2019 3:40 PM, Mohammed Naser wrote: >> The proposal that I had was that in mind would be for us to let teams >> self manage their own stable branches.  I think we've reached a point >> where we can trust most of our community to be familiar with the >> stable branch policy (and let teams decide for themselves what they >> believe is best for the success of their own projects). > > So for a project like nova that has a separate nova-core [1] and > nova-stable-maint team [2] where some from [2] aren't in [1], what does > this mean? Drop [2] and just rely on [1]? That won't work for those in > nova-core that aren't familiar enough with the stable branch guidelines > or simply don't care to review stable branch changes, and won't work for > those that are in nova-stable-maint but not nova-core. I believe the proposal is to allow the Nova team to manage nova-stable-maint in the same way they do nova-core, not to force anyone to drop their stable-maint team entirely. > > [1] https://review.opendev.org/#/admin/groups/25,members > [2] https://review.opendev.org/#/admin/groups/540,members > From mnaser at vexxhost.com Mon Nov 18 22:35:50 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 18 Nov 2019 17:35:50 -0500 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> Message-ID: On Mon, Nov 18, 2019 at 5:13 PM Matt Riedemann wrote: > > On 11/18/2019 3:40 PM, Mohammed Naser wrote: > > The proposal that I had was that in mind would be for us to let teams > > self manage their own stable branches. I think we've reached a point > > where we can trust most of our community to be familiar with the > > stable branch policy (and let teams decide for themselves what they > > believe is best for the success of their own projects). > > So for a project like nova that has a separate nova-core [1] and > nova-stable-maint team [2] where some from [2] aren't in [1], what does > this mean? Drop [2] and just rely on [1]? That won't work for those in > nova-core that aren't familiar enough with the stable branch guidelines > or simply don't care to review stable branch changes, and won't work for > those that are in nova-stable-maint but not nova-core. Thanks for bringing this up, I think we'll slowly iron those out. I think this can be a team-specific decision, we should have $project-stable-maint for every single project anyways, and the team could decide to put all of $project-core inside of it, or a select group of people. > [1] https://review.opendev.org/#/admin/groups/25,members > [2] https://review.opendev.org/#/admin/groups/540,members > > -- > > Thanks, > > Matt > From nate.johnston at redhat.com Mon Nov 18 23:01:06 2019 From: nate.johnston at redhat.com (Nate Johnston) Date: Mon, 18 Nov 2019 18:01:06 -0500 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> Message-ID: <20191118230106.gl4ctpftmndyzpbn@firewall> On Mon, Nov 18, 2019 at 04:08:24PM -0600, Matt Riedemann wrote: > On 11/18/2019 3:40 PM, Mohammed Naser wrote: > > The proposal that I had was that in mind would be for us to let teams > > self manage their own stable branches. I think we've reached a point > > where we can trust most of our community to be familiar with the > > stable branch policy (and let teams decide for themselves what they > > believe is best for the success of their own projects). > > So for a project like nova that has a separate nova-core [1] and > nova-stable-maint team [2] where some from [2] aren't in [1], what does this > mean? Drop [2] and just rely on [1]? That won't work for those in nova-core > that aren't familiar enough with the stable branch guidelines or simply > don't care to review stable branch changes, and won't work for those that > are in nova-stable-maint but not nova-core. I wouldn't think that anything would need to change about how Nova does things. If the Nova team wants to manage Nova stable branches using nova-stable-maint then this proposal absolutely supports that. The main change is removing stable-maint-core [3] from nove-stable-maint as stable-maint-core would presumably be dissolving as part of this change. Many teams already have a stable team [4]. For the ones that don't seem to (for example packaging-rpm, telemetry, monasca, or kuryr) it would make sense to make a $PROJECT-stable-maint and then leave it up to that project to either add $PROJECT-core to it or designate specific members to manage the stable branches. So in the end all the teams have the option to work like Nova does. Nate > [1] https://review.opendev.org/#/admin/groups/25,members > [2] https://review.opendev.org/#/admin/groups/540,members [3] https://review.opendev.org/#/admin/groups/530,members [4] https://review.opendev.org/#/admin/groups/?filter=stable From xingyongji at gmail.com Tue Nov 19 01:06:50 2019 From: xingyongji at gmail.com (yj x) Date: Tue, 19 Nov 2019 09:06:50 +0800 Subject: how to update back-end storage online Message-ID: Hi, I want to implement a qemu block driver which like qeme/block/iscsi.c. Here are my questions: some targets have be attached to guest system, and then the targets info have changed. How can I update the connection to targets online? I mean the disk in guest system does not change, just update the connection to the back-end storage, and the operation can't affect guest system. I can't find a nova-api or libvirt-api to meet my needs now. Does anyone help me? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zbitter at redhat.com Tue Nov 19 01:17:04 2019 From: zbitter at redhat.com (Zane Bitter) Date: Mon, 18 Nov 2019 17:17:04 -0800 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> Message-ID: <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> On 18/11/19 5:35 pm, Ben Nemec wrote: > > > On 11/18/19 4:08 PM, Matt Riedemann wrote: >> On 11/18/2019 3:40 PM, Mohammed Naser wrote: >>> The proposal that I had was that in mind would be for us to let teams >>> self manage their own stable branches.  I think we've reached a point >>> where we can trust most of our community to be familiar with the >>> stable branch policy (and let teams decide for themselves what they >>> believe is best for the success of their own projects). >> >> So for a project like nova that has a separate nova-core [1] and >> nova-stable-maint team [2] where some from [2] aren't in [1], what >> does this mean? Drop [2] and just rely on [1]? That won't work for >> those in nova-core that aren't familiar enough with the stable branch >> guidelines or simply don't care to review stable branch changes, and >> won't work for those that are in nova-stable-maint but not nova-core. > > I believe the proposal is to allow the Nova team to manage > nova-stable-maint in the same way they do nova-core, not to force anyone > to drop their stable-maint team entirely. I think the proposal was actually for each *-stable-maint team to manage itself. This would avoid the situation where e.g. the TC appoints a brand-new PTL and suddenly they get to make themselves a stable core, as in that case the team would still have to be bootstrapped by the stable-maint team. But it would allow those who are both closest to the project and confirmed to be familiar with the stable guidelines to make decisions about who else is ready to join that group. - ZB >> >> [1] https://review.opendev.org/#/admin/groups/25,members >> [2] https://review.opendev.org/#/admin/groups/540,members >> > From naohiro.sameshima at global.ntt Tue Nov 19 02:02:56 2019 From: naohiro.sameshima at global.ntt (=?utf-8?B?TmFvaGlybyBTYW1lc2hpbWHvvIjprqvls7Yg55u05rSL77yJKEdyb3VwKQ==?=) Date: Tue, 19 Nov 2019 02:02:56 +0000 Subject: [glance] glance_store tests failed Message-ID: Hi, When I run a test with `tox -e py37` in glance_store, two tests failed. The command what I ran is below. 1. git clone https://opendev.org/openstack/glance_store.git 2. tox -e py37 Is there something wrong with how to run test? Thanks & Best Regards, ============================== Failed 2 tests - output below: ============================== glance_store.tests.unit.test_filesystem_store.TestStore.test_add_check_metadata_list_with_valid_mountpoint_locations -------------------------------------------------------------------------------------------------------------------- Captured traceback: ~~~~~~~~~~~~~~~~~~~ b'Traceback (most recent call last):' b' File "/Users/sameshima/glance_store/glance_store/tests/unit/test_filesystem_store.py", line 215, in test_add_check_metadata_list_with_valid_mountpoint_locations' b' self.assertEqual(in_metadata[0], metadata)' b' File "/Users/sameshima/glance_store/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", line 411, in assertEqual' b' self.assertThat(observed, matcher, message)' b' File "/Users/sameshima/glance_store/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", line 498, in assertThat' b' raise mismatch_error' b"testtools.matchers._impl.MismatchError: {'id': 'abcdefg', 'mountpoint': '/tmp'} != {}" b'' glance_store.tests.unit.test_multistore_filesystem.TestMultiStore.test_add_check_metadata_list_with_valid_mountpoint_locations ------------------------------------------------------------------------------------------------------------------------------ Captured traceback: ~~~~~~~~~~~~~~~~~~~ b'Traceback (most recent call last):' b' File "/Users/sameshima/glance_store/glance_store/tests/unit/test_multistore_filesystem.py", line 276, in test_add_check_metadata_list_with_valid_mountpoint_locations' b' self.assertEqual(in_metadata[0], metadata)' b' File "/Users/sameshima/glance_store/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", line 411, in assertEqual' b' self.assertThat(observed, matcher, message)' b' File "/Users/sameshima/glance_store/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", line 498, in assertThat' b' raise mismatch_error' b"testtools.matchers._impl.MismatchError: {'id': 'abcdefg', 'mountpoint': '/tmp'} != {'store': 'file1'}" This email and all contents are subject to the following disclaimer: https://hello.global.ntt/en-us/email-disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Tue Nov 19 02:18:09 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 18 Nov 2019 20:18:09 -0600 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> Message-ID: <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> ---- On Mon, 18 Nov 2019 19:17:04 -0600 Zane Bitter wrote ---- > On 18/11/19 5:35 pm, Ben Nemec wrote: > > > > > > On 11/18/19 4:08 PM, Matt Riedemann wrote: > >> On 11/18/2019 3:40 PM, Mohammed Naser wrote: > >>> The proposal that I had was that in mind would be for us to let teams > >>> self manage their own stable branches. I think we've reached a point > >>> where we can trust most of our community to be familiar with the > >>> stable branch policy (and let teams decide for themselves what they > >>> believe is best for the success of their own projects). > >> > >> So for a project like nova that has a separate nova-core [1] and > >> nova-stable-maint team [2] where some from [2] aren't in [1], what > >> does this mean? Drop [2] and just rely on [1]? That won't work for > >> those in nova-core that aren't familiar enough with the stable branch > >> guidelines or simply don't care to review stable branch changes, and > >> won't work for those that are in nova-stable-maint but not nova-core. > > > > I believe the proposal is to allow the Nova team to manage > > nova-stable-maint in the same way they do nova-core, not to force anyone > > to drop their stable-maint team entirely. > > I think the proposal was actually for each *-stable-maint team to manage > itself. This would avoid the situation where e.g. the TC appoints a > brand-new PTL and suddenly they get to make themselves a stable core, as > in that case the team would still have to be bootstrapped by the > stable-maint team. But it would allow those who are both closest to the > project and confirmed to be familiar with the stable guidelines to make > decisions about who else is ready to join that group. I am still finding difficult to understand the change and how it will solve the current problem. The current problem is: * Fewer contributors in the stable-maintenance team (core stable team and project side stable team) which is nothing but we have fewer contributors who understand the stable policies. * The stable policies are not the problem so we will stick with current stable policies across all the projects. Stable policies have to be maintained at single place for consistency in backports across projects. If we are moving the stable maintenance team ownership from current stable-maintenance team to project side then, how it will solve the issue, does it enable more contributors to understand the stable policy and extend the team? if yes, then why it cannot happen with current model? If the project team or PTL making its core member get more familiar with the stable policy and add as a stable core team then why it cannot happen with the current model. For example, if I am PTL or core of any project and finding hard to get my backport merged then I or my project team core should review more stable branch patches and propose them in stable team core. If we move the stable team ownership to the projects side then I think PTL is going to do the same. Ask the team members to understand the stable policies and do more review and then add them in stable core team. If any member know the stable policies then directly add. I feel that the current problem cannot be solved by moving the ownership of the team, we need to encourage more and more developers to become stable core in existing model especially from projects who find difficulties in merging their backport. One more thing, do we have data that how much time as avg it take to merge the backport and what all projects facing the backport merge issue ? -gmann > > - ZB > > >> > >> [1] https://review.opendev.org/#/admin/groups/25,members > >> [2] https://review.opendev.org/#/admin/groups/540,members > >> > > > > > From anlin.kong at gmail.com Tue Nov 19 07:53:37 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Tue, 19 Nov 2019 20:53:37 +1300 Subject: Trove Image issue In-Reply-To: <008d01d59b70$542067f0$fc6137d0$@xflowresearch.com> References: <008d01d59b70$542067f0$fc6137d0$@xflowresearch.com> Message-ID: On Sat, Nov 16, 2019 at 4:19 AM wrote: > > > Hi Team, > > > > I have deployed kolla-ansible(Stein) with Trove enable and using Trove > prebuild images to build the Database VM but VM is constantly stuck in > build state. I am not sure how to use prebuild image key which is in > https://opendev.org/openstack/trove/src/branch/master/integration/scripts/files/keys folder. > > This key file is not used any more. The Nova keypair used for creating trove instance is configured in 'nova_keypair' config option. > Can you please guide me regarding the key usage for prebuild images ( > http://tarballs.openstack.org/trove/images/) > > > > To build my own trove image on kolla-ansible, is there any specific guide > available for it? > For how to build trove guest image, please refer to the official trove doc https://docs.openstack.org/trove/latest/admin/building_guest_images.html - Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Tue Nov 19 08:46:39 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 19 Nov 2019 09:46:39 +0100 Subject: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained Message-ID: Hello Folks, It looks like gnocchi is "officially" marked as unmaintained: https://github.com/gnocchixyz/gnocchi/issues/1049 Has there been any discussion regarding how it affects OpenStack projects? And/or are there any plans to amend this situation? -yoctozepto -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Tue Nov 19 09:03:19 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Tue, 19 Nov 2019 22:03:19 +1300 Subject: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: Message-ID: We (ceilometer team) will probably add Ceilometer API and mongodb support back, considering the current Gnocchi project situation. However, Gnocchi will still be supported as a publisher in Ceilometer. - Best regards, Lingxian Kong Catalyst Cloud On Tue, Nov 19, 2019 at 9:54 PM Radosław Piliszek < radoslaw.piliszek at gmail.com> wrote: > Hello Folks, > > It looks like gnocchi is "officially" marked as unmaintained: > https://github.com/gnocchixyz/gnocchi/issues/1049 > > Has there been any discussion regarding how it affects OpenStack projects? > And/or are there any plans to amend this situation? > > -yoctozepto > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From merlin.blom at bertelsmann.de Tue Nov 19 09:15:56 2019 From: merlin.blom at bertelsmann.de (Blom, Merlin, NMU-OI) Date: Tue, 19 Nov 2019 09:15:56 +0000 Subject: AW: [RabbitMQ][cinder] Listen to messages In-Reply-To: References: Message-ID: Thank you for your answer! “Are you sure cinder use a dedicated vhost? I'm notconviced, if I'm right they all use the default vhost '/'.” Indeed it does, when deployed with openstack-ansible. But I found a way to find the exchange with RMQ Tracing: https://www.rabbitmq.com/firehose.html Using the GUI plugin I’ve got all the messages flowing through the cinder vhost and found: The exchange for the notification.* queues is “openstack” not cinder. Sometimes I ask myself if there are any kind of standards for RMQ Communication. :P This may be interesting for someone else to use in their projects. Cheers, Merlin Von: Herve Beraud Gesendet: Freitag, 15. November 2019 11:14 An: Blom, Merlin, NMU-OI Cc: openstack-discuss at lists.openstack.org Betreff: Re: [RabbitMQ][cinder] Listen to messages Le ven. 15 nov. 2019 à 10:17, Blom, Merlin, NMU-OI > a écrit : Hey there, it seems to me as if ask.openstack.org is down, so I ask my question here: I’d like to listen to oslo messages from cinder as I do for neutron and octavia to know what is going on. For me the following code worked for neutron: EXCHANGE_NAME = os.getenv('EXCHANGE_NAME', 'neutron') ROUTING_KEY = os.getenv('ROUTING_KEY', 'notifications.info ') QUEUE_NAME = os.getenv('QUEUE_NAME', 'messaging_queue') BROKER_URI = os.getenv('BROKER_URI', 'UNDEFINED') BROKER_PASSWORD = os.getenv('BROKER_PASSWORD', '') class Messages(ConsumerMixin): def __init__(self, connection): self.connection = connection return def get_consumers(self, consumer, channel): exchange = Exchange(EXCHANGE_NAME, type="topic", durable=False) queue = Queue(QUEUE_NAME, exchange, routing_key=ROUTING_KEY, durable=False, auto_delete=True, no_ack=True) return [consumer(queues=[queue], callbacks=[self.on_message])] def on_message(self, body, message): try: print(message) except Exception as e: log.info (repr(e)) if __name__ == "__main__": log.info ("Connecting to broker {}".format(BROKER_URI)) with BrokerConnection(hostname=BROKER_URI, userid='messaging', password=BROKER_PASSWORD, virtual_host='/'+EXCHANGE_NAME, heartbeat=4, failover_strategy='round-robin') as connection: Messaging(connection).run() BrokerConnection.connection.close() But on the cinder vhost (/cinder) Are you sure cinder use a dedicated vhost? I'm notconviced, if I'm right they all use the default vhost '/'. I can’t find an exchange that the code is working on. (cinder, cinder-backup, …) I tried using the rabbitmq tracer: https://www.rabbitmq.com/firehose.html And got all the cinder messages but I don’t want to use it in production because of performance issues. Does anyone have an idea how to find the correct exchange for the notification info queue in cinder? Cheers, Merlin -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5195 bytes Desc: not available URL: From merlin.blom at bertelsmann.de Tue Nov 19 09:25:11 2019 From: merlin.blom at bertelsmann.de (Blom, Merlin, NMU-OI) Date: Tue, 19 Nov 2019 09:25:11 +0000 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: Message-ID: Thanks for your work on ceilometer! The gnocchi situation is realy sad. We implemented solutions on Gnocchi and ceilometer. In my opinion you abandoned the mongodb support for performance reasons and now you are going back to it? Has mongodb made any significant performance improvements for time series data? Best regards, Merlin Von: Lingxian Kong Gesendet: Dienstag, 19. November 2019 10:03 An: Radosław Piliszek Cc: openstack-discuss Betreff: Re: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained We (ceilometer team) will probably add Ceilometer API and mongodb support back, considering the current Gnocchi project situation. However, Gnocchi will still be supported as a publisher in Ceilometer. - Best regards, Lingxian Kong Catalyst Cloud On Tue, Nov 19, 2019 at 9:54 PM Radosław Piliszek > wrote: Hello Folks, It looks like gnocchi is "officially" marked as unmaintained: https://github.com/gnocchixyz/gnocchi/issues/1049 Has there been any discussion regarding how it affects OpenStack projects? And/or are there any plans to amend this situation? -yoctozepto -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5195 bytes Desc: not available URL: From aj at suse.com Tue Nov 19 09:40:48 2019 From: aj at suse.com (Andreas Jaeger) Date: Tue, 19 Nov 2019 10:40:48 +0100 Subject: On next minimum libvirt / QEMU versions for "V" release In-Reply-To: <20191118181106.GD7032@paraplu> References: <20191118181106.GD7032@paraplu> Message-ID: <789413eb-3fde-0283-9ddb-c356879c749d@suse.com> On 18/11/2019 19.11, Kashyap Chamarthy wrote: > Heya, > > The last time we incremented versions for libvirt and QEMU was during > the "Stein" release[1]. For "Train" we didn't do any. Although we > advertized NEXT_MIN_{LIBVIRT,QEMU} versions for "Train" release to be > libvirt 4.0.0 and QEMU 2.11.0, but we actually didn't bump; we'll do > that for "Ussuri". > > But before we bump the versions for "Ussuri", we need to pick NEXT_MIN > versions for the "V" release. Based on the updated the > DistroSupportMatrix page[2], it looks we can pick the next libvirt and > QEMU versions for "V" release to the following: > > libvirt: 5.0.0 [GAed on: 15-Jan-2019] > QEMU: 4.0.0 [GAed on: 24-Apr-2019] > > I have the initial patch here[3] for comments. > > Debian, Fedora, Ubuntu[4], CentOS, RHEL currently already ship the above > versions (actually, higher than those). And it is reasonable to assume > -- but let's confirm below -- that openSUSE, SLES, and Oracle Linux > would also have the above versions available by "V" release time. > > Action Items for Linux Distros > ------------------------------ > > (a) Oracle Linux: Please update your libvirt/QEMU versions for Oracle > Linux 8? > > I couldn't find anything related to libvirt/QEMU here: > https://yum.oracle.com/oracle-linux-8.html. (My educated guess is: > the versions roughly match what's in CentOS/RHEL.) > > (b) openSUSE and SLES: Same request as above. > > Andreas Jaegaer said on #openstack-infra that the proposed versions > for 'V' release should be fine for SLES. (And by extension open > SUSE, I assume.) Yes, those look fine for SLES and openSUSE, Andreas > - - - > > Assuming Oracle Linux and SLES confirm, please let us know if there are > any objections if we pick NEXT_MIN_* versions for the OpenStack "V" > release to be libvirt: 5.0.0 and QEMU: 4.0.0. > > Comments / alternative proposals welcome :-) > > > [1] https://opendev.org/openstack/nova/commit/489b5f762e -- Pick next > minimum libvirt / QEMU versions for "T" release, 2018-09-25) > [2] https://wiki.openstack.org/wiki/LibvirtDistroSupportMatrix > [3] https://review.opendev.org/694821 -- [RFC] Pick NEXT_MIN > libvirt/QEMU versions for "V" release > [4] For Ubuntu, I updated the versions based on what is in the Cloud > Archive repo for "Bionic" (the LTS) release: > http://reqorts.qa.ubuntu.com/reports/ubuntu-server/cloud-archive/train_versions.html > > -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From tobias.urdin at binero.se Tue Nov 19 09:51:34 2019 From: tobias.urdin at binero.se (Tobias Urdin) Date: Tue, 19 Nov 2019 10:51:34 +0100 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: Message-ID: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> It sure is, we as well abandoned the MongoDB backend for Gnocchi which works pretty well. Would be a shame if a migration back would be required, maybe we can get a discussion going on a more long-term solution as was discussed when talking about the future of Ceilometer. Supporting Gnocchi or moving to another open source project as a storage backend that is stable and maintained. There were (and still is? Though unofficial out-of-tree) storage backends for Ceilometer that publishes to InfluxDB. I were never able to follow-up on the meetings (I probably missed a lot of it) regarding the Ceilometer roadmap [1]. [1] https://etherpad.openstack.org/p/telemetry-train-roadmap On 11/19/19 10:29 AM, Blom, Merlin, NMU-OI wrote: > > Thanks for your work on ceilometer! > > The gnocchi situation is realy sad. > > We implemented solutions on Gnocchi and ceilometer. > > In my opinion you abandoned the mongodb support for performance > reasons and now you are going back to it? > > Has mongodb made any significant performance improvements for time > series data? > > Best regards, > > Merlin > > *Von:*Lingxian Kong > *Gesendet:* Dienstag, 19. November 2019 10:03 > *An:* Radosław Piliszek > *Cc:* openstack-discuss > *Betreff:* Re: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi > unmaintained > > We (ceilometer team) will probably add Ceilometer API and mongodb > support back, considering the current Gnocchi project situation. > However, Gnocchi will still be supported as a publisher in Ceilometer. > > - > > Best regards, > Lingxian Kong > > Catalyst Cloud > > On Tue, Nov 19, 2019 at 9:54 PM Radosław Piliszek > > wrote: > > Hello Folks, > > It looks like gnocchi is "officially" marked as unmaintained: > https://github.com/gnocchixyz/gnocchi/issues/1049 > > > Has there been any discussion regarding how it affects OpenStack > projects? And/or are there any plans to amend this situation? > > -yoctozepto > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luka.peschke at objectif-libre.com Tue Nov 19 10:20:45 2019 From: luka.peschke at objectif-libre.com (Luka Peschke) Date: Tue, 19 Nov 2019 11:20:45 +0100 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> Message-ID: <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> My two cents from my experience on cloudkitty: We had to implement several storage drivers, and faced more or less the same issues as the telemetry team did before us. We had a gnocchi driver at some point, which worked pretty well, but ended up being very hacky because gnocchi lacked flexibility for non-openstack metrics (ie. data models which aren't resource-based). We ended up implementing a driver for InfluxDB which has relatively good perfs. But given that the open-source version of InfluxDB does not support HA/clustering, we also implemented an experimental Elasticsearch driver (which requires ES>=6.5). The recent ES releases have really improved the support for timeseries, and it is the storage backend for elastic beats. Given that many openstack deployments already have an Elasticsearch deployment for logs, and the large adoption of ES, it'd be my choice for a new Ceilometer storage driver. However, Gnocchi is pretty stable in 4.3, and well integrated with Ceilometer. Wouldn't it be less effort to keep it functional for now (ie only bug/security fixes, no new features), instead of re-integrating deleted features to ceilometer ? Cheers, -- Luka Peschke (peschk_l) Le 2019-11-19 10:51, Tobias Urdin a écrit : > It sure is, we as well abandoned the MongoDB backend for Gnocchi > which works pretty well. > > Would be a shame if a migration back would be required, maybe we can > get a discussion going on a more > long-term solution as was discussed when talking about the future of > Ceilometer. > > Supporting Gnocchi or moving to another open source project as a > storage backend that is stable and maintained. > There were (and still is? Though unofficial out-of-tree) storage > backends for Ceilometer that publishes to InfluxDB. > > I were never able to follow-up on the meetings (I probably missed a > lot of it) regarding the Ceilometer roadmap [1]. > > [1] https://etherpad.openstack.org/p/telemetry-train-roadmap > > On 11/19/19 10:29 AM, Blom, Merlin, NMU-OI wrote: > >> Thanks for your work on ceilometer! >> >> The gnocchi situation is realy sad. >> >> We implemented solutions on Gnocchi and ceilometer. >> >> In my opinion you abandoned the mongodb support for performance >> reasons and now you are going back to it? >> >> Has mongodb made any significant performance improvements for time >> series data? >> >> Best regards, >> >> Merlin >> >> VON: Lingxian Kong >> GESENDET: Dienstag, 19. November 2019 10:03 >> AN: Radosław Piliszek >> CC: openstack-discuss >> BETREFF: Re: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi >> unmaintained >> >> We (ceilometer team) will probably add Ceilometer API and mongodb >> support back, considering the current Gnocchi project situation. >> However, Gnocchi will still be supported as a publisher in Ceilometer. >> >> - >> >> Best regards, >> Lingxian Kong >> >> Catalyst Cloud >> >> On Tue, Nov 19, 2019 at 9:54 PM Radosław Piliszek >> wrote: >> >>> Hello Folks, >>> >>> It looks like gnocchi is "officially" marked as unmaintained: >>> https://github.com/gnocchixyz/gnocchi/issues/1049 [1] >>> >>> Has there been any discussion regarding how it affects OpenStack >>> projects? And/or are there any plans to amend this situation? >>> >>> -yoctozepto > > > > Links: > ------ > [1] > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_gnocchixyz_gnocchi_issues_1049&d=DwMFaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=hTUN4-Trlb-8Fh11dR6m5VD1uYA15z7v9WL8kYigkr8&m=czRC3qwwRqT3qKzfXMSVl78G4Sk8QVwT93okCgkBe34&s=Ob7yLjlWUAz9-8oMikC_QU9ivZBvtBKkqqFEvceGGM0&e= From skaplons at redhat.com Tue Nov 19 10:26:15 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 19 Nov 2019 11:26:15 +0100 Subject: [all][neutron][neutron-fwaas] Maintainers needed Message-ID: <20191119102615.oq46xojyhoybulna@skaplons-mac> Hi, Over the past couple of cycles we have noticed that new contributions and maintenance efforts for neutron-fwaas project were almost non existent. This impacts patches for bug fixes, new features and reviews. The Neutron core team is trying to at least keep the CI of this project healthy, but we don’t have enough knowledge about the details of the neutron-fwaas code base to review more complex patches. During the PTG in Shanghai we discussed that with operators and TC members during the forum session [1] and later within the Neutron team during the PTG session [2]. During these discussions, with the help of operators and TC members, we reached the conclusion that we need to have someone responsible for maintaining project. This doesn’t mean that the maintainer needs to spend full time working on this project. Rather, we need someone to be the contact person for the project, who takes care of the project’s CI and review patches. Of course that’s only a minimal requirement. If the new maintainer works on new features for the project, it’s even better :) If we don’t have any new maintainer(s) before milestone Ussuri-2, which is Feb 10 - Feb 14 according to [3], we will need to mark neutron-fwaas as deprecated and in “V” cycle we will propose to move the project from the Neutron stadium, hosted in the “openstack/“ namespace, to the unofficial projects hosted in the “x/“ namespace. So if You are using this project now, or if You have customers who are using it, please consider the possibility of maintaining it. Otherwise, please be aware that it is highly possible that the project will be deprecated and moved out from the official OpenStack projects. [1] https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored - Lines 379-421 [3] https://releases.openstack.org/ussuri/schedule.html -- Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Tue Nov 19 10:29:18 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 19 Nov 2019 11:29:18 +0100 Subject: [all][neutron][networking-bagpipe][networking-bgpvpn] Maintainers needed Message-ID: <20191119102918.b5cmfecqjf746bqi@skaplons-mac> Hi, Over the past couple of cycles we have noticed that new contributions and maintenance efforts for networking-bagpipe and networking-bgpvpn were almost non existent. This impacts patches for bug fixes, new features and reviews. The Neutron core team is trying to at least keep the CI of this project healthy, but we don’t have enough knowledge about the details of the code base to review more complex patches. During the PTG in Shanghai we discussed that with operators and TC members during the forum session [1] and later within the Neutron team during the PTG session [2]. During these discussions, with the help of operators and TC members, we reached the conclusion that we need to have someone responsible for maintaining those projects. This doesn’t mean that the maintainer needs to spend full time working on those projects. Rather, we need someone to be the contact person for the project, who takes care of the project’s CI and review patches. Of course that’s only a minimal requirement. If the new maintainer works on new features for the project, it’s even better :) If we don’t have any new maintainer(s) before milestone Ussuri-2, which is Feb 10 - Feb 14 according to [3], we will need to mark networking-bgpvpn and networking-bagpipe as deprecated and in “V” cycle we will propose to move the projects from the Neutron stadium, hosted in the “openstack/“ namespace, to the unofficial projects hosted in the “x/“ namespace. So if You are using this project now, or if You have customers who are using it, please consider the possibility of maintaining it. Otherwise, please be aware that it is highly possible that the project will be deprecated and moved out from the official OpenStack projects. [1] https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored - Lines 379-421 [3] https://releases.openstack.org/ussuri/schedule.html -- Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Tue Nov 19 10:41:37 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 19 Nov 2019 11:41:37 +0100 Subject: [all][neutron][neutron-vpnaas] Maintainers needed Message-ID: <20191119104137.pkra6hehfhdjjhh3@skaplons-mac> Hi, Over the past couple of cycles we have noticed that new contributions and maintenance efforts for neutron-vpnaas were almost non existent. This impacts patches for bug fixes, new features and reviews. The Neutron core team is trying to at least keep the CI of this project healthy, but we don’t have enough knowledge about the details of the neutron-vpnaas code base to review more complex patches. During the PTG in Shanghai we discussed that with operators and TC members during the forum session [1] and later within the Neutron team during the PTG session [2]. During these discussions, with the help of operators and TC members, we reached the conclusion that we need to have someone responsible for maintaining project. This doesn’t mean that the maintainer needs to spend full time working on this project. Rather, we need someone to be the contact person for the project, who takes care of the project’s CI and review patches. Of course that’s only a minimal requirement. If the new maintainer works on new features for the project, it’s even better :) If we don’t have any new maintainer(s) before milestone Ussuri-2, which is Feb 10 - Feb 14 according to [3], we will need to mark neutron-vpnaas as deprecated and in “V” cycle we will propose to move the project from the Neutron stadium, hosted in the “openstack/“ namespace, to the unofficial projects hosted in the “x/“ namespace. So if You are using this project now, or if You have customers who are using it, please consider the possibility of maintaining it. Otherwise, please be aware that it is highly possible that the project will be deprecated and moved out from the official OpenStack projects. [1] https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored - Lines 379-421 [3] https://releases.openstack.org/ussuri/schedule.html -- Slawek Kaplonski Senior software engineer Red Hat From aaronzhu1121 at gmail.com Tue Nov 19 10:56:24 2019 From: aaronzhu1121 at gmail.com (Rong Zhu) Date: Tue, 19 Nov 2019 18:56:24 +0800 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> Message-ID: I am sorry to the telemetry project happened before, but now current telemetry core team had decided to add ceilometer api and mongodb support and cpu_utils support back. Gnoochi will still support as the backed. All the mentioned database (influxdb, ES....), we would happy everyone to submit patches to support as the database backed in ceilometer. I created a storyboard to track ceilometer Ussuri release todo things in [0]. Free free to add things you want to do in Ussuri release. Due to I will have a vacation this week, I can't hold this week's meeting, we can discuss more in the next irc meeting at 5 Dec 2:00 UTC. [0] https://storyboard.openstack.org/#!/board/205 Luka Peschke 于2019年11月19日 周二18:24写道: > My two cents from my experience on cloudkitty: We had to implement > several storage drivers, and faced more or less the same issues as the > telemetry team did before us. We had a gnocchi driver at some point, > which worked pretty well, but ended up being very hacky because gnocchi > lacked flexibility for non-openstack metrics (ie. data models which > aren't resource-based). > > We ended up implementing a driver for InfluxDB which has relatively > good perfs. But given that the open-source version of InfluxDB does not > support HA/clustering, we also implemented an experimental Elasticsearch > driver (which requires ES>=6.5). > > The recent ES releases have really improved the support for timeseries, > and it is the storage backend for elastic beats. > > Given that many openstack deployments already have an Elasticsearch > deployment for logs, and the large adoption of ES, it'd be my choice for > a new Ceilometer storage driver. > > However, Gnocchi is pretty stable in 4.3, and well integrated with > Ceilometer. Wouldn't it be less effort to keep it functional for now (ie > only bug/security fixes, no new features), instead of re-integrating > deleted features to ceilometer ? > > Cheers, > > -- > Luka Peschke (peschk_l) > > Le 2019-11-19 10:51, Tobias Urdin a écrit : > > It sure is, we as well abandoned the MongoDB backend for Gnocchi > > which works pretty well. > > > > Would be a shame if a migration back would be required, maybe we can > > get a discussion going on a more > > long-term solution as was discussed when talking about the future of > > Ceilometer. > > > > Supporting Gnocchi or moving to another open source project as a > > storage backend that is stable and maintained. > > There were (and still is? Though unofficial out-of-tree) storage > > backends for Ceilometer that publishes to InfluxDB. > > > > I were never able to follow-up on the meetings (I probably missed a > > lot of it) regarding the Ceilometer roadmap [1]. > > > > [1] https://etherpad.openstack.org/p/telemetry-train-roadmap > > > > On 11/19/19 10:29 AM, Blom, Merlin, NMU-OI wrote: > > > >> Thanks for your work on ceilometer! > >> > >> The gnocchi situation is realy sad. > >> > >> We implemented solutions on Gnocchi and ceilometer. > >> > >> In my opinion you abandoned the mongodb support for performance > >> reasons and now you are going back to it? > >> > >> Has mongodb made any significant performance improvements for time > >> series data? > >> > >> Best regards, > >> > >> Merlin > >> > >> VON: Lingxian Kong > >> GESENDET: Dienstag, 19. November 2019 10:03 > >> AN: Radosław Piliszek > >> CC: openstack-discuss > >> BETREFF: Re: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi > >> unmaintained > >> > >> We (ceilometer team) will probably add Ceilometer API and mongodb > >> support back, considering the current Gnocchi project situation. > >> However, Gnocchi will still be supported as a publisher in Ceilometer. > >> > >> - > >> > >> Best regards, > >> Lingxian Kong > >> > >> Catalyst Cloud > >> > >> On Tue, Nov 19, 2019 at 9:54 PM Radosław Piliszek > >> wrote: > >> > >>> Hello Folks, > >>> > >>> It looks like gnocchi is "officially" marked as unmaintained: > >>> https://github.com/gnocchixyz/gnocchi/issues/1049 [1] > >>> > >>> Has there been any discussion regarding how it affects OpenStack > >>> projects? And/or are there any plans to amend this situation? > >>> > >>> -yoctozepto > > > > > > > > Links: > > ------ > > [1] > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_gnocchixyz_gnocchi_issues_1049&d=DwMFaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=hTUN4-Trlb-8Fh11dR6m5VD1uYA15z7v9WL8kYigkr8&m=czRC3qwwRqT3qKzfXMSVl78G4Sk8QVwT93okCgkBe34&s=Ob7yLjlWUAz9-8oMikC_QU9ivZBvtBKkqqFEvceGGM0&e= > > -- Thanks, Rong Zhu -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Tue Nov 19 11:05:28 2019 From: thierry at openstack.org (Thierry Carrez) Date: Tue, 19 Nov 2019 12:05:28 +0100 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> Message-ID: <80b9c92c-be69-7c96-291a-702a7a8c6498@openstack.org> Ghanshyam Mann wrote: > [...] > I am still finding difficult to understand the change and how it will solve the current problem. > > The current problem is: > * Fewer contributors in the stable-maintenance team (core stable team and project side stable team) > which is nothing but we have fewer contributors who understand the stable policies. > > * The stable policies are not the problem so we will stick with current stable policies across all the projects. > Stable policies have to be maintained at single place for consistency in backports across projects. > [...] I don't think that this the problem this change wants to solve. Currently the stable-core team is perceived as a bottleneck to getting more people into project-specific stable teams, or keeping those teams membership up to date. As a result stable maintenance is still seen in some teams as an alien thing, rather than an integral team duty. I suspect that by getting out of the badge-granting game, stable-core could focus more on stable policy definition and education, and review how well or bad each team does on the stable front. Because reviewing backports for stable branch suitability is just one part of doing stable branch right -- the other is to actively backport relevant patches. Personally, the main reason I support this change is that we have too much "ask for permission" things in OpenStack today, something that was driven by a code-review-for-everything culture. So the more we can remove the need to ask for permission to do some work, the better. -- Thierry Carrez (ttx) From witold.bedyk at suse.com Tue Nov 19 11:36:06 2019 From: witold.bedyk at suse.com (Witek Bedyk) Date: Tue, 19 Nov 2019 12:36:06 +0100 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> Message-ID: <45ed8eb5-1c90-b3f8-5c29-1cb319fd5f5b@suse.com> Another approach could be to use Monasca as the back end. The publisher has been recently added upstream [1]. It uses InfluxDB as the time series DB. The message queue between the API and TSDB adds resiliency and allows setting up InfluxDB in HA. What you get on top is a generic, multi-tenant monitoring solution. Cloud users can install their own agents, push own application metrics and set up own alerting per project. Support for auto-scaling with Heat templates is included. Greetings Witek [1] https://docs.openstack.org/ceilometer/latest/admin/telemetry-system-architecture.html#supported-databases On 11/19/19 10:51 AM, Tobias Urdin wrote: > It sure is, we as well abandoned the MongoDB backend for Gnocchi which > works pretty well. > > Would be a shame if a migration back would be required, maybe we can get > a discussion going on a more > long-term solution as was discussed when talking about the future of > Ceilometer. > > Supporting Gnocchi or moving to another open source project as a storage > backend that is stable and maintained. > There were (and still is? Though unofficial out-of-tree) storage > backends for Ceilometer that publishes to InfluxDB. > > I were never able to follow-up on the meetings (I probably missed a lot > of it) regarding the Ceilometer roadmap [1]. > > [1] https://etherpad.openstack.org/p/telemetry-train-roadmap > > On 11/19/19 10:29 AM, Blom, Merlin, NMU-OI wrote: >> >> Thanks for your work on ceilometer! >> >> The gnocchi situation is realy sad. >> >> We implemented solutions on Gnocchi and ceilometer. >> >> In my opinion you abandoned the mongodb support for performance >> reasons and now you are going back to it? >> >> Has mongodb made any significant performance improvements for time >> series data? >> >> Best regards, >> >> Merlin >> >> *Von:*Lingxian Kong >> *Gesendet:* Dienstag, 19. November 2019 10:03 >> *An:* Radosław Piliszek >> *Cc:* openstack-discuss >> *Betreff:* Re: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi >> unmaintained >> >> We (ceilometer team) will probably add Ceilometer API and mongodb >> support back, considering the current Gnocchi project situation. >> However, Gnocchi will still be supported as a publisher in Ceilometer. >> >> - >> >> Best regards, >> Lingxian Kong >> >> Catalyst Cloud >> >> On Tue, Nov 19, 2019 at 9:54 PM Radosław Piliszek >> > wrote: >> >> Hello Folks, >> >> It looks like gnocchi is "officially" marked as unmaintained: >> https://github.com/gnocchixyz/gnocchi/issues/1049 >> >> >> Has there been any discussion regarding how it affects OpenStack >> projects? And/or are there any plans to amend this situation? >> >> -yoctozepto >> > From romain at ledisez.net Tue Nov 19 13:18:24 2019 From: romain at ledisez.net (Romain LE DISEZ) Date: Tue, 19 Nov 2019 14:18:24 +0100 Subject: =?utf-8?q?Re=3A?==?utf-8?q?_AW=3A?= =?utf-8?q?_=5Bgnocchi=5D=5Btelemetry=5D=5Bceilometer=5D=5Bcloudkitty=5D?= Gnocchi unmaintained In-Reply-To: Message-ID: Hi, at OVH, we kept the mongodb backend (understand: we are currently running an old version of ceilometer-collector). But we modified it to implement real-time aggregation so that we can then get the interresting values immediatly instead of running long calculations when we need them (we use Ceilometer for billing). To do that, mongodb provides some operators such as $inc and $max: https://docs.mongodb.com/manual/reference/operator/update-field/ This implementation scales well, we currently handle more than 20 000 mongodb updates per seconds without problems. (The issue is actually ceilometer-collector consuming too many CPU, forcing us to scale the number of servers to handle the load) -- Romain LE DISEZ From deepa.kr at fingent.com Tue Nov 19 05:39:40 2019 From: deepa.kr at fingent.com (Deepa) Date: Tue, 19 Nov 2019 11:09:40 +0530 Subject: Freezer Project Update In-Reply-To: <14B06106-F3AB-47DA-BD35-99B18383C147@dkrz.de> References: <003201d59aaf$cf415fa0$6dc41ee0$@fingent.com> <14B06106-F3AB-47DA-BD35-99B18383C147@dkrz.de> Message-ID: <001d01d59e9b$bf07b310$3d171930$@fingent.com> Hello Amjad Thanks a lot for the reply. It will great if you can share me link of the document you followed to install it ,also were you able to incorporate Freezer in dashboard. Regards, Deepa K R From: Amjad Kotobi Sent: Saturday, November 16, 2019 1:12 AM To: Deepa Cc: openstack-dev at lists.openstack.org Subject: Re: Freezer Project Update Hi, This project is pretty much in production state, from last summit it got active again from developer ends, we are using it for backup solution too. Documentation side isn’t that bright, very soon gonna get updated, anyhow you are able to install as standalone project in instance, I did it manually, didn’t use any provision tools. Let me know for specific part of deployment that is not clear. Amjad On 14. Nov 2019, at 06:53, Deepa > wrote: Hello Team Good Day I am Deepa from Fingent Global Solutions and we are a big fan of Openstack and we do have 4 + openstack setup (including production) We have deployed Openstack using juju and Maas .So when we check for backup feasibility other than cinder-backup we were able to see Freezer Project. But couldn’t find any charms for it in juju charms. Also there isn’t a clear documentation on how to install freezer . https://docs.openstack.org/releasenotes/freezer/train.html. No proper release notes in the latest version as well. Can you please tell me whether this project is in developing state? Whether charms will be added to juju in future. Can you also share a proper documentation on how to install Freezer in cluster setup. Thanks for your help. Regards, Deepa K R -------------- next part -------------- An HTML attachment was scrubbed... URL: From nsitlani03 at gmail.com Tue Nov 19 16:06:50 2019 From: nsitlani03 at gmail.com (Namrata Sitlani) Date: Tue, 19 Nov 2019 21:36:50 +0530 Subject: [magnum] Kubernetes cluster issue Message-ID: Hello Folks, >From Thursday last week (Nov 14), Magnum is unable to spin up working Kubernetes clusters. We run on Rocky Openstack release. All our Kubernetes pods show CrashLoopBackOff status. We use the following commands to create the Kubernetes clusters : http://paste.openstack.org/show/786348/ . The deployment fails with the following output : http://paste.openstack.org/show/786287/. We tried deploying Kubernetes v1.13.12, v1.14.8, v1.15.5 and v1.16.2 without success. However, if we use version v1.14.6 we can successfully deploy our clusters. Unfortunately, we cannot use v1.14.6 in production because it is not patched for the CVE-2019-11253 vulnerability . Since this stopped working for us on Thursday, we think that the image update https://hub.docker.com/r/openstackmagnum/kubernetes-apiserver/tags done 6 days ago is the culprit. (We previously deployed clusters with versions v1.15.5 and v1.14.8 successfully.) Can you please confirm our findings and help us find a way forward? We can provide more logs if needed. Thank you very much, Namrata Sitlani -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Tue Nov 19 16:31:51 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 19 Nov 2019 10:31:51 -0600 Subject: [all][tc] Planning for dropping the Python2 support in OpenStack In-Reply-To: <16e1d9f5df9.e9dfc74911451.4806654031763681992@ghanshyammann.com> References: <16dd0a42b8d.e847dd3e124645.6364180516762707559@ghanshyammann.com> <16dfe4467a4.db6f72ec168733.7542022367023887408@ghanshyammann.com> <16dff41292e.11b7e81b1177136.7669214833037569841@ghanshyammann.com> <16e19144cf0.f6b07849311271.7773306777497055114@ghanshyammann.com> <20191030004035.rsuegdsij2eezps3@mthode.org> <16e1d9f5df9.e9dfc74911451.4806654031763681992@ghanshyammann.com> Message-ID: <16e848187b9.10540b0f433809.8696342144139236610@ghanshyammann.com> Hello Everyone, I would like to notify all projects about important discussions and agreement happened today to move forward for cross projects dependencies and devstack default installation or py2 drop work. * It is now an official community goal for ussuri[1] and except Swift, all projects agreed to drop py2 as per schedule in goal. * If any project (openstack services which are planned to drop from now till m-1) drop the py2 with removing the py2 requirements and min python version in setup.cfg which makes that project uninstallable in cross projects jobs then: Options 1 (suggested): broken projects has to drop the py2 support/testing immediately. Options 2: if it breaks most of the projects, for example, nova or any other default projects become uninstallable on py2 then we can half-revert[2] the changes from the project caused the break and wait till m1 to merge them back. * Making Devstack to py3 by default TODAY (otherwise it can break gate everyday). **Devstack default is py2 currently and it was planned to make py3 by default after m-1. But after seeing today gate break, it is hard to maintain devstack-py2-by-default. because projects are dropping the py2 support and devstack py2 by default cause the problem[3]. Today it is from nova side and It can happen due to any projects dropping py2 or I should say it can happen every day as py2 drop patches get merged. ** I am ok to make Devstack py3 by default today which is this patch - https://review.opendev.org/#/c/649097/ ** Action for projects who want to keep testing py2 job till m-1 or whenever they plan to drop py2: Explicitly disable the py3 in their py2 jobs (USE_PYTHON3: False). * I have pushed the py2 drop patches on almost all the OpenStack services[4] which migrate py2 jobs to py3, remove the py2 requirement but do not mention the min python version in setup.cfg (which can ben done by followup if projects want to do that). I will suggest to merge them asap to avoid any gate break due to cross projects dependency. [1] https://governance.openstack.org/tc/goals/selected/ussuri/drop-py27.html [2] half-revert means only change in requirements.txt and setup.cfg - https://review.opendev.org/#/c/695007/ [3] https://bugs.launchpad.net/nova/+bug/1853166 [4] https://review.opendev.org/#/q/topic:drop-py27-support+(status:open+OR+status:merged) -gmann ---- On Wed, 30 Oct 2019 12:03:33 -0500 Ghanshyam Mann wrote ---- > ---- On Wed, 30 Oct 2019 06:59:19 -0500 Sean Mooney wrote ---- > > On Tue, 2019-10-29 at 19:40 -0500, Matthew Thode wrote: > > > On 19-10-29 14:53:11, Ghanshyam Mann wrote: > > > > ---- On Thu, 24 Oct 2019 14:32:03 -0500 Ghanshyam Mann wrote ---- > > > > > Hello Everyone, > > > > > > > > > > We had good amount of discussion on the final plan and schedule in today's TC office hour[1]. > > > > > > > > > > I captured the agreement on each point in etherpad (you can see the AGREE:). Also summarizing > > > > > the discussions here. Imp point is if your projects are planning to keep the py2.7 support then do not delay > > > > > to tell us. Reply on this ML thread or add your project in etherpad. > > > > > > > > > > - Projects can start dropping the py2.7 support. Common lib and testing tools need to wait until milestone-2. > > > > > ** pepe8 job to be included in openstack-python3-ussuri-jobs-* templates - > > > > https://review.opendev.org/#/c/688997/ > > > > > ** You can drop openstack-python-jobs template and start using ussuri template once 688997 patch is merged. > > > > > ** Cross projects dependency (if any ) can be sync up among dependent projects. > > > > > > > > > > - I will add this plan and schedule as a community goal. The goal is more about what all things to do and when. > > > > > ** If any project keeping the support then it has to be notified explicitly for its consumer. > > > > > > > > > > - Schedule: > > > > > The schedule is aligned with the Ussuri cycle milestone[2]. I will add the plan in the release schedule also. > > > > > Phase-1: Dec 09 - Dec 13 R-22 Ussuri-1 milestone > > > > > ** Project to start dropping the py2 support along with all the py2 CI jobs. > > > > > Phase-2: Feb 10 - Feb 14 R-13 Ussuri-2 milestone > > > > > ** This includes Oslo, QA tools (or any other testing tools), common lib (os-brick), Client library. > > > > > ** This will give enough time to projects to drop the py2 support. > > > > > Phase-3: Apr 06 - Apr 10 R-5 Ussuri-3 milestone > > > > > ** Final audit on Phase-1 and Phase-2 plan and make sure everything is done without breaking anything. > > > > > This is enough time to measure such break or anything extra to do before ussuri final release. > > > > > > > > > > Other discussions points and agreement: > > > > > - Projects want to keep python 2 support and need oslo, QA or any other dependent projects/lib support: > > > > > ** swift. AI: gmann to reach out to swift team about the plan and exact required things from its dependency > > > > > (the common lib/testing tool). > > > > > > > > I chated with timburke on IRC about things required by swift to keep the py2.7 support[1]. Below are > > > > client lib/middleware swift required for py2 testing. > > > > @timburke, feel free to update if any missing point. > > > > > > > > - devstack. able to keep running swift on py2 and rest all services can be on py3 > > > > - keystonemiddleware and its dependency > > > > - keystoneclient and openstackclient (dep of keystonemiddleware) > > > > - castellan and barbicanclient > > > > > > > > > > > > As those lib/middleware going to drop the py2.7 support in phase-2, we need to cap them for swift. > > > > I think capping them for python2.7 in upper constraint file would not affect any other users but Matthew Thode can > > > > explain better how that will work from the requirement constraint perspective. > > > > > > > > [1] > > > > http://eavesdrop.openstack.org/irclogs/%23openstack-swift/%23openstack-swift.2019-10-28.log.html#t2019-10-28T16:37:33 > > > > > > > > -gmann > > > > > > > > > > ya, there are examples already for libs that have dropped py2 support. > > > What you need to do is update global requirements to be something like > > > the following. > > > > > > sphinx!=1.6.6,!=1.6.7,<2.0.0;python_version=='2.7' # BSD > > > sphinx!=1.6.6,!=1.6.7,!=2.1.0;python_version>='3.4' # BSD > > > > > > or > > > > > > keyring<19.0.0;python_version=='2.7' # MIT/PSF > > > keyring;python_version>='3.4' # MIT/PSF > > on a related note os-vif is blocked form running tempest jobs under python 3 > > until https://review.opendev.org/#/c/681029/ is merged due to > > https://zuul.opendev.org/t/openstack/build/4ff60d6bd2f24782abeb12cc7bdb8013/log/controller/logs/screen-q-agt.txt.gz#308-318 > > > > i think this issue will affect any job that install proejcts that use privsep using the required-proejcts section of the > > zuul job definition. adding a project to required-proejcts sechtion adds it to the LIBS_FROM_GIT varible in devstack. > > this inturn istalls it twice due to https://review.opendev.org/#/c/418135/ . the side effect of this is that the > > privsep helper script gets installed under python2 and the neutron ageint in this case gets install under python 3 so > > when it trys to spawn the privsep deamon and invoke commands it typically expodes due to dependcy issues or in this case > > because it failed to drop privileges correctly. > > > > so as part of phase 1 we need to merge https://review.opendev.org/#/c/681029/ so that lib project that use required- > > projects to run with master of project that comsume it and support depends-on can move to python 3 tempest jobs. > > Thanks for raising this. I agree on not falling back to py2 in Ussuri, I approved 681029. > > -gmann > > > > > > > > > > > > > > > > > From bharat at stackhpc.com Tue Nov 19 16:37:34 2019 From: bharat at stackhpc.com (Bharat Kunwar) Date: Tue, 19 Nov 2019 16:37:34 +0000 Subject: [magnum] Kubernetes cluster issue In-Reply-To: References: Message-ID: <3D89EC73-47ED-4ED8-AF08-4BA549E90989@stackhpc.com> Hi Namrata, This is a known issue being tracked under this story: https://storyboard.openstack.org/#!/story/2006846 As I said before on IRC, the only known fix at the moment is to run Magnum Train with `use_podman=true` label. The solution for atomic is actively being investigated. Best Bharat > On 19 Nov 2019, at 16:06, Namrata Sitlani wrote: > > Hello Folks, > > From Thursday last week (Nov 14), Magnum is unable to spin up working Kubernetes clusters. We run on Rocky Openstack release. > > All our Kubernetes pods show CrashLoopBackOff status. > > We use the following commands to create the Kubernetes clusters : http://paste.openstack.org/show/786348/ . > > The deployment fails with the following output : http://paste.openstack.org/show/786287/ . > > We tried deploying Kubernetes v1.13.12, v1.14.8, v1.15.5 and v1.16.2 without success. However, if we use version v1.14.6 we can successfully deploy our clusters. > Unfortunately, we cannot use v1.14.6 in production because it is not patched for the CVE-2019-11253 vulnerability . > > Since this stopped working for us on Thursday, we think that the image update https://hub.docker.com/r/openstackmagnum/kubernetes-apiserver/tags done 6 days ago is the culprit. > (We previously deployed clusters with versions v1.15.5 and v1.14.8 successfully.) > > Can you please confirm our findings and help us find a way forward? We can provide more logs if needed. > > Thank you very much, > Namrata Sitlani -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.page at canonical.com Tue Nov 19 16:48:50 2019 From: james.page at canonical.com (James Page) Date: Tue, 19 Nov 2019 16:48:50 +0000 Subject: Freezer Project Update In-Reply-To: <14B06106-F3AB-47DA-BD35-99B18383C147@dkrz.de> References: <003201d59aaf$cf415fa0$6dc41ee0$@fingent.com> <14B06106-F3AB-47DA-BD35-99B18383C147@dkrz.de> Message-ID: Hello On Fri, Nov 15, 2019 at 7:43 PM Amjad Kotobi wrote: > Hi, > > This project is pretty much in production state, from last summit it got > active again from developer ends, we are using it for backup solution too. > Great to hear that Freezer is getting some increased developer focus! > Documentation side isn’t that bright, very soon gonna get updated, anyhow > you are able to install as standalone project in instance, I did it > manually, didn’t use any provision tools. > Let me know for specific part of deployment that is not clear. > > Amjad > > On 14. Nov 2019, at 06:53, Deepa wrote: > > Hello Team > > Good Day > > I am Deepa from Fingent Global Solutions and we are a big fan of Openstack > and we do have 4 + openstack setup (including production) > We have deployed Openstack using juju and Maas .So when we check for > backup feasibility other than cinder-backup we were able to see > Freezer Project. But couldn’t find any charms for it in juju charms. Also > there isn’t a clear documentation on how to install freezer . > https://docs.openstack.org/releasenotes/freezer/train.html. No proper > release notes in the latest version as well. > Can you please tell me whether this project is in developing state? > Whether charms will be added to juju in future. > > Freezer is not currently on the plan for OpenStack Charms for Ussuri. Better install documentation and support from Linux distros would be a good first step in the right direction. Cheers James -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Tue Nov 19 17:58:12 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 19 Nov 2019 12:58:12 -0500 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <80b9c92c-be69-7c96-291a-702a7a8c6498@openstack.org> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <80b9c92c-be69-7c96-291a-702a7a8c6498@openstack.org> Message-ID: On Tue, Nov 19, 2019 at 6:16 AM Thierry Carrez wrote: > > Ghanshyam Mann wrote: > > [...] > > I am still finding difficult to understand the change and how it will solve the current problem. > > > > The current problem is: > > * Fewer contributors in the stable-maintenance team (core stable team and project side stable team) > > which is nothing but we have fewer contributors who understand the stable policies. > > > > * The stable policies are not the problem so we will stick with current stable policies across all the projects. > > Stable policies have to be maintained at single place for consistency in backports across projects. > > [...] > I don't think that this the problem this change wants to solve. > > Currently the stable-core team is perceived as a bottleneck to getting > more people into project-specific stable teams, or keeping those teams > membership up to date. As a result stable maintenance is still seen in > some teams as an alien thing, rather than an integral team duty. > > I suspect that by getting out of the badge-granting game, stable-core > could focus more on stable policy definition and education, and review > how well or bad each team does on the stable front. Because reviewing > backports for stable branch suitability is just one part of doing stable > branch right -- the other is to actively backport relevant patches. > > Personally, the main reason I support this change is that we have too > much "ask for permission" things in OpenStack today, something that was > driven by a code-review-for-everything culture. So the more we can > remove the need to ask for permission to do some work, the better. For context, I thought I'd gather my thoughts to explain the idea best and woke up to this well summarized email by Thierry. I agree with this and the intention is indeed what Thierry is mentioning here. > -- > Thierry Carrez (ttx) > From kchamart at redhat.com Tue Nov 19 18:23:43 2019 From: kchamart at redhat.com (Kashyap Chamarthy) Date: Tue, 19 Nov 2019 19:23:43 +0100 Subject: On next minimum libvirt / QEMU versions for "V" release In-Reply-To: <789413eb-3fde-0283-9ddb-c356879c749d@suse.com> References: <20191118181106.GD7032@paraplu> <789413eb-3fde-0283-9ddb-c356879c749d@suse.com> Message-ID: <20191119182343.GA32458@paraplu> On Tue, Nov 19, 2019 at 10:40:48AM +0100, Andreas Jaeger wrote: > On 18/11/2019 19.11, Kashyap Chamarthy wrote: [...] > > Action Items for Linux Distros > > ------------------------------ > > > > (a) Oracle Linux: Please update your libvirt/QEMU versions for Oracle > > Linux 8? > > > > I couldn't find anything related to libvirt/QEMU here: > > https://yum.oracle.com/oracle-linux-8.html. (My educated guess is: > > the versions roughly match what's in CentOS/RHEL.) > > > > (b) openSUSE and SLES: Same request as above. > > > > Andreas Jaegaer said on #openstack-infra that the proposed versions > > for 'V' release should be fine for SLES. (And by extension open > > SUSE, I assume.) > > Yes, those look fine for SLES and openSUSE, Great; thanks for confirming. > Andreas > > > - - - > > > > Assuming Oracle Linux and SLES confirm, please let us know if there are > > any objections if we pick NEXT_MIN_* versions for the OpenStack "V" > > release to be libvirt: 5.0.0 and QEMU: 4.0.0. Cced Iain MacDonnell from Oracle for the above. (As he responded last year on this topic :-)) [...] -- /kashyap From gmann at ghanshyammann.com Tue Nov 19 19:20:29 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 19 Nov 2019 13:20:29 -0600 Subject: [all][tc][forum] Summarizing Ussuri cycle community goal discussion in Forum and PTG Message-ID: <16e851be988.d317731d39307.2622330940217571010@ghanshyammann.com> Hello Everyone, We discussed Ussuri goals in Forum as well as in PTG. I would like to summarize the discussions here. * Forum[1]: Three goals were discussed in details: 1. Drop Python 2.7 Support We discussed keeping the CI/CD support for Swift which is the only project keeping the py2 support. Swift needs the devstack to keep installing on py2 env with the rest of the services on py3 (same as old jobs when Swift was on py2 by default on devstack). There is no oslo dependency from swift and all the other dependency will be capped for py2 version. Requirements check job currently checks that if openstack/requirements list two entries for a requirement (one for <=2.7 and one for >) that the repo under test also has both entries. smcginnis already pushed the changes[2] to handle dual python version requirements. Everything else will go as discussed in ML[3] and this is already accepted as Ussuri goal[4]. 2. Project Specific New Contributor & PTL Docs - As per feedback in Forum sessions, this is a good goal which will make documentation more consistent. All the projects should edit their contributor.rst to follow a more complete template and adjust/add PTL documentation. - This is accepted as a pre-approved as Ussuri goal - Kim Hindhart is working on getting EU funding for people to work on OpenStack and they like consistent documentation. - diablo_rojo already updated the goal proposal patch and it is up to get wider feedback. 3. Switch remaining legacy jobs to Zuul v3 and drop legacy support - Grenade job is not yet on zuulv3 which is required to finish first. - Few projects waiting for big projects finishing the zuulv3 migration first. - This needs more work and can be a "pre-approved" thing for V, and this would be split to focus on the Grenade work in U. Other than above 3 goals, there were few more ideas for goal candidate and good to go in goal backlogs etherpad: - cdent: stop using paste, pastedeploy and WSME Note from Chris: This does not need to be a community goal as such but requires the common solution from TC WSME is still used, has contributions, and at least a core or two - cmurphy: Consistent and secure default policies https://etherpad.openstack.org/p/PVG-keystone-forum-policy Going with pop-up team first - support matrix documentation to be consistent across projects. going with pop-up team (fungi can propose the pop-up team in governance) first Richard Pioso (rpioso) to help fungi on this once consistent framework is identified, the pop-up team can expire with the approval of a related cycle goal for implementing it across remaining projects * PTG[5]: We all agreed to select the Ussuri goal asap because the Ussuri cycle already started and projects are waiting for the final goals. TC discussed this further in PTG and agreed on below two goals to be selected as the Ussuri goal. 1. Drop Python 2.7 Support - Already Accepted. 2. Project Specific New Contributor & PTL Docs - Under Review Goal "Switch remaining legacy jobs to Zuul v3 and drop legacy support" will be pre-selected for V cycle, that does not mean to stop the review on proposed goal or any ongoing work. All ongoing efforts will continue on this. [1] https://etherpad.openstack.org/p/PVG-ussuri-goal-forum [2] https://review.opendev.org/#/c/693631/ [3] http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010371.html [4] https://governance.openstack.org/tc/goals/selected/ussuri/drop-py27.html [5] L76 https://etherpad.openstack.org/p/PVG-TC-PTG -gmann From mrunge at matthias-runge.de Tue Nov 19 19:26:29 2019 From: mrunge at matthias-runge.de (Matthias Runge) Date: Tue, 19 Nov 2019 20:26:29 +0100 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> Message-ID: <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> On 19/11/2019 11:56, Rong Zhu wrote: > I am sorry to the telemetry project happened before, but now current > telemetry core team had decided to add ceilometer api and mongodb > support and cpu_utils support back. Gnoochi will still support as the > backed. All the mentioned database (influxdb, ES....), we would happy > everyone to submit patches to support as the database backed in ceilometer. > > I created a storyboard to track ceilometer Ussuri release todo things in > [0]. Free free to add things you want to do in Ussuri release. > > Due to I will have a vacation this week, I can't hold this week's > meeting, we can discuss more in the next irc meeting at 5 Dec 2:00 UTC. > > [0] https://storyboard.openstack.org/#!/board/205 > > Luka Peschke >于2019年11月19日 周二18:24写道: Hi, tbh, I am surprised to see the telemetry team trying to roll back to something, which is known to cause a lot of performance issues. There were good reasons for splitting ceilometer into several components. There were lots of good ideas and suggestions already shared in this thread. My proposal here would be to keep ceilometer as is (as data collecting agent) and to write missing glue to digest or send data to a time-series database, like InfluxDB or Prometheus (plus many more options, not mentioned here). If I remember correctly, MongoDB was deprecated because of the issues it caused and also because it was removed from Linux distributions, since there were the licensing issues. Matthias From rosmaita.fossdev at gmail.com Tue Nov 19 20:06:29 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 19 Nov 2019 15:06:29 -0500 Subject: [glance] glance_store tests failed In-Reply-To: References: Message-ID: <3033d0aa-da0d-02fb-2aa0-6658a7fdaae1@gmail.com> On 11/18/19 9:02 PM, Naohiro Sameshima(鮫島 直洋)(Group) wrote: > Hi, > > When I run a test with `tox -e py37` in glance_store, two tests failed. > > The command what I ran is below. > > 1. git clone https://opendev.org/openstack/glance_store.git > > 2. tox -e py37 > > Is there something wrong with how to run test? That's the correct way to run the tests. I'm not sure why you're getting those failures -- I did a fresh checkout and all tests passed using Python 3.7.1. I don't have anything useful to say other than start over with a clean environment (which you've probably already done a few times). Maybe ask in #openstack-glance. Perhaps you have an atypical environment? > > Thanks & Best Regards, > > ============================== > > Failed 2 tests - output below: > > ============================== > > glance_store.tests.unit.test_filesystem_store.TestStore.test_add_check_metadata_list_with_valid_mountpoint_locations > > -------------------------------------------------------------------------------------------------------------------- > > Captured traceback: > > ~~~~~~~~~~~~~~~~~~~ > >     b'Traceback (most recent call last):' > >     b'  File > "/Users/sameshima/glance_store/glance_store/tests/unit/test_filesystem_store.py", > line 215, in test_add_check_metadata_list_with_valid_mountpoint_locations' > >     b'    self.assertEqual(in_metadata[0], metadata)' > >     b'  File > "/Users/sameshima/glance_store/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", > line 411, in assertEqual' > >     b'    self.assertThat(observed, matcher, message)' > >     b'  File > "/Users/sameshima/glance_store/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", > line 498, in assertThat' > >     b'    raise mismatch_error' > >     b"testtools.matchers._impl.MismatchError: {'id': 'abcdefg', > 'mountpoint': '/tmp'} != {}" > >     b'' > > glance_store.tests.unit.test_multistore_filesystem.TestMultiStore.test_add_check_metadata_list_with_valid_mountpoint_locations > > ------------------------------------------------------------------------------------------------------------------------------ > > Captured traceback: > > ~~~~~~~~~~~~~~~~~~~ > >     b'Traceback (most recent call last):' > >     b'  File > "/Users/sameshima/glance_store/glance_store/tests/unit/test_multistore_filesystem.py", > line 276, in test_add_check_metadata_list_with_valid_mountpoint_locations' > >     b'    self.assertEqual(in_metadata[0], metadata)' > >     b'  File > "/Users/sameshima/glance_store/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", > line 411, in assertEqual' > >     b'    self.assertThat(observed, matcher, message)' > >     b'  File > "/Users/sameshima/glance_store/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", > line 498, in assertThat' > >     b'    raise mismatch_error' > >     b"testtools.matchers._impl.MismatchError: {'id': 'abcdefg', > 'mountpoint': '/tmp'} != {'store': 'file1'}" > > > This email and all contents are subject to the following disclaimer: > https://hello.global.ntt/en-us/email-disclaimer > From radoslaw.piliszek at gmail.com Tue Nov 19 20:27:23 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 19 Nov 2019 21:27:23 +0100 Subject: [gnocchi][telemetry][ceilometer][cloudkitty][monasca] Gnocchi unmaintained In-Reply-To: <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> Message-ID: I second Matthias' stance - MongoDB does not look like the future-proof tool for the job, it's not really well suited for time-series data. I feel that would just be "wasted" work to bring it back, unless it is really like snapping fingers... Not to mention supporting it in the long run. Obviously not me making decisions. :-) The other part, bringing API back, seems nice as there are ppl using Ceilometer with Monasca this way. I am glad to hear that Monasca is finally getting official support as a publisher in Ceilometer. This should actually solve the main issue of lack of a reliable publisher - Monasca already handles modern persistent time-series databases. I added [monasca] since we have the topic of Monasca as one of Ceilometer's new publishers. Feel free to remove. -yoctozepto wt., 19 lis 2019 o 20:41 Matthias Runge napisał(a): > > > On 19/11/2019 11:56, Rong Zhu wrote: > > I am sorry to the telemetry project happened before, but now current > > telemetry core team had decided to add ceilometer api and mongodb > > support and cpu_utils support back. Gnoochi will still support as the > > backed. All the mentioned database (influxdb, ES....), we would happy > > everyone to submit patches to support as the database backed in > ceilometer. > > > > I created a storyboard to track ceilometer Ussuri release todo things in > > [0]. Free free to add things you want to do in Ussuri release. > > > > Due to I will have a vacation this week, I can't hold this week's > > meeting, we can discuss more in the next irc meeting at 5 Dec 2:00 UTC. > > > > [0] https://storyboard.openstack.org/#!/board/205 > > > > Luka Peschke > >于2019年11月19日 周二18:24写道: > > > Hi, > > tbh, I am surprised to see the telemetry team trying to roll back to > something, which is known to cause a lot of performance issues. There > were good reasons for splitting ceilometer into several components. > > There were lots of good ideas and suggestions already shared in this > thread. My proposal here would be to keep ceilometer as is (as data > collecting agent) and to write missing glue to digest or send data to a > time-series database, like InfluxDB or Prometheus (plus many more > options, not mentioned here). > > If I remember correctly, MongoDB was deprecated because of the issues it > caused and also because it was removed from Linux distributions, since > there were the licensing issues. > > Matthias > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Tue Nov 19 20:35:02 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Wed, 20 Nov 2019 09:35:02 +1300 Subject: [tc][stable] Changing stable branch policy In-Reply-To: References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <80b9c92c-be69-7c96-291a-702a7a8c6498@openstack.org> Message-ID: As the person who has asked for stable branch merge permission before, I felt the pain and I totally agree with mnaser's proposal. - Best regards, Lingxian Kong Catalyst Cloud On Wed, Nov 20, 2019 at 7:03 AM Mohammed Naser wrote: > On Tue, Nov 19, 2019 at 6:16 AM Thierry Carrez > wrote: > > > > Ghanshyam Mann wrote: > > > [...] > > > I am still finding difficult to understand the change and how it will > solve the current problem. > > > > > > The current problem is: > > > * Fewer contributors in the stable-maintenance team (core stable team > and project side stable team) > > > which is nothing but we have fewer contributors who understand the > stable policies. > > > > > > * The stable policies are not the problem so we will stick with > current stable policies across all the projects. > > > Stable policies have to be maintained at single place for > consistency in backports across projects. > > > [...] > > I don't think that this the problem this change wants to solve. > > > > Currently the stable-core team is perceived as a bottleneck to getting > > more people into project-specific stable teams, or keeping those teams > > membership up to date. As a result stable maintenance is still seen in > > some teams as an alien thing, rather than an integral team duty. > > > > I suspect that by getting out of the badge-granting game, stable-core > > could focus more on stable policy definition and education, and review > > how well or bad each team does on the stable front. Because reviewing > > backports for stable branch suitability is just one part of doing stable > > branch right -- the other is to actively backport relevant patches. > > > > Personally, the main reason I support this change is that we have too > > much "ask for permission" things in OpenStack today, something that was > > driven by a code-review-for-everything culture. So the more we can > > remove the need to ask for permission to do some work, the better. > > For context, I thought I'd gather my thoughts to explain the idea best and > woke up to this well summarized email by Thierry. I agree with this and > the > intention is indeed what Thierry is mentioning here. > > > -- > > Thierry Carrez (ttx) > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Tue Nov 19 20:55:22 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Wed, 20 Nov 2019 09:55:22 +1300 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> Message-ID: On Wed, Nov 20, 2019 at 8:33 AM Matthias Runge wrote: > tbh, I am surprised to see the telemetry team trying to roll back to > something, which is known to cause a lot of performance issues. There > were good reasons for splitting ceilometer into several components. > I understand your concerns and the team will find a better way to deal with the rollback, the expected result is that it won't break anyone who are using Gnocchi or other storage backend but it means much to the cloud providers(like OVH and us) who are still using old version Ceilometer and relying on the API. > There were lots of good ideas and suggestions already shared in this > thread. My proposal here would be to keep ceilometer as is (as data > collecting agent) and to write missing glue to digest or send data to a > time-series database, like InfluxDB or Prometheus (plus many more > options, not mentioned here). > Does InfluxDB or Prometheus support to store samples that could be leveraged for auditing or billing purpose? I'm not familiar with TSDB but I got the answer NO according to the chatting with some other people. > If I remember correctly, MongoDB was deprecated because of the issues it > caused and also because it was removed from Linux distributions, since > there were the licensing issues. > I don't think the license change will affect the cloud that only uses MongoDB as internal service backend storage unless I'm missing something. - Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Tue Nov 19 20:59:09 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Wed, 20 Nov 2019 09:59:09 +1300 Subject: [gnocchi][telemetry][ceilometer][cloudkitty][monasca] Gnocchi unmaintained In-Reply-To: References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> Message-ID: On Wed, Nov 20, 2019 at 9:39 AM Radosław Piliszek < radoslaw.piliszek at gmail.com> wrote: > I second Matthias' stance - MongoDB does not look like the future-proof > tool for the job, it's not really well suited for time-series data. > I feel that would just be "wasted" work to bring it back, unless it is > really like snapping fingers... > Not to mention supporting it in the long run. > Obviously not me making decisions. :-) > > The other part, bringing API back, seems nice as there are ppl using > Ceilometer with Monasca this way. > > I am glad to hear that Monasca is finally getting official support as a > publisher in Ceilometer. > This should actually solve the main issue of lack of a reliable publisher > - Monasca already handles > modern persistent time-series databases. > > I added [monasca] since we have the topic of Monasca as one of > Ceilometer's new publishers. Feel free to remove. > As open source community, we don't force anyone to use some specific software or tooling, we support as more options as possible for real use cases but leave the choice to the users. - Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Nov 19 21:12:49 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 19 Nov 2019 21:12:49 +0000 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> Message-ID: <20191119211248.kjdcyxfjtgkwbbsf@yuggoth.org> On 2019-11-20 09:55:22 +1300 (+1300), Lingxian Kong wrote: [...] > I don't think the license change will affect the cloud that only uses > MongoDB as internal service backend storage unless I'm missing > something. This is a field of endeavor distinction which may be tough for some folks to explain to their corporate legal departments, which is why OpenStack has focused on dependencies which are licensed Apache v2 or some compatible subset of terms (like 3-clause BSD, MIT/Expat or ISC). -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From mrunge at matthias-runge.de Tue Nov 19 21:34:40 2019 From: mrunge at matthias-runge.de (Matthias Runge) Date: Tue, 19 Nov 2019 22:34:40 +0100 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> Message-ID: On 19/11/2019 21:55, Lingxian Kong wrote: > On Wed, Nov 20, 2019 at 8:33 AM Matthias Runge > wrote: > > tbh, I am surprised to see the telemetry team trying to roll back to > something, which is known to cause a lot of performance issues. There > were good reasons for splitting ceilometer into several components. > > > I understand your concerns and the team will find a better way to deal > with the rollback, the expected result is that it won't break anyone who > are using Gnocchi or other storage backend but it means much to the > cloud providers(like OVH and us) who are still using old version > Ceilometer and relying on the API. I assume, you haven't seen any performance issues with both Ceilometer or Gnocchi? > > There were lots of good ideas and suggestions already shared in this > thread. My proposal here would be to keep ceilometer as is (as data > collecting agent) and to write missing glue to digest or send data to a > time-series database, like InfluxDB or Prometheus (plus many more > options, not mentioned here). > > > Does InfluxDB or Prometheus support to store samples that could be > leveraged for auditing or billing purpose? I'm not familiar with TSDB > but I got the answer NO according to the chatting with some other > people Billing is a complicated beast for multiple reasons. Let's look only at ingestion. First of all, you are right, e.g Prometheus should not be used in billing situations, see[1], also on [2] (same source as [1]). However, the same is true for Gnocchi and also for Ceilometer. You are going to loose metrics if your collection is faster than your backend can digest it, for Gnocchi is such behaviour documented. If you are looking at speed, I'd consider prometheus to be much faster than gnocchi and ceilometer and thus better suited for handling metrics. [1] https://github.com/prometheus/docs/blob/master/content/docs/introduction/overview.md#when-does-it-not-fit [2] https://prometheus.io/docs/introduction/overview/#when-does-it-not-fit > If I remember correctly, MongoDB was deprecated because of the issues it > caused and also because it was removed from Linux distributions, since > there were the licensing issues. > > > I don't think the license change will affect the cloud that only uses > MongoDB as internal service backend storage unless I'm missing > something. You can read a bit of story on MongoDB relicensing at [3]. The license change was regarding commercial use. You can find the full text at [4], depending on your use case, it may be valuable to ask a lawyer, especially, when in doubt. [3] https://hub.packtpub.com/mongodb-withdraws-controversial-server-side-public-license-from-the-open-source-initiatives-approval-process/ [4] https://www.mongodb.com/licensing/server-side-public-license >From my POV (and apparently others here in this thread as well), I'd take a step back and revisit https://storyboard.openstack.org/#!/board/205 again, in order to run into the same issues as in Newton cycle. I mean, if you'd want ceilometer from Newton, it is still there. Matthias From bhordinesh07 at gmail.com Tue Nov 19 21:59:16 2019 From: bhordinesh07 at gmail.com (Dinesh Bhor) Date: Wed, 20 Nov 2019 06:59:16 +0900 Subject: [sig] Forming a Large scale SIG In-Reply-To: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> Message-ID: Hi Thierry and All, Sorry for the late reply. Thank you for these efforts. +1 from me and my employer LINE. Regards, Dinesh Bhor On Wed, 13 Nov 2019, 8:19 pm Thierry Carrez, wrote: > Hi everyone, > > In Shanghai we held a forum session to gauge interest in a new SIG to > specifically address cluster scaling issues. In the past we had several > groups ("Large deployments", "Performance", LCOO...) but those efforts > were arguably a bit too wide and those groups are now abandoned. > > My main goal here is to get large users directly involved in a domain > where their expertise can best translate into improvements in the > software. It's easy for such a group to go nowhere while trying to boil > the ocean. To maximize its chances of success and make it sustainable, > the group should have a narrow focus, and reasonable objectives. > > My personal idea for the group focus was to specifically address scaling > issues within a single cluster: basically identify and address issues > that prevent scaling a single cluster (or cell) past a number of nodes. > By sharing analysis and experience, the group could identify common pain > points that, once solved, would help raising that number. > > There was a lot of interest in that session[1], and it predictably > exploded in lots of different directions, including some that are > definitely past a single cluster (like making Neutron better support > cells). I think it's fine: my initial proposal was more of a strawman. > Active members of the group should really define what they collectively > want to work on. And the SIG name should be picked to match that. > > I'd like to help getting that group off the ground and to a place where > it can fly by itself, without needing external coordination. The first > step would be to identify interested members and discuss group scope and > objectives. Given the nature of the group (with interested members in > Japan, Europe, Australia and the US) it will be hard to come up with a > synchronous meeting time that will work for everyone, so let's try to > hold that discussion over email. > > So to kick this off: if you are interested in that group, please reply > to this email, introduce yourself and tell us what you would like the > group scope and objectives to be, and what you can contribute to the group. > > Thanks! > > [1] https://etherpad.openstack.org/p/PVG-large-scale-SIG > > -- > Thierry Carrez (ttx) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sorrison at gmail.com Tue Nov 19 22:05:52 2019 From: sorrison at gmail.com (Sam Morrison) Date: Wed, 20 Nov 2019 09:05:52 +1100 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> Message-ID: We too use ceilometer and gnocchi and it works great. The old way with mongo simply didn't work. Our environment is around 10k running instances and gnocchi currently has data for 1,667,072 resources with 6,879,910 metrics. We are happy with gnocchi and so far haven't needed to change anything but would help contribute if there were things we needed. Would be great if we could just maintain gnocchi as opposed to making yet another thing. Sam On Wed, 20 Nov 2019 at 08:39, Matthias Runge wrote: > On 19/11/2019 21:55, Lingxian Kong wrote: > > On Wed, Nov 20, 2019 at 8:33 AM Matthias Runge > > wrote: > > > > tbh, I am surprised to see the telemetry team trying to roll back to > > something, which is known to cause a lot of performance issues. There > > were good reasons for splitting ceilometer into several components. > > > > > > I understand your concerns and the team will find a better way to deal > > with the rollback, the expected result is that it won't break anyone who > > are using Gnocchi or other storage backend but it means much to the > > cloud providers(like OVH and us) who are still using old version > > Ceilometer and relying on the API. > > > I assume, you haven't seen any performance issues with both Ceilometer > or Gnocchi? > > > > > There were lots of good ideas and suggestions already shared in this > > thread. My proposal here would be to keep ceilometer as is (as data > > collecting agent) and to write missing glue to digest or send data > to a > > time-series database, like InfluxDB or Prometheus (plus many more > > options, not mentioned here). > > > > > > Does InfluxDB or Prometheus support to store samples that could be > > leveraged for auditing or billing purpose? I'm not familiar with TSDB > > but I got the answer NO according to the chatting with some other > > people > > Billing is a complicated beast for multiple reasons. Let's look only at > ingestion. First of all, you are right, e.g Prometheus should not be > used in billing situations, see[1], also on [2] (same source as [1]). > > However, the same is true for Gnocchi and also for Ceilometer. You are > going to loose metrics if your collection is faster than your backend > can digest it, for Gnocchi is such behaviour documented. If you are > looking at speed, I'd consider prometheus to be much faster than gnocchi > and ceilometer and thus better suited for handling metrics. > > > > [1] > > https://github.com/prometheus/docs/blob/master/content/docs/introduction/overview.md#when-does-it-not-fit > [2] https://prometheus.io/docs/introduction/overview/#when-does-it-not-fit > > > If I remember correctly, MongoDB was deprecated because of the > issues it > > caused and also because it was removed from Linux distributions, > since > > there were the licensing issues. > > > > > > I don't think the license change will affect the cloud that only uses > > MongoDB as internal service backend storage unless I'm missing > > something. > > You can read a bit of story on MongoDB relicensing at [3]. > > The license change was regarding commercial use. You can find the full > text at [4], depending on your use case, it may be valuable to ask a > lawyer, especially, when in doubt. > > > > [3] > > https://hub.packtpub.com/mongodb-withdraws-controversial-server-side-public-license-from-the-open-source-initiatives-approval-process/ > [4] https://www.mongodb.com/licensing/server-side-public-license > > From my POV (and apparently others here in this thread as well), I'd > take a step back and revisit > https://storyboard.openstack.org/#!/board/205 again, in order to run > into the same issues as in Newton cycle. I mean, if you'd want > ceilometer from Newton, it is still there. > > Matthias > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Tue Nov 19 23:41:57 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 19 Nov 2019 17:41:57 -0600 Subject: [tc][all] Updates on Ussuri cycle community-wide goals In-Reply-To: <16e181adbc4.1191b0166302215.2291880664205036921@ghanshyammann.com> References: <16e181adbc4.1191b0166302215.2291880664205036921@ghanshyammann.com> Message-ID: <16e860b4a50.10ee607ea44010.6012230745285412048@ghanshyammann.com> ---- On Tue, 29 Oct 2019 10:20:43 -0500 Ghanshyam Mann wrote ---- > Hello Everyone, > > We have two goals with their champions ready for review. Please review and provide your feedback on Gerrit. > > > 1. Add goal for project specific PTL and contributor guides - Kendall Nelson > - https://review.opendev.org/#/c/691737/ > > 2. Propose a new goal to migrate all legacy zuul jobs - Luigi Toscano > - https://review.opendev.org/#/c/691278/ > Hello Everyone, >From the Forum and PTG discussions[1], we agreed to proceed with below two goals for the Ussuri cyle. 1. Drop Python 2.7 Support - Already Accepted. Patches on almost all services are up for review and merge[2]. Merge those fast to avoid your projects gate break due to cross projects dropping py2. 2. Project Specific New Contributor & PTL Docs - Under Review The goal patch is under review. Feel Free to provide your feedback on https://review.opendev.org/#/c/691737/ 'migrate all legacy zuul job' is pre-selected as V cycle goal and under review in https://review.opendev.org/#/c/691278/ [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010943.html [2] https://review.opendev.org/#/q/topic:drop-py27-support+(status:open+OR+status:merged) -gmann > We are still looking for the Champion volunteer for RBAC goal[1]. If you have any new ideas for goal, do not hesitate to add in etherpad[2] > > [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010291.html > [2] https://etherpad.openstack.org/p/PVG-u-series-goals > > -gmann & diablo_rojo > > > From anlin.kong at gmail.com Wed Nov 20 00:47:56 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Wed, 20 Nov 2019 13:47:56 +1300 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> Message-ID: On Wed, Nov 20, 2019 at 11:11 AM Sam Morrison wrote: > We too use ceilometer and gnocchi and it works great. The old way with > mongo simply didn't work. > Our environment is around 10k running instances and gnocchi currently has > data for 1,667,072 resources with 6,879,910 metrics. > > We are happy with gnocchi and so far haven't needed to change anything but > would help contribute if there were things we needed. > Would be great if we could just maintain gnocchi as opposed to making yet > another thing. > > Sam > Glad to hear that you are going well with Gnocchi, and I also hope Gnocchi could be maintained as expected. Just to be clear, adding MongoDB support back doesn't mean you have to install and maintain that once you upgrade Ceilometer in the future, we will guarantee the existing deployments won't be affected, it's optional and you are the boss. - Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Nov 20 01:30:52 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 19 Nov 2019 19:30:52 -0600 Subject: [all][tc] Planning for dropping the Python2 support in OpenStack In-Reply-To: <16e848187b9.10540b0f433809.8696342144139236610@ghanshyammann.com> References: <16dd0a42b8d.e847dd3e124645.6364180516762707559@ghanshyammann.com> <16dfe4467a4.db6f72ec168733.7542022367023887408@ghanshyammann.com> <16dff41292e.11b7e81b1177136.7669214833037569841@ghanshyammann.com> <16e19144cf0.f6b07849311271.7773306777497055114@ghanshyammann.com> <20191030004035.rsuegdsij2eezps3@mthode.org> <16e1d9f5df9.e9dfc74911451.4806654031763681992@ghanshyammann.com> <16e848187b9.10540b0f433809.8696342144139236610@ghanshyammann.com> Message-ID: <16e866f0345.b945094345130.7689728518186402120@ghanshyammann.com> ---- On Tue, 19 Nov 2019 10:31:51 -0600 Ghanshyam Mann wrote ---- > Hello Everyone, > > I would like to notify all projects about important discussions and agreement happened today to > move forward for cross projects dependencies and devstack default installation or py2 drop work. > > * It is now an official community goal for ussuri[1] and except Swift, all projects agreed to drop py2 > as per schedule in goal. > > * If any project (openstack services which are planned to drop from now till m-1) drop the py2 with removing > the py2 requirements and min python version in setup.cfg which makes that project uninstallable in cross > projects jobs then: > > Options 1 (suggested): broken projects has to drop the py2 support/testing immediately. > > Options 2: if it breaks most of the projects, for example, nova or any other default projects become uninstallable > on py2 then we can half-revert[2] the changes from the project caused the break and wait till m1 to > merge them back. > > * Making Devstack to py3 by default TODAY (otherwise it can break gate everyday). > **Devstack default is py2 currently and it was planned to make py3 by default after m-1. But after seeing today gate break, it is hard > to maintain devstack-py2-by-default. because projects are dropping the py2 support and devstack py2 by default cause the > problem[3]. Today it is from nova side and It can happen due to any projects dropping py2 or I should say it can happen every day as > py2 drop patches get merged. > ** I am ok to make Devstack py3 by default today which is this patch - https://review.opendev.org/#/c/649097/ > ** Action for projects who want to keep testing py2 job till m-1 or whenever they plan to drop py2: Explicitly disable > the py3 in their py2 jobs (USE_PYTHON3: False). Devstack patch moving to py3 by default is approved and waiting for base patches to merge first. I am monitoring that to be merged by night. One thing it will break for sure is grenade jobs which we have seen in the same patch also. After auditing all the projects for grenade py2 jobs, patches to move those jobs to py3 is up[1]. grenade jobs in openstack-zuul-jobs are mainly for stable branches so I kept them on py2 by explicitly disable the py3. later while dropping the py2, I will move those jobs to projects side with py3 version. If you find your project gate broken for grenade job then, you need to merge the patches which are already up[1]. If any other job is broken then, Option1: migrate the failing py2 jobs to py3 with 'USE_PYTHON3: True in zuulv3 jobs' and 'DEVSTACK_GATE_USE_PYTHON3=True in legacy jobs'. Options2: If the migration to py3 takes time due to any reason then restore those jobs to py2 by explicitly disabled the py3 via above variables. [1] https://review.opendev.org/#/q/topic:drop-py27-support-devstack-default-py3+(status:open+OR+status:merged) -gmann > > * I have pushed the py2 drop patches on almost all the OpenStack services[4] which migrate py2 jobs to py3, remove the py2 > requirement but do not mention t he min python version in setup.cfg (which can ben done by followup if projects want to > do that). I will suggest to merge them asap to avoid any gate break due to cross projects dependency. > > > [1] https://governance.openstack.org/tc/goals/selected/ussuri/drop-py27.html > [2] half-revert means only change in requirements.txt and setup.cfg - https://review.opendev.org/#/c/695007/ > [3] https://bugs.launchpad.net/nova/+bug/1853166 > [4] https://review.opendev.org/#/q/topic:drop-py27-support+(status:open+OR+status:merged) > > -gmann > > > ---- On Wed, 30 Oct 2019 12:03:33 -0500 Ghanshyam Mann wrote ---- > > ---- On Wed, 30 Oct 2019 06:59:19 -0500 Sean Mooney wrote ---- > > > On Tue, 2019-10-29 at 19:40 -0500, Matthew Thode wrote: > > > > On 19-10-29 14:53:11, Ghanshyam Mann wrote: > > > > > ---- On Thu, 24 Oct 2019 14:32:03 -0500 Ghanshyam Mann wrote ---- > > > > > > Hello Everyone, > > > > > > > > > > > > We had good amount of discussion on the final plan and schedule in today's TC office hour[1]. > > > > > > > > > > > > I captured the agreement on each point in etherpad (you can see the AGREE:). Also summarizing > > > > > > the discussions here. Imp point is if your projects are planning to keep the py2.7 support then do not delay > > > > > > to tell us. Reply on this ML thread or add your project in etherpad. > > > > > > > > > > > > - Projects can start dropping the py2.7 support. Common lib and testing tools need to wait until milestone-2. > > > > > > ** pepe8 job to be included in openstack-python3-ussuri-jobs-* templates - > > > > > https://review.opendev.org/#/c/688997/ > > > > > > ** You can drop openstack-python-jobs template and start using ussuri template once 688997 patch is merged. > > > > > > ** Cross projects dependency (if any ) can be sync up among dependent projects. > > > > > > > > > > > > - I will add this plan and schedule as a community goal. The goal is more about what all things to do and when. > > > > > > ** If any project keeping the support then it has to be notified explicitly for its consumer. > > > > > > > > > > > > - Schedule: > > > > > > The schedule is aligned with the Ussuri cycle milestone[2]. I will add the plan in the release schedule also. > > > > > > Phase-1: Dec 09 - Dec 13 R-22 Ussuri-1 milestone > > > > > > ** Project to start dropping the py2 support along with all the py2 CI jobs. > > > > > > Phase-2: Feb 10 - Feb 14 R-13 Ussuri-2 milestone > > > > > > ** This includes Oslo, QA tools (or any other testing tools), common lib (os-brick), Client library. > > > > > > ** This will give enough time to projects to drop the py2 support. > > > > > > Phase-3: Apr 06 - Apr 10 R-5 Ussuri-3 milestone > > > > > > ** Final audit on Phase-1 and Phase-2 plan and make sure everything is done without breaking anything. > > > > > > This is enough time to measure such break or anything extra to do before ussuri final release. > > > > > > > > > > > > Other discussions points and agreement: > > > > > > - Projects want to keep python 2 support and need oslo, QA or any other dependent projects/lib support: > > > > > > ** swift. AI: gmann to reach out to swift team about the plan and exact required things from its dependency > > > > > > (the common lib/testing tool). > > > > > > > > > > I chated with timburke on IRC about things required by swift to keep the py2.7 support[1]. Below are > > > > > client lib/middleware swift required for py2 testing. > > > > > @timburke, feel free to update if any missing point. > > > > > > > > > > - devstack. able to keep running swift on py2 and rest all services can be on py3 > > > > > - keystonemiddleware and its dependency > > > > > - keystoneclient and openstackclient (dep of keystonemiddleware) > > > > > - castellan and barbicanclient > > > > > > > > > > > > > > > As those lib/middleware going to drop the py2.7 support in phase-2, we need to cap them for swift. > > > > > I think capping them for python2.7 in upper constraint file would not affect any other users but Matthew Thode can > > > > > explain better how that will work from the requirement constraint perspective. > > > > > > > > > > [1] > > > > > http://eavesdrop.openstack.org/irclogs/%23openstack-swift/%23openstack-swift.2019-10-28.log.html#t2019-10-28T16:37:33 > > > > > > > > > > -gmann > > > > > > > > > > > > > ya, there are examples already for libs that have dropped py2 support. > > > > What you need to do is update global requirements to be something like > > > > the following. > > > > > > > > sphinx!=1.6.6,!=1.6.7,<2.0.0;python_version=='2.7' # BSD > > > > sphinx!=1.6.6,!=1.6.7,!=2.1.0;python_version>='3.4' # BSD > > > > > > > > or > > > > > > > > keyring<19.0.0;python_version=='2.7' # MIT/PSF > > > > keyring;python_version>='3.4' # MIT/PSF > > > on a related note os-vif is blocked form running tempest jobs under python 3 > > > until https://review.opendev.org/#/c/681029/ is merged due to > > > https://zuul.opendev.org/t/openstack/build/4ff60d6bd2f24782abeb12cc7bdb8013/log/controller/logs/screen-q-agt.txt.gz#308-318 > > > > > > i think this issue will affect any job that install proejcts that use privsep using the required-proejcts section of the > > > zuul job definition. adding a project to required-proejcts sechtion adds it to the LIBS_FROM_GIT varible in devstack. > > > this inturn istalls it twice due to https://review.opendev.org/#/c/418135/ . the side effect of this is that the > > > privsep helper script gets installed under python2 and the neutron ageint in this case gets install under python 3 so > > > when it trys to spawn the privsep deamon and invoke commands it typically expodes due to dependcy issues or in this case > > > because it failed to drop privileges correctly. > > > > > > so as part of phase 1 we need to merge https://review.opendev.org/#/c/681029/ so that lib project that use required- > > > projects to run with master of project that comsume it and support depends-on can move to python 3 tempest jobs. > > > > Thanks for raising this. I agree on not falling back to py2 in Ussuri, I approved 681029. > > > > -gmann > > > > > > > > > > > > > > > > > > > > > > > > > > > > From iwienand at redhat.com Wed Nov 20 02:00:32 2019 From: iwienand at redhat.com (Ian Wienand) Date: Wed, 20 Nov 2019 13:00:32 +1100 Subject: Stable branch tox now installed with Python 3 in gate Bionic nodes Message-ID: <20191120020032.GA930587@fedora19.localdomain> Hello, In the infra meeting we discussed that tox is installed with Python 3 on Bionic hosts in the gate now [1]. This can lead to tox jobs that do not specify a basepython running under python3. This was an unintentional change that came from [2] which uses the python version diskimage-builder thinks is the default on the platform to install tox. On Bionic this is Python 3. (The reasoning behind this was because of CentOS 8; where we are shipping a true Python3 only distribution) Opinion was mixed; this can be fixed by setting basepython=2 in the tox.ini. It's arguable this is the most correct solution and what should have been done, as it makes it agnostic to platform changes. This includes anyone who might be running *outside* the gate trying to replicate jobs. However, I'm sympathetic to the argument that things changing on stable jobs is very annoying too. We could put in a workaround to install tox under python2 on Bionic, restoring the old behaviour. Personally I'm of the opinion this just hides what is a real failure; that the test is only compatible with Python 2 but doesn't specify it. I feel like we should come down on the side of just fixing tests so they are isolated from this, be it in the gate or otherwise. Better to fix the gate quickly than have potential users struggling with running tests in incompatible environments, for mine. However, I'll take advice if the problems are just too big to fix piecemeal, and we need a workaround. Thanks, -i [1] http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-11-19-19.05.log.html#l-102 [2] https://review.opendev.org/#/c/686524/ From 785336911 at 139.com Wed Nov 20 03:09:48 2019 From: 785336911 at 139.com (Canwei Li) Date: Wed, 20 Nov 2019 11:09:48 +0800 (CST) Subject: [Watcher] irc meeting at 8:00 UTC today Message-ID: <2b075dd4aa4f95b-00022.Richmail.00034386067683784207@139.com> Hi,Watcher team meeting is in the #openstack-meeting-alt channel. The agenda is available on https://wiki.openstack.irg/wiki/Watcher_Meeting_Agenda Thanks! Canwei Li 发自139邮箱 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ytatsumi at yahoo-corp.jp Wed Nov 20 04:53:50 2019 From: ytatsumi at yahoo-corp.jp (Yusuke Tatsumi) Date: Wed, 20 Nov 2019 04:53:50 +0000 Subject: [sig] Forming a Large scale SIG In-Reply-To: References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> <47730534-09AF-4018-A2AE-670D4725E569@telfer.org>, Message-ID: Hi all, Thank you to Thierry for providing this SIG! +1 for me and my employer (Yahoo! JAPAN). We are interested in building over 1K compute-node scale cluster with an upstreamed code. From YJ, we can share our way of deploying/operating clusters, because we've already deployed/operated many clusters that have several hundred compute. It also may be possible to share the actual performance data of our clusters or to try revised configs to our deployed cluster from this SIG. Cheers, Tatsumi. ________________________________ �����: Arnaud MORIN �����Օr: 2019��11��14�� 16:10 ����: Stig Telfer CC: openstack-discuss at lists.openstack.org ����: Re: [sig] Forming a Large scale SIG Hi all, +1 for me and my employer (OVH). We are mostly interested in sharing good practices when deploying a region at scale, and operating it. For the deployment part, my main pain point is about the configuration parameters I should use on different software (e.g. nova behind wsgi). The current doc is designed to deploy a small pod, but when we are going large, usually some of those params needs tuning. I'd like to identify them and eventually tag them to help other being aware that they are useful at large scale. About operating, I am pretty sure we can share some good advices as well. E.g., avoid restarting neutron agents in a single shot. So definitely interested in that group. Thanks for bringing that up. Cheers. Le mer. 13 nov. 2019 �� 19:00, Stig Telfer > a ��crit : Hi Thierry & all - Thanks for your mail. I��m interested in joining this SIG. Among others, I��m interested in participating in discussions around these common problems: - golden signals for scaling bottlenecks (and what to do about them) - using Ansible at scale - strategies for simplifying OpenStack functionality in order to scale Cheers, Stig > On 13 Nov 2019, at 11:18, Thierry Carrez > wrote: > > Hi everyone, > > In Shanghai we held a forum session to gauge interest in a new SIG to specifically address cluster scaling issues. In the past we had several groups ("Large deployments", "Performance", LCOO...) but those efforts were arguably a bit too wide and those groups are now abandoned. > > My main goal here is to get large users directly involved in a domain where their expertise can best translate into improvements in the software. It's easy for such a group to go nowhere while trying to boil the ocean. To maximize its chances of success and make it sustainable, the group should have a narrow focus, and reasonable objectives. > > My personal idea for the group focus was to specifically address scaling issues within a single cluster: basically identify and address issues that prevent scaling a single cluster (or cell) past a number of nodes. By sharing analysis and experience, the group could identify common pain points that, once solved, would help raising that number. > > There was a lot of interest in that session[1], and it predictably exploded in lots of different directions, including some that are definitely past a single cluster (like making Neutron better support cells). I think it's fine: my initial proposal was more of a strawman. Active members of the group should really define what they collectively want to work on. And the SIG name should be picked to match that. > > I'd like to help getting that group off the ground and to a place where it can fly by itself, without needing external coordination. The first step would be to identify interested members and discuss group scope and objectives. Given the nature of the group (with interested members in Japan, Europe, Australia and the US) it will be hard to come up with a synchronous meeting time that will work for everyone, so let's try to hold that discussion over email. > > So to kick this off: if you are interested in that group, please reply to this email, introduce yourself and tell us what you would like the group scope and objectives to be, and what you can contribute to the group. > > Thanks! > > [1] https://etherpad.openstack.org/p/PVG-large-scale-SIG > > -- > Thierry Carrez (ttx) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bitskrieg at bitskrieg.net Wed Nov 20 04:57:54 2019 From: bitskrieg at bitskrieg.net (Chris Apsey) Date: Wed, 20 Nov 2019 04:57:54 +0000 Subject: [neutron][ovn] networking-ovn-metadata-agent and neutron agent liveness Message-ID: All, Currently experimenting with networking-ovn (rdo/train packages on centos7) and I've managed to cobble together a functional deployment with two exceptions: metadata agents and agent liveness. Ref: the metadata issues, it appears that the local compute node ovsdb server listens on a unix socket at /var/run/openvswitch/db.sock as openvswitch:hugetlbfs 0750. Since networking-ovn-metadata-agent runs as neutron, it's not able to interact with the local ovs database and gets stuck in a restart loop and complains about the inaccessible database socket. If I edit the systemd unit file and let the agent run as root, it functions as expected. This obviously isn't a real solution, but indicates to me a possible packaging bug? Not sure what the correct mix of permissions is, or if the local database should be listening on tcp:localhost:6640 as well and that's how the metadata agent should connect. The docs are sparse in this area, but I would imagine that something like the metadata-agent should 'just work' out of the box without having to change systemd unit files or mess with unix socket permissions. Thoughts? Secondly, ```openstack network agent list``` shows that all agents (ovn-controller) are all dead, all the time. However, if I display a single agent ```openstack network agent show $foo```, it shows as live. I looked around and saw some discussions about getting networking-ovn to deal with this better, but as of now the agents are reported as dead consistently unless they are explicitly polled, at least on centos 7. I haven't noticed any real impact, but the testing I'm doing is small scale. Other than those two issues, networking-ovn is great, and based on the discussions around possibly deprecating linuxbridge as an in-tree driver, it would make a great 'default' networking configuration option upstream, given the docs get cleaned up. Thanks in advance, r Chris Apsey -------------- next part -------------- An HTML attachment was scrubbed... URL: From stig.openstack at telfer.org Wed Nov 20 06:25:12 2019 From: stig.openstack at telfer.org (Stig Telfer) Date: Wed, 20 Nov 2019 06:25:12 +0000 Subject: [scientific-sig] Meeting CANCELLED - SC19 Message-ID: <3D9AB4CD-6B71-492B-8BF1-20F0B8CE44E9@telfer.org> Hi All - Unfortunately, we must cancel this week’s meeting as all the SIG chairs and many of the members are currently in Denver for Supercomputing 2019. Apologies, Stig From zbitter at redhat.com Wed Nov 20 07:18:39 2019 From: zbitter at redhat.com (Zane Bitter) Date: Tue, 19 Nov 2019 23:18:39 -0800 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> Message-ID: <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> On 18/11/19 9:18 pm, Ghanshyam Mann wrote: > ---- On Mon, 18 Nov 2019 19:17:04 -0600 Zane Bitter wrote ---- > > On 18/11/19 5:35 pm, Ben Nemec wrote: > > > > > > > > > On 11/18/19 4:08 PM, Matt Riedemann wrote: > > >> On 11/18/2019 3:40 PM, Mohammed Naser wrote: > > >>> The proposal that I had was that in mind would be for us to let teams > > >>> self manage their own stable branches. I think we've reached a point > > >>> where we can trust most of our community to be familiar with the > > >>> stable branch policy (and let teams decide for themselves what they > > >>> believe is best for the success of their own projects). > > >> > > >> So for a project like nova that has a separate nova-core [1] and > > >> nova-stable-maint team [2] where some from [2] aren't in [1], what > > >> does this mean? Drop [2] and just rely on [1]? That won't work for > > >> those in nova-core that aren't familiar enough with the stable branch > > >> guidelines or simply don't care to review stable branch changes, and > > >> won't work for those that are in nova-stable-maint but not nova-core. > > > > > > I believe the proposal is to allow the Nova team to manage > > > nova-stable-maint in the same way they do nova-core, not to force anyone > > > to drop their stable-maint team entirely. > > > > I think the proposal was actually for each *-stable-maint team to manage > > itself. This would avoid the situation where e.g. the TC appoints a > > brand-new PTL and suddenly they get to make themselves a stable core, as > > in that case the team would still have to be bootstrapped by the > > stable-maint team. But it would allow those who are both closest to the > > project and confirmed to be familiar with the stable guidelines to make > > decisions about who else is ready to join that group. > > > I am still finding difficult to understand the change and how it will solve the current problem. > > The current problem is: > * Fewer contributors in the stable-maintenance team (core stable team and project side stable team) > which is nothing but we have fewer contributors who understand the stable policies. > > * The stable policies are not the problem so we will stick with current stable policies across all the projects. Stable > policies have to be maintained at single place for consistency in backports across projects. > > If we are moving the stable maintenance team ownership from current stable-maintenance team to project side then, > how it will solve the issue, does it enable more contributors to understand the stable policy and extend the team? Yes. > if yes, then why it cannot happen with current model? Because the core stable team is necessarily not as familiar with the review/backport history of contributors in every project as the individual project stable team is with contributors in each project. > If the project team or PTL making its core member get > more familiar with the stable policy and add as a stable core team then why it cannot happen with the current model. > > For example, if I am PTL or core of any project and finding hard to get my backport merged then I or my project team core > should review more stable branch patches and propose them in stable team core. I have tried that with only very limited success. > If we move the stable team ownership to the projects side then I think PTL is going to do the same. Ask the team members > to understand the stable policies and do more review and then add them in stable core team. If any member know the stable > policies then directly add. You make it sound like that's not a good thing? > I feel that the current problem cannot be solved by moving the ownership of the team, we need to encourage more and more > developers to become stable core in existing model especially from projects who find difficulties in merging their backport. In my experience at least there's no shortage of people willing to do the work, but there is a severe shortage of people willing to do the work of climbing over the bar to get permission to do the work. The position espoused by Tony and Matt at least is that those shouldn't be different things, and in principle that's correct, but in practice they are. Humans are weird. > One more thing, do we have data that how much time as avg it take to merge the backport and what all projects facing the backport merge > issue ? > > -gmann > > > > > - ZB > > > > >> > > >> [1] https://review.opendev.org/#/admin/groups/25,members > > >> [2] https://review.opendev.org/#/admin/groups/540,members > > >> > > > > > > > > > > From gmann at ghanshyammann.com Wed Nov 20 08:08:52 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 20 Nov 2019 02:08:52 -0600 Subject: [all][tc] Planning for dropping the Python2 support in OpenStack In-Reply-To: <16e866f0345.b945094345130.7689728518186402120@ghanshyammann.com> References: <16dd0a42b8d.e847dd3e124645.6364180516762707559@ghanshyammann.com> <16dfe4467a4.db6f72ec168733.7542022367023887408@ghanshyammann.com> <16dff41292e.11b7e81b1177136.7669214833037569841@ghanshyammann.com> <16e19144cf0.f6b07849311271.7773306777497055114@ghanshyammann.com> <20191030004035.rsuegdsij2eezps3@mthode.org> <16e1d9f5df9.e9dfc74911451.4806654031763681992@ghanshyammann.com> <16e848187b9.10540b0f433809.8696342144139236610@ghanshyammann.com> <16e866f0345.b945094345130.7689728518186402120@ghanshyammann.com> Message-ID: <16e87db6527.12abd4d2b50644.2222213273226617419@ghanshyammann.com> ---- On Tue, 19 Nov 2019 19:30:52 -0600 Ghanshyam Mann wrote ---- > ---- On Tue, 19 Nov 2019 10:31:51 -0600 Ghanshyam Mann wrote ---- > > Hello Everyone, > > > > I would like to notify all projects about important discussions and agreement happened today to > > move forward for cross projects dependencies and devstack default installation or py2 drop work. > > > > * It is now an official community goal for ussuri[1] and except Swift, all projects agreed to drop py2 > > as per schedule in goal. > > > > * If any project (openstack services which are planned to drop from now till m-1) drop the py2 with removing > > the py2 requirements and min python version in setup.cfg which makes that project uninstallable in cross > > projects jobs then: > > > > Options 1 (suggested): broken projects has to drop the py2 support/testing immediately. > > > > Options 2: if it breaks most of the projects, for example, nova or any other default projects become uninstallable > > on py2 then we can half-revert[2] the changes from the project caused the break and wait till m1 to > > merge them back. > > > > * Making Devstack to py3 by default TODAY (otherwise it can break gate everyday). > > **Devstack default is py2 currently and it was planned to make py3 by default after m-1. But after seeing today gate break, it is hard > > to maintain devstack-py2-by-default. because projects are dropping the py2 support and devstack py2 by default cause the > > problem[3]. Today it is from nova side and It can happen due to any projects dropping py2 or I should say it can happen every day as > > py2 drop patches get merged. > > ** I am ok to make Devstack py3 by default today which is this patch - https://review.opendev.org/#/c/649097/ > > ** Action for projects who want to keep testing py2 job till m-1 or whenever they plan to drop py2: Explicitly disable > > the py3 in their py2 jobs (USE_PYTHON3: False). > > Devstack patch moving to py3 by default is approved and waiting for base patches to merge first. I am monitoring that > to be merged by night. > > One thing it will break for sure is grenade jobs which we have seen in the same patch also. After auditing all the projects > for grenade py2 jobs, patches to move those jobs to py3 is up[1]. grenade jobs in openstack-zuul-jobs are mainly for stable branches > so I kept them on py2 by explicitly disable the py3. later while dropping the py2, I will move those jobs to projects side with py3 version. > > If you find your project gate broken for grenade job then, you need to merge the patches which are already up[1]. > > If any other job is broken then, > > Option1: migrate the failing py2 jobs to py3 with 'USE_PYTHON3: True in zuulv3 jobs' and 'DEVSTACK_GATE_USE_PYTHON3=True in legacy jobs'. > > Options2: If the migration to py3 takes time due to any reason then restore those jobs to py2 by explicitly disabled the py3 via above variables. > > > [1] https://review.opendev.org/#/q/topic:drop-py27-support-devstack-default-py3+(status:open+OR+status:merged) Devstack patch is about to merge, Below is the status of grenade jobs fixes: * heat- https://review.opendev.org/#/c/695088/ - One test is failing on grenade job consistently. Not sure how that is related to py3. * barbican- https://review.opendev.org/#/c/695052/ - Some strange behaviour is happening in this, barbican grenade job run stable/train run.yaml always which is stoping this job to move to py3 * Octavia- https://review.opendev.org/#/c/693486/3 - This needs to be updated to mention the py3 version in run.yaml. * rest all other patches are passing gate so merge them asap before devstack patch is merged and block their gate. -gmann > > -gmann > > > > > * I have pushed the py2 drop patches on almost all the OpenStack services[4] which migrate py2 jobs to py3, remove the py2 > > requirement but do not mention t > he min python version in setup.cfg (which can ben done by followup if projects want to > > do that). I will suggest to merge them asap to avoid any gate break due to cross projects dependency. > > > > > > [1] https://governance.openstack.org/tc/goals/selected/ussuri/drop-py27.html > > [2] half-revert means only change in requirements.txt and setup.cfg - https://review.opendev.org/#/c/695007/ > > [3] https://bugs.launchpad.net/nova/+bug/1853166 > > [4] https://review.opendev.org/#/q/topic:drop-py27-support+(status:open+OR+status:merged) > > > > -gmann > > > > > > ---- On Wed, 30 Oct 2019 12:03:33 -0500 Ghanshyam Mann wrote ---- > > > ---- On Wed, 30 Oct 2019 06:59:19 -0500 Sean Mooney wrote ---- > > > > On Tue, 2019-10-29 at 19:40 -0500, Matthew Thode wrote: > > > > > On 19-10-29 14:53:11, Ghanshyam Mann wrote: > > > > > > ---- On Thu, 24 Oct 2019 14:32:03 -0500 Ghanshyam Mann wrote ---- > > > > > > > Hello Everyone, > > > > > > > > > > > > > > We had good amount of discussion on the final plan and schedule in today's TC office hour[1]. > > > > > > > > > > > > > > I captured the agreement on each point in etherpad (you can see the AGREE:). Also summarizing > > > > > > > the discussions here. Imp point is if your projects are planning to keep the py2.7 support then do not delay > > > > > > > to tell us. Reply on this ML thread or add your project in etherpad. > > > > > > > > > > > > > > - Projects can start dropping the py2.7 support. Common lib and testing tools need to wait until milestone-2. > > > > > > > ** pepe8 job to be included in openstack-python3-ussuri-jobs-* templates - > > > > > > https://review.opendev.org/#/c/688997/ > > > > > > > ** You can drop openstack-python-jobs template and start using ussuri template once 688997 patch is merged. > > > > > > > ** Cross projects dependency (if any ) can be sync up among dependent projects. > > > > > > > > > > > > > > - I will add this plan and schedule as a community goal. The goal is more about what all things to do and when. > > > > > > > ** If any project keeping the support then it has to be notified explicitly for its consumer. > > > > > > > > > > > > > > - Schedule: > > > > > > > The schedule is aligned with the Ussuri cycle milestone[2]. I will add the plan in the release schedule also. > > > > > > > Phase-1: Dec 09 - Dec 13 R-22 Ussuri-1 milestone > > > > > > > ** Project to start dropping the py2 support along with all the py2 CI jobs. > > > > > > > Phase-2: Feb 10 - Feb 14 R-13 Ussuri-2 milestone > > > > > > > ** This includes Oslo, QA tools (or any other testing tools), common lib (os-brick), Client library. > > > > > > > ** This will give enough time to projects to drop the py2 support. > > > > > > > Phase-3: Apr 06 - Apr 10 R-5 Ussuri-3 milestone > > > > > > > ** Final audit on Phase-1 and Phase-2 plan and make sure everything is done without breaking anything. > > > > > > > This is enough time to measure such break or anything extra to do before ussuri final release. > > > > > > > > > > > > > > Other discussions points and agreement: > > > > > > > - Projects want to keep python 2 support and need oslo, QA or any other dependent projects/lib support: > > > > > > > ** swift. AI: gmann to reach out to swift team about the plan and exact required things from its dependency > > > > > > > (the common lib/testing tool). > > > > > > > > > > > > I chated with timburke on IRC about things required by swift to keep the py2.7 support[1]. Below are > > > > > > client lib/middleware swift required for py2 testing. > > > > > > @timburke, feel free to update if any missing point. > > > > > > > > > > > > - devstack. able to keep running swift on py2 and rest all services can be on py3 > > > > > > - keystonemiddleware and its dependency > > > > > > - keystoneclient and openstackclient (dep of keystonemiddleware) > > > > > > - castellan and barbicanclient > > > > > > > > > > > > > > > > > > As those lib/middleware going to drop the py2.7 support in phase-2, we need to cap them for swift. > > > > > > I think capping them for python2.7 in upper constraint file would not affect any other users but Matthew Thode can > > > > > > explain better how that will work from the requirement constraint perspective. > > > > > > > > > > > > [1] > > > > > > http://eavesdrop.openstack.org/irclogs/%23openstack-swift/%23openstack-swift.2019-10-28.log.html#t2019-10-28T16:37:33 > > > > > > > > > > > > -gmann > > > > > > > > > > > > > > > > ya, there are examples already for libs that have dropped py2 support. > > > > > What you need to do is update global requirements to be something like > > > > > the following. > > > > > > > > > > sphinx!=1.6.6,!=1.6.7,<2.0.0;python_version=='2.7' # BSD > > > > > sphinx!=1.6.6,!=1.6.7,!=2.1.0;python_version>='3.4' # BSD > > > > > > > > > > or > > > > > > > > > > keyring<19.0.0;python_version=='2.7' # MIT/PSF > > > > > keyring;python_version>='3.4' # MIT/PSF > > > > on a related note os-vif is blocked form running tempest jobs under python 3 > > > > until https://review.opendev.org/#/c/681029/ is merged due to > > > > https://zuul.opendev.org/t/openstack/build/4ff60d6bd2f24782abeb12cc7bdb8013/log/controller/logs/screen-q-agt.txt.gz#308-318 > > > > > > > > i think this issue will affect any job that install proejcts that use privsep using the required-proejcts section of the > > > > zuul job definition. adding a project to required-proejcts sechtion adds it to the LIBS_FROM_GIT varible in devstack. > > > > this inturn istalls it twice due to https://review.opendev.org/#/c/418135/ . the side effect of this is that the > > > > privsep helper script gets installed under python2 and the neutron ageint in this case gets install under python 3 so > > > > when it trys to spawn the privsep deamon and invoke commands it typically expodes due to dependcy issues or in this case > > > > because it failed to drop privileges correctly. > > > > > > > > so as part of phase 1 we need to merge https://review.opendev.org/#/c/681029/ so that lib project that use required- > > > > projects to run with master of project that comsume it and support depends-on can move to python 3 tempest jobs. > > > > > > Thanks for raising this. I agree on not falling back to py2 in Ussuri, I approved 681029. > > > > > > -gmann > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From radoslaw.piliszek at gmail.com Wed Nov 20 08:28:18 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Wed, 20 Nov 2019 09:28:18 +0100 Subject: [gnocchi][telemetry][ceilometer][cloudkitty][monasca] Gnocchi unmaintained In-Reply-To: References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> Message-ID: Sure, Lingxian. I have never negated that. To make sure I was not misunderstood, I will rephrase myself. Any kind of work requires effort which consists of human-spent time. If we are well-aware that MongoDB is not the best solution, the same time might be better spent integrating something else. Still, from my point of view, having Monasca as the backend is enough of a solution to the main issue in this thread which is Gnocchi not being maintained any longer. -yoctozepto wt., 19 lis 2019 o 21:59 Lingxian Kong napisał(a): > On Wed, Nov 20, 2019 at 9:39 AM Radosław Piliszek < > radoslaw.piliszek at gmail.com> wrote: > >> I second Matthias' stance - MongoDB does not look like the future-proof >> tool for the job, it's not really well suited for time-series data. >> I feel that would just be "wasted" work to bring it back, unless it is >> really like snapping fingers... >> Not to mention supporting it in the long run. >> Obviously not me making decisions. :-) >> >> The other part, bringing API back, seems nice as there are ppl using >> Ceilometer with Monasca this way. >> >> I am glad to hear that Monasca is finally getting official support as a >> publisher in Ceilometer. >> This should actually solve the main issue of lack of a reliable publisher >> - Monasca already handles >> modern persistent time-series databases. >> >> I added [monasca] since we have the topic of Monasca as one of >> Ceilometer's new publishers. Feel free to remove. >> > > As open source community, we don't force anyone to use some specific > software or tooling, we support as more options as possible for real use > cases but leave the choice to the users. > > - > Best regards, > Lingxian Kong > Catalyst Cloud > -------------- next part -------------- An HTML attachment was scrubbed... URL: From naohiro.sameshima at global.ntt Wed Nov 20 08:45:43 2019 From: naohiro.sameshima at global.ntt (=?utf-8?B?TmFvaGlybyBTYW1lc2hpbWHvvIjprqvls7Yg55u05rSL77yJKEdyb3VwKQ==?=) Date: Wed, 20 Nov 2019 08:45:43 +0000 Subject: [glance] glance_store tests failed In-Reply-To: <3033d0aa-da0d-02fb-2aa0-6658a7fdaae1@gmail.com> References: <3033d0aa-da0d-02fb-2aa0-6658a7fdaae1@gmail.com> Message-ID: <61F4FA08-8F2F-4D7F-8A73-A824054865E0@global.ntt> Hi Abhishek and Brian, Thank you for the reply. I use Python3.7.4 with pyenv in macOS Mojave 10.14.6. Curiously, when I run tests in docker container (image: Python3.7.4), all tests passed. I think something is wrong with my environment, so I'll look into the cause of failure. Thanks. Naohiro Sameshima > That's the correct way to run the tests. I'm not sure why you're > getting those failures -- I did a fresh checkout and all tests passed > using Python 3.7.1. > I don't have anything useful to say other than start over with a clean > environment (which you've probably already done a few times). Maybe ask > in #openstack-glance. Perhaps you have an atypical environment? --------------------------------------------------------------------------- > Hi Naohiro, > There might be something wrong with your environment, I have run these tests on master branch in my environment and those are passing successfully. > I have also submitted test patch [1] on gerrit, where it has passed the tests. > [1] https://review.opendev.org/694903 > Thanks & Best Regards, > Abhishek Kekane This email and all contents are subject to the following disclaimer: https://hello.global.ntt/en-us/email-disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-philippe at evrard.me Wed Nov 20 09:20:25 2019 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Wed, 20 Nov 2019 10:20:25 +0100 Subject: [tc] Recent updates and action items Message-ID: <4e0fa644566efc53d9715f321b6aa17a8a0a1447.camel@evrard.me> Hello TC members, There are still business opportunities merging for 2019. The process for 2020 need to start soon. We have action items from the Summit and PTG [1] to take care. Please review etherpads and check if any part actions that you already been assigned or whould like to help. Here are a few action items already if you want to help: - ttx: will be working on the technical vision reflection, as we arrive at the end of the road (2019). - gmann: will be working on the multiple goals he has, including python versions. A little help would be welcomed I guess :) - mnaser: will be working on telemetry and an oslo project to expose metrics for prometheus. (If I didn't get that one wrong) - jungleboyj: will write a blog post about the analysis of the Foundation user survey. We should read it when it's ready and prepare the next one - mugsie: is writing a new proposal for release naming based on what we discussed at the summit - We all should think about stable policy and what to do about it, based on the feedback from TheJulia/smcginnis/release+requirements team. Maybe diablo_rojo can follow this? - everyone: we should promote the first contact sig to organisations that might need help on getting things achieved in OpenStack. - zaneb: will propose documentation of the process on how TC members should vote on conflicting resolutions when there is no predetermined consensus - mnaser: will be the point of contact for the infra matters of static hosting - ttx: will investigate a merge tc/uc - evrardjp: will start the new ideas concept that was floated before the summit and got traction during the summit. PowerVMStackers is now a SIG [2]. However, mnaser suggested to move to a multi-arch SIG. Mohammed, do you want to get started on this? V release naming process is now started [3]. [1] https://etherpad.openstack.org/p/PVG-TC-PTG [2] https://review.opendev.org/#/c/680438/ [3] https://review.opendev.org/#/c/693266/ Regards, Jean-Philippe & Rico From kchamart at redhat.com Wed Nov 20 09:24:04 2019 From: kchamart at redhat.com (Kashyap Chamarthy) Date: Wed, 20 Nov 2019 10:24:04 +0100 Subject: On next minimum libvirt / QEMU versions for "V" release In-Reply-To: <20191119182343.GA32458@paraplu> References: <20191118181106.GD7032@paraplu> <789413eb-3fde-0283-9ddb-c356879c749d@suse.com> <20191119182343.GA32458@paraplu> Message-ID: <20191120092404.GB32458@paraplu> [Now with Iain's email address fixed.] On Tue, Nov 19, 2019 at 07:23:43PM +0100, Kashyap Chamarthy wrote: > On Tue, Nov 19, 2019 at 10:40:48AM +0100, Andreas Jaeger wrote: > > On 18/11/2019 19.11, Kashyap Chamarthy wrote: > > [...] > > > > Action Items for Linux Distros > > > ------------------------------ > > > > > > (a) Oracle Linux: Please update your libvirt/QEMU versions for Oracle > > > Linux 8? > > > > > > I couldn't find anything related to libvirt/QEMU here: > > > https://yum.oracle.com/oracle-linux-8.html. (My educated guess is: > > > the versions roughly match what's in CentOS/RHEL.) > > > > > > (b) openSUSE and SLES: Same request as above. > > > > > > Andreas Jaegaer said on #openstack-infra that the proposed versions > > > for 'V' release should be fine for SLES. (And by extension open > > > SUSE, I assume.) > > > > Yes, those look fine for SLES and openSUSE, > > Great; thanks for confirming. > > > Andreas > > > > > - - - > > > > > > Assuming Oracle Linux and SLES confirm, please let us know if there are > > > any objections if we pick NEXT_MIN_* versions for the OpenStack "V" > > > release to be libvirt: 5.0.0 and QEMU: 4.0.0. > > Cced Iain MacDonnell from Oracle for the above. (As he responded last > year on this topic :-)) > > [...] > > -- > /kashyap -- /kashyap From jean-philippe at evrard.me Wed Nov 20 09:25:53 2019 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Wed, 20 Nov 2019 10:25:53 +0100 Subject: [all][tc] What happened in OpenStack Governance last month Message-ID: <507b57b4816cf78b832914fad594f41cfe46923f.camel@evrard.me> Hello everyone, I hope you're all back from the summit to a normal life (nod to all the people at conferences, like KubeCon or devops days)! Here are a few things you might want to know about... First, for Python2.7 and Python3: We now removed references to Python 2.7 in the PTI (Project Testing Interface) [1]. In Ussuri we no longer require projects to test for python 2.7, so remove it from the list of requirements in the PTI. You can find current PTI for python in [2]. For py2 to py3 migration timing, there is now a ML [4] and etherpad[6] to discuss about specific schedule (now out in [6]), please join our discussions! We need each teams to pay attention on their python dependency and check if that schedule works. The release schedule will be update accordingly. We recently merged updates to the drop py2 goal too [9]. Then, for changes in projects, SIGs or adjacent communities: - If you didn't know: Airship is now confirmed as official project under OSF [3]. Congrats Airship team! - In OpenStack projects, we now have new barbican-ui, and manila- ganesha charms repositories. Microversion parse was adopted by the oslo team [7], and the API site was un-retired for redirect reasons. - Large scale SIG is under plan [8]. Please join, if you're interested on it. - PowerVM is now a SIG instead of a project. In the future, we might want to have a Multi-Arch SIG. - The Technical Writing SIG is born, replacing the openstack docs project. - Designate was promoted as a 2019 "business opportunity" for investment. We are now thinking for 2020. Thank you all for reading until this line! ;) [1] https://review.opendev.org/#/c/686582 [2] https://governance.openstack.org/tc/reference/pti/python.html [3] http://lists.openstack.org/pipermail/foundation/2019-October/002804.html [4] http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010142.html [5] http://eavesdrop.openstack.org/irclogs/%23openstack-tc/%23openstack-tc.2019-10-24.log.html [6] https://etherpad.openstack.org/p/drop-python2-support [7]: https://review.opendev.org/#/c/689754/ [8] http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010757.html [9]: https://review.opendev.org/#/c/694161/ Regards, Jean-Philippe & Rico From info at dantalion.nl Wed Nov 20 09:37:34 2019 From: info at dantalion.nl (info at dantalion.nl) Date: Wed, 20 Nov 2019 10:37:34 +0100 Subject: [Watcher] irc meeting at 8:00 UTC today In-Reply-To: <2b075dd4aa4f95b-00022.Richmail.00034386067683784207@139.com> References: <2b075dd4aa4f95b-00022.Richmail.00034386067683784207@139.com> Message-ID: <9c2c7791-e9c2-a624-0c59-a6c461950d33@dantalion.nl> Hello everyone, I apologize for not being able to attend the Watcher meeting this morning, unfortunately I was stuck in transit longer than usual unexpectedly. I expect to be there as usual next time. Hope everyone who attended enjoyed the OpenStack summit and have a good day. Kind regards, Corne Lukken (Dantali0n) On 11/20/19 4:09 AM, Canwei Li wrote: > > > Hi,Watcher team meeting is in the #openstack-meeting-alt channel. > The agenda is available on https://wiki.openstack.irg/wiki/Watcher_Meeting_Agenda > > > Thanks! > Canwei Li > > > > > > > 发自139邮箱 > > > From rico.lin.guanyu at gmail.com Wed Nov 20 10:03:03 2019 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Wed, 20 Nov 2019 18:03:03 +0800 Subject: [meta-sig][multi-arch] propose forming a Multi-arch SIG Message-ID: Dear all In summit, there's a forum for ARM support [1] in Summit which many people show they're interested in ARM support for OpenStack. And since also we have Linaro shows interest in donating servers to OpenStack infra. It's time for community to think about what we should deal with those ARM servers once we have them in community infrastructure. One thing we should do as a community is to gather people for this topic. So I propose we create a Multi-arch SIG and aim to support ARM architecture as very first step. I had the idea to call it ARM SIG before, but since there might be high overlap knowledge between support ARM 64 and other architectures. I propose we go for Multi-arch instead. This SIG will be a nice place to collect all the documents, gate jobs, and to trace tasks. If you're also interested in that group, please reply to this email, introduce yourself and tell us what you would like the group scope and objectives to be, and what you can contribute to the group. [1] https://www.openstack.org/summit/shanghai-2019/summit-schedule/events/24355/running-open-infrastucture-on-arm64 -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From missile0407 at gmail.com Wed Nov 20 10:21:04 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Wed, 20 Nov 2019 18:21:04 +0800 Subject: [kolla][gnocchi] Issue on Gnocchi metricd when using Ceph as backend but with cache tier. Message-ID: Hi everyone, I'm not sure if it's good to ask at this place but still want to make sure that some one has a same issue as me. Currently we're using Kolla Openstack (R version), and Gnocchi version is 4.3.2 (ubuntu-source). The issue is, we always get below error trace in gnocchi-metricd after launched VM or uploaded images that let Ceilometer to create metrics in few environments. https://pastebin.com/fPZSFWaE After few investigations, we found that this only happens on the environment which enabled cache tier in Ceph. And also found the gnocchi cache pool only got 14 Bytes data inside, and no data inside the OSD pool one. We tried remove the cache tier from gnocchi pool then restart the gnocchi services. Then everything become normal. Can saw Gnocchi work normally to write data into Ceph pool. Does anyone had a same problem? Eddie. -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Wed Nov 20 10:29:41 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Wed, 20 Nov 2019 23:29:41 +1300 Subject: [gnocchi][telemetry][ceilometer][cloudkitty][monasca] Gnocchi unmaintained In-Reply-To: References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> Message-ID: On Wed, Nov 20, 2019 at 9:28 PM Radosław Piliszek < radoslaw.piliszek at gmail.com> wrote: > Sure, Lingxian. I have never negated that. > To make sure I was not misunderstood, I will rephrase myself. > Any kind of work requires effort which consists of human-spent time. > If we are well-aware that MongoDB is not the best solution, the same time > might be better spent integrating something else. > Still, from my point of view, having Monasca as the backend is enough of a > solution to the main issue in this thread which is Gnocchi not being > maintained any longer. > Yeah, I'm pretty sure we are on the same page. Monasca is probably the best solution for you, but we (catalyst cloud) is still on the Ceilometer+MongoDB boat, and they work very well with us. - Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From aj at suse.com Wed Nov 20 10:30:06 2019 From: aj at suse.com (Andreas Jaeger) Date: Wed, 20 Nov 2019 11:30:06 +0100 Subject: [all][tc] Planning for dropping the Python2 support in OpenStack In-Reply-To: <16e87db6527.12abd4d2b50644.2222213273226617419@ghanshyammann.com> References: <16dd0a42b8d.e847dd3e124645.6364180516762707559@ghanshyammann.com> <16dfe4467a4.db6f72ec168733.7542022367023887408@ghanshyammann.com> <16dff41292e.11b7e81b1177136.7669214833037569841@ghanshyammann.com> <16e19144cf0.f6b07849311271.7773306777497055114@ghanshyammann.com> <20191030004035.rsuegdsij2eezps3@mthode.org> <16e1d9f5df9.e9dfc74911451.4806654031763681992@ghanshyammann.com> <16e848187b9.10540b0f433809.8696342144139236610@ghanshyammann.com> <16e866f0345.b945094345130.7689728518186402120@ghanshyammann.com> <16e87db6527.12abd4d2b50644.2222213273226617419@ghanshyammann.com> Message-ID: <90b63305-a379-93d6-c1ae-693d20c3fe24@suse.com> On 20/11/2019 09.08, Ghanshyam Mann wrote: > [...] > * barbican- https://review.opendev.org/#/c/695052/ > - Some strange behaviour is happening in this, barbican grenade job run stable/train run.yaml always which is stoping this job to move to py3 AFAIU, the stable jobs have a branch matcher, see https://opendev.org/openstack/barbican/src/branch/stable/train/.zuul.yaml#L138 - and thus apply. You need https://review.opendev.org/689458 and friends merged for all branches to fix this, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From james.denton at rackspace.com Wed Nov 20 12:15:58 2019 From: james.denton at rackspace.com (James Denton) Date: Wed, 20 Nov 2019 12:15:58 +0000 Subject: [neutron][ovn] networking-ovn-metadata-agent and neutron agent liveness In-Reply-To: References: Message-ID: <58B86CC6-6E25-4255-A150-5B16ED2FDD44@rackspace.com> Hi Chris – I recall having the same issue when first implementing OVN into OpenStack-Ansible, and currently have the OVN metadata agent running as root[1]. I’m curious to see how others solved the issue as well. Thanks for bringing this up. [1] https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/vars/main.yml#L495-L496 James Denton Network Engineer Rackspace Private Cloud james.denton at rackspace.com From: Chris Apsey Reply-To: Chris Apsey Date: Wednesday, November 20, 2019 at 12:00 AM To: "openstack-discuss at lists.openstack.org" Subject: [neutron][ovn] networking-ovn-metadata-agent and neutron agent liveness CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! All, Currently experimenting with networking-ovn (rdo/train packages on centos7) and I've managed to cobble together a functional deployment with two exceptions: metadata agents and agent liveness. Ref: the metadata issues, it appears that the local compute node ovsdb server listens on a unix socket at /var/run/openvswitch/db.sock as openvswitch:hugetlbfs 0750. Since networking-ovn-metadata-agent runs as neutron, it's not able to interact with the local ovs database and gets stuck in a restart loop and complains about the inaccessible database socket. If I edit the systemd unit file and let the agent run as root, it functions as expected. This obviously isn't a real solution, but indicates to me a possible packaging bug? Not sure what the correct mix of permissions is, or if the local database should be listening on tcp:localhost:6640 as well and that's how the metadata agent should connect. The docs are sparse in this area, but I would imagine that something like the metadata-agent should 'just work' out of the box without having to change systemd unit files or mess with unix socket permissions. Thoughts? Secondly, ```openstack network agent list``` shows that all agents (ovn-controller) are all dead, all the time. However, if I display a single agent ```openstack network agent show $foo```, it shows as live. I looked around and saw some discussions about getting networking-ovn to deal with this better, but as of now the agents are reported as dead consistently unless they are explicitly polled, at least on centos 7. I haven't noticed any real impact, but the testing I'm doing is small scale. Other than those two issues, networking-ovn is great, and based on the discussions around possibly deprecating linuxbridge as an in-tree driver, it would make a great 'default' networking configuration option upstream, given the docs get cleaned up. Thanks in advance, r Chris Apsey -------------- next part -------------- An HTML attachment was scrubbed... URL: From roam at ringlet.net Wed Nov 20 12:23:18 2019 From: roam at ringlet.net (Peter Pentchev) Date: Wed, 20 Nov 2019 14:23:18 +0200 Subject: [glance] glance_store tests failed In-Reply-To: <61F4FA08-8F2F-4D7F-8A73-A824054865E0@global.ntt> References: <3033d0aa-da0d-02fb-2aa0-6658a7fdaae1@gmail.com> <61F4FA08-8F2F-4D7F-8A73-A824054865E0@global.ntt> Message-ID: <20191120122318.GA361265@straylight.m.ringlet.net> On Wed, Nov 20, 2019 at 08:45:43AM +0000, Naohiro Sameshima(鮫島 直洋)(Group) wrote: [actually Abhishek wrote:] > > That's the correct way to run the tests. I'm not sure why you're > > getting those failures -- I did a fresh checkout and all tests passed > > using Python 3.7.1. > > > I don't have anything useful to say other than start over with a clean > > environment (which you've probably already done a few times). Maybe ask > > in #openstack-glance. Perhaps you have an atypical environment? > > > Hi Naohiro, > > > There might be something wrong with your environment, I have run these tests on master branch in my environment and those are passing successfully. > > I have also submitted test patch [1] on gerrit, where it has passed the tests. > > > [1] https://review.opendev.org/694903 > > Hi Abhishek and Brian, > > Thank you for the reply. > > I use Python3.7.4 with pyenv in macOS Mojave 10.14.6. > > Curiously, when I run tests in docker container (image: Python3.7.4), all tests passed. > > I think something is wrong with my environment, so I'll look into the cause of failure. Just a shot in the dark (haven't looked at the source yet), but since I saw /tmp mentioned in the error message, I wonder if it might have something to do with the /private/tmp and per-user temporary directories on most MacOS X installations... G'luck, Peter -- Peter Pentchev roam@{ringlet.net,debian.org,FreeBSD.org} pp at storpool.com PGP key: http://people.FreeBSD.org/~roam/roam.key.asc Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From mriedemos at gmail.com Wed Nov 20 14:21:29 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 20 Nov 2019 08:21:29 -0600 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> Message-ID: <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> On 11/20/2019 1:18 AM, Zane Bitter wrote: > Because the core stable team is necessarily not as familiar with the > review/backport history of contributors in every project as the > individual project stable team is with contributors in each project. This is assuming that each project has a stable core team already, which a lot don't, that's why we get a lot of "hi I'm the PTL du jour on project X now please make me stable core even though I've never reviewed any stable branch changes before". With Tony more removed these days and I myself not wanting to vet every one of these "add me to the stable core team" requests, I'm more or less OK with the proposal so that it removes me as a bottleneck. That might mean people merge things on stable branches for their projects that don't follow the guidelines but so be it. If it's a problem hopefully they'll hear about it from their consumers, but if the project is in such maintenance mode anyway that they can break the stable guidelines, then they might not have many external consumers to complain anyway. Either way I don't need to be involved. So sure, +1 from me on the proposal given nova can still do what it's already been doing with a specific stable maint core team ACL in Gerrit. -- Thanks, Matt From mriedemos at gmail.com Wed Nov 20 14:22:44 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 20 Nov 2019 08:22:44 -0600 Subject: [release][stable][telemetry]Please add Rong Zhu to ceilometer-stable-maint group In-Reply-To: References: Message-ID: <84019aaa-2c60-dcf4-a1b5-77a9186d1f2c@gmail.com> On 11/18/2019 5:46 AM, Rong Zhu wrote: > I am the current Telemetry PTL, could you please add me to > ceilometer-stable-maint group. And please also add Lingxian Kong to this > group. Done. Shall I also remove everyone else from this list? https://review.opendev.org/#/admin/groups/533,members I don't think any of those people work directly on OpenStack anymore. -- Thanks, Matt From akekane at redhat.com Wed Nov 20 14:37:47 2019 From: akekane at redhat.com (Abhishek Kekane) Date: Wed, 20 Nov 2019 20:07:47 +0530 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> Message-ID: Hi Matt, For your kind information, I never asked to be to the core of stable-maintainer. Someone has recommended my name for it. So if you find it hard ir against the stable core policy, kindly remove me from this list. I will earn it with my efforts. Thank you for your kind support. Abhishek Kekane On Wed, 20 Nov 2019 at 8:00 PM, Matt Riedemann wrote: > On 11/20/2019 1:18 AM, Zane Bitter wrote: > > Because the core stable team is necessarily not as familiar with the > > review/backport history of contributors in every project as the > > individual project stable team is with contributors in each project. > > This is assuming that each project has a stable core team already, which > a lot don't, that's why we get a lot of "hi I'm the PTL du jour on > project X now please make me stable core even though I've never reviewed > any stable branch changes before". > > With Tony more removed these days and I myself not wanting to vet every > one of these "add me to the stable core team" requests, I'm more or less > OK with the proposal so that it removes me as a bottleneck. That might > mean people merge things on stable branches for their projects that > don't follow the guidelines but so be it. If it's a problem hopefully > they'll hear about it from their consumers, but if the project is in > such maintenance mode anyway that they can break the stable guidelines, > then they might not have many external consumers to complain anyway. > Either way I don't need to be involved. > > So sure, +1 from me on the proposal given nova can still do what it's > already been doing with a specific stable maint core team ACL in Gerrit. > > -- > > Thanks, > > Matt > > -- Thanks & Best Regards, Abhishek Kekane -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Wed Nov 20 14:46:32 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Wed, 20 Nov 2019 09:46:32 -0500 Subject: [tc][stable] Changing stable branch policy In-Reply-To: References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> Message-ID: On Wed, Nov 20, 2019 at 9:43 AM Abhishek Kekane wrote: > > Hi Matt, > > For your kind information, I never asked to be to the core of stable-maintainer. Someone has recommended my name for it. So if you find it hard ir against the stable core policy, kindly remove me from this list. I will earn it with my efforts. Just for context, I don't think Matt was actually targeting you, it's actually something that has been raised a few times in the past with many other projects (i.e. Trove is one another recent memory). I'm happy to see you on the stable team for Glance. . :) > Thank you for your kind support. > > Abhishek Kekane > > On Wed, 20 Nov 2019 at 8:00 PM, Matt Riedemann wrote: >> >> On 11/20/2019 1:18 AM, Zane Bitter wrote: >> > Because the core stable team is necessarily not as familiar with the >> > review/backport history of contributors in every project as the >> > individual project stable team is with contributors in each project. >> >> This is assuming that each project has a stable core team already, which >> a lot don't, that's why we get a lot of "hi I'm the PTL du jour on >> project X now please make me stable core even though I've never reviewed >> any stable branch changes before". >> >> With Tony more removed these days and I myself not wanting to vet every >> one of these "add me to the stable core team" requests, I'm more or less >> OK with the proposal so that it removes me as a bottleneck. That might >> mean people merge things on stable branches for their projects that >> don't follow the guidelines but so be it. If it's a problem hopefully >> they'll hear about it from their consumers, but if the project is in >> such maintenance mode anyway that they can break the stable guidelines, >> then they might not have many external consumers to complain anyway. >> Either way I don't need to be involved. >> >> So sure, +1 from me on the proposal given nova can still do what it's >> already been doing with a specific stable maint core team ACL in Gerrit. >> >> -- >> >> Thanks, >> >> Matt >> > -- > Thanks & Best Regards, > > Abhishek Kekane -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. https://vexxhost.com From openstack at nemebean.com Wed Nov 20 14:49:23 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 20 Nov 2019 08:49:23 -0600 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> Message-ID: <1043e7be-b5ce-d537-41b7-2e07106722bc@nemebean.com> On 11/20/19 8:21 AM, Matt Riedemann wrote: > That might mean people merge things on stable branches for their > projects that don't follow the guidelines but so be it. Another interesting point that was made in Shanghai was that the initial review is only the first level where bad backports can be caught. Even if something merges, it still has to be proposed for release, at which point the release liaison or PTL should be looking at it, and then once the release is proposed the release team is going to look at the changes included. So there is a safety net if a reviewer makes a mistake. From akekane at redhat.com Wed Nov 20 14:58:55 2019 From: akekane at redhat.com (Abhishek Kekane) Date: Wed, 20 Nov 2019 20:28:55 +0530 Subject: [tc][stable] Changing stable branch policy In-Reply-To: References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> Message-ID: Hi Mohammed, My sincere apologies, but if you are certain that it is against policy and not able to buy it then why one should be added to the team and even if its done then why bothering about it. Once again, sincere apologies. Abhishek On Wed, 20 Nov 2019 at 8:16 PM, Mohammed Naser wrote: > On Wed, Nov 20, 2019 at 9:43 AM Abhishek Kekane > wrote: > > > > Hi Matt, > > > > For your kind information, I never asked to be to the core of > stable-maintainer. Someone has recommended my name for it. So if you find > it hard ir against the stable core policy, kindly remove me from this list. > I will earn it with my efforts. > > Just for context, I don't think Matt was actually targeting you, it's > actually something that has been raised a few times in the past with > many other projects (i.e. Trove is one another recent memory). I'm > happy to see you on the stable team for Glance. . :) > > > Thank you for your kind support. > > > > Abhishek Kekane > > > > On Wed, 20 Nov 2019 at 8:00 PM, Matt Riedemann > wrote: > >> > >> On 11/20/2019 1:18 AM, Zane Bitter wrote: > >> > Because the core stable team is necessarily not as familiar with the > >> > review/backport history of contributors in every project as the > >> > individual project stable team is with contributors in each project. > >> > >> This is assuming that each project has a stable core team already, which > >> a lot don't, that's why we get a lot of "hi I'm the PTL du jour on > >> project X now please make me stable core even though I've never reviewed > >> any stable branch changes before". > >> > >> With Tony more removed these days and I myself not wanting to vet every > >> one of these "add me to the stable core team" requests, I'm more or less > >> OK with the proposal so that it removes me as a bottleneck. That might > >> mean people merge things on stable branches for their projects that > >> don't follow the guidelines but so be it. If it's a problem hopefully > >> they'll hear about it from their consumers, but if the project is in > >> such maintenance mode anyway that they can break the stable guidelines, > >> then they might not have many external consumers to complain anyway. > >> Either way I don't need to be involved. > >> > >> So sure, +1 from me on the proposal given nova can still do what it's > >> already been doing with a specific stable maint core team ACL in Gerrit. > >> > >> -- > >> > >> Thanks, > >> > >> Matt > >> > > -- > > Thanks & Best Regards, > > > > Abhishek Kekane > > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. https://vexxhost.com > > -- Thanks & Best Regards, Abhishek Kekane -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Wed Nov 20 15:01:38 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 20 Nov 2019 15:01:38 +0000 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <1043e7be-b5ce-d537-41b7-2e07106722bc@nemebean.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> <1043e7be-b5ce-d537-41b7-2e07106722bc@nemebean.com> Message-ID: <20191120150138.kcmqjp4yzl33hx3l@yuggoth.org> On 2019-11-20 08:49:23 -0600 (-0600), Ben Nemec wrote: [...] > Even if something merges, it still has to be proposed for release, > at which point the release liaison or PTL should be looking at it, > and then once the release is proposed the release team is going to > look at the changes included. So there is a safety net if a > reviewer makes a mistake. True in principle, but we've basically always treated stable branches as a place from which downstream consumers can consume patches, and the stable point releases on them are more of a formality. I may simply not be connected with the right segments of our community, but I haven't heard anyone say they specifically wait to consume stable branch point releases vs just taking the branch content at a random point in time or selectively picking relevant patches out of it to incorporate into a packaged version... and even the theoretical stable point release reviewer safety net vaporizes for branches which pass into extended maintenance mode. However, the above should not be taken as an objection on my part for the plan. I agree the real safety net here is the users, and the lessons a reviewer learns after helping a panicked user of their software work around a regression or behavior change which should never have been allowed to merge on a stable branch in the first place. Failure is the best teacher. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From jichenjc at cn.ibm.com Wed Nov 20 15:08:43 2019 From: jichenjc at cn.ibm.com (Chen CH Ji) Date: Wed, 20 Nov 2019 15:08:43 +0000 Subject: [meta-sig][multi-arch] propose forming a Multi-arch SIG In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Wed Nov 20 15:35:05 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Wed, 20 Nov 2019 10:35:05 -0500 Subject: [tc] Recent updates and action items In-Reply-To: <4e0fa644566efc53d9715f321b6aa17a8a0a1447.camel@evrard.me> References: <4e0fa644566efc53d9715f321b6aa17a8a0a1447.camel@evrard.me> Message-ID: On Wed, Nov 20, 2019 at 4:24 AM Jean-Philippe Evrard wrote: > > Hello TC members, > > There are still business opportunities merging for 2019. The process > for 2020 need to start soon. > > We have action items from the Summit and PTG [1] to take care. Please > review etherpads and check if any part actions that you already been > assigned or whould like to help. Here are a few action items already if > you want to help: > - ttx: will be working on the technical vision reflection, as we > arrive at the end of the road (2019). > - gmann: will be working on the multiple goals he has, including > python versions. A little help would be welcomed I guess :) > - mnaser: will be working on telemetry and an oslo project to > expose metrics for prometheus. (If I didn't get that one wrong) > - jungleboyj: will write a blog post about the analysis of the > Foundation user survey. We should read it when it's ready and prepare > the next one > - mugsie: is writing a new proposal for release naming based on > what we discussed at the summit > - We all should think about stable policy and what to do about it, > based on the feedback from TheJulia/smcginnis/release+requirements > team. Maybe diablo_rojo can follow this? > - everyone: we should promote the first contact sig to > organisations that might need help on getting things achieved in > OpenStack. > - zaneb: will propose documentation of the process on how TC > members should vote on conflicting resolutions when there is no > predetermined consensus > - mnaser: will be the point of contact for the infra matters of > static hosting > - ttx: will investigate a merge tc/uc > - evrardjp: will start the new ideas concept that was floated > before the summit and got traction during the summit. > > > PowerVMStackers is now a SIG [2]. However, mnaser suggested to move to > a multi-arch SIG. Mohammed, do you want to get started on this? I don't recall saying this -- but I think Rico started on that work > V release naming process is now started [3]. > > [1] https://etherpad.openstack.org/p/PVG-TC-PTG > [2] https://review.opendev.org/#/c/680438/ > [3] https://review.opendev.org/#/c/693266/ > > Regards, > Jean-Philippe & Rico > > From zigo at debian.org Wed Nov 20 15:35:11 2019 From: zigo at debian.org (Thomas Goirand) Date: Wed, 20 Nov 2019 16:35:11 +0100 Subject: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: Message-ID: On 11/19/19 10:03 AM, Lingxian Kong wrote: > We (ceilometer team) will probably add Ceilometer API and mongodb > support back, considering the current Gnocchi project situation. > However, Gnocchi will still be supported as a publisher in Ceilometer. > > - > Best regards, > Lingxian Kong > Catalyst Cloud Hi, Please don't do Mongodb, that's non-free these days. On 11/19/19 10:51 AM, Tobias Urdin wrote: > There were (and still is? Though unofficial out-of-tree) storage > backends for Ceilometer that publishes to InfluxDB. Same problem: InfluxDB is open-core, with only non-redundant non-clustered solution being fully open. Cheers, Thomas Goirand (zigo) From zigo at debian.org Wed Nov 20 15:36:48 2019 From: zigo at debian.org (Thomas Goirand) Date: Wed, 20 Nov 2019 16:36:48 +0100 Subject: [gnocchi][telemetry][ceilometer][cloudkitty][monasca] Gnocchi unmaintained In-Reply-To: References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> Message-ID: On 11/19/19 9:59 PM, Lingxian Kong wrote: > On Wed, Nov 20, 2019 at 9:39 AM Radosław Piliszek > > wrote: > > I second Matthias' stance - MongoDB does not look like the > future-proof tool for the job, it's not really well suited for > time-series data. > I feel that would just be "wasted" work to bring it back, unless it > is really like snapping fingers... > Not to mention supporting it in the long run. > Obviously not me making decisions. :-) > > The other part, bringing API back, seems nice as there are ppl using > Ceilometer with Monasca this way. > > I am glad to hear that Monasca is finally getting official support > as a publisher in Ceilometer. > This should actually solve the main issue of lack of a reliable > publisher - Monasca already handles > modern persistent time-series databases. > > I added [monasca] since we have the topic of Monasca as one of > Ceilometer's new publishers. Feel free to remove. > > > As open source community, we don't force anyone to use some specific > software or tooling, we support as more options as possible for real use > cases but leave the choice to the users. But we don't need options which wont scale... I'd prefer one working driver rather than 10 not working. Thomas From dangtrinhnt at gmail.com Wed Nov 20 15:38:04 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Thu, 21 Nov 2019 00:38:04 +0900 Subject: [release][stable][telemetry]Please add Rong Zhu to ceilometer-stable-maint group In-Reply-To: <84019aaa-2c60-dcf4-a1b5-77a9186d1f2c@gmail.com> References: <84019aaa-2c60-dcf4-a1b5-77a9186d1f2c@gmail.com> Message-ID: Hi Matt, You can remove me. Though I'm still working on other projects, I dont think I have time for telemetry. Thanks. On Wed, Nov 20, 2019, 23:26 Matt Riedemann wrote: > On 11/18/2019 5:46 AM, Rong Zhu wrote: > > I am the current Telemetry PTL, could you please add me to > > ceilometer-stable-maint group. And please also add Lingxian Kong to this > > group. > > Done. Shall I also remove everyone else from this list? > > https://review.opendev.org/#/admin/groups/533,members > > I don't think any of those people work directly on OpenStack anymore. > > -- > > Thanks, > > Matt > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gr at ham.ie Wed Nov 20 15:49:29 2019 From: gr at ham.ie (Graham Hayes) Date: Wed, 20 Nov 2019 15:49:29 +0000 Subject: [tc] Recent updates and action items In-Reply-To: <4e0fa644566efc53d9715f321b6aa17a8a0a1447.camel@evrard.me> References: <4e0fa644566efc53d9715f321b6aa17a8a0a1447.camel@evrard.me> Message-ID: <04a68b4d-581d-bab8-d448-9334a7b3938e@ham.ie> On 20/11/2019 09:20, Jean-Philippe Evrard wrote: > Hello TC members, > > There are still business opportunities merging for 2019. The process > for 2020 need to start soon. > > We have action items from the Summit and PTG [1] to take care. Please > review etherpads and check if any part actions that you already been > assigned or whould like to help. Here are a few action items already if > you want to help: > - ttx: will be working on the technical vision reflection, as we > arrive at the end of the road (2019). > - gmann: will be working on the multiple goals he has, including > python versions. A little help would be welcomed I guess :) > - mnaser: will be working on telemetry and an oslo project to > expose metrics for prometheus. (If I didn't get that one wrong) > - jungleboyj: will write a blog post about the analysis of the > Foundation user survey. We should read it when it's ready and prepare > the next one > - mugsie: is writing a new proposal for release naming based on > what we discussed at the summit This is done - https://review.opendev.org/#/c/695071/ I encourage anyone with an interest in naming to leave comments there, or reply to this thread. > - We all should think about stable policy and what to do about it, > based on the feedback from TheJulia/smcginnis/release+requirements > team. Maybe diablo_rojo can follow this? > - everyone: we should promote the first contact sig to > organisations that might need help on getting things achieved in > OpenStack. > - zaneb: will propose documentation of the process on how TC > members should vote on conflicting resolutions when there is no > predetermined consensus > - mnaser: will be the point of contact for the infra matters of > static hosting > - ttx: will investigate a merge tc/uc > - evrardjp: will start the new ideas concept that was floated > before the summit and got traction during the summit. > > > PowerVMStackers is now a SIG [2]. However, mnaser suggested to move to > a multi-arch SIG. Mohammed, do you want to get started on this? > > V release naming process is now started [3]. > > [1] https://etherpad.openstack.org/p/PVG-TC-PTG > [2] https://review.opendev.org/#/c/680438/ > [3] https://review.opendev.org/#/c/693266/ > > Regards, > Jean-Philippe & Rico > > From rico.lin.guanyu at gmail.com Wed Nov 20 15:54:42 2019 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Wed, 20 Nov 2019 23:54:42 +0800 Subject: [auto-scaling][self-healing] Discussion to merge two SIG to one Message-ID: Dear all As we discussed in PTG about merge two SIG to one. I would like to continue the discussion on ML. In PTG, Eric proposes the idea to merge two SIG due to the high overlapping of domains and tasks. I think this is a great idea since, over the last 6 months, most of the discussions in both SIG are overlapped. So I'm onboard with this idea. Here's how I think we can continue this idea: 1. Create new SIG (maybe 'Automation SIG'? feel free to propose name which can cover both interest.) 2. Redirect docs and wiki to new SIG. And rework on index so there will be no confusion 3. Move repos from both SIGs to new SIG 4. Mark auto-scaling SIG and self-healing SIG as inactive. 5. remove auto-scaling SIG and self-healing SIG after a reasonable waiting time Let us know what you think about this. Otherwise, we definitely expect this to happen soon. -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Nov 20 16:15:10 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 20 Nov 2019 10:15:10 -0600 Subject: [all][tc] Planning for dropping the Python2 support in OpenStack In-Reply-To: <90b63305-a379-93d6-c1ae-693d20c3fe24@suse.com> References: <16dd0a42b8d.e847dd3e124645.6364180516762707559@ghanshyammann.com> <16dfe4467a4.db6f72ec168733.7542022367023887408@ghanshyammann.com> <16dff41292e.11b7e81b1177136.7669214833037569841@ghanshyammann.com> <16e19144cf0.f6b07849311271.7773306777497055114@ghanshyammann.com> <20191030004035.rsuegdsij2eezps3@mthode.org> <16e1d9f5df9.e9dfc74911451.4806654031763681992@ghanshyammann.com> <16e848187b9.10540b0f433809.8696342144139236610@ghanshyammann.com> <16e866f0345.b945094345130.7689728518186402120@ghanshyammann.com> <16e87db6527.12abd4d2b50644.2222213273226617419@ghanshyammann.com> <90b63305-a379-93d6-c1ae-693d20c3fe24@suse.com> Message-ID: <16e89989e5f.f81e780780731.5842142892865621719@ghanshyammann.com> ---- On Wed, 20 Nov 2019 04:30:06 -0600 Andreas Jaeger wrote ---- > On 20/11/2019 09.08, Ghanshyam Mann wrote: > > [...] > > * barbican- https://review.opendev.org/#/c/695052/ > > - Some strange behaviour is happening in this, barbican grenade job run stable/train run.yaml always which is stoping this job to move to py3 > > AFAIU, the stable jobs have a branch matcher, see > https://opendev.org/openstack/barbican/src/branch/stable/train/.zuul.yaml#L138 > - and thus apply. > > You need https://review.opendev.org/689458 and friends merged for all > branches to fix this, Thanks Andreas, that was something i was suspecting but could not understand how zuul picked the job inventory from stable/train every time and not from master where the master is branchless so should be eligible for the master gate. One more thing, only run.yaml is taken from stable/train and rest all like pre.yaml etc is from the master branch. This is another the mystery I would like to learn :). -gmann > > Andreas > -- > Andreas Jaeger aj at suse.com Twitter: jaegerandi > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 > > From deepa.kr at fingent.com Wed Nov 20 12:35:31 2019 From: deepa.kr at fingent.com (Deepa) Date: Wed, 20 Nov 2019 18:05:31 +0530 Subject: Freezer Project Update In-Reply-To: References: <003201d59aaf$cf415fa0$6dc41ee0$@fingent.com> <14B06106-F3AB-47DA-BD35-99B18383C147@dkrz.de> Message-ID: <005401d59f9f$025a81c0$070f8540$@fingent.com> Hello Amjad/James We tried installing Freezer .Freezer-scheduler and Freezer-agent in a VM which need to be backed up and Freezer-api and freezer-webui on controller node.The version of Openstack is Train. Unfortunately nothing worked out ☹ Getting below error on VM (client to be backed up) when I run freezer-agent --action info Critical Error: Authorization Failure. Authorization Failed: Not Found (HTTP 404) (Request-ID: req-0c71d8b4-ef1a-4c8d-8d12-26df763f5085) And getting error in Dashboard when we enabled Freezer-api and freezer-webui in dashboard During handling of the above exception ([*] Error 401: {"error": {"message": "The request you have made requires authentication.", "code": 401, "title": "Unauthorized"}}), another exception occurred: Doubt the issue is with keystone versioning v2/v3.It will be great if you can share or tell freeze.env file for Freezer-agent (For client VM) and freezer-api.conf file parameters. Also what should be admin.rc file for freezer-webui. freezer-scheduler --config-file /etc/freezer/scheduler.conf start Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Not Found (HTTP 404) (Request-ID: req-2adb597d-7ad5-45a8-9888-6b552c8e55cc) Any guidance is highly appreciated . Thanks a lot Regards, Deepa K R From: James Page Sent: Tuesday, November 19, 2019 10:19 PM To: Amjad Kotobi Cc: Deepa ; OpenStack Development Mailing List (not for usage questions) Subject: Re: Freezer Project Update Hello On Fri, Nov 15, 2019 at 7:43 PM Amjad Kotobi > wrote: Hi, This project is pretty much in production state, from last summit it got active again from developer ends, we are using it for backup solution too. Great to hear that Freezer is getting some increased developer focus! Documentation side isn’t that bright, very soon gonna get updated, anyhow you are able to install as standalone project in instance, I did it manually, didn’t use any provision tools. Let me know for specific part of deployment that is not clear. Amjad On 14. Nov 2019, at 06:53, Deepa > wrote: Hello Team Good Day I am Deepa from Fingent Global Solutions and we are a big fan of Openstack and we do have 4 + openstack setup (including production) We have deployed Openstack using juju and Maas .So when we check for backup feasibility other than cinder-backup we were able to see Freezer Project. But couldn’t find any charms for it in juju charms. Also there isn’t a clear documentation on how to install freezer . https://docs.openstack.org/releasenotes/freezer/train.html. No proper release notes in the latest version as well. Can you please tell me whether this project is in developing state? Whether charms will be added to juju in future. Freezer is not currently on the plan for OpenStack Charms for Ussuri. Better install documentation and support from Linux distros would be a good first step in the right direction. Cheers James -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Nov 20 16:54:23 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 20 Nov 2019 10:54:23 -0600 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> Message-ID: <16e89bc83a4.128a8586282411.2326681975156905993@ghanshyammann.com> ---- On Wed, 20 Nov 2019 01:18:39 -0600 Zane Bitter wrote ---- > On 18/11/19 9:18 pm, Ghanshyam Mann wrote: > > ---- On Mon, 18 Nov 2019 19:17:04 -0600 Zane Bitter wrote ---- > > > On 18/11/19 5:35 pm, Ben Nemec wrote: > > > > > > > > > > > > On 11/18/19 4:08 PM, Matt Riedemann wrote: > > > >> On 11/18/2019 3:40 PM, Mohammed Naser wrote: > > > >>> The proposal that I had was that in mind would be for us to let teams > > > >>> self manage their own stable branches. I think we've reached a point > > > >>> where we can trust most of our community to be familiar with the > > > >>> stable branch policy (and let teams decide for themselves what they > > > >>> believe is best for the success of their own projects). > > > >> > > > >> So for a project like nova that has a separate nova-core [1] and > > > >> nova-stable-maint team [2] where some from [2] aren't in [1], what > > > >> does this mean? Drop [2] and just rely on [1]? That won't work for > > > >> those in nova-core that aren't familiar enough with the stable branch > > > >> guidelines or simply don't care to review stable branch changes, and > > > >> won't work for those that are in nova-stable-maint but not nova-core. > > > > > > > > I believe the proposal is to allow the Nova team to manage > > > > nova-stable-maint in the same way they do nova-core, not to force anyone > > > > to drop their stable-maint team entirely. > > > > > > I think the proposal was actually for each *-stable-maint team to manage > > > itself. This would avoid the situation where e.g. the TC appoints a > > > brand-new PTL and suddenly they get to make themselves a stable core, as > > > in that case the team would still have to be bootstrapped by the > > > stable-maint team. But it would allow those who are both closest to the > > > project and confirmed to be familiar with the stable guidelines to make > > > decisions about who else is ready to join that group. > > > > > > I am still finding difficult to understand the change and how it will solve the current problem. > > > > The current problem is: > > * Fewer contributors in the stable-maintenance team (core stable team and project side stable team) > > which is nothing but we have fewer contributors who understand the stable policies. > > > > * The stable policies are not the problem so we will stick with current stable policies across all the projects. Stable > > policies have to be maintained at single place for consistency in backports across projects. > > > > If we are moving the stable maintenance team ownership from current stable-maintenance team to project side then, > > how it will solve the issue, does it enable more contributors to understand the stable policy and extend the team? > > Yes. > > > if yes, then why it cannot happen with current model? > > Because the core stable team is necessarily not as familiar with the > review/backport history of contributors in every project as the > individual project stable team is with contributors in each project. > > > If the project team or PTL making its core member get > > more familiar with the stable policy and add as a stable core team then why it cannot happen with the current model. > > > > For example, if I am PTL or core of any project and finding hard to get my backport merged then I or my project team core > > should review more stable branch patches and propose them in stable team core. > > I have tried that with only very limited success. > > > If we move the stable team ownership to the projects side then I think PTL is going to do the same. Ask the team members > > to understand the stable policies and do more review and then add them in stable core team. If any member know the stable > > policies then directly add. > > You make it sound like that's not a good thing? This is a good thing but I am saying why it cannot happen in the current model? If I am PTL of x project then and find hard to merge my backport I can ask for stable team volunteer from my core/active contributors and then they learn/onboard to stable team. > > > I feel that the current problem cannot be solved by moving the ownership of the team, we need to encourage more and more > > developers to become stable core in existing model especially from projects who find difficulties in merging their backport. > > In my experience at least there's no shortage of people willing to do > the work, but there is a severe shortage of people willing to do the > work of climbing over the bar to get permission to do the work. > > The position espoused by Tony and Matt at least is that those shouldn't > be different things, and in principle that's correct, but in practice > they are. Humans are weird. humm, I feel same can exist in new model also. Not all existing core going to become the stable core for that project. PTL or someone has to filter out the list and add the new members to list where same issue can happen. I feel if stable policies are going to be the same then we should not change the model which is working fine from user perspective. -gmann > > > One more thing, do we have data that how much time as avg it take to merge the backport and what all projects facing the backport merge > > issue ? > > > > -gmann > > > > > > > > - ZB > > > > > > >> > > > >> [1] https://review.opendev.org/#/admin/groups/25,members > > > >> [2] https://review.opendev.org/#/admin/groups/540,members > > > >> > > > > > > > > > > > > > > > > > From gmann at ghanshyammann.com Wed Nov 20 17:08:40 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 20 Nov 2019 11:08:40 -0600 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> Message-ID: <16e89c9969c.11900a9e282934.1452316154561550482@ghanshyammann.com> ---- On Wed, 20 Nov 2019 08:21:29 -0600 Matt Riedemann wrote ---- > On 11/20/2019 1:18 AM, Zane Bitter wrote: > > Because the core stable team is necessarily not as familiar with the > > review/backport history of contributors in every project as the > > individual project stable team is with contributors in each project. > > This is assuming that each project has a stable core team already, which > a lot don't, that's why we get a lot of "hi I'm the PTL du jour on > project X now please make me stable core even though I've never reviewed > any stable branch changes before". > > With Tony more removed these days and I myself not wanting to vet every > one of these "add me to the stable core team" requests, I'm more or less > OK with the proposal so that it removes me as a bottleneck. That might > mean people merge things on stable branches for their projects that > don't follow the guidelines but so be it. If it's a problem hopefully > they'll hear about it from their consumers, but if the project is in > such maintenance mode anyway that they can break the stable guidelines, > then they might not have many external consumers to complain anyway. > Either way I don't need to be involved. This is the main problem going to be and I am worried about it. We had the great stable policies with a dedicated team maintaining it well. 'great' and 'well' I am writing from the user perspective where they get applicable backport which does not break their within-in-release upgrade. If backport is delayed it is fine for them as compared to breaking backport. OpenStack has a very well known problem of inconsistent APIs. Inconsistency is from usage, interop, debug perspective. All projects define its own definition of APIs (new API, changes, discoverability etc) which we say it is fine because that project wants to do that way but the user faces the problem. We have not solved this problem yet due to various reasons. IMO, this problem could have solved or minimized if we had "Single mandatory way to write and maintain your API and central team to very that" from *starting* instead of project decide its own way or recommended guidelines only. The same case is for stable backports, we have a "Single mandatory way to backport the changes and central team to verify that". And due to that, OpenStack is more stable on the backports business. Why we are changing that? Stable backport is more of maintaining the quality and no-break things than merging a large amount of backports and *fast*. -gmann > > So sure, +1 from me on the proposal given nova can still do what it's > already been doing with a specific stable maint core team ACL in Gerrit. > > -- > > Thanks, > > Matt > > From gmann at ghanshyammann.com Wed Nov 20 17:19:03 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 20 Nov 2019 11:19:03 -0600 Subject: [tc][stable] Changing stable branch policy In-Reply-To: References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <80b9c92c-be69-7c96-291a-702a7a8c6498@openstack.org> Message-ID: <16e89d31a7f.10b5b7e1383285.2554681040254942322@ghanshyammann.com> ---- On Tue, 19 Nov 2019 11:58:12 -0600 Mohammed Naser wrote ---- > On Tue, Nov 19, 2019 at 6:16 AM Thierry Carrez wrote: > > > > Ghanshyam Mann wrote: > > > [...] > > > I am still finding difficult to understand the change and how it will solve the current problem. > > > > > > The current problem is: > > > * Fewer contributors in the stable-maintenance team (core stable team and project side stable team) > > > which is nothing but we have fewer contributors who understand the stable policies. > > > > > > * The stable policies are not the problem so we will stick with current stable policies across all the projects. > > > Stable policies have to be maintained at single place for consistency in backports across projects. > > > [...] > > I don't think that this the problem this change wants to solve. > > > > Currently the stable-core team is perceived as a bottleneck to getting > > more people into project-specific stable teams, or keeping those teams > > membership up to date. As a result stable maintenance is still seen in > > some teams as an alien thing, rather than an integral team duty. > > > > I suspect that by getting out of the badge-granting game, stable-core > > could focus more on stable policy definition and education, and review > > how well or bad each team does on the stable front. Because reviewing > > backports for stable branch suitability is just one part of doing stable > > branch right -- the other is to actively backport relevant patches. > > > > Personally, the main reason I support this change is that we have too > > much "ask for permission" things in OpenStack today, something that was > > driven by a code-review-for-everything culture. So the more we can > > remove the need to ask for permission to do some work, the better. > > For context, I thought I'd gather my thoughts to explain the idea best and > woke up to this well summarized email by Thierry. I agree with this and the > intention is indeed what Thierry is mentioning here. I can understand your point and for some area I agree but not for stable policies case :). IMO, "ask for permission" is good when the stability of the software come into pic. For any proprietary software development we have a lot of "ask for permission" from the various team like QA, Requirement verification, Audit, patching up the fixes on production. Those are mainly to maintain the quality and stability of the software. If we give all the power of deciding all these things to a developer then you can imagine the situation. I know we have to trust the developer in Open Source to maintain all these areas in their code but that could be easy if OpenStack could have been a single software project. Because OpenStack is ~50 projects, our main challenge is to maintain the consistency among them via centric team to enforce and verify the key area like stability etc. It is not easy to maintain a centric team especially when OpenStack facing the less contributors issue. but as long as stable policies things are completely broken I think we should not change it. -gmann > > > -- > > Thierry Carrez (ttx) > > > > From Albert.Braden at synopsys.com Wed Nov 20 19:02:06 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Wed, 20 Nov 2019 19:02:06 +0000 Subject: All VMs fail when --max exceeds available resources Message-ID: I was experimenting in our dev cluster with CPU pinning and filters. After I was done I ran Ansible to put everything back like it was, but the scheduler is broken in 2 ways and I can't find the problem in my config. The first symptom is that if I use -max to create more VMs than the hypervisors can support, all of them go to ERROR. Before I changed things, --max would fill the hypervisors and only the extra VMs would go to ERROR. I'll email separately about the other one; this is already getting long. When I look at the logs, I see the "Starting to schedule" line with the list of instance UUIDs, and then "Attempting to claim resources" log entries. Hosts are selected; they are starting with 0 instances: us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:56.880 1409571 DEBUG nova.scheduler.filter_scheduler [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] [instance: ef1f7493-f792-4c3b-bf50-8e68b3d553ac] Selected host: (us01odc-dev1-hv003, us01odc-dev1-hv003.internal.synopsys.com) ram: 120696MB disk: 986112MB io_ops: 0 instances: 0 _consume_selected_host /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:346 Scheduler starting with the correct number of HV: us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:56.888 1409571 DEBUG nova.filters [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Starting with 3 host(s) get_filtered_objects /usr/lib/python2.7/dist-packages/nova/filters.py:70 I see the HV being weighed and the number of instances on each HV increasing from 0: us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:57.106 1409571 DEBUG nova.scheduler.filter_scheduler [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Filtered [(us01odc-dev1-hv002, us01odc-dev1-hv002.internal.synopsys.com) ram: 85880MB disk: 880640MB io_ops: 1 instances: 2, (us01odc-dev1-hv001, us01odc-dev1-hv001.internal.synopsys.com) ram: 89976MB disk: 932864MB io_ops: 0 instances: 1, (us01odc-dev1-hv003, us01odc-dev1-hv003.internal.synopsys.com) ram: 89976MB disk: 929792MB io_ops: 1 instances: 1] _get_sorted_hosts /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:435 us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:57.107 1409571 DEBUG nova.scheduler.filter_scheduler [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Weighed [WeighedHost [host: (us01odc-dev1-hv001, us01odc-dev1-hv001.internal.synopsys.com) ram: 89976MB disk: 932864MB io_ops: 0 instances: 1, weight: 2.40576402895], WeighedHost [host: (us01odc-dev1-hv003, us01odc-dev1-hv003.internal.synopsys.com) ram: 89976MB disk: 929792MB io_ops: 1 instances: 1, weight: 1.11693447844], WeighedHost [host: (us01odc-dev1-hv002, us01odc-dev1-hv002.internal.synopsys.com) ram: 85880MB disk: 880640MB io_ops: 1 instances: 2, weight: 0.961725168625]] _get_sorted_hosts /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:454 The number of instances goes beyond the HV capacity: us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:57.602 1409571 DEBUG nova.scheduler.filter_scheduler [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Weighed [WeighedHost [host: (us01odc-dev1-hv001, us01odc-dev1-hv001.internal.synopsys.com) ram: 59256MB disk: 876544MB io_ops: 1 instances: 2, weight: 1.17395901159], WeighedHost [host: (us01odc-dev1-hv003, us01odc-dev1-hv003.internal.synopsys.com) ram: 59256MB disk: 873472MB io_ops: 2 instances: 2, weight: 0.435549629144], WeighedHost [host: (us01odc-dev1-hv002, us01odc-dev1-hv002.internal.synopsys.com) ram: 55160MB disk: 824320MB io_ops: 2 instances: 3, weight: 0.292945361349]] _get_sorted_hosts /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:454 Hosts are still being selected, but they are way over capacity already: us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:58.696 1409571 DEBUG nova.scheduler.filter_scheduler [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] [instance: 579dd4e2-d5d5-445f-905f-84cfd93146f6] Selected host: (us01odc-dev1-hv002, us01odc-dev1-hv002.internal.synopsys.com) ram: -6280MB disk: 711680MB io_ops: 4 instances: 5 _consume_selected_host /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:346 Then we start to see the warnings: us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:58.817 1409571 WARNING nova.scheduler.client.report [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Unable to submit allocation for instance 5de08185-1ef2-4c92-8a19-5f09ec27be71 (409 {"errors": [{"status": 409, "request_id": "req-6cdd0b7a-bdbd-486f-9792-20840ea4a72e", "code": "placement.undefined_code", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'MEMORY_MB' on resource provider 'f20fa03d-18f4-486b-9b40-ceaaf52dabf8'. The requested amount would exceed the capacity. ", "title": "Conflict"}]}) And then they all fail and are deleted: us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:59.057 1409571 DEBUG nova.scheduler.filter_scheduler [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Unable to successfully claim against any host. _schedule /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:242 us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:59.058 1409571 DEBUG nova.scheduler.filter_scheduler [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Cleaning up allocations for [u'ef1f7493-f792-4c3b-bf50-8e68b3d553ac', u'80afcfd7-ce25-4fc4-8b0d-b581a31b87bd', u'22cf9509-39e8-456a-b32c-e950cc597266', u'd6fc0b66-cd44-410d-96c9-590c22f1e21b', u'713a12fb-8bbf-4eac-ae02-68cb007fa34e', u'30c70c4d-1447-4481-bdc5-816835180ac6', u'be955f43-ac56-4531-ada8-16ce966211c7', u'b91820f6-775e-4abb-b28e-5b9b065819c2', u'83472954-2197-4e65-b7e7-1fe892f28458', u'd862bd74-494b-4c29-94ad-b80fc6c113e8', u'80a69dba-8011-403f-87fb-c39ef17ba467', u'c440d8d1-4da2-4c27-af9f-dd5afa19d083', u'5b301d69-a4f1-4393-8d67-4286c9873490', u'579dd4e2-d5d5-445f-905f-84cfd93146f6'] _cleanup_allocations /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:299 us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:59.124 1409571 INFO nova.scheduler.client.report [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Deleted allocation for instance ef1f7493-f792-4c3b-bf50-8e68b3d553ac us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:59.190 1409571 INFO nova.scheduler.client.report [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Deleted allocation for instance 80afcfd7-ce25-4fc4-8b0d-b581a31b87bd And the errors: us01odc-dev1-ctrl1:/var/log/nova/nova-conductor.log:2019-11-19 11:40:00.201 1801903 WARNING nova.scheduler.utils [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available. us01odc-dev1-ctrl1:/var/log/nova/nova-conductor.log:2019-11-19 11:40:00.205 1801903 WARNING nova.scheduler.utils [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] [instance: 5b301d69-a4f1-4393-8d67-4286c9873490] Setting instance to ERROR state.: NoValidHost_Remote: No valid host was found. There are not enough hosts available. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Albert.Braden at synopsys.com Wed Nov 20 19:02:33 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Wed, 20 Nov 2019 19:02:33 +0000 Subject: All VMs fail when --max exceeds available resources Message-ID: The other symptom is that the scheduler will send single VMs to a full hypervisor and overload it even though we have cpu_allocation_ratio and ram_allocation_ratio set to 1: root at us01odc-dev1-ctrl1:~# os hypervisor list --long +----+------------------------------------------+-----------------+---------------+-------+------------+-------+----------------+-----------+ | ID | Hypervisor Hostname | Hypervisor Type | Host IP | State | vCPUs Used | vCPUs | Memory MB Used | Memory MB | +----+------------------------------------------+-----------------+---------------+-------+------------+-------+----------------+-----------+ | 1 | us01odc-dev1-hv003.internal.synopsys.com | QEMU | 10.195.116.16 | up | 42 | 16 | 161792 | 128888 | | 3 | us01odc-dev1-hv002.internal.synopsys.com | QEMU | 10.195.116.15 | up | 43 | 16 | 165888 | 128888 | | 4 | us01odc-dev1-hv001.internal.synopsys.com | QEMU | 10.195.116.14 | up | 38 | 16 | 161792 | 128888 | +----+------------------------------------------+-----------------+---------------+-------+------------+-------+----------------+-----------+ In the logs I see the scheduler returning 1 host: /var/log/nova/nova-scheduler.log:2019-11-19 16:38:20.930 895454 DEBUG nova.filters [req-0703f1f8-a52a-4fb6-a402-226dd25e9988 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Filter NUMATopologyFilter returned 1 host(s) get_filtered_objects /usr/lib/python2.7/dist-packages/nova/filters.py:104 It weighs the host and reports negative RAM: /var/log/nova/nova-scheduler.log:2019-11-19 16:38:20.930 895454 DEBUG nova.scheduler.filter_scheduler [req-0703f1f8-a52a-4fb6-a402-226dd25e9988 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Weighed [WeighedHost [host: (us01odc-dev1-hv002, us01odc-dev1-hv002.internal.synopsys.com) ram: -6280MB disk: 683008MB io_ops: 1 instances: 5, weight: 0.0]] _get_sorted_hosts /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:454 Then it selects that host: /var/log/nova/nova-scheduler.log:2019-11-19 16:38:20.931 895454 DEBUG nova.scheduler.utils [req-0703f1f8-a52a-4fb6-a402-226dd25e9988 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Attempting to claim resources in the placement API for instance 834dd112-26f8-424c-a92b-23423baa185a claim_resources /usr/lib/python2.7/dist-packages/nova/scheduler/utils.py:935 /var/log/nova/nova-scheduler.log:2019-11-19 16:38:21.567 895454 DEBUG nova.scheduler.filter_scheduler [req-0703f1f8-a52a-4fb6-a402-226dd25e9988 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] [instance: 834dd112-26f8-424c-a92b-23423baa185a] Selected host: (us01odc-dev1-hv002, us01odc-dev1-hv002.internal.synopsys.com) ram: -6280MB disk: 683008MB io_ops: 1 instances: 5 _consume_selected_host /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:346 The VM builds successfully and goes to ACTIVE. What should I be looking for here? Obviously I broke the scheduler, but my nova config is the same as the working cluster. -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Wed Nov 20 19:46:20 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 20 Nov 2019 13:46:20 -0600 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <16e89c9969c.11900a9e282934.1452316154561550482@ghanshyammann.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> <16e89c9969c.11900a9e282934.1452316154561550482@ghanshyammann.com> Message-ID: On 11/20/19 11:08 AM, Ghanshyam Mann wrote: > ---- On Wed, 20 Nov 2019 08:21:29 -0600 Matt Riedemann wrote ---- > > On 11/20/2019 1:18 AM, Zane Bitter wrote: > > > Because the core stable team is necessarily not as familiar with the > > > review/backport history of contributors in every project as the > > > individual project stable team is with contributors in each project. > > > > This is assuming that each project has a stable core team already, which > > a lot don't, that's why we get a lot of "hi I'm the PTL du jour on > > project X now please make me stable core even though I've never reviewed > > any stable branch changes before". > > > > With Tony more removed these days and I myself not wanting to vet every > > one of these "add me to the stable core team" requests, I'm more or less > > OK with the proposal so that it removes me as a bottleneck. That might > > mean people merge things on stable branches for their projects that > > don't follow the guidelines but so be it. If it's a problem hopefully > > they'll hear about it from their consumers, but if the project is in > > such maintenance mode anyway that they can break the stable guidelines, > > then they might not have many external consumers to complain anyway. > > Either way I don't need to be involved. > > This is the main problem going to be and I am worried about it. We had the great stable policies > with a dedicated team maintaining it well. 'great' and 'well' I am writing from the user perspective > where they get applicable backport which does not break their within-in-release upgrade. > If backport is delayed it is fine for them as compared to breaking backport. Is it? Important backports being delayed due to lack of stable reviewers still leaves consumers with a broken piece of software. I also think this is a bit of a false dichotomy. Nobody is suggesting that we start approving backports willy-nilly, just that we lower the barrier to entry for people who want to help with stable branch reviews. I would hope we can trust our teams to be responsible with their stable-maint list and not just start handing out +2 to anyone with a pulse. If not, I think we have a bigger problem, but let's cross that bridge if and when we come to it. > > > OpenStack has a very well known problem of inconsistent APIs. Inconsistency is from > usage, interop, debug perspective. All projects define its own definition of APIs (new API, > changes, discoverability etc) which we say it is fine because that project wants to do that way > but the user faces the problem. We have not solved this problem yet due to various reasons. > > IMO, this problem could have solved or minimized if we had "Single mandatory way to write > and maintain your API and central team to very that" from *starting* instead of project > decide its own way or recommended guidelines only. > > The same case is for stable backports, we have a "Single mandatory way to backport the changes > and central team to verify that". And due to that, OpenStack is more stable on the backports business. > Why we are changing that? This does not change the stable policy, and the existence of project-specific stable-maint teams means there was never a single central team reviewing all stable backports. Even if there had been, it's pretty clear that model doesn't work given the staffing constraints we are facing in almost every area, and which are only going to get worse after this cycle. The proposal removes some inflexible process that currently prevents contributors from becoming stable maintainers, it does _not_ mean we stop expecting stable maintainers to understand the stable policy. > > Stable backport is more of maintaining the quality and no-break things than merging a large amount > of backports and *fast*. > > -gmann > > > > > So sure, +1 from me on the proposal given nova can still do what it's > > already been doing with a specific stable maint core team ACL in Gerrit. > > > > -- > > > > Thanks, > > > > Matt > > > > > > From mriedemos at gmail.com Wed Nov 20 20:10:13 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 20 Nov 2019 14:10:13 -0600 Subject: All VMs fail when --max exceeds available resources In-Reply-To: References: Message-ID: <14fd42a9-19d2-3811-80cf-22aca04519c0@gmail.com> On 11/20/2019 1:02 PM, Albert Braden wrote: > The other symptom is that the scheduler will send single VMs to a full > hypervisor and overload it even though we have cpu_allocation_ratio and > ram_allocation_ratio set to 1: You're on Rocky correct? If allocation ratios are acting funky, you should read through this: https://docs.openstack.org/nova/rocky/admin/configuration/schedulers.html#bug-1804125 There were some changes in Stein to help with configuring nova to deal with allocation ratios per compute or via aggregate: https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#allocation-ratios But what you'll likely need to do is manage the allocation ratios in aggregate on the resource providers in placement. Fortunately there is a CLI for doing that: https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-inventory-set e.g. openstack resource provider inventory set --resource VCPU:allocation_ratio=1.0 --aggregate --amend Anyway, see if that documented bug with allocation ratios is your issue first and then go through the workarounds. -- Thanks, Matt From rosmaita.fossdev at gmail.com Wed Nov 20 20:12:36 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 20 Nov 2019 15:12:36 -0500 Subject: [cinder] Ussuri Virtual PTG next week Message-ID: <496969c3-982f-8277-3a44-0d2c73d2f37a@gmail.com> As announced at today's Cinder weekly meeting, the Virtual PTG will be held next week on two days: - Monday 25 November 1500-1700 UTC - Wednesday 27 November 1500-1700 UTC The planning etherpad is: https://etherpad.openstack.org/p/cinder-ussuri-virtual-ptg-planning Feel free to add topics. They can be new topics or something you want to follow up on from the Shanghai PTG. Format will be the usual, that is, we'll start at the beginning, give each topic what it takes, and finish when we are done. We'll use BlueJeans video conferencing: https://bluejeans.com/3228528973 If you haven't used BlueJeans before, ping me in #openstack-cinder and I can do a quick meeting with you to make sure you can connect OK. cheers, brian From Albert.Braden at synopsys.com Wed Nov 20 21:21:25 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Wed, 20 Nov 2019 21:21:25 +0000 Subject: All VMs fail when --max exceeds available resources In-Reply-To: <14fd42a9-19d2-3811-80cf-22aca04519c0@gmail.com> References: <14fd42a9-19d2-3811-80cf-22aca04519c0@gmail.com> Message-ID: Yes, we are on Rocky. If I'm reading correctly the document says that setting allocation ratios by aggregate may not work after Ocata, but we are setting them in nova.conf on the controller. That setting does appear to have failed. The settings are 1: root at us01odc-dev1-ctrl1:~# grep allocation_ /etc/nova/nova.conf cpu_allocation_ratio = 1 ram_allocation_ratio = 1.0 But the inventory shows different values: root at us01odc-dev1-ctrl1:~# os resource provider inventory list f20fa03d-18f4-486b-9b40-ceaaf52dabf8 +----------------+------------------+----------+----------+-----------+----------+--------+ | resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total | +----------------+------------------+----------+----------+-----------+----------+--------+ | VCPU | 16.0 | 16 | 2 | 1 | 1 | 16 | | MEMORY_MB | 1.5 | 128888 | 8192 | 1 | 1 | 128888 | | DISK_GB | 1.0 | 1208 | 246 | 1 | 1 | 1208 | +----------------+------------------+----------+----------+-----------+----------+--------+ I think the document is saying that we need to set them in nova.conf on each HV. I tried that and it seems to fix the allocation failure: root at us01odc-dev1-ctrl1:~# os resource provider inventory list f20fa03d-18f4-486b-9b40-ceaaf52dabf8 +----------------+------------------+----------+----------+-----------+----------+--------+ | resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total | +----------------+------------------+----------+----------+-----------+----------+--------+ | VCPU | 1.0 | 16 | 2 | 1 | 1 | 16 | | MEMORY_MB | 1.0 | 128888 | 8192 | 1 | 1 | 128888 | | DISK_GB | 1.0 | 1208 | 246 | 1 | 1 | 1208 | +----------------+------------------+----------+----------+-----------+----------+--------+ This fixed the "allocation ratio" issue but I still see the --max issue. What could be causing that? -----Original Message----- From: Matt Riedemann Sent: Wednesday, November 20, 2019 12:10 PM To: openstack-discuss at lists.openstack.org Subject: Re: All VMs fail when --max exceeds available resources On 11/20/2019 1:02 PM, Albert Braden wrote: > The other symptom is that the scheduler will send single VMs to a full > hypervisor and overload it even though we have cpu_allocation_ratio and > ram_allocation_ratio set to 1: You're on Rocky correct? If allocation ratios are acting funky, you should read through this: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_nova_rocky_admin_configuration_schedulers.html-23bug-2D1804125&d=DwIC-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=gMZHbn10OjlP7p38T4nIjXKKFJRwV8b1vbxdP_PSGlg&s=hLHu3N1jilahIhKN7TpazkEVjFSQX-YwtVvTos-h9BY&e= There were some changes in Stein to help with configuring nova to deal with allocation ratios per compute or via aggregate: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_nova_latest_admin_configuration_schedulers.html-23allocation-2Dratios&d=DwIC-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=gMZHbn10OjlP7p38T4nIjXKKFJRwV8b1vbxdP_PSGlg&s=Gb4j3hz2t9M_BDmhIMvg2BQiXcg5CEYAdMlj-PFZygQ&e= But what you'll likely need to do is manage the allocation ratios in aggregate on the resource providers in placement. Fortunately there is a CLI for doing that: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_osc-2Dplacement_latest_cli_index.html-23resource-2Dprovider-2Dinventory-2Dset&d=DwIC-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=gMZHbn10OjlP7p38T4nIjXKKFJRwV8b1vbxdP_PSGlg&s=tsaZIozBHkiBjNvbZGRvyuRMQOKe23zN2ouP3uOi8ag&e= e.g. openstack resource provider inventory set --resource VCPU:allocation_ratio=1.0 --aggregate --amend Anyway, see if that documented bug with allocation ratios is your issue first and then go through the workarounds. -- Thanks, Matt From mriedemos at gmail.com Wed Nov 20 22:00:29 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 20 Nov 2019 16:00:29 -0600 Subject: All VMs fail when --max exceeds available resources In-Reply-To: References: <14fd42a9-19d2-3811-80cf-22aca04519c0@gmail.com> Message-ID: <6731d279-70b9-6a5d-886e-5fe9a517a21b@gmail.com> On 11/20/2019 3:21 PM, Albert Braden wrote: > I think the document is saying that we need to set them in nova.conf on each HV. I tried that and it seems to fix the allocation failure: > > root at us01odc-dev1-ctrl1:~# os resource provider inventory list f20fa03d-18f4-486b-9b40-ceaaf52dabf8 > +----------------+------------------+----------+----------+-----------+----------+--------+ > | resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total | > +----------------+------------------+----------+----------+-----------+----------+--------+ > | VCPU | 1.0 | 16 | 2 | 1 | 1 | 16 | > | MEMORY_MB | 1.0 | 128888 | 8192 | 1 | 1 | 128888 | > | DISK_GB | 1.0 | 1208 | 246 | 1 | 1 | 1208 | > +----------------+------------------+----------+----------+-----------+----------+--------+ Yup, the config on the controller doesn't apply to the computes or placement because the computes are what report the inventory to placement so you have to configure the allocation ratios there, or starting in stein via (resource provider) aggregate. > > This fixed the "allocation ratio" issue but I still see the --max issue. What could be causing that? That's something else yeah? I didn't quite dig into that email and the allocation ratio thing popped up to me since it's been a long standing known painful issue/behavior change since Ocata. One question though, I read your original email as essentially "(1) I did x and got some failures, then (2) I changed something and now everything fails", but are you running from a clean environment in both test scenarios because if you have VMs on the computes when you're doing (2) then that's going to change the scheduling results in (2), i.e. the computes will have less capacity since there are resources allocated on them in placement. -- Thanks, Matt From Albert.Braden at synopsys.com Wed Nov 20 23:16:54 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Wed, 20 Nov 2019 23:16:54 +0000 Subject: All VMs fail when --max exceeds available resources In-Reply-To: <6731d279-70b9-6a5d-886e-5fe9a517a21b@gmail.com> References: <14fd42a9-19d2-3811-80cf-22aca04519c0@gmail.com> <6731d279-70b9-6a5d-886e-5fe9a517a21b@gmail.com> Message-ID: The expected result (that I was seeing last week) is that, if my cluster has capacity for 4 VMs and I use --max 5, 4 will go active and 1 will go to error. This week all 5 are going to error. I can still build 4 VMs of that flavor, one at a time, or use --max 4, but if I use --max 5, then all 5 will fail. If I use smaller VMs, the --max numbers get bigger but I still see the same symptom. The --max thing is pretty useful and we use it a lot; it allows us to use up the cluster without knowing exactly how much space we have. -----Original Message----- From: Matt Riedemann Sent: Wednesday, November 20, 2019 2:00 PM To: openstack-discuss at lists.openstack.org Subject: Re: All VMs fail when --max exceeds available resources On 11/20/2019 3:21 PM, Albert Braden wrote: > I think the document is saying that we need to set them in nova.conf on each HV. I tried that and it seems to fix the allocation failure: > > root at us01odc-dev1-ctrl1:~# os resource provider inventory list f20fa03d-18f4-486b-9b40-ceaaf52dabf8 > +----------------+------------------+----------+----------+-----------+----------+--------+ > | resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total | > +----------------+------------------+----------+----------+-----------+----------+--------+ > | VCPU | 1.0 | 16 | 2 | 1 | 1 | 16 | > | MEMORY_MB | 1.0 | 128888 | 8192 | 1 | 1 | 128888 | > | DISK_GB | 1.0 | 1208 | 246 | 1 | 1 | 1208 | > +----------------+------------------+----------+----------+-----------+----------+--------+ Yup, the config on the controller doesn't apply to the computes or placement because the computes are what report the inventory to placement so you have to configure the allocation ratios there, or starting in stein via (resource provider) aggregate. > > This fixed the "allocation ratio" issue but I still see the --max issue. What could be causing that? That's something else yeah? I didn't quite dig into that email and the allocation ratio thing popped up to me since it's been a long standing known painful issue/behavior change since Ocata. One question though, I read your original email as essentially "(1) I did x and got some failures, then (2) I changed something and now everything fails", but are you running from a clean environment in both test scenarios because if you have VMs on the computes when you're doing (2) then that's going to change the scheduling results in (2), i.e. the computes will have less capacity since there are resources allocated on them in placement. -- Thanks, Matt From mriedemos at gmail.com Wed Nov 20 23:51:51 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 20 Nov 2019 17:51:51 -0600 Subject: All VMs fail when --max exceeds available resources In-Reply-To: References: <14fd42a9-19d2-3811-80cf-22aca04519c0@gmail.com> <6731d279-70b9-6a5d-886e-5fe9a517a21b@gmail.com> Message-ID: <4aca7dfc-86f8-722c-7d41-d8f610fe2564@gmail.com> On 11/20/2019 5:16 PM, Albert Braden wrote: > The expected result (that I was seeing last week) is that, if my cluster has capacity for 4 VMs and I use --max 5, 4 will go active and 1 will go to error. This week all 5 are going to error. I can still build 4 VMs of that flavor, one at a time, or use --max 4, but if I use --max 5, then all 5 will fail. If I use smaller VMs, the --max numbers get bigger but I still see the same symptom. > > The --max thing is pretty useful and we use it a lot; it allows us to use up the cluster without knowing exactly how much space we have. OK so I think you're hitting this with the NoValidHost error: https://github.com/openstack/nova/blob/18.0.0/nova/conductor/manager.py#L1209 And that's putting all of the instances into ERROR status even though 4 out of the 5 did successfully allocate resources in the scheduler. The scheduler would have rolled back the allocations here if it couldn't fit everything: https://github.com/openstack/nova/blob/18.0.0/nova/scheduler/filter_scheduler.py#L276 Which release did you say that the --max 5 scenario worked where 4 would be successfully built but the remaining one would go to ERROR status? I'm just trying to figure out where/when the regression in behavior occurred. -- Thanks, Matt From mriedemos at gmail.com Thu Nov 21 00:02:11 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 20 Nov 2019 18:02:11 -0600 Subject: All VMs fail when --max exceeds available resources In-Reply-To: <4aca7dfc-86f8-722c-7d41-d8f610fe2564@gmail.com> References: <14fd42a9-19d2-3811-80cf-22aca04519c0@gmail.com> <6731d279-70b9-6a5d-886e-5fe9a517a21b@gmail.com> <4aca7dfc-86f8-722c-7d41-d8f610fe2564@gmail.com> Message-ID: On 11/20/2019 5:51 PM, Matt Riedemann wrote: > Which release did you say that the --max 5 scenario worked where 4 would > be successfully built but the remaining one would go to ERROR status? > I'm just trying to figure out where/when the regression in behavior > occurred. Reading back on your original email, I guess it's the same release (Rocky). I can't really understand how you got the first scenario where you used --max 5 and 4 were built but one failed and was put into ERROR status, especially if the environment and server create request is made the same way. Given the links in my previous email, I would expect them all to go to ERROR status when the scheduler raises NoValidHost. And yeah that's likely a regression, but the inconsistent behavior is what is weird to me. -- Thanks, Matt From melwittt at gmail.com Thu Nov 21 00:04:59 2019 From: melwittt at gmail.com (melanie witt) Date: Wed, 20 Nov 2019 16:04:59 -0800 Subject: All VMs fail when --max exceeds available resources In-Reply-To: References: <14fd42a9-19d2-3811-80cf-22aca04519c0@gmail.com> <6731d279-70b9-6a5d-886e-5fe9a517a21b@gmail.com> Message-ID: On 11/20/19 15:16, Albert Braden wrote: > The expected result (that I was seeing last week) is that, if my cluster has capacity for 4 VMs and I use --max 5, 4 will go active and 1 will go to error. This week all 5 are going to error. I can still build 4 VMs of that flavor, one at a time, or use --max 4, but if I use --max 5, then all 5 will fail. If I use smaller VMs, the --max numbers get bigger but I still see the same symptom. The behavior you're describing is an old issue described here: https://bugs.launchpad.net/nova/+bug/1458122 I don't understand how it's possible that you saw the 4 active 1 in error behavior last week. The behavior has been "error all" since 2015, at least. Unless there's some kind of race condition bug happening, maybe. Did you consistently see it fulfill less than --max last week or was it just once? Changing the behavior would be an API change so it would need a spec and new microversion, I think. It's been an undesirable behavior for a long time but it seemingly hasn't been enough of a pain point for someone to sign up and do the work. -melanie > The --max thing is pretty useful and we use it a lot; it allows us to use up the cluster without knowing exactly how much space we have. > > -----Original Message----- > From: Matt Riedemann > Sent: Wednesday, November 20, 2019 2:00 PM > To: openstack-discuss at lists.openstack.org > Subject: Re: All VMs fail when --max exceeds available resources > > On 11/20/2019 3:21 PM, Albert Braden wrote: >> I think the document is saying that we need to set them in nova.conf on each HV. I tried that and it seems to fix the allocation failure: >> >> root at us01odc-dev1-ctrl1:~# os resource provider inventory list f20fa03d-18f4-486b-9b40-ceaaf52dabf8 >> +----------------+------------------+----------+----------+-----------+----------+--------+ >> | resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total | >> +----------------+------------------+----------+----------+-----------+----------+--------+ >> | VCPU | 1.0 | 16 | 2 | 1 | 1 | 16 | >> | MEMORY_MB | 1.0 | 128888 | 8192 | 1 | 1 | 128888 | >> | DISK_GB | 1.0 | 1208 | 246 | 1 | 1 | 1208 | >> +----------------+------------------+----------+----------+-----------+----------+--------+ > > Yup, the config on the controller doesn't apply to the computes or > placement because the computes are what report the inventory to > placement so you have to configure the allocation ratios there, or > starting in stein via (resource provider) aggregate. > >> >> This fixed the "allocation ratio" issue but I still see the --max issue. What could be causing that? > > That's something else yeah? I didn't quite dig into that email and the > allocation ratio thing popped up to me since it's been a long standing > known painful issue/behavior change since Ocata. > > One question though, I read your original email as essentially "(1) I > did x and got some failures, then (2) I changed something and now > everything fails", but are you running from a clean environment in both > test scenarios because if you have VMs on the computes when you're doing > (2) then that's going to change the scheduling results in (2), i.e. the > computes will have less capacity since there are resources allocated on > them in placement. > From iwienand at redhat.com Thu Nov 21 00:15:09 2019 From: iwienand at redhat.com (Ian Wienand) Date: Thu, 21 Nov 2019 11:15:09 +1100 Subject: [meta-sig][multi-arch] propose forming a Multi-arch SIG In-Reply-To: References: Message-ID: <20191121001509.GB976114@fedora19.localdomain> On Wed, Nov 20, 2019 at 06:03:03PM +0800, Rico Lin wrote: > If you're also interested in that group, please reply to this email, > introduce yourself and tell us what you would like the group scope and > objectives to be, and what you can contribute to the group. I have worked with some upstream people such as hrw to get diskimage-builder working with EFI and ARM64, and setup the infra to build ARM64 nodes on Linaro's donated resources. Although infra is all an open book in terms of configuration, etc. and anyone can contribute, I'll be happy to help with issues or mentor anyone on using the gate resources we have available. Cheers, -i From gmann at ghanshyammann.com Thu Nov 21 01:04:40 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 20 Nov 2019 19:04:40 -0600 Subject: [all][tc] Planning for dropping the Python2 support in OpenStack In-Reply-To: <16e89989e5f.f81e780780731.5842142892865621719@ghanshyammann.com> References: <16dd0a42b8d.e847dd3e124645.6364180516762707559@ghanshyammann.com> <16dfe4467a4.db6f72ec168733.7542022367023887408@ghanshyammann.com> <16dff41292e.11b7e81b1177136.7669214833037569841@ghanshyammann.com> <16e19144cf0.f6b07849311271.7773306777497055114@ghanshyammann.com> <20191030004035.rsuegdsij2eezps3@mthode.org> <16e1d9f5df9.e9dfc74911451.4806654031763681992@ghanshyammann.com> <16e848187b9.10540b0f433809.8696342144139236610@ghanshyammann.com> <16e866f0345.b945094345130.7689728518186402120@ghanshyammann.com> <16e87db6527.12abd4d2b50644.2222213273226617419@ghanshyammann.com> <90b63305-a379-93d6-c1ae-693d20c3fe24@suse.com> <16e89989e5f.f81e780780731.5842142892865621719@ghanshyammann.com> Message-ID: <16e8b7d6409.10e2cd0d091410.8889822353143523681@ghanshyammann.com> ---- On Wed, 20 Nov 2019 10:15:10 -0600 Ghanshyam Mann wrote ---- > ---- On Wed, 20 Nov 2019 04:30:06 -0600 Andreas Jaeger wrote ---- > > On 20/11/2019 09.08, Ghanshyam Mann wrote: > > > [...] > > > * barbican- https://review.opendev.org/#/c/695052/ > > > - Some strange behaviour is happening in this, barbican grenade job run stable/train run.yaml always which is stoping this job to move to py3 Devstck patch is merged now and it defaults to py3. below are the pending fix for grenade jobs which will fail now. Request the corresponding project to merge them. - https://review.opendev.org/#/q/topic:drop-py27-support-devstack-default-py3+status:open If any other job starts failing then below are the options to fix them: Option1: migrate the failing py2 jobs to py3 with 'USE_PYTHON3: True in zuulv3 jobs' and 'DEVSTACK_GATE_USE_PYTHON3=True in legacy jobs'. Options2: If the migration to py3 takes time due to any reason then restore those jobs to py2 by explicitly disabled the py3 via above variables. -gmann > > > > AFAIU, the stable jobs have a branch matcher, see > > https://opendev.org/openstack/barbican/src/branch/stable/train/.zuul.yaml#L138 > > - and thus apply. > > > > You need https://review.opendev.org/689458 and friends merged for all > > branches to fix this, > > Thanks Andreas, that was something i was suspecting but could not understand how zuul picked the job inventory > from stable/train every time and not from master where the master is branchless so should be eligible for the master gate. > > One more thing, only run.yaml is taken from stable/train and rest all like pre.yaml etc is from the master branch. This is another > the mystery I would like to learn :). > > -gmann > > > > > Andreas > > -- > > Andreas Jaeger aj at suse.com Twitter: jaegerandi > > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > > (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > > GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 > > > > > > > From rosmaita.fossdev at gmail.com Thu Nov 21 01:55:26 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 20 Nov 2019 20:55:26 -0500 Subject: [cinder] meeting time change poll results Message-ID: <91bc9c9c-8e5a-0e5b-7f32-9e0dffd744de@gmail.com> Here are the results of the meeting time change poll: 16 responses - current time (1600 UTC): 19% unlikely to attend, 25% love it - one hour earlier (1500 UTC): 13% unlikely to attend, 19% love it - two hours earlier (1400 UTC): 0% unlikely to attend, 56% love it Thus, we will reschedule the Cinder weekly meeting to occur each Wednesday at 1400 UTC beginning with the meeting on 4 December 2019. Our current meeting channel is occupied at that time. I've submitted a patch to hold the meeting in #openstack-meeting-4: https://review.opendev.org/695339 I'll send out another email when that patch has been approved and the new meeting location has been verified. cheers, brian From bitskrieg at bitskrieg.net Thu Nov 21 02:10:02 2019 From: bitskrieg at bitskrieg.net (Chris Apsey) Date: Thu, 21 Nov 2019 02:10:02 +0000 Subject: [meta-sig][multi-arch] propose forming a Multi-arch SIG In-Reply-To: References: Message-ID: This will be a long-winded response, so bare with me... I brought up a semi-related topic on the mailing list last year [1], namely that nova should ingest the hw_architecture field that is available in glance images and intelligently schedule it on compute nodes - not only in cases where you have actual hardware, but also using just qemu emulation when no 'real' hardware is available. My area of focus over the past few years hasn't been the traditional enterprise use case (e.g. doing what aws, azure, etc. can do but on a private cloud) but rather on security research. The vast majority of highly vulnerable systems (industrial control, healthcare, transportation, etc.), whose compromise would have globally felt impacts rarely run on traditional x86 platforms and instead use esoteric, long-dead architectures that many times only exist in qemu (if at all). In addition, some pieces of equipment can only be studied and replicated in FPGAs and other similar devices. I think that a long-ignored potential area of strength for OpenStack is the market of automated security research (e.g. fuzzing), particularly when it comes to nontraditional architectures and pieces of equipment. The return on investment to Amazon, Microsoft, Google, and others for making esoteric architectures function with the same level of fidelity and quality that x86 (and in some cases now, arm) function just isn't there. The good part is, security researchers don't care about 95% of the things that major public clouds do for you - they want to load up their binaries, blast them with data, and watch them crash. Traditionally, they do this using small-scale qemu emulation, with all of the labor-intensive requirements behind it (configuring userlands, figuring out how to get qemu networking to work they way they need it to, etc.) They want to spend their time hacking, not configuring servers, and yet that's what they spend a lot of their time doing. When I think about what cloud architectures and concepts have done in terms of revolutionizing how applications are written, deployed, and delivered to customers, I envision a world where those same principles allow security researchers to do more with their limited time. Someone who knows enough about CRIS to reverse engineer firmware on microcontrollers made 20 years ago should not be fighting with server configurations, but they also should not be limited to small-scale research due to a lack of orchestration options. Proprietary solutions for these problems exist, but they are extraordinarily expensive and generally target a very specific thing. I see the future of security research as a time where researchers can upload a binary and a fuzzing test plan into horizon, have their binary loaded into already-configured glance images in the appropriate architecture (to include adding in GPUs for parallel processing, or FPGAs if needed), distribute the fuzzing job to the appropriate number of hosts automatically, and have the results published back to the researcher. I see opportunities for using nova, magnum, sahara, murano, heat, barbican, mistral, cinder, and other projects for being a part of this solution. But.. in order for any of that to be possible, the scope of OpenStack needs to be expanded to support integration and emulation of these weird devices that exist in the world around us. Multi-arch support in nova is the first step in that direction (and it would appear some changes in libvirt are needed too). I'm getting out of the Army and changing jobs soon soon, but my area of focus and passion won't be changing. This stuff matters, and I think that OpenStack could be *the* standard for large-scale security research, if the community wants it to be. My .02... r Chris Apsey [[1] http://lists.openstack.org/pipermail/openstack-operators/2018-August/015653.html](http://lists.openstack.org/pipermail/openstack-operators/2018-August/015653.html) ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Wednesday, November 20, 2019 5:03 AM, Rico Lin wrote: > Dear all > In summit, there's a forum for ARM support [1] in Summit which many people show they're interested in ARM support for OpenStack. > And since also we have Linaro shows interest in donating servers to OpenStack infra. It's time for community to think about what we should deal with those ARM servers once we have them in community infrastructure. > > One thing we should do as a community is to gather people for this topic. So I propose we create a Multi-arch SIG and aim to support ARM architecture as very first step. > I had the idea to call it ARM SIG before, but since there might be high overlap knowledge between support ARM 64 and other architectures. I propose we go for Multi-arch instead. > > This SIG will be a nice place to collect all the documents, gate jobs, and to trace tasks. > > If you're also interested in that group, please reply to this email, introduce yourself and tell us what you would like the group scope and objectives to be, and what you can contribute to the group. > > [1] https://www.openstack.org/summit/shanghai-2019/summit-schedule/events/24355/running-open-infrastucture-on-arm64 > > -- > > May The Force of OpenStack Be With You, > Rico Linirc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tushar.Patil at nttdata.com Thu Nov 21 09:30:02 2019 From: Tushar.Patil at nttdata.com (Patil, Tushar) Date: Thu, 21 Nov 2019 09:30:02 +0000 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: References: <1573200108.23158.4@est.tech> <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> <1573654932.26082.3@est.tech> <1573717497.26082.4@est.tech>, Message-ID: >> For me from the sharing disk provider feature perspective the placement >> aggregate that is needed for the sharing to work, and any kind of nova >> host aggregate (either synced to placement or not) is independent. The >> placement aggregate is a must for the feature. On top of that if the >> operator wants to create a nova host aggregate as well and sync it to >> placement then at the end there will be two, independent placement >> aggregates. One to express the sharing relationship and one to express >> a host aggregate from nova. These two aggregate will not be the same as >> the first one will have the sharing provider in it while the second one >> doesn't. > I tend to agree with the simplicity of this as well. I have updated the specs as per the agreements at Shanghai PTG. I still do see some thorny issues especially the way disk_gb information will be returned in the new micro-version os-hypervisor API. Please check "other end user impact" and "Response of os-hypervisors statistics" sections from the specs. Please review the specs [1] and give your suggestions/feedback. [1] : https://review.opendev.org/#/c/650188/8 Thanks, tpatil ________________________________________ From: Matt Riedemann Sent: Thursday, November 14, 2019 11:02 PM To: openstack-discuss at lists.openstack.org Subject: Re: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship On 11/14/2019 1:45 AM, Balázs Gibizer wrote: > For me from the sharing disk provider feature perspective the placement > aggregate that is needed for the sharing to work, and any kind of nova > host aggregate (either synced to placement or not) is independent. The > placement aggregate is a must for the feature. On top of that if the > operator wants to create a nova host aggregate as well and sync it to > placement then at the end there will be two, independent placement > aggregates. One to express the sharing relationship and one to express > a host aggregate from nova. These two aggregate will not be the same as > the first one will have the sharing provider in it while the second one > doesn't. I tend to agree with the simplicity of this as well. -- Thanks, Matt Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. From smooney at redhat.com Thu Nov 21 12:04:24 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 21 Nov 2019 12:04:24 +0000 Subject: All VMs fail when --max exceeds available resources In-Reply-To: References: <14fd42a9-19d2-3811-80cf-22aca04519c0@gmail.com> <6731d279-70b9-6a5d-886e-5fe9a517a21b@gmail.com> Message-ID: <6cc9b4eab1995ea2dc3f1deabc822c7910308f3b.camel@redhat.com> On Wed, 2019-11-20 at 16:04 -0800, melanie witt wrote: > On 11/20/19 15:16, Albert Braden wrote: > > The expected result (that I was seeing last week) is that, if my cluster has capacity for 4 VMs and I use --max 5, 4 > > will go active and 1 will go to error. This week all 5 are going to error. I can still build 4 VMs of that flavor, > > one at a time, or use --max 4, but if I use --max 5, then all 5 will fail. If I use smaller VMs, the --max numbers > > get bigger but I still see the same symptom. > > The behavior you're describing is an old issue described here: > > https://bugs.launchpad.net/nova/+bug/1458122 > > I don't understand how it's possible that you saw the 4 active 1 in > error behavior last week. The behavior has been "error all" since 2015, > at least. Unless there's some kind of race condition bug happening, > maybe. Did you consistently see it fulfill less than --max last week or > was it just once? for what its worth i have definitely seen the behavior where you use max an only some go active and some go to error i cant recall if its post rocky but i a agree with the bug in that they should only all go to error if the min value was not met. max should not error out. > > Changing the behavior would be an API change so it would need a spec and > new microversion, I think. It's been an undesirable behavior for a long > time but it seemingly hasn't been enough of a pain point for someone to > sign up and do the work. well what would be the api change i previous though that the behavior was some would go active and some would not if that is not the current behaviour it was change without a spec and that is a regressions. i think the behavior might change if the max vaule exceeds the batch size. we group the resues in set of 10? by default if all the vms in a batch go active and latter vms in a different set fail the first vms will remain active. i cant remember which config option contolse that but there is one. its max concurent build or somethign like that. > > -melanie > > > The --max thing is pretty useful and we use it a lot; it allows us to use up the cluster without knowing exactly how > > much space we have. > > > > -----Original Message----- > > From: Matt Riedemann > > Sent: Wednesday, November 20, 2019 2:00 PM > > To: openstack-discuss at lists.openstack.org > > Subject: Re: All VMs fail when --max exceeds available resources > > > > On 11/20/2019 3:21 PM, Albert Braden wrote: > > > I think the document is saying that we need to set them in nova.conf on each HV. I tried that and it seems to fix > > > the allocation failure: > > > > > > root at us01odc-dev1-ctrl1:~# os resource provider inventory list f20fa03d-18f4-486b-9b40-ceaaf52dabf8 > > > +----------------+------------------+----------+----------+-----------+----------+--------+ > > > > resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total | > > > > > > +----------------+------------------+----------+----------+-----------+----------+--------+ > > > > VCPU | 1.0 | 16 | 2 | 1 | 1 | 16 | > > > > MEMORY_MB | 1.0 | 128888 | 8192 | 1 | 1 | 128888 | > > > > DISK_GB | 1.0 | 1208 | 246 | 1 | 1 | 1208 | > > > > > > +----------------+------------------+----------+----------+-----------+----------+--------+ > > > > Yup, the config on the controller doesn't apply to the computes or > > placement because the computes are what report the inventory to > > placement so you have to configure the allocation ratios there, or > > starting in stein via (resource provider) aggregate. > > > > > > > > This fixed the "allocation ratio" issue but I still see the --max issue. What could be causing that? > > > > That's something else yeah? I didn't quite dig into that email and the > > allocation ratio thing popped up to me since it's been a long standing > > known painful issue/behavior change since Ocata. > > > > One question though, I read your original email as essentially "(1) I > > did x and got some failures, then (2) I changed something and now > > everything fails", but are you running from a clean environment in both > > test scenarios because if you have VMs on the computes when you're doing > > (2) then that's going to change the scheduling results in (2), i.e. the > > computes will have less capacity since there are resources allocated on > > them in placement. > > > > From zigo at debian.org Thu Nov 21 12:29:01 2019 From: zigo at debian.org (Thomas Goirand) Date: Thu, 21 Nov 2019 13:29:01 +0100 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> Message-ID: <0b6bd1f3-acd6-8b9c-7e0e-1f7d241a6aa4@debian.org> On 11/19/19 9:55 PM, Lingxian Kong wrote: > I don't think the license change will affect the cloud that only uses > MongoDB as internal service backend storage unless I'm missing > something. What you are probably missing, is that none of the downstream distribution will continue to package MongoDB. That, for sure, will have an effect on what people will use. If an operator decides to still continue to use these back-end, probably it's going to be using outside of the distro packages, which leads to many problems, including: - inferior quality packages. - availability of the repositories (ie: not enough mirror, just the one of upstream). - impossibility to redistribute the packages (for example: on an ISO image, or in a repository), and it may even be forbidden to publish a public repository with it. - probably, folks contributing to config management project will be reluctant to offer compatibility for these back-ends. With my Debian OpenStack package maintainer hat on: I will certainly ignore any backend that would be using MongoDB or InfluxDB, as these cannot be used without non-debian packages. I also will do my best to convince everyone that using non-free software is a bad idea, and that these company who broke the free software license promise cannot be trusted anymore. So definitively, the change of license is problematic in many ways. Cheers, Thomas Goirand (zigo) From kevinzs2048 at gmail.com Thu Nov 21 13:39:01 2019 From: kevinzs2048 at gmail.com (Shuai Zhao) Date: Thu, 21 Nov 2019 21:39:01 +0800 Subject: [meta-sig][multi-arch] propose forming a Multi-arch SIG In-Reply-To: References: Message-ID: Thanks Rico. We(Linaro) will be very honor to help upstream setup the OpenStack CI on Arm64 to make OpenStack functional supporting. Also thanks OpenStack Infra team for the help. Actually we have done Kolla porting and release Kolla images for Arm64 since Rocky version last year, also we have used those images to deploy a Arm64 production class cloud(OpenStack Rocky + Ceph Lumious). It is Linaro Developer Cloud, it is free to offer cloud instance to Upstream and welcome registration, https://www.linaro.cloud/. Again, thanks everyone for the help. We will continue to work for Arm64 supporting with upstream. On Thu, Nov 21, 2019 at 10:14 AM Chris Apsey wrote: > This will be a long-winded response, so bare with me... > > I brought up a semi-related topic on the mailing list last year [1], > namely that nova should ingest the hw_architecture field that is available > in glance images and intelligently schedule it on compute nodes - not only > in cases where you have actual hardware, but also using just qemu emulation > when no 'real' hardware is available. > > My area of focus over the past few years hasn't been the traditional > enterprise use case (e.g. doing what aws, azure, etc. can do but on a > private cloud) but rather on security research. The vast majority of > highly vulnerable systems (industrial control, healthcare, transportation, > etc.), whose compromise would have globally felt impacts rarely run on > traditional x86 platforms and instead use esoteric, long-dead architectures > that many times only exist in qemu (if at all). In addition, some pieces > of equipment can only be studied and replicated in FPGAs and other similar > devices. > > I think that a long-ignored potential area of strength for OpenStack is > the market of automated security research (e.g. fuzzing), particularly when > it comes to nontraditional architectures and pieces of equipment. The > return on investment to Amazon, Microsoft, Google, and others for making > esoteric architectures function with the same level of fidelity and quality > that x86 (and in some cases now, arm) function just isn't there. The good > part is, security researchers don't care about 95% of the things that major > public clouds do for you - they want to load up their binaries, blast them > with data, and watch them crash. Traditionally, they do this using > small-scale qemu emulation, with all of the labor-intensive requirements > behind it (configuring userlands, figuring out how to get qemu networking > to work they way they need it to, etc.) They want to spend their time > hacking, not configuring servers, and yet that's what they spend a lot of > their time doing. > > When I think about what cloud architectures and concepts have done in > terms of revolutionizing how applications are written, deployed, and > delivered to customers, I envision a world where those same principles > allow security researchers to do more with their limited time. Someone who > knows enough about CRIS to reverse engineer firmware on microcontrollers > made 20 years ago should not be fighting with server configurations, but > they also should not be limited to small-scale research due to a lack of > orchestration options. > > Proprietary solutions for these problems exist, but they are > extraordinarily expensive and generally target a very specific thing. > > I see the future of security research as a time where researchers can > upload a binary and a fuzzing test plan into horizon, have their binary > loaded into already-configured glance images in the appropriate > architecture (to include adding in GPUs for parallel processing, or FPGAs > if needed), distribute the fuzzing job to the appropriate number of hosts > automatically, and have the results published back to the researcher. I > see opportunities for using nova, magnum, sahara, murano, heat, barbican, > mistral, cinder, and other projects for being a part of this solution. > > But.. in order for any of that to be possible, the scope of OpenStack > needs to be expanded to support integration and emulation of these weird > devices that exist in the world around us. Multi-arch support in nova is > the first step in that direction (and it would appear some changes in > libvirt are needed too). > > I'm getting out of the Army and changing jobs soon soon, but my area of > focus and passion won't be changing. This stuff matters, and I think that > OpenStack could be *the* standard for large-scale security research, if the > community wants it to be. > > My .02... > > r > > Chris Apsey > > > [1] > http://lists.openstack.org/pipermail/openstack-operators/2018-August/015653.html > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Wednesday, November 20, 2019 5:03 AM, Rico Lin < > rico.lin.guanyu at gmail.com> wrote: > > Dear all > In summit, there's a forum for ARM support [1] in Summit which many people > show they're interested in ARM support for OpenStack. > And since also we have Linaro shows interest in donating servers to > OpenStack infra. It's time for community to think about what we should deal > with those ARM servers once we have them in community infrastructure. > > One thing we should do as a community is to gather people for this topic. > So I propose we create a Multi-arch SIG and aim to support ARM architecture > as very first step. > I had the idea to call it ARM SIG before, but since there might be high > overlap knowledge between support ARM 64 and other architectures. I propose > we go for Multi-arch instead. > > This SIG will be a nice place to collect all the documents, gate jobs, and > to trace tasks. > > If you're also interested in that group, please reply to this email, > introduce yourself and tell us what you would like the group scope and > objectives to be, and what you can contribute to the group. > > > [1] > https://www.openstack.org/summit/shanghai-2019/summit-schedule/events/24355/running-open-infrastucture-on-arm64 > > > -- > May The Force of OpenStack Be With You, > > *Rico Lin*irc: ricolin > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ekuvaja at redhat.com Thu Nov 21 13:46:06 2019 From: ekuvaja at redhat.com (Erno Kuvaja) Date: Thu, 21 Nov 2019 13:46:06 +0000 Subject: [tc][stable] Changing stable branch policy In-Reply-To: References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> <16e89c9969c.11900a9e282934.1452316154561550482@ghanshyammann.com> Message-ID: On Wed, Nov 20, 2019 at 7:50 PM Ben Nemec wrote: > > > On 11/20/19 11:08 AM, Ghanshyam Mann wrote: > > ---- On Wed, 20 Nov 2019 08:21:29 -0600 Matt Riedemann < > mriedemos at gmail.com> wrote ---- > > > On 11/20/2019 1:18 AM, Zane Bitter wrote: > > > > Because the core stable team is necessarily not as familiar with > the > > > > review/backport history of contributors in every project as the > > > > individual project stable team is with contributors in each > project. > > > > > > This is assuming that each project has a stable core team already, > which > > > a lot don't, that's why we get a lot of "hi I'm the PTL du jour on > > > project X now please make me stable core even though I've never > reviewed > > > any stable branch changes before". > > > > > > With Tony more removed these days and I myself not wanting to vet > every > > > one of these "add me to the stable core team" requests, I'm more or > less > > > OK with the proposal so that it removes me as a bottleneck. That > might > > > mean people merge things on stable branches for their projects that > > > don't follow the guidelines but so be it. If it's a problem hopefully > > > they'll hear about it from their consumers, but if the project is in > > > such maintenance mode anyway that they can break the stable > guidelines, > > > then they might not have many external consumers to complain anyway. > > > Either way I don't need to be involved. > > > > This is the main problem going to be and I am worried about it. We had > the great stable policies > > with a dedicated team maintaining it well. 'great' and 'well' I am > writing from the user perspective > > where they get applicable backport which does not break their > within-in-release upgrade. > > If backport is delayed it is fine for them as compared to breaking > backport. > > Is it? Important backports being delayed due to lack of stable reviewers > still leaves consumers with a broken piece of software. > > I also think this is a bit of a false dichotomy. Nobody is suggesting > that we start approving backports willy-nilly, just that we lower the > barrier to entry for people who want to help with stable branch reviews. > I would hope we can trust our teams to be responsible with their > stable-maint list and not just start handing out +2 to anyone with a > pulse. If not, I think we have a bigger problem, but let's cross that > bridge if and when we come to it. > > > > > > > OpenStack has a very well known problem of inconsistent APIs. > Inconsistency is from > > usage, interop, debug perspective. All projects define its own > definition of APIs (new API, > > changes, discoverability etc) which we say it is fine because that > project wants to do that way > > but the user faces the problem. We have not solved this problem yet due > to various reasons. > > > > IMO, this problem could have solved or minimized if we had "Single > mandatory way to write > > and maintain your API and central team to very that" from *starting* > instead of project > > decide its own way or recommended guidelines only. > > > > The same case is for stable backports, we have a "Single mandatory way > to backport the changes > > and central team to verify that". And due to that, OpenStack is more > stable on the backports business. > > Why we are changing that? > > This does not change the stable policy, and the existence of > project-specific stable-maint teams means there was never a single > central team reviewing all stable backports. Even if there had been, > it's pretty clear that model doesn't work given the staffing constraints > we are facing in almost every area, and which are only going to get > worse after this cycle. > > The proposal removes some inflexible process that currently prevents > contributors from becoming stable maintainers, it does _not_ mean we > stop expecting stable maintainers to understand the stable policy. > > > > > Stable backport is more of maintaining the quality and no-break things > than merging a large amount > > of backports and *fast*. > > > > -gmann > > > > > > > > So sure, +1 from me on the proposal given nova can still do what it's > > > already been doing with a specific stable maint core team ACL in > Gerrit. > > > > > > -- > > > > > > Thanks, > > > > > > Matt > > > > > > > > > > > > Sounds like we're back in a spot where Stable Maintenance team probably shouldn't be a thing then and I think this time around I do agree. Perhaps we should just discontinue the Stable Maintenance group all together (not just as part of stable reviewers and moderators of in project stable maintenance groups) and subject our Stable Branch policies and guidelines directly under governance. Obviously unless we're planning to put the bolt on TCs head as well being unscalable group most don't care about. (We might even get some folks who care about stable branches and releases to be interested about TC.) If nothing else, the good part of this discussion is the message that we have amazing flood of people being interested to join the stable maintenance to overwhelm the current Stable Maint team with requests. So lets remove the clearly toxic vetting process and get these people to work! - jokke_ -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Nov 21 13:57:56 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 21 Nov 2019 13:57:56 +0000 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: <0b6bd1f3-acd6-8b9c-7e0e-1f7d241a6aa4@debian.org> References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> <0b6bd1f3-acd6-8b9c-7e0e-1f7d241a6aa4@debian.org> Message-ID: On Thu, 2019-11-21 at 13:29 +0100, Thomas Goirand wrote: > On 11/19/19 9:55 PM, Lingxian Kong wrote: > > I don't think the license change will affect the cloud that only uses > > MongoDB as internal service backend storage unless I'm missing > > something. > > What you are probably missing, is that none of the downstream > distribution will continue to package MongoDB. That, for sure, will have > an effect on what people will use. > > If an operator decides to still continue to use these back-end, probably > it's going to be using outside of the distro packages, which leads to > many problems, including: > - inferior quality packages. > - availability of the repositories (ie: not enough mirror, just the one > of upstream). > - impossibility to redistribute the packages (for example: on an ISO > image, or in a repository), and it may even be forbidden to publish a > public repository with it. > - probably, folks contributing to config management project will be > reluctant to offer compatibility for these back-ends. > > With my Debian OpenStack package maintainer hat on: I will certainly > ignore any backend that would be using MongoDB or InfluxDB, as these > cannot be used without non-debian packages. influxdb is mit licensed so im not sure why you would not be able to package or redistribute it in debian. https://github.com/influxdata/influxdb/blob/master/LICENSE mongodb is a different story but we shoudl not paint influx with the same brush and it should be a valid alternitive to Gnocchi as it is a time serise database. > I also will do my best to > convince everyone that using non-free software is a bad idea, and that > these company who broke the free software license promise cannot be > trusted anymore. if that is your goal then you should advocate for influxDB then since its under an even more liberial license then gnocchi was. > > So definitively, the change of license is problematic in many ways. > > Cheers, > > Thomas Goirand (zigo) > From rosmaita.fossdev at gmail.com Thu Nov 21 14:13:30 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 21 Nov 2019 09:13:30 -0500 Subject: [cinder] weekly meeting time and location change Message-ID: <227ed6e3-6dc3-6a87-8d11-c44358baf570@gmail.com> Beginning with the 4 December 2019 meeting, the Cinder weekly meeting will be held as follows: Day: Wednesday Time: 1400 UTC Location: #openstack-meeting-4 Agenda: https://wiki.openstack.org/wiki/CinderMeetings Updated ICS file: http://eavesdrop.openstack.org/calendars/cinder-team-meeting.ics Note that the *time* and *IRC chat room* have changed. The day, agenda, meeting log locations, etc., remain the same. Thanks to Liang Fang for initiating this change, which will bring our meeting into a more reasonable time frame for contributors in Asia time zones. cheers, brian From mriedemos at gmail.com Thu Nov 21 14:45:58 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 21 Nov 2019 08:45:58 -0600 Subject: All VMs fail when --max exceeds available resources In-Reply-To: <6cc9b4eab1995ea2dc3f1deabc822c7910308f3b.camel@redhat.com> References: <14fd42a9-19d2-3811-80cf-22aca04519c0@gmail.com> <6731d279-70b9-6a5d-886e-5fe9a517a21b@gmail.com> <6cc9b4eab1995ea2dc3f1deabc822c7910308f3b.camel@redhat.com> Message-ID: <3acd61d6-62c3-2f3d-0a14-ad2f5ef3b591@gmail.com> On 11/21/2019 6:04 AM, Sean Mooney wrote: > i think the behavior might change if the max vaule exceeds the batch size. we group the resues in set of 10? by default > if all the vms in a batch go active and latter vms in a different set fail the first vms will remain active. > i cant remember which config option contolse that but there is one. its max concurent build or somethign like that. That batch size option is per-compute. For what Albert was hitting it failed with NoValidHost in the scheduler so the compute isn't involved. What you're describing is likely legacy behavior where the scheduler said, "yup sure putting 20 instances on a few computes is probably OK" and then they raced to do the RT claim on the compute and failed late and went to ERROR while some went ACTIVE. That window was closed for vcpu/ram/disk claims in Pike when the scheduler started using placement to create atomic resource allocation claims. So if someone can reproduce this issue with --max and some go active while some go error in the same request post-pike I'd be surprised. Doing that in *concurrent* requests I could understand since the scheduler could be a bit split brain there but placement still would not be. -- Thanks, Matt From openstack at nemebean.com Thu Nov 21 14:51:33 2019 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 21 Nov 2019 08:51:33 -0600 Subject: [tc][stable] Changing stable branch policy In-Reply-To: References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> <16e89c9969c.11900a9e282934.1452316154561550482@ghanshyammann.com> Message-ID: On 11/21/19 7:46 AM, Erno Kuvaja wrote: > On Wed, Nov 20, 2019 at 7:50 PM Ben Nemec > wrote: > > > > On 11/20/19 11:08 AM, Ghanshyam Mann wrote: > >   ---- On Wed, 20 Nov 2019 08:21:29 -0600 Matt Riedemann > > wrote ---- > >   > On 11/20/2019 1:18 AM, Zane Bitter wrote: > >   > > Because the core stable team is necessarily not as familiar > with the > >   > > review/backport history of contributors in every project as the > >   > > individual project stable team is with contributors in each > project. > >   > > >   > This is assuming that each project has a stable core team > already, which > >   > a lot don't, that's why we get a lot of "hi I'm the PTL du > jour on > >   > project X now please make me stable core even though I've > never reviewed > >   > any stable branch changes before". > >   > > >   > With Tony more removed these days and I myself not wanting to > vet every > >   > one of these "add me to the stable core team" requests, I'm > more or less > >   > OK with the proposal so that it removes me as a bottleneck. > That might > >   > mean people merge things on stable branches for their > projects that > >   > don't follow the guidelines but so be it. If it's a problem > hopefully > >   > they'll hear about it from their consumers, but if the > project is in > >   > such maintenance mode anyway that they can break the stable > guidelines, > >   > then they might not have many external consumers to complain > anyway. > >   > Either way I don't need to be involved. > > > > This is the main problem going to be and I am worried about it. > We had the great stable policies > > with a dedicated team maintaining it well. 'great' and 'well' I > am writing from the user perspective > > where they get applicable backport which does not break their > within-in-release upgrade. > > If backport is delayed it is fine for them as compared to > breaking backport. > > Is it? Important backports being delayed due to lack of stable > reviewers > still leaves consumers with a broken piece of software. > > I also think this is a bit of a false dichotomy. Nobody is suggesting > that we start approving backports willy-nilly, just that we lower the > barrier to entry for people who want to help with stable branch > reviews. > I would hope we can trust our teams to be responsible with their > stable-maint list and not just start handing out +2 to anyone with a > pulse. If not, I think we have a bigger problem, but let's cross that > bridge if and when we come to it. > > > > > > > OpenStack has a very well known problem of inconsistent APIs. > Inconsistency is from > > usage, interop, debug perspective. All projects define its own > definition of APIs (new API, > > changes, discoverability etc) which we say it is fine because > that project wants to do that way > > but the user faces the problem.  We have not solved this problem > yet due to various reasons. > > > > IMO, this problem could have solved or minimized if we had > "Single mandatory way to write > > and maintain your API and central team to very that" from > *starting* instead of project > > decide its own way or recommended guidelines only. > > > > The same case is for stable backports, we have a "Single > mandatory way to backport the changes > > and central team to verify that". And due to that, OpenStack is > more stable on the backports business. > > Why we are changing that? > > This does not change the stable policy, and the existence of > project-specific stable-maint teams means there was never a single > central team reviewing all stable backports. Even if there had been, > it's pretty clear that model doesn't work given the staffing > constraints > we are facing in almost every area, and which are only going to get > worse after this cycle. > > The proposal removes some inflexible process that currently prevents > contributors from becoming stable maintainers, it does _not_ mean we > stop expecting stable maintainers to understand the stable policy. > > > > > Stable backport is more of maintaining the quality and no-break > things than merging a large amount > > of backports and *fast*. > > > > -gmann > > > >   > > >   > So sure, +1 from me on the proposal given nova can still do > what it's > >   > already been doing with a specific stable maint core team ACL > in Gerrit. > >   > > >   > -- > >   > > >   > Thanks, > >   > > >   > Matt > >   > > >   > > > > > > > > Sounds like we're back in a spot where Stable Maintenance team probably > shouldn't be a thing then and I think this time around I do agree. > > Perhaps we should just discontinue the Stable Maintenance group all > together (not just as part of stable reviewers and moderators of in > project stable maintenance groups) and subject our Stable Branch > policies and guidelines directly under governance. Obviously unless > we're planning to put the bolt on TCs head as well being unscalable > group most don't care about. (We might even get some folks who care > about stable branches and releases to be interested about TC.) I don't have a strong opinion about this, other than that it's important to have someone people can go to for questions about stable backports. If that's a stable team or some other entity is irrelevant to me, but unusual situations do come up and the stable team was extremely helpful with the one I asked about recently[0]. 0: http://lists.openstack.org/pipermail/openstack-discuss/2019-October/009984.html > > If nothing else, the good part of this discussion is the message that we > have amazing flood of people being interested to join the stable > maintenance to overwhelm the current Stable Maint team with requests. So > lets remove the clearly toxic vetting process and get these people to work! > > - jokke_ From fungi at yuggoth.org Thu Nov 21 15:19:52 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 21 Nov 2019 15:19:52 +0000 Subject: [tc][stable] Changing stable branch policy In-Reply-To: References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> <16e89c9969c.11900a9e282934.1452316154561550482@ghanshyammann.com> Message-ID: <20191121145450.cpxizqdcgyusneqc@yuggoth.org> On 2019-11-21 13:46:06 +0000 (+0000), Erno Kuvaja wrote: [...] > Sounds like we're back in a spot where Stable Maintenance team > probably shouldn't be a thing then and I think this time around I > do agree. [...] Conveniently, it hasn't been a thing for 1.5 years now, ever since https://review.openstack.org/584206 merged to officially disband and replace it with a SIG. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Thu Nov 21 15:29:01 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 21 Nov 2019 15:29:01 +0000 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> <0b6bd1f3-acd6-8b9c-7e0e-1f7d241a6aa4@debian.org> Message-ID: <20191121152901.xzyv6o6q6jstc36u@yuggoth.org> On 2019-11-21 13:57:56 +0000 (+0000), Sean Mooney wrote: > On Thu, 2019-11-21 at 13:29 +0100, Thomas Goirand wrote: [...] > > With my Debian OpenStack package maintainer hat on: I will > > certainly ignore any backend that would be using MongoDB or > > InfluxDB, as these cannot be used without non-debian packages. > > influxdb is mit licensed so im not sure why you would not be able > to package or redistribute it in debian. [...] And indeed, it's been packaged in Debian for years, and still is: https://packages.debian.org/influxdb The main concern about it is the open-core development model where "advanced" features like stability and redundancy require you to purchase their proprietary enterprise version instead of the less full-featured (but freely-licensed) community version: https://www.influxdata.com/blog/update-on-influxdb-clustering-high-availability-and-monetization/ -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From witold.bedyk at suse.com Thu Nov 21 16:07:29 2019 From: witold.bedyk at suse.com (Witek Bedyk) Date: Thu, 21 Nov 2019 17:07:29 +0100 Subject: [monasca] New team meeting time poll In-Reply-To: References: Message-ID: Hello team, the new meeting time is: Tuesdays, 13.00 UTC as usual in #openstack-monasca @ freenode See you there Witek http://eavesdrop.openstack.org/#Monasca_Team_Meeting On 11/14/19 3:03 PM, Witek Bedyk wrote: > Hello everyone, > > We would like to find the new time slot for the Monasca Team Meeting > which suites you best. Please fill in the times which work for you in > that poll [1] until next Wednesday. > > Thanks > Witek > > > [1] https://doodle.com/poll/ey6brvmbsubkxpp9 > From zigo at debian.org Thu Nov 21 16:25:46 2019 From: zigo at debian.org (Thomas Goirand) Date: Thu, 21 Nov 2019 17:25:46 +0100 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> <0b6bd1f3-acd6-8b9c-7e0e-1f7d241a6aa4@debian.org> Message-ID: On 11/21/19 2:57 PM, Sean Mooney wrote: > On Thu, 2019-11-21 at 13:29 +0100, Thomas Goirand wrote: >> On 11/19/19 9:55 PM, Lingxian Kong wrote: >>> I don't think the license change will affect the cloud that only uses >>> MongoDB as internal service backend storage unless I'm missing >>> something. >> >> What you are probably missing, is that none of the downstream >> distribution will continue to package MongoDB. That, for sure, will have >> an effect on what people will use. >> >> If an operator decides to still continue to use these back-end, probably >> it's going to be using outside of the distro packages, which leads to >> many problems, including: >> - inferior quality packages. >> - availability of the repositories (ie: not enough mirror, just the one >> of upstream). >> - impossibility to redistribute the packages (for example: on an ISO >> image, or in a repository), and it may even be forbidden to publish a >> public repository with it. >> - probably, folks contributing to config management project will be >> reluctant to offer compatibility for these back-ends. >> >> With my Debian OpenStack package maintainer hat on: I will certainly >> ignore any backend that would be using MongoDB or InfluxDB, as these >> cannot be used without non-debian packages. > influxdb is mit licensed so im not sure why you would not be able to package > or redistribute it in debian. > https://github.com/influxdata/influxdb/blob/master/LICENSE > mongodb is a different story but we shoudl not paint influx with the same brush > and it should be a valid alternitive to Gnocchi as it is a time serise database. Unless I'm mistaking, that's the non-clusterable version only. Cheers, Thomas Goirand (zigo) From marcin.juszkiewicz at linaro.org Thu Nov 21 16:31:41 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Thu, 21 Nov 2019 17:31:41 +0100 Subject: [meta-sig][multi-arch] propose forming a Multi-arch SIG In-Reply-To: References: Message-ID: W dniu 20.11.2019 o 11:03, Rico Lin pisze: > If you're also interested in that group, please reply to this email, > introduce yourself and tell us what you would like the group scope and > objectives to be, and what you can contribute to the group. I work on OpenStack for a while as Red Hat assignee to Linaro. Most of time I spent in Kolla projects (core dev for over years). Had fingers in DIB, Loci, Nova and some drive-by patching in other projects. I work on AArch64 (arm64) since 2012. Added AArch64 and Power (ppc64le) support into Kolla and maintain AArch64 support. 1-2 times per year I also check how we are on Power. Most of my work is around building stuff. From mriedemos at gmail.com Thu Nov 21 16:42:44 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 21 Nov 2019 10:42:44 -0600 Subject: [nova][cinder][ops] question/confirmation of legacy vol attachment migration In-Reply-To: <20191017102419.pa3qqlqgrlp2b7qx@localhost> References: <37e953ee-f3c8-9797-446f-f3e3db9dcad6@gmail.com> <20191010100050.hn546tikeihaho7e@localhost> <20191017102419.pa3qqlqgrlp2b7qx@localhost> Message-ID: On 10/17/2019 5:24 AM, Gorka Eguileor wrote: > I stand by my initial recommendation, being able to update the existing > attachment to add the connection information from Nova. OK, thanks for the input and thoughtfulness on this. I've abandoned my change since I'm not going to be pushing this boulder anymore but left notes in the change in case someone else wants to pick it up some day. Note to nova cores: this means we'll have legacy volume attachment compat code around forever. -- Thanks, Matt From melwittt at gmail.com Thu Nov 21 16:52:29 2019 From: melwittt at gmail.com (melanie witt) Date: Thu, 21 Nov 2019 08:52:29 -0800 Subject: All VMs fail when --max exceeds available resources In-Reply-To: <6cc9b4eab1995ea2dc3f1deabc822c7910308f3b.camel@redhat.com> References: <14fd42a9-19d2-3811-80cf-22aca04519c0@gmail.com> <6731d279-70b9-6a5d-886e-5fe9a517a21b@gmail.com> <6cc9b4eab1995ea2dc3f1deabc822c7910308f3b.camel@redhat.com> Message-ID: On 11/21/19 04:04, Sean Mooney wrote: > On Wed, 2019-11-20 at 16:04 -0800, melanie witt wrote: >> Changing the behavior would be an API change so it would need a spec and >> new microversion, I think. It's been an undesirable behavior for a long >> time but it seemingly hasn't been enough of a pain point for someone to >> sign up and do the work. > well what would be the api change i previous though that the behavior was some would go active and some would not > if that is not the current behaviour it was change without a spec and that is a regressions. If it's a regression, sure. But the bug [1] was opened on 2015-05-22 which was Liberty and I'm not aware that the behavior has ever been different prior to Liberty (save for the parallel requests/race condition case). I don't think it's a regression. That said, if everyone else is cool with changing it without a spec, that's fine with me. Either way, someone would have to spend the time and do the work. -melanie [1] https://bugs.launchpad.net/nova/+bug/1458122 From mriedemos at gmail.com Thu Nov 21 17:08:08 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 21 Nov 2019 11:08:08 -0600 Subject: [nova][gate] Thoughts on working around bug 1853453? Message-ID: I've been noticing these shelve/unshelve guest ssh fail due to dhcp lease issues quite a bit recently and wrote a bug and e-r query for it this morning: http://status.openstack.org/elastic-recheck/#1853453 The problem seems to stem from when these shelve tests run on multinode jobs and we shelve on one host and unshelve on another. I have a patch up to nova to force config drive in the nova-next job where this hits the most: https://review.opendev.org/#/c/695431 But that's just kind of a stab in the dark to take the metadata API out of the picture for cloud-init. If that doesn't help, and we don't know what is causing this or have ideas to debug it, we might need to consider making a change to shelve/unshelve testing in tempest such that we try to unshelve on original host. Now I realize that is unfortunate since the whole point of shelve offloading and unshelving is that you can land on another host and things are good, but if these tests continue to be a high failure rate in multinode jobs we probably need to consider workarounds if no one is going to dig into the failures and figure out what is going wrong. Thoughts? -- Thanks, Matt From thierry at openstack.org Thu Nov 21 17:13:02 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 21 Nov 2019 18:13:02 +0100 Subject: [dev] Upgrading flake8 to support f-strings Message-ID: Hi everyone, TL;DR: is it time for us to update flake8 and break the world? Long version: Ussuri-supported Python runtimes will be python 3.6 and 3.7, which opens the marvelous world of f-strings to us. For those not familiar with f-strings, I recommend reading: https://realpython.com/python-f-strings/ In a recent project[1] I tried to use them, only to get pep8 job failures[2]. The old flake8 version we are using in hacking is exploding trying to parse f-strings, with a cryptic "AttributeError: 'FlakesChecker' object has no attribute 'JOINEDSTR'" error. Of course that was long-fixed in pyflakes (>=1.4.0) but that version is only used starting in flake8>=3.3.0, and hacking is capping flake8<2.7.0. We do cap those aggressively for a reason: bumping that cap triggers pep8 job failures everywhere as flake8 decides to pay attention to slightly different things, and we lose a lot of time collectively chasing down those syntax glitches everywhere. Which is why we haven't really bumped those since 2016... But proper support of f-strings might be a good reason to update? Thoughts? [1] https://review.opendev.org/#/c/695457/1 [2] https://zuul.opendev.org/t/openstack/build/bd97e5397b184176aac94251cc0a9220 -- Thierry Carrez (ttx) From cboylan at sapwetik.org Thu Nov 21 17:14:05 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Thu, 21 Nov 2019 09:14:05 -0800 Subject: [nova][gate] Thoughts on working around bug 1853453? In-Reply-To: References: Message-ID: <672bd7d6-acda-4eb3-8657-99ada940cd55@www.fastmail.com> On Thu, Nov 21, 2019, at 9:08 AM, Matt Riedemann wrote: > I've been noticing these shelve/unshelve guest ssh fail due to dhcp > lease issues quite a bit recently and wrote a bug and e-r query for it > this morning: > > http://status.openstack.org/elastic-recheck/#1853453 > > The problem seems to stem from when these shelve tests run on multinode > jobs and we shelve on one host and unshelve on another. > > I have a patch up to nova to force config drive in the nova-next job > where this hits the most: > > https://review.opendev.org/#/c/695431 > > But that's just kind of a stab in the dark to take the metadata API out > of the picture for cloud-init. > > If that doesn't help, and we don't know what is causing this or have > ideas to debug it, we might need to consider making a change to > shelve/unshelve testing in tempest such that we try to unshelve on > original host. Now I realize that is unfortunate since the whole point > of shelve offloading and unshelving is that you can land on another host > and things are good, but if these tests continue to be a high failure > rate in multinode jobs we probably need to consider workarounds if no > one is going to dig into the failures and figure out what is going wrong. > > Thoughts? I have no evidence for this, but is it possible that the dhcp anti spoofing rules that neutron installs on the firewall prevent the dhcp packets from flowing (like spice!) on the new compute node? I want to say there is a tcpdump "service" we can enable in devstack jobs with a ruleset that we could use to examine this. Basically dump the dhcp traffic and see if it goes through. Clark From fungi at yuggoth.org Thu Nov 21 17:25:28 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 21 Nov 2019 17:25:28 +0000 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: References: Message-ID: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> On 2019-11-21 18:13:02 +0100 (+0100), Thierry Carrez wrote: [...] > We do cap those aggressively for a reason: bumping that cap > triggers pep8 job failures everywhere as flake8 decides to pay > attention to slightly different things, and we lose a lot of time > collectively chasing down those syntax glitches everywhere. > > Which is why we haven't really bumped those since 2016... But > proper support of f-strings might be a good reason to update? [...] Well, the QA team used to increase the recommendation to the latest version of the various checkers in use at the start of every new development cycle. It's apparently been a few cycles now since anyone remembered to do it (or cared). -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From sfinucan at redhat.com Thu Nov 21 17:54:32 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Thu, 21 Nov 2019 17:54:32 +0000 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> Message-ID: <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> On Thu, 2019-11-21 at 17:25 +0000, Jeremy Stanley wrote: > On 2019-11-21 18:13:02 +0100 (+0100), Thierry Carrez wrote: > [...] > > We do cap those aggressively for a reason: bumping that cap > > triggers pep8 job failures everywhere as flake8 decides to pay > > attention to slightly different things, and we lose a lot of time > > collectively chasing down those syntax glitches everywhere. > > > > Which is why we haven't really bumped those since 2016... But > > proper support of f-strings might be a good reason to update? > [...] > > Well, the QA team used to increase the recommendation to the latest > version of the various checkers in use at the start of every new > development cycle. It's apparently been a few cycles now since > anyone remembered to do it (or cared). I remembered and tried to update hacking to use a new version of flake8 some time back [1]. Unfortunately, flake8 3.x is a total rewrite and I haven't found a way to port things across. I've tried at least a number of different approaches, all to no avail. I even went and asked the PyCQA folks for help [2] but unfortunately it looks like what we were doing isn't possible anymore. I'm flat out of ideas on that so someone other than me is going to have to take this migration upon themselves or we're going to have to drop hacking so we can use a new flake8. Stephen [1] https://review.opendev.org/#/q/status:open+project:openstack/hacking+branch:master+topic:bump-flake8 [2] https://gitlab.com/pycqa/flake8/issues/545 From fungi at yuggoth.org Thu Nov 21 18:15:57 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 21 Nov 2019 18:15:57 +0000 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> Message-ID: <20191121181557.apixsfva7vbufgc3@yuggoth.org> On 2019-11-21 17:54:32 +0000 (+0000), Stephen Finucane wrote: [...] > Unfortunately, flake8 3.x is a total rewrite and I haven't found a > way to port things across. [...] > I'm flat out of ideas on that so someone other than me is going to > have to take this migration upon themselves or we're going to have > to drop hacking so we can use a new flake8. [...] Oof, yes I guess it's high time to discuss this (sorry if there was a prior ML thread about it which I missed). So I guess the options I can see are: A. keep running woefully outdated flake8 and friends (isn't working) B. overhaul hacking to work as a file-level analyzer plug-in C. improve flake8 to support string-level analyzer plug-ins D. separate hacking back out so it's no longer a flake8 plug-in E. stop running hacking entirely and rely on other flake8 plug-ins Anything else? For sake of simplicity I'd favor option E. In our present reality where most folks already have far too much work on their respective plates, having one less project to maintain makes some measure of sense. Does hacking currently save teams more than enough effort to balance out the amount of effort involved in keeping it working with newer software? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From mnaser at vexxhost.com Thu Nov 21 18:26:31 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Thu, 21 Nov 2019 13:26:31 -0500 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: <20191121181557.apixsfva7vbufgc3@yuggoth.org> References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> <20191121181557.apixsfva7vbufgc3@yuggoth.org> Message-ID: On Thu, Nov 21, 2019 at 1:20 PM Jeremy Stanley wrote: > > On 2019-11-21 17:54:32 +0000 (+0000), Stephen Finucane wrote: > [...] > > Unfortunately, flake8 3.x is a total rewrite and I haven't found a > > way to port things across. > [...] > > I'm flat out of ideas on that so someone other than me is going to > > have to take this migration upon themselves or we're going to have > > to drop hacking so we can use a new flake8. > [...] > > Oof, yes I guess it's high time to discuss this (sorry if there was > a prior ML thread about it which I missed). So I guess the options > I can see are: > > A. keep running woefully outdated flake8 and friends (isn't working) > > B. overhaul hacking to work as a file-level analyzer plug-in > > C. improve flake8 to support string-level analyzer plug-ins > > D. separate hacking back out so it's no longer a flake8 plug-in > > E. stop running hacking entirely and rely on other flake8 plug-ins While I don't have all the context to the work required, that does seem like that's the best option long term IMHO. > Anything else? For sake of simplicity I'd favor option E. In our > present reality where most folks already have far too much work on > their respective plates, having one less project to maintain makes > some measure of sense. Does hacking currently save teams more than > enough effort to balance out the amount of effort involved in > keeping it working with newer software? > -- > Jeremy Stanley From stig.openstack at telfer.org Thu Nov 21 18:31:54 2019 From: stig.openstack at telfer.org (Stig Telfer) Date: Thu, 21 Nov 2019 18:31:54 +0000 Subject: [neutron][scientific-sig] SIG help with Linuxbridge ML2 maintenance? Message-ID: Hi all - Following this discussion [1] around the Linuxbridge ML2 driver, I’m aware that a number of members of the Scientific SIG use this driver and appreciate its performance and simplicity. Would anyone from the Neutron project involved in this issue be interested in joining a Scientific SIG meeting to discuss how SIG members can help with keeping this driver maintained? Our next meeting is Tuesday 26th November at 2100 UTC. If that’s possible, please let me know and we’ll put it on the agenda. Many thanks, Stig [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010761.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From Albert.Braden at synopsys.com Thu Nov 21 18:32:06 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Thu, 21 Nov 2019 18:32:06 +0000 Subject: All VMs fail when --max exceeds available resources In-Reply-To: References: <14fd42a9-19d2-3811-80cf-22aca04519c0@gmail.com> <6731d279-70b9-6a5d-886e-5fe9a517a21b@gmail.com> Message-ID: My co-worker and I both thought that we had seen the 4/5-active behavior last week, but now we can't duplicate it. So maybe we were confused. I think that is a standard condition among OpenStack operators! -----Original Message----- From: melanie witt Sent: Wednesday, November 20, 2019 4:05 PM To: openstack-discuss at lists.openstack.org Subject: Re: All VMs fail when --max exceeds available resources On 11/20/19 15:16, Albert Braden wrote: > The expected result (that I was seeing last week) is that, if my cluster has capacity for 4 VMs and I use --max 5, 4 will go active and 1 will go to error. This week all 5 are going to error. I can still build 4 VMs of that flavor, one at a time, or use --max 4, but if I use --max 5, then all 5 will fail. If I use smaller VMs, the --max numbers get bigger but I still see the same symptom. The behavior you're describing is an old issue described here: https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_nova_-2Bbug_1458122&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=TSs8buOdmVE0QnkzlNF2pR_2osW1wdg5PBtapOOVFXs&s=nn72IvZ5lQjOkSVX0aMux32HxcBBjxrJ15SCvWOYfts&e= I don't understand how it's possible that you saw the 4 active 1 in error behavior last week. The behavior has been "error all" since 2015, at least. Unless there's some kind of race condition bug happening, maybe. Did you consistently see it fulfill less than --max last week or was it just once? Changing the behavior would be an API change so it would need a spec and new microversion, I think. It's been an undesirable behavior for a long time but it seemingly hasn't been enough of a pain point for someone to sign up and do the work. -melanie > The --max thing is pretty useful and we use it a lot; it allows us to use up the cluster without knowing exactly how much space we have. > > -----Original Message----- > From: Matt Riedemann > Sent: Wednesday, November 20, 2019 2:00 PM > To: openstack-discuss at lists.openstack.org > Subject: Re: All VMs fail when --max exceeds available resources > > On 11/20/2019 3:21 PM, Albert Braden wrote: >> I think the document is saying that we need to set them in nova.conf on each HV. I tried that and it seems to fix the allocation failure: >> >> root at us01odc-dev1-ctrl1:~# os resource provider inventory list f20fa03d-18f4-486b-9b40-ceaaf52dabf8 >> +----------------+------------------+----------+----------+-----------+----------+--------+ >> | resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total | >> +----------------+------------------+----------+----------+-----------+----------+--------+ >> | VCPU | 1.0 | 16 | 2 | 1 | 1 | 16 | >> | MEMORY_MB | 1.0 | 128888 | 8192 | 1 | 1 | 128888 | >> | DISK_GB | 1.0 | 1208 | 246 | 1 | 1 | 1208 | >> +----------------+------------------+----------+----------+-----------+----------+--------+ > > Yup, the config on the controller doesn't apply to the computes or > placement because the computes are what report the inventory to > placement so you have to configure the allocation ratios there, or > starting in stein via (resource provider) aggregate. > >> >> This fixed the "allocation ratio" issue but I still see the --max issue. What could be causing that? > > That's something else yeah? I didn't quite dig into that email and the > allocation ratio thing popped up to me since it's been a long standing > known painful issue/behavior change since Ocata. > > One question though, I read your original email as essentially "(1) I > did x and got some failures, then (2) I changed something and now > everything fails", but are you running from a clean environment in both > test scenarios because if you have VMs on the computes when you're doing > (2) then that's going to change the scheduling results in (2), i.e. the > computes will have less capacity since there are resources allocated on > them in placement. > From openstack at nemebean.com Thu Nov 21 19:43:01 2019 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 21 Nov 2019 13:43:01 -0600 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: <20191121181557.apixsfva7vbufgc3@yuggoth.org> References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> <20191121181557.apixsfva7vbufgc3@yuggoth.org> Message-ID: On 11/21/19 12:15 PM, Jeremy Stanley wrote: > On 2019-11-21 17:54:32 +0000 (+0000), Stephen Finucane wrote: > [...] >> Unfortunately, flake8 3.x is a total rewrite and I haven't found a >> way to port things across. > [...] >> I'm flat out of ideas on that so someone other than me is going to >> have to take this migration upon themselves or we're going to have >> to drop hacking so we can use a new flake8. > [...] > > Oof, yes I guess it's high time to discuss this (sorry if there was > a prior ML thread about it which I missed). So I guess the options > I can see are: > > A. keep running woefully outdated flake8 and friends (isn't working) > > B. overhaul hacking to work as a file-level analyzer plug-in > > C. improve flake8 to support string-level analyzer plug-ins > > D. separate hacking back out so it's no longer a flake8 plug-in > > E. stop running hacking entirely and rely on other flake8 plug-ins > > Anything else? For sake of simplicity I'd favor option E. In our > present reality where most folks already have far too much work on > their respective plates, having one less project to maintain makes > some measure of sense. Does hacking currently save teams more than > enough effort to balance out the amount of effort involved in > keeping it working with newer software? > That would be unfortunate since I know some teams have extensive custom hacking rules to help out their reviewers[0]. That said, I'm not signing up to figure out how to make hacking work with modern flake8 and if the project is broken with no one to fix it then it's all academic. :-/ 0: https://github.com/openstack/nova/blob/master/nova/hacking/checks.py From Tim.Bell at cern.ch Thu Nov 21 19:48:37 2019 From: Tim.Bell at cern.ch (Tim Bell) Date: Thu, 21 Nov 2019 19:48:37 +0000 Subject: [neutron][scientific-sig] SIG help with Linuxbridge ML2 maintenance? In-Reply-To: References: Message-ID: <025F4573-A5AB-4090-AE85-2A1EC627EAEF@cern.ch> Stig, Thanks for raising the point. This has raised the question of functional equivalence where one driver has less functional than others. From our experience at CERN, this is not an issue as long as the user community is not demanding the full functional equivalence. Some drivers are chosen by deployments because they solve the requirements for that cloud rather than because they cover everything. In fact, a focussed minimum subset functionality may be very attractive if that’s all a deployment needs, has a lower skills and operations bar and is sustainable. Sustainability should not be assessed by activity. No reason to fix/improve something that works as long as it keeps up with the target functionality. Tim On 21 Nov 2019, at 19:31, Stig Telfer > wrote: Hi all - Following this discussion [1] around the Linuxbridge ML2 driver, I’m aware that a number of members of the Scientific SIG use this driver and appreciate its performance and simplicity. Would anyone from the Neutron project involved in this issue be interested in joining a Scientific SIG meeting to discuss how SIG members can help with keeping this driver maintained? Our next meeting is Tuesday 26th November at 2100 UTC. If that’s possible, please let me know and we’ll put it on the agenda. Many thanks, Stig [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010761.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Nov 21 20:16:20 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 21 Nov 2019 14:16:20 -0600 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: References: Message-ID: On 11/21/2019 11:13 AM, Thierry Carrez wrote: > which opens the marvelous world of f-strings to us Get off my lawn. /me shakes fist -- Thanks, Matt From fungi at yuggoth.org Thu Nov 21 20:53:44 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 21 Nov 2019 20:53:44 +0000 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> <20191121181557.apixsfva7vbufgc3@yuggoth.org> Message-ID: <20191121205344.7yfe4zi7e6pfgmbr@yuggoth.org> On 2019-11-21 13:43:01 -0600 (-0600), Ben Nemec wrote: [...] > I know some teams have extensive custom hacking rules to help out > their reviewers[0]. That said, I'm not signing up to figure out > how to make hacking work with modern flake8 and if the project is > broken with no one to fix it then it's all academic. :-/ > > 0: https://github.com/openstack/nova/blob/master/nova/hacking/checks.py Yeah, my line of thinking there was, "do the teams who rely heavily on hacking checks have the bandwidth to maintain the tool (or at least solve the current problem)?" -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From nate.johnston at redhat.com Thu Nov 21 20:53:56 2019 From: nate.johnston at redhat.com (Nate Johnston) Date: Thu, 21 Nov 2019 15:53:56 -0500 Subject: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: <20191121152901.xzyv6o6q6jstc36u@yuggoth.org> References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> <0b6bd1f3-acd6-8b9c-7e0e-1f7d241a6aa4@debian.org> <20191121152901.xzyv6o6q6jstc36u@yuggoth.org> Message-ID: <20191121205356.fdwb6oaeoppmk6up@firewall> On Thu, Nov 21, 2019 at 03:29:01PM +0000, Jeremy Stanley wrote: > On 2019-11-21 13:57:56 +0000 (+0000), Sean Mooney wrote: > > On Thu, 2019-11-21 at 13:29 +0100, Thomas Goirand wrote: > [...] > > > With my Debian OpenStack package maintainer hat on: I will > > > certainly ignore any backend that would be using MongoDB or > > > InfluxDB, as these cannot be used without non-debian packages. > > > > influxdb is mit licensed so im not sure why you would not be able > > to package or redistribute it in debian. > [...] > > And indeed, it's been packaged in Debian for years, and still is: > > https://packages.debian.org/influxdb > > The main concern about it is the open-core development model where > "advanced" features like stability and redundancy require you to > purchase their proprietary enterprise version instead of the less > full-featured (but freely-licensed) community version: > > https://www.influxdata.com/blog/update-on-influxdb-clustering-high-availability-and-monetization/ I know several very large sites (ingesting billions of records per day) that run community InfluxDB and they get HA by putting influx-proxy [1] in front of it. I've evaluated it for large scale uses before as well, and with influx-proxy I found no need for the clustering option. Nate [1] https://github.com/shell909090/influx-proxy From melwittt at gmail.com Thu Nov 21 22:30:54 2019 From: melwittt at gmail.com (melanie witt) Date: Thu, 21 Nov 2019 14:30:54 -0800 Subject: [nova][ops] heads up: change in behavior for resize and migrate server action APIs Message-ID: <3053c52d-c63c-277a-ee44-5816c99bb078@gmail.com> Hey all, Just wanted to give everyone a heads up about a recent change [1] in behavior for the resize [2] and migrate [3] server action APIs: https://docs.openstack.org/releasenotes/nova/unreleased.html#other-notes The scheduling step for resize and migrate has been changed from synchronous to asynchronous, so users will no longer receive a 400 error for NoValidHost. To discover whether the resize or migrate has failed during scheduling, users must use the server actions API [4], which is called "server events" in the openstackclient [5]. There is a WIP proposal [6] to add exception messages to events in the server actions API: https://review.opendev.org/694428 in order to communicate additional failure info to non-admin users. I think we'll want this as the server actions API becomes more of a source of information for the end user. Feel free reply here or visit the review to add your thoughts. Cheers, -melanie [1] https://review.opendev.org/693937 [2] https://docs.openstack.org/api-ref/compute/#resize-server-resize-action [3] https://docs.openstack.org/api-ref/compute/#migrate-server-migrate-action [4] https://docs.openstack.org/api-ref/compute/#servers-actions-servers-os-instance-actions [5] https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server-event.html [6] background: http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010775.html From smooney at redhat.com Thu Nov 21 23:06:34 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 21 Nov 2019 23:06:34 +0000 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> <20191121181557.apixsfva7vbufgc3@yuggoth.org> Message-ID: On Thu, 2019-11-21 at 13:26 -0500, Mohammed Naser wrote: > On Thu, Nov 21, 2019 at 1:20 PM Jeremy Stanley wrote: > > > > On 2019-11-21 17:54:32 +0000 (+0000), Stephen Finucane wrote: > > [...] > > > Unfortunately, flake8 3.x is a total rewrite and I haven't found a > > > way to port things across. > > > > [...] > > > I'm flat out of ideas on that so someone other than me is going to > > > have to take this migration upon themselves or we're going to have > > > to drop hacking so we can use a new flake8. > > > > [...] > > > > Oof, yes I guess it's high time to discuss this (sorry if there was > > a prior ML thread about it which I missed). So I guess the options > > I can see are: > > > > A. keep running woefully outdated flake8 and friends (isn't working) > > > > B. overhaul hacking to work as a file-level analyzer plug-in > > > > C. improve flake8 to support string-level analyzer plug-ins > > > > D. separate hacking back out so it's no longer a flake8 plug-in > > > > E. stop running hacking entirely and rely on other flake8 plug-ins > > While I don't have all the context to the work required, that does seem > like that's the best option long term IMHO. i would prefer E as well. if we do need something like a hacking test that enforces no alias of privsep function for example the a plugin could be written for that 1 thing but in general i think we would be better off adopting exsitsing plugins our just using flake8 directly without plugins. i know we have some duplicate work checkers and other custom hacking test but i dont know if i have ever been hit by a hacking failure. i have pep8 issues all the time but never checks added by hacking. > > > Anything else? For sake of simplicity I'd favor option E. In our > > present reality where most folks already have far too much work on > > their respective plates, having one less project to maintain makes > > some measure of sense. Does hacking currently save teams more than > > enough effort to balance out the amount of effort involved in > > keeping it working with newer software? > > -- > > Jeremy Stanley > > From smooney at redhat.com Thu Nov 21 23:12:27 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 21 Nov 2019 23:12:27 +0000 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> <20191121181557.apixsfva7vbufgc3@yuggoth.org> Message-ID: On Thu, 2019-11-21 at 13:43 -0600, Ben Nemec wrote: > > On 11/21/19 12:15 PM, Jeremy Stanley wrote: > > On 2019-11-21 17:54:32 +0000 (+0000), Stephen Finucane wrote: > > [...] > > > Unfortunately, flake8 3.x is a total rewrite and I haven't found a > > > way to port things across. > > > > [...] > > > I'm flat out of ideas on that so someone other than me is going to > > > have to take this migration upon themselves or we're going to have > > > to drop hacking so we can use a new flake8. > > > > [...] > > > > Oof, yes I guess it's high time to discuss this (sorry if there was > > a prior ML thread about it which I missed). So I guess the options > > I can see are: > > > > A. keep running woefully outdated flake8 and friends (isn't working) > > > > B. overhaul hacking to work as a file-level analyzer plug-in > > > > C. improve flake8 to support string-level analyzer plug-ins > > > > D. separate hacking back out so it's no longer a flake8 plug-in > > > > E. stop running hacking entirely and rely on other flake8 plug-ins > > > > Anything else? For sake of simplicity I'd favor option E. In our > > present reality where most folks already have far too much work on > > their respective plates, having one less project to maintain makes > > some measure of sense. Does hacking currently save teams more than > > enough effort to balance out the amount of effort involved in > > keeping it working with newer software? > > > > That would be unfortunate since I know some teams have extensive custom > hacking rules to help out their reviewers[0]. That said, I'm not signing > up to figure out how to make hacking work with modern flake8 and if the > project is broken with no one to fix it then it's all academic. :-/ > > 0: https://github.com/openstack/nova/blob/master/nova/hacking/checks.py nova only has a couple but it might be intersting to convert those to precommit scripts. looking through them some of them do seam useful although other are just python 2 vs python 3 guidline that i hope will be less relevant now. > From dangtrinhnt at gmail.com Thu Nov 21 23:20:29 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Fri, 22 Nov 2019 08:20:29 +0900 Subject: [auto-scaling][self-healing] Discussion to merge two SIG to one In-Reply-To: References: Message-ID: Hi Rico, +1 That is a very good idea. Coincidentally, I'm working on some research projects that focus on autoscaling and self-healing at the same time. And the combined group would be a very good idea because I don't have to switch back and forth between the groups for discussion. Thanks, On Thu, Nov 21, 2019 at 12:57 AM Rico Lin wrote: > Dear all > > As we discussed in PTG about merge two SIG to one. > I would like to continue the discussion on ML. > > In PTG, Eric proposes the idea to merge two SIG due to the high > overlapping of domains and tasks. > > I think this is a great idea since, over the last 6 months, most of the > discussions in both SIG are overlapped. So I'm onboard with this idea. > > Here's how I think we can continue this idea: > > 1. Create new SIG (maybe 'Automation SIG'? feel free to propose name > which can cover both interest.) > 2. Redirect docs and wiki to new SIG. And rework on index so there > will be no confusion > 3. Move repos from both SIGs to new SIG > 4. Mark auto-scaling SIG and self-healing SIG as inactive. > 5. remove auto-scaling SIG and self-healing SIG after a > reasonable waiting time > > > Let us know what you think about this. Otherwise, we definitely expect > this to happen soon. > > > > -- > May The Force of OpenStack Be With You, > > *Rico Lin*irc: ricolin > > -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Thu Nov 21 23:28:12 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 21 Nov 2019 17:28:12 -0600 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set Message-ID: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> Hi all. Sean mentioned a "downstream bug" in IRC today [1], which we discussed a little bit, but without gibi; and then Matt and I discussed it more later [2], but without gibi *or* Sean. Since I don't know if there's a bug report I can comment on, I wanted to summarize here for now so I don't forget. The Problem =========== Neutron needs to know the compute node resource provider on which to hang the child providers for QoS bandwidth. Today it assumes CONF.host is the name of that provider. That's wrong. The name of the provider is `hypervisor_hostname`, which for libvirt [3] happens to match the *default* value of CONF.host [4]. Per the bug Sean describes, if you override CONF.host, neutron won't find the compute node provider, and things break. The problem will be the same for any non-nova wishing to discover the compute node RP -- e.g. cyborg for purposes of creating child providers for accelerators. The Right Solution ================== Neutron (and any $service) should look up the compute node provider by its UUID. That's returned by the /os-hypervisors APIs after microversion 2.53, e.g. [5], but, Catch-22, you can currently only filter those results on hypervisor_hostname. So you would have to e.g. GET /os-hypervisors/detail and then walk the list looking for service.host matching CONF.host. That's way heavy for your CERNs. So the proposal moving forward is to add (in a new microversion) a ?service_host=XXX qparam to those APIs to let you filter down to just the one entry for your CONF.host. The UUID of that entry will also be the UUID of the compute node resource provider. (At that point you don't even need to ask Placement for that provider; you can just use that UUID directly in the APIs that create the child providers. Yay, you got your extra API call back.) Now, that's not backportable, and this problem exists in stable releases (at least those that support QoS bandwidth). So we should totally do it, but we also need... The Backportable Solution ========================= Neutron should use `gethostname()` rather than CONF.host to discover the compute node resource provider. I don't consider this a viable permanent solution because it is tightly coupled to knowing that hypervisor_hostname == `gethostname()`, which happens to be true for libvirt, but not necessarily for other drivers. We can get away with it for stable because we happen to know that we're only supporting QoS bandwidth via Placement for libvirt. Upgrade Concerns ================ Matt and I didn't nail down whether neutron and compute are allowed to be at different versions on a given host, or what those are allowed to be. But things should be sane if neutron (or any $service) logics like this in >=ussuri: if new_nova_microversion_available: do_the_os_hypervisors_thing() elif using_new_non_libvirt_feature: raise YouCantDoThisWithOldNova() else: do_the_gethostname_thing() Action Summary ============== If the above sounds reasonable, it would entail the following actions: - Neutron(/Cyborg?): backportable patch to s/CONF.host/socket.gethostname()/ - Nova: GET /os-hypervisors*?service_host=X in a new microversion. - Neutron/Cyborg: master-only patch to do the logic described in `Upgrade Concerns`_ (though for now without the `elif` branch). Thanks, efried [1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-11-21.log.html#t2019-11-21T17:59:05 [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-11-21.log.html#t2019-11-21T21:53:29 [3] https://opendev.org/openstack/nova/src/commit/1cd5563f2dd2b218db2422397c8aab394d484626/nova/virt/libvirt/host.py#L955 [4] https://opendev.org/openstack/nova/src/commit/1cd5563f2dd2b218db2422397c8aab394d484626/nova/conf/netconf.py#L56 [5] https://docs.openstack.org/api-ref/compute/?expanded=list-hypervisors-details-detail#id309 From dms at danplanet.com Thu Nov 21 23:32:33 2019 From: dms at danplanet.com (Dan Smith) Date: Thu, 21 Nov 2019 15:32:33 -0800 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: (Sean Mooney's message of "Thu, 21 Nov 2019 23:12:27 +0000") References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> <20191121181557.apixsfva7vbufgc3@yuggoth.org> Message-ID: > nova only has a couple but it might be intersting to convert those to precommit scripts. > looking through them some of them do seam useful although other are just python 2 vs python 3 > guidline that i hope will be less relevant now. I would so very much love if we did NOT do that. Precommit hooks are super annoying for writing up quick PoCs and DNM patches, which we do a lot. --Dan From mriedemos at gmail.com Fri Nov 22 00:53:43 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 21 Nov 2019 18:53:43 -0600 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> Message-ID: <6f926b2a-3580-8542-d17e-3ad7feef2149@gmail.com> On 11/21/2019 5:28 PM, Eric Fried wrote: > So the proposal moving forward is to add (in a new microversion) a > ?service_host=XXX qparam to those APIs to let you filter down to just > the one entry for your CONF.host. I couldn't find the nova bug for it (if there was one, but I could have sworn there was if even for docs) but at some point someone was asking for this kind of filtering by compute service host API change for ironic nodes as well because today you can only filter hypervisors by the hypervisor_hostname which for ironic nodes is the node uuid. But if you want to list all nodes managed by a given compute service, you need to filter on the compute service host which is in the response but not a filter parameter in the API and is what Eric is proposing here. Point being, this would be useful even without the necessity for this nested provider wonkadoo problem. -- Thanks, Matt From duc.openstack at gmail.com Fri Nov 22 01:23:11 2019 From: duc.openstack at gmail.com (Duc Truong) Date: Thu, 21 Nov 2019 17:23:11 -0800 Subject: [auto-scaling][self-healing] Discussion to merge two SIG to one In-Reply-To: References: Message-ID: +1 from me. On Thu, Nov 21, 2019 at 3:21 PM Trinh Nguyen wrote: > Hi Rico, > > +1 > That is a very good idea. Coincidentally, I'm working on some research > projects that focus on autoscaling and self-healing at the same time. And > the combined group would be a very good idea because I don't have to switch > back and forth between the groups for discussion. > > Thanks, > > > On Thu, Nov 21, 2019 at 12:57 AM Rico Lin > wrote: > >> Dear all >> >> As we discussed in PTG about merge two SIG to one. >> I would like to continue the discussion on ML. >> >> In PTG, Eric proposes the idea to merge two SIG due to the high >> overlapping of domains and tasks. >> >> I think this is a great idea since, over the last 6 months, most of the >> discussions in both SIG are overlapped. So I'm onboard with this idea. >> >> Here's how I think we can continue this idea: >> >> 1. Create new SIG (maybe 'Automation SIG'? feel free to propose name >> which can cover both interest.) >> 2. Redirect docs and wiki to new SIG. And rework on index so there >> will be no confusion >> 3. Move repos from both SIGs to new SIG >> 4. Mark auto-scaling SIG and self-healing SIG as inactive. >> 5. remove auto-scaling SIG and self-healing SIG after a >> reasonable waiting time >> >> >> Let us know what you think about this. Otherwise, we definitely expect >> this to happen soon. >> >> >> >> -- >> May The Force of OpenStack Be With You, >> >> *Rico Lin*irc: ricolin >> >> > > -- > *Trinh Nguyen* > *www.edlab.xyz * > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sundar.nadathur at intel.com Fri Nov 22 02:11:59 2019 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Fri, 22 Nov 2019 02:11:59 +0000 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> Message-ID: > -----Original Message----- > From: Eric Fried > Sent: Thursday, November 21, 2019 3:28 PM > Action Summary > ============== > If the above sounds reasonable, it would entail the following actions: > - Neutron(/Cyborg?): backportable patch to > s/CONF.host/socket.gethostname()/ > - Nova: GET /os-hypervisors*?service_host=X in a new microversion. > - Neutron/Cyborg: master-only patch to do the logic described in `Upgrade > Concerns`_ (though for now without the `elif` branch). Cyborg does use CONF.host today. We can use socket.gethostname() instead. On a related node, the patch for ARQ binding [1] uses instance.host, which comes from CONF.host. I'll change it to instance.node, which comes from [2], which in turn comes from get_hostname() [3]. [1] https://review.opendev.org/#/c/631244/46/nova/compute/manager.py at 2634 [2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L9137 [3] https://github.com/openstack/nova/blob/1cd5563f2dd2b218db2422397c8aab394d484626/nova/virt/libvirt/driver.py#L9614 > Thanks, > efried Thanks & Regards, Sundar From tony at bakeyournoodle.com Fri Nov 22 04:05:18 2019 From: tony at bakeyournoodle.com (Tony Breeds) Date: Fri, 22 Nov 2019 15:05:18 +1100 Subject: [meta-sig][multi-arch] propose forming a Multi-arch SIG In-Reply-To: References: Message-ID: <20191122040518.GA10388@thor.bakeyournoodle.com> On Wed, Nov 20, 2019 at 06:03:03PM +0800, Rico Lin wrote: > Dear all > In summit, there's a forum for ARM support [1] in Summit which many people > show they're interested in ARM support for OpenStack. > And since also we have Linaro shows interest in donating servers to > OpenStack infra. It's time for community to think about what we should deal > with those ARM servers once we have them in community infrastructure. > > One thing we should do as a community is to gather people for this topic. > So I propose we create a Multi-arch SIG and aim to support ARM architecture > as very first step. > I had the idea to call it ARM SIG before, but since there might be high > overlap knowledge between support ARM 64 and other architectures. I propose > we go for Multi-arch instead. > > This SIG will be a nice place to collect all the documents, gate jobs, and > to trace tasks. > > If you're also interested in that group, please reply to this email, > introduce yourself and tell us what you would like the group scope and > objectives to be, and what you can contribute to the group. Pick me Pick me :) I've been around OpenStack for about 5 years now, the last couple have been focused on brining multi-arch support (albeit ppc64le) into tripleo building on the enablement work that others have done. I'm keen to work with the SIG, to build out the ARM support and at the same time ensure we don't make it hard for other architectures to do the same Yours Tony. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From tony at bakeyournoodle.com Fri Nov 22 04:06:30 2019 From: tony at bakeyournoodle.com (Tony Breeds) Date: Fri, 22 Nov 2019 15:06:30 +1100 Subject: [meta-sig][multi-arch] propose forming a Multi-arch SIG In-Reply-To: References: Message-ID: <20191122040630.GB10388@thor.bakeyournoodle.com> On Thu, Nov 21, 2019 at 02:10:02AM +0000, Chris Apsey wrote: > This will be a long-winded response, so bare with me... > > I brought up a semi-related topic on the mailing list last year [1], namely that nova should ingest the hw_architecture field that is available in glance images and intelligently schedule it on compute nodes - not only in cases where you have actual hardware, but also using just qemu emulation when no 'real' hardware is available. That's a fun idea and one we shoudl talk more about and try to work out what that'd look like for nova. Of course it assumes really good "TCG" support in qemu but we can probably do *something* Yours Tony. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From zhangbailin at inspur.com Fri Nov 22 06:58:28 2019 From: zhangbailin at inspur.com (=?gb2312?B?QnJpbiBaaGFuZyjVxbDZwdYp?=) Date: Fri, 22 Nov 2019 06:58:28 +0000 Subject: =?gb2312?B?W2xpc3RzLm9wZW5zdGFjay5vcme0+reiXVJlOiBbZGV2XSBVcGdyYWRpbmcg?= =?gb2312?Q?flake8_to_support_f-strings?= In-Reply-To: References: <783ec4032f4070af16917e5f6c009026@sslemail.net> Message-ID: <828352d1cba44a3db0a2c3a843693922@inspur.com> > 主题: [lists.openstack.org代发]Re: [dev] Upgrading flake8 to support f-strings > > On Thu, 2019-11-21 at 13:26 -0500, Mohammed Naser wrote: > > On Thu, Nov 21, 2019 at 1:20 PM Jeremy Stanley > wrote: > > > > > > On 2019-11-21 17:54:32 +0000 (+0000), Stephen Finucane wrote: > > > [...] > > > > Unfortunately, flake8 3.x is a total rewrite and I haven't found a > > > > way to port things across. > > > > > > [...] > > > > I'm flat out of ideas on that so someone other than me is going to > > > > have to take this migration upon themselves or we're going to have > > > > to drop hacking so we can use a new flake8. > > > > > > [...] > > > > > > Oof, yes I guess it's high time to discuss this (sorry if there was > > > a prior ML thread about it which I missed). So I guess the options I > > > can see are: > > > > > > A. keep running woefully outdated flake8 and friends (isn't working) > > > > > > B. overhaul hacking to work as a file-level analyzer plug-in > > > > > > C. improve flake8 to support string-level analyzer plug-ins > > > > > > D. separate hacking back out so it's no longer a flake8 plug-in > > > > > > E. stop running hacking entirely and rely on other flake8 plug-ins > > > > While I don't have all the context to the work required, that does > > seem like that's the best option long term IMHO. > i would prefer E as well. if we do need something like a hacking test that > enforces no alias of privsep function for example the a plugin could be written > for that 1 thing but in general i think we would be better off adopting exsitsing > plugins our just using > flake8 directly without plugins. i know we have some duplicate work checkers > and other custom hacking test but i dont know if i have ever been hit by a > hacking failure. i have pep8 issues all the time but never checks added by > hacking. I also like E, but can we not update with flake8 every time, can we follow up with a stable version? In addition to the bug update, there will be a feature update, so is nova going to be updated? The maintenance of the plugin is also a big investment. > > > Anything else? For sake of simplicity I'd favor option E. In our > > > present reality where most folks already have far too much work on > > > their respective plates, having one less project to maintain makes > > > some measure of sense. Does hacking currently save teams more than > > > enough effort to balance out the amount of effort involved in > > > keeping it working with newer software? > > > -- > > > Jeremy Stanley > > > > > From rico.lin.guanyu at gmail.com Fri Nov 22 07:03:47 2019 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Fri, 22 Nov 2019 15:03:47 +0800 Subject: [meta][K8s][API][Extended Maintenance][Operation Docs] Change SIG to Advisory status Message-ID: Dear SIG members and chairs To follow the discussion in Meta SIG PTG room [1]. I would like to propose to change the following SIGs to Advisory status [2] to represent the SIG stays around for provide help, make sure everything stays working and provide advice when needed. - K8s SIG - API SIG - Extended Maintenance SIG - Operation Docs SIG If you think your SIG should not belong to `advisory` statue. Please advise from the following statuses: - active: SIG reaches out for discussion and event, have plans for the current cycle, host meetings or send ML out regularly. - forming: SIG still setting up. - advisory: SIG stays around for help, make sure everything stays working and provide advice when needed. - complete: SIG completes its mission. If that sounds correct, I need at least one chair from each SIG +1 on [2], so we can make sure it's what SIGs agreed on. [1] https://etherpad.openstack.org/p/PVG-meta-sig [2] https://review.opendev.org/#/c/695625/ -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From aj at suse.com Fri Nov 22 08:21:52 2019 From: aj at suse.com (Andreas Jaeger) Date: Fri, 22 Nov 2019 09:21:52 +0100 Subject: [meta][K8s][API][Extended Maintenance][Operation Docs] Change SIG to Advisory status In-Reply-To: References: Message-ID: <638eb1eb-febc-3a53-9bb8-f81f610bdb14@suse.com> On 22/11/2019 08.03, Rico Lin wrote: > Dear SIG members and chairs > > To follow the discussion in Meta SIG PTG room [1]. I would like to > propose to change the following SIGs to Advisory status [2] to represent > the SIG stays around for provide help, make sure everything stays > working and provide advice when needed. > > * K8s SIG > * API SIG > * Extended Maintenance SIG > * Operation Docs SIG Two questions: 1) Are there even enough people around for these to be advisors? 2) What is going to happen with artifacts of Operations Docs SIG: https://review.opendev.org/#/q/project:openstack/arch-design - no real change since 1 year. https://review.opendev.org/#/q/project:openstack/operations-guide has a bit more activity. What will happen with these two documents if the Operation Docs SIG becomes advisory state? Should we retire the repos and delete the content? I don't know whether the other SIGS own any deliverables, where we need to discuss what to do with them, Andreas > [...] -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From zigo at debian.org Fri Nov 22 08:28:27 2019 From: zigo at debian.org (Thomas Goirand) Date: Fri, 22 Nov 2019 09:28:27 +0100 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: <20191121152901.xzyv6o6q6jstc36u@yuggoth.org> References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> <0b6bd1f3-acd6-8b9c-7e0e-1f7d241a6aa4@debian.org> <20191121152901.xzyv6o6q6jstc36u@yuggoth.org> Message-ID: On 11/21/19 4:29 PM, Jeremy Stanley wrote: > On 2019-11-21 13:57:56 +0000 (+0000), Sean Mooney wrote: >> On Thu, 2019-11-21 at 13:29 +0100, Thomas Goirand wrote: > [...] >>> With my Debian OpenStack package maintainer hat on: I will >>> certainly ignore any backend that would be using MongoDB or >>> InfluxDB, as these cannot be used without non-debian packages. >> >> influxdb is mit licensed so im not sure why you would not be able >> to package or redistribute it in debian. > [...] > > And indeed, it's been packaged in Debian for years, and still is: > > https://packages.debian.org/influxdb > > The main concern about it is the open-core development model where > "advanced" features like stability and redundancy require you to > purchase their proprietary enterprise version instead of the less > full-featured (but freely-licensed) community version: > > https://www.influxdata.com/blog/update-on-influxdb-clustering-high-availability-and-monetization/ And we also should remember one of the reason that spawned OpenStack into existence: some other cloud solution was open-core, and there was multiple conflict of interest where upstream made it difficult to contribute (because they needed to differentiate commercially). That very much is the main issue of any software adopting the open-core model: conflict of interest by the company behind it. Cheers, Thomas Goirand (zigo) From witold.bedyk at suse.com Fri Nov 22 08:45:02 2019 From: witold.bedyk at suse.com (Witek Bedyk) Date: Fri, 22 Nov 2019 09:45:02 +0100 Subject: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: <20191121205356.fdwb6oaeoppmk6up@firewall> References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> <0b6bd1f3-acd6-8b9c-7e0e-1f7d241a6aa4@debian.org> <20191121152901.xzyv6o6q6jstc36u@yuggoth.org> <20191121205356.fdwb6oaeoppmk6up@firewall> Message-ID: <8184e8c7-b5aa-9801-c4a6-b1e9f8db11b2@suse.com> On 11/21/19 9:53 PM, Nate Johnston wrote: > I know several very large sites (ingesting billions of records per day) > that run community InfluxDB and they get HA by putting influx-proxy [1] > in front of it. I've evaluated it for large scale uses before as well, > and with influx-proxy I found no need for the clustering option. Similar architecture is followed by Monasca. InfluxDB instances can be assigned to different Kafka consumer groups and consume messages independently from the message queue. In case one of the instances is down all the measurements are still buffered and get persisted as soon as the instance is available again. Best greetings Witek From thierry at openstack.org Fri Nov 22 11:11:30 2019 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 22 Nov 2019 12:11:30 +0100 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: <20191121181557.apixsfva7vbufgc3@yuggoth.org> References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> <20191121181557.apixsfva7vbufgc3@yuggoth.org> Message-ID: <7c061ba6-58d4-4a54-98a6-1140e93c6e46@openstack.org> Jeremy Stanley wrote: > [...] > Oof, yes I guess it's high time to discuss this (sorry if there was > a prior ML thread about it which I missed). So I guess the options > I can see are: > > A. keep running woefully outdated flake8 and friends (isn't working) To be fair, option (A) *is* working, as long as you don't use f-strings. So we could just say that we should not use certain new language features. However, the error you get when you try using them is very confusing, and I expect more and more people to hit it as we move forward. > B. overhaul hacking to work as a file-level analyzer plug-in > > C. improve flake8 to support string-level analyzer plug-ins > > D. separate hacking back out so it's no longer a flake8 plug-in > > E. stop running hacking entirely and rely on other flake8 plug-ins > > Anything else? There is option (F) which is to explicitly add pyflakes>=1.4.0,<1.5 to projects that want to use f-strings (or to a new hacking version that projects will opt into). That will force flake8 to use pyflakes 1.4.0 instead of 1.2.3, while still remaining compatible with hacking. Looks like the only check 1.4.0 adds compared to 1.2.3 is duplicate dictionary keys[1], which hopefully should not trigger massive reports... As an example, see this DNM change which adds a f-string in patchset 1, fails pep8 tests with the cryptic error, but passes them in patchset 2 once pyflakes>=1.4.0,<1.5 is explicitly added in test-requirements: https://review.opendev.org/695653 This obviously kicks the can down the road, but at least avoids getting it onto mriedem's lawn... [1] https://github.com/PyCQA/pyflakes/commit/152ca182a603d7327925873db9a795797812968d -- Thierry Carrez (ttx) From balazs.gibizer at est.tech Fri Nov 22 11:16:58 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 22 Nov 2019 11:16:58 +0000 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> Message-ID: <1574421416.31688.4@est.tech> On Thu, Nov 21, 2019 at 17:28, Eric Fried wrote: > Hi all. Sean mentioned a "downstream bug" in IRC today [1], which we > discussed a little bit, but without gibi; and then Matt and I > discussed > it more later [2], but without gibi *or* Sean. Since I don't know if > there's a bug report I can comment on, I wanted to summarize here for > now so I don't forget. > > The Problem > =========== > Neutron needs to know the compute node resource provider on which to > hang the child providers for QoS bandwidth. Today it assumes CONF.host > is the name of that provider. > > That's wrong. > > The name of the provider is `hypervisor_hostname`, which for libvirt > [3] > happens to match the *default* value of CONF.host [4]. > > Per the bug Sean describes, if you override CONF.host, neutron won't > find the compute node provider, and things break. > > The problem will be the same for any non-nova wishing to discover the > compute node RP -- e.g. cyborg for purposes of creating child > providers > for accelerators. As a background I'm pretty sure I made this mistake due to looking at how nova and neutron does the port binding. There nova sends the instance.host (which is coming from the CONF.host) to neutron in the binding:host_id field of the neutron port. The difference between the binding and the inventory handling is that port.binding:host_id is used by neutron to figure out which service host will run the nova-compute service as well as the neutron agent. So there the CONF.host is enough. Libvirt also runs on the same physical host as nova-compute and the neutron agent. BUT while for nova and neutron the hostname can be overwritten by setting the CONF.host to an arbitrary string, for libvirt such configuration cannot be applied. Libvirt will always use the linux's gethostname call [6]. > > The Right Solution > ================== > Neutron (and any $service) should look up the compute node provider by > its UUID. That's returned by the /os-hypervisors APIs after > microversion > 2.53, e.g. [5], but, Catch-22, you can currently only filter those > results on hypervisor_hostname. So you would have to e.g. GET > /os-hypervisors/detail and then walk the list looking for service.host > matching CONF.host. That's way heavy for your CERNs. > > So the proposal moving forward is to add (in a new microversion) a > ?service_host=XXX qparam to those APIs to let you filter down to just > the one entry for your CONF.host. The UUID of that entry will also be > the UUID of the compute node resource provider. (At that point you > don't > even need to ask Placement for that provider; you can just use that > UUID > directly in the APIs that create the child providers. Yay, you got > your > extra API call back.) > The result of ?service_host=XXX cannot be a single compute node. It needs to be a list of compute nodes (due to ironic) and neutron needs to *assume* that in any case where neutron does this query there is no more than a single item in the returned list (e.g. neutron cannot mix with ironic). > > Now, that's not backportable, and this problem exists in stable > releases > (at least those that support QoS bandwidth). So we should totally do > it, > but we also need... > > The Backportable Solution > ========================= > Neutron should use `gethostname()` rather than CONF.host to discover > the > compute node resource provider. In theory this sounds a good solution as it will align the nova + libvirt behavior with the neutron behavior regarding placement RP naming. Where the whole thing goes sideways is the fact that neutron does the inventory handling up in the neutron server. Neutron does the compute RP lookup, child RP creation and inventory reporting based on the information the agents are sending up to the neutron server. As in neutron there is no differentiation between compute/agent service host and hypervisor node the agents sends up a single hostname[7][9] based on the CONF.host configuration[8]. So adding a new piece of information to the agent report is most probably an RPC change between neutron agents and the neutron server. @Neutron, @Stable: Does such RPC change is backportable in neutron? > > I don't consider this a viable permanent solution because it is > tightly > coupled to knowing that hypervisor_hostname == `gethostname()`, which > happens to be true for libvirt, but not necessarily for other drivers. > We can get away with it for stable because we happen to know that > we're > only supporting QoS bandwidth via Placement for libvirt. Temporary workaround ==================== If you configure bandwidth in neutron then do not change the CONF.host config in your computes. I think we have to document that as a limitation here [10]. > > Upgrade Concerns > ================ > Matt and I didn't nail down whether neutron and compute are allowed to > be at different versions on a given host, or what those are allowed to > be. But things should be sane if neutron (or any $service) logics like > this in >=ussuri: > > if new_nova_microversion_available: > do_the_os_hypervisors_thing() > elif using_new_non_libvirt_feature: > raise YouCantDoThisWithOldNova() > else: > do_the_gethostname_thing() > The 'if' part looks OK to me and when we find a backportable solution, that needs to be in the 'else' branch. > Action Summary > ============== > If the above sounds reasonable, it would entail the following actions: > - Neutron(/Cyborg?): backportable patch to > s/CONF.host/socket.gethostname()/ > - Nova: GET /os-hypervisors*?service_host=X in a new microversion. I guess that will be me, starting with a spec that outlines the new API. > - Neutron/Cyborg: master-only patch to do the logic described in > `Upgrade Concerns`_ (though for now without the `elif` branch). > > Thanks, > efried > > [1] > http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-11-21.log.html#t2019-11-21T17:59:05 > [2] > http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-11-21.log.html#t2019-11-21T21:53:29 > [3] > https://opendev.org/openstack/nova/src/commit/1cd5563f2dd2b218db2422397c8aab394d484626/nova/virt/libvirt/host.py#L955 > [4] > https://opendev.org/openstack/nova/src/commit/1cd5563f2dd2b218db2422397c8aab394d484626/nova/conf/netconf.py#L56 > [5] > https://docs.openstack.org/api-ref/compute/?expanded=list-hypervisors-details-detail#id309 > @Sean: thanks for finding the bug! @Eric: thanks for the good write up! Cheers, gibi [6] https://libvirt.org/docs/libvirt-appdev-guide-python/en-US/html/libvirt_application_development_guide_using_python-Connections-Host_Info.html [7] https://github.com/openstack/neutron/blob/67b613b795416406fb4fab143b3ec9ba8657711f/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L295-L324 [8] https://github.com/openstack/neutron/blob/67b613b795416406fb4fab143b3ec9ba8657711f/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L171 [9] https://github.com/openstack/neutron/blob/67b613b795416406fb4fab143b3ec9ba8657711f/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py#L161-L175 [10] https://docs.openstack.org/neutron/latest/admin/config-qos-min-bw.html#limitations From balazs.gibizer at est.tech Fri Nov 22 11:18:54 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 22 Nov 2019 11:18:54 +0000 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> Message-ID: <1574421532.31688.5@est.tech> On Fri, Nov 22, 2019 at 02:11, "Nadathur, Sundar" wrote: >> -----Original Message----- >> From: Eric Fried >> Sent: Thursday, November 21, 2019 3:28 PM > >> Action Summary >> ============== >> If the above sounds reasonable, it would entail the following >> actions: >> - Neutron(/Cyborg?): backportable patch to >> s/CONF.host/socket.gethostname()/ >> - Nova: GET /os-hypervisors*?service_host=X in a new microversion. >> - Neutron/Cyborg: master-only patch to do the logic described in >> `Upgrade >> Concerns`_ (though for now without the `elif` branch). > > Cyborg does use CONF.host today. We can use socket.gethostname() > instead. > > On a related node, the patch for ARQ binding [1] uses instance.host, > which comes from CONF.host. I'll change it to instance.node, which > comes from [2], which in turn comes from get_hostname() [3]. I'm not sure that for binding you need to use the instance.node. See my neutron port binding discussion in my response to Eric's original mail. > > [1] > https://review.opendev.org/#/c/631244/46/nova/compute/manager.py at 2634 > [2] > https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L9137 > [3] > https://github.com/openstack/nova/blob/1cd5563f2dd2b218db2422397c8aab394d484626/nova/virt/libvirt/driver.py#L9614 > >> Thanks, >> efried > > > Thanks & Regards, > Sundar > From rfolco at redhat.com Fri Nov 22 12:15:42 2019 From: rfolco at redhat.com (Rafael Folco) Date: Fri, 22 Nov 2019 10:15:42 -0200 Subject: [tripleo] TripleO CI Summary: Sprint 39 Message-ID: Greetings, The TripleO CI team has just completed Sprint 39 / Unified Sprint 18 (Oct 31 thru Nov 20). The following is a summary of completed work during this sprint cycle: - Evaluated and implemented CI jobs in Zuul that deal with RPM build artifacts for ceph-ansible and podman 3rd party testing. - Explored design options and created a PoC for individual component testing in the promotion pipeline. This effort will add an additional verification layer to check OpenStack components (compute, networking, storage, etc) with stable builds, and ease root cause determination when it breaks the code. - Fixed the docker login module in the new promotion code. - Improve tests for verifying a full promotion workflow running on the staging environment. The planned work for the next sprint [1] are: - Close-out work on 3rd-party testing jobs for podman and ceph-ansible. - Close-out work on the design enhancements to the promotion server. - Implement component pipeline running daily and promoting separated from the integration jobs. - Address issues and technical debt tasks in TripleO CI realm. The Ruck and Rover for this sprint are Marios Andreou (marios) and Chandan Kumar (chkumar). Please direct questions or queries to them regarding CI status or issues in #tripleo, ideally to whomever has the ‘|ruck’ suffix on their nick. Ruck/rover notes are being tracked in etherpad [2]. Thanks, rfolco [1] https://tree.taiga.io/project/tripleo-ci-board/taskboard/unified-sprint-19 [2] https://etherpad.openstack.org/p/ruckroversprint19 -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Fri Nov 22 12:39:43 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 22 Nov 2019 12:39:43 +0000 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> Message-ID: <2396063812f86abb1c67e49be86e38b68daf40c1.camel@redhat.com> On Fri, 2019-11-22 at 02:11 +0000, Nadathur, Sundar wrote: > > -----Original Message----- > > From: Eric Fried > > Sent: Thursday, November 21, 2019 3:28 PM > > Action Summary > > ============== > > If the above sounds reasonable, it would entail the following actions: > > - Neutron(/Cyborg?): backportable patch to > > s/CONF.host/socket.gethostname()/ > > - Nova: GET /os-hypervisors*?service_host=X in a new microversion. > > - Neutron/Cyborg: master-only patch to do the logic described in `Upgrade > > Concerns`_ (though for now without the `elif` branch). > > Cyborg does use CONF.host today. We can use socket.gethostname() instead. > > On a related node, the patch for ARQ binding [1] uses instance.host, which comes from CONF.host. I'll change it to > instance.node, which comes from [2], which in turn comes from get_hostname() [3]. for comunication with neutorn and cinder nova uses CONF.host so cyborg shoudl still use the CONF.host for acclerators but for comunicating with placemet we use the hypervior_hostname. so in the cyborg case we choudl chage form socket.getFQDN to socket.gethostname as the defualt for CONF.host but simply change the get_root_prover function so that it uses the hypervisor api to determin the uuid of the RP > > [1] https://review.opendev.org/#/c/631244/46/nova/compute/manager.py at 2634 > [2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L9137 > [3] https://github.com/openstack/nova/blob/1cd5563f2dd2b218db2422397c8aab394d484626/nova/virt/libvirt/driver.py#L9614 > > > Thanks, > > efried > > > Thanks & Regards, > Sundar > From smooney at redhat.com Fri Nov 22 13:06:01 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 22 Nov 2019 13:06:01 +0000 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: <1574421416.31688.4@est.tech> References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> Message-ID: <85ccebc4cfa6f41e0fac5fa80ffb270e0dfa0538.camel@redhat.com> On Fri, 2019-11-22 at 11:16 +0000, Balázs Gibizer wrote: > > On Thu, Nov 21, 2019 at 17:28, Eric Fried wrote: > > Hi all. Sean mentioned a "downstream bug" in IRC today [1], which we > > discussed a little bit, but without gibi; and then Matt and I > > discussed > > it more later [2], but without gibi *or* Sean. Since I don't know if > > there's a bug report I can comment on, I wanted to summarize here for > > now so I don't forget. > > > > The Problem > > =========== > > Neutron needs to know the compute node resource provider on which to > > hang the child providers for QoS bandwidth. Today it assumes CONF.host > > is the name of that provider. > > > > That's wrong. > > > > The name of the provider is `hypervisor_hostname`, which for libvirt > > [3] > > happens to match the *default* value of CONF.host [4]. > > > > Per the bug Sean describes, if you override CONF.host, neutron won't > > find the compute node provider, and things break. > > > > The problem will be the same for any non-nova wishing to discover the > > compute node RP -- e.g. cyborg for purposes of creating child > > providers > > for accelerators. > > As a background I'm pretty sure I made this mistake due to looking at > how nova and neutron does the port binding. There nova sends the > instance.host (which is coming from the CONF.host) to neutron in the > binding:host_id field of the neutron port. The difference between the > binding and the inventory handling is that port.binding:host_id is used > by neutron to figure out which service host will run the nova-compute > service as well as the neutron agent. So there the CONF.host is enough. > Libvirt also runs on the same physical host as nova-compute and the > neutron agent. BUT while for nova and neutron the hostname can be > overwritten by setting the CONF.host to an arbitrary string, for > libvirt such configuration cannot be applied. Libvirt will always use > the linux's gethostname call [6]. right so normally triplo change /etc/hostname to match the value it sets in CONF.host via we think could-init (i try to avoid triploe so have not fully avlidated that) as a result in a typeical tripleo deployment this will work because gethostname and CONF.host happen to align. i dont think any other installl set the CONF.host value by default. we even disucssed removing it at the dublin ptg. the reason we found this is that triplo in standalone more does not update the hostname but did set the CONF.host since in standalone mode you have to deploy the os yourself and therefero there is not cloud-init step to update the hostname so its whatever you set in your install or get via dhcp. > > > > > The Right Solution > > ================== > > Neutron (and any $service) should look up the compute node provider by > > its UUID. That's returned by the /os-hypervisors APIs after > > microversion > > 2.53, e.g. [5], but, Catch-22, you can currently only filter those > > results on hypervisor_hostname. So you would have to e.g. GET > > /os-hypervisors/detail and then walk the list looking for service.host > > matching CONF.host. That's way heavy for your CERNs. > > > > So the proposal moving forward is to add (in a new microversion) a > > ?service_host=XXX qparam to those APIs to let you filter down to just > > the one entry for your CONF.host. The UUID of that entry will also be > > the UUID of the compute node resource provider. (At that point you > > don't > > even need to ask Placement for that provider; you can just use that > > UUID > > directly in the APIs that create the child providers. Yay, you got > > your > > extra API call back.) > > > > The result of ?service_host=XXX cannot be a single compute node. It > needs to be a list of compute nodes (due to ironic) and neutron needs > to *assume* that in any case where neutron does this query there is no > more than a single item in the returned list (e.g. neutron cannot mix > with ironic). so that is problematic or will be in the futrue since there are efforets to allow contol of smartnics via neutron for ironic. in the general case however yes. what i think we need to do is extend that agent report to contain a hyperviour_hostname filed in addtion to the host filed so that in the ironic case you can set the node uuid and in the non ironic case you can set the hostname as returned by socket.gethostname() that said i dont know if we have to handel the ironic + smartnic case currently as i dont think minium bandwith work in that case or even really makes sense. you will get the entire nic in that case regardless. > > > > > > Now, that's not backportable, and this problem exists in stable > > releases > > (at least those that support QoS bandwidth). So we should totally do > > it, > > but we also need... > > > > The Backportable Solution > > ========================= > > Neutron should use `gethostname()` rather than CONF.host to discover > > the > > compute node resource provider. > > In theory this sounds a good solution as it will align the nova + > libvirt behavior with the neutron behavior regarding placement RP > naming. > > Where the whole thing goes sideways is the fact that neutron does the > inventory handling up in the neutron server. Neutron does the compute > RP lookup, child RP creation and inventory reporting based on the > information the agents are sending up to the neutron server. As in > neutron there is no differentiation between compute/agent service host > and hypervisor node the agents sends up a single hostname[7][9] based > on the CONF.host configuration[8]. So adding a new piece of information > to the agent report is most probably an RPC change between neutron > agents and the neutron server. > > @Neutron, @Stable: Does such RPC change is backportable in neutron? right so i mentions above that we would need to extend the agent report with a hyperviour_hostname but that is only really needed for the ironic case and currenlty neutron does not support min bandwith with ironic. fortunetly the agent report/agent state is an unversioned dictionary of random stings https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L295-L324 or at least the configuration dict inside is. i think the stucture as a whole is largely unversioned so we shoudl be able to extend it without breaking anything if we need too again its up to the neutron/stable folks to determin but i belive it would be fine as addtional files woudl be ignored by the server and ml2 dirvers. > > > > > I don't consider this a viable permanent solution because it is > > tightly > > coupled to knowing that hypervisor_hostname == `gethostname()`, which > > happens to be true for libvirt, but not necessarily for other drivers. > > We can get away with it for stable because we happen to know that > > we're > > only supporting QoS bandwidth via Placement for libvirt. > > Temporary workaround > ==================== > > If you configure bandwidth in neutron then do not change the CONF.host > config in your computes. I think we have to document that as a > limitation here [10]. well dont change it without actully also changing the hostnaem in /etc/hostname or systemd where ever that is handeled these days. as long as it is set to the same value that will be returned by libvirt or your virt driver of choice you will be fine. granted at that point the default value would also work but tipleo are trying to force it to be the full FQDN for some reason. > > > > > Upgrade Concerns > > ================ > > Matt and I didn't nail down whether neutron and compute are allowed to > > be at different versions on a given host, or what those are allowed to > > be. But things should be sane if neutron (or any $service) logics like > > this in >=ussuri: > > > > if new_nova_microversion_available: > > do_the_os_hypervisors_thing() > > elif using_new_non_libvirt_feature: > > raise YouCantDoThisWithOldNova() you could do the full hypervour list in this case and cache the result so you do it once per host. that would suck on startup but its an option. > > else: > > do_the_gethostname_thing() > > > > The 'if' part looks OK to me and when we find a backportable solution, > that needs to be in the 'else' branch. > > > Action Summary > > ============== > > If the above sounds reasonable, it would entail the following actions: > > - Neutron(/Cyborg?): backportable patch to > > s/CONF.host/socket.gethostname()/ > > - Nova: GET /os-hypervisors*?service_host=X in a new microversion. > > I guess that will be me, starting with a spec that outlines the new API. > > > - Neutron/Cyborg: master-only patch to do the logic described in > > `Upgrade Concerns`_ (though for now without the `elif` branch). > > > > Thanks, > > efried > > > > [1] > > http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-11-21.log.html#t2019-11-21T17:59:05 > > [2] > > http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-11-21.log.html#t2019-11-21T21:53:29 > > [3] > > https://opendev.org/openstack/nova/src/commit/1cd5563f2dd2b218db2422397c8aab394d484626/nova/virt/libvirt/host.py#L955 > > [4] > > https://opendev.org/openstack/nova/src/commit/1cd5563f2dd2b218db2422397c8aab394d484626/nova/conf/netconf.py#L56 > > [5] > > https://docs.openstack.org/api-ref/compute/?expanded=list-hypervisors-details-detail#id309 > > > > @Sean: thanks for finding the bug! > @Eric: thanks for the good write up! > > Cheers, > gibi > > [6] > https://libvirt.org/docs/libvirt-appdev-guide-python/en-US/html/libvirt_application_development_guide_using_python-Connections-Host_Info.html > [7] > https://github.com/openstack/neutron/blob/67b613b795416406fb4fab143b3ec9ba8657711f/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L295-L324 > [8] > https://github.com/openstack/neutron/blob/67b613b795416406fb4fab143b3ec9ba8657711f/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L171 > [9] > https://github.com/openstack/neutron/blob/67b613b795416406fb4fab143b3ec9ba8657711f/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py#L161-L175 > [10] > https://docs.openstack.org/neutron/latest/admin/config-qos-min-bw.html#limitations > > > > From thierry at openstack.org Fri Nov 22 13:07:27 2019 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 22 Nov 2019 14:07:27 +0100 Subject: [sig] Forming a Large scale SIG In-Reply-To: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> Message-ID: <25c6d2a9-ae8f-1499-3a5d-48b881761f3e@openstack.org> Thanks everyone for volunteering to participate to this group! We'll have our first meeting next week, on IRC (#openstack-meeting), on Wednesday, November 27, 09:00 UTC: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20191127T09 The agenda will be the following: - Agree on SIG name - Volunteers for SIG chairing - Need for synchronous meetings - If meetings are needed, meeting frequency and tooling - If any time is left, discuss initial SIG objectives Notes at https://etherpad.openstack.org/p/large-scale-sig-meeting Hoping to see you there, -- Thierry Carrez (ttx) From smooney at redhat.com Fri Nov 22 13:11:16 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 22 Nov 2019 13:11:16 +0000 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> <20191121181557.apixsfva7vbufgc3@yuggoth.org> Message-ID: <7fb9b364734277c71b0634a86d030f9778fa64ea.camel@redhat.com> On Thu, 2019-11-21 at 15:32 -0800, Dan Smith wrote: > > nova only has a couple but it might be intersting to convert those to precommit scripts. > > looking through them some of them do seam useful although other are just python 2 vs python 3 > > guidline that i hope will be less relevant now. > > I would so very much love if we did NOT do that. Precommit hooks are > super annoying for writing up quick PoCs and DNM patches, which we do a > lot. ya true although i was referign to the precommit framework which we previously disucssed and said shoudl be optional to install. do you think we should keep/maintain hacking in nova and or port these to something else if not? anyway it was just a thought i dont want it to be mandaroy on every commit for the poc hacking reason either but it might be a way to keep the checks without having to contiue to maintain hacking. > > --Dan > From sfinucan at redhat.com Fri Nov 22 13:28:51 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Fri, 22 Nov 2019 13:28:51 +0000 Subject: =?UTF-8?Q?=5Blists=2Eopenstack=2Eorg=E4=BB=A3=E5=8F=91=5DRe=3A?= [dev] Upgrading flake8 to support f-strings In-Reply-To: <828352d1cba44a3db0a2c3a843693922@inspur.com> References: <783ec4032f4070af16917e5f6c009026@sslemail.net> <828352d1cba44a3db0a2c3a843693922@inspur.com> Message-ID: On Fri, 2019-11-22 at 06:58 +0000, Brin Zhang(张百林) wrote: > > 主题: [lists.openstack.org代发]Re: [dev] Upgrading flake8 to support > f-strings > > On Thu, 2019-11-21 at 13:26 -0500, Mohammed Naser wrote: > > > On Thu, Nov 21, 2019 at 1:20 PM Jeremy Stanley > > wrote: > > > > On 2019-11-21 17:54:32 +0000 (+0000), Stephen Finucane wrote: > > > > [...] > > > > > Unfortunately, flake8 3.x is a total rewrite and I haven't found a > > > > > way to port things across. > > > > > > > > [...] > > > > > I'm flat out of ideas on that so someone other than me is going to > > > > > have to take this migration upon themselves or we're going to have > > > > > to drop hacking so we can use a new flake8. > > > > > > > > [...] > > > > > > > > Oof, yes I guess it's high time to discuss this (sorry if there was > > > > a prior ML thread about it which I missed). So I guess the options I > > > > can see are: > > > > > > > > A. keep running woefully outdated flake8 and friends (isn't working) > > > > > > > > B. overhaul hacking to work as a file-level analyzer plug-in > > > > > > > > C. improve flake8 to support string-level analyzer plug-ins > > > > > > > > D. separate hacking back out so it's no longer a flake8 plug-in > > > > > > > > E. stop running hacking entirely and rely on other flake8 plug-ins > > > > > > While I don't have all the context to the work required, that does > > > seem like that's the best option long term IMHO. > > i would prefer E as well. if we do need something like a hacking test that > > enforces no alias of privsep function for example the a plugin could be > written > > for that 1 thing but in general i think we would be better off adopting > exsitsing > > plugins our just using > > flake8 directly without plugins. i know we have some duplicate work > checkers > > and other custom hacking test but i dont know if i have ever been hit by a > > hacking failure. i have pep8 issues all the time but never checks added by > > hacking. > > I also like E, but can we not update with flake8 every time, can we > follow up with a stable version? In addition to the bug update, there > will be a feature update, so is nova going to be updated? > The maintenance of the plugin is also a big investment. I might have misunderstood your question, but flake8 is one of the libraries that we don't manage via upper-constraints so projects are free to use whatever they want. nova will decide at some point to bump their flake8 version but no one else needs to do it in lockstep (or at all, if what they have works and they don't care to change it). Stephen > > > > Anything else? For sake of simplicity I'd favor option E. In our > > > > present reality where most folks already have far too much work on > > > > their respective plates, having one less project to maintain makes > > > > some measure of sense. Does hacking currently save teams more than > > > > enough effort to balance out the amount of effort involved in > > > > keeping it working with newer software? > > > > -- > > > > Jeremy Stanley From fungi at yuggoth.org Fri Nov 22 13:30:39 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 22 Nov 2019 13:30:39 +0000 Subject: =?utf-8?Q?=5Blists=2Eopenstack=2Eorg?= =?utf-8?B?5Luj5Y+RXVJlOg==?= [dev] Upgrading flake8 to support f-strings In-Reply-To: <828352d1cba44a3db0a2c3a843693922@inspur.com> References: <783ec4032f4070af16917e5f6c009026@sslemail.net> <828352d1cba44a3db0a2c3a843693922@inspur.com> Message-ID: <20191122133039.zu6ndky6dp7titsh@yuggoth.org> On 2019-11-22 06:58:28 +0000 (+0000), Brin Zhang(张百林) wrote: [...] > I also like E, but can we not update with flake8 every time, can > we follow up with a stable version? I don't understand, what's not stable? We managed to skip updating it for several cycles, and now we're talking about how to deal with the fact that we want to use newer Python constructs than the version of flake8 we've been stuck on for years. But really, we've said in the past that we should take the opportunity at the start of every development cycle (for Ussuri that's ~now) to globally increase the versions of static analysis tools we're relying on as a community and then freeze them for the remainder of the cycle. To me, that's as "stable" as makes sense. > In addition to the bug update, there will be a feature update, so > is nova going to be updated? [...] I don't know what you mean here either. Bug update where? Feature update where? Update Nova how? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From sfinucan at redhat.com Fri Nov 22 13:32:56 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Fri, 22 Nov 2019 13:32:56 +0000 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> <20191121181557.apixsfva7vbufgc3@yuggoth.org> Message-ID: On Thu, 2019-11-21 at 15:32 -0800, Dan Smith wrote: > > nova only has a couple but it might be intersting to convert those to precommit scripts. > > looking through them some of them do seam useful although other are just python 2 vs python 3 > > guidline that i hope will be less relevant now. > > I would so very much love if we did NOT do that. Precommit hooks are > super annoying for writing up quick PoCs and DNM patches, which we do a > lot. $ git commit -n U iz wlcm. > --Dan > From ssbarnea at redhat.com Fri Nov 22 13:51:10 2019 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Fri, 22 Nov 2019 13:51:10 +0000 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: <7fb9b364734277c71b0634a86d030f9778fa64ea.camel@redhat.com> References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> <20191121181557.apixsfva7vbufgc3@yuggoth.org> <7fb9b364734277c71b0634a86d030f9778fa64ea.camel@redhat.com> Message-ID: Keep in mind pre-commit tool != git-hooks, is like confusing JavaScript with Java ;) By default pre-commit does not install any hooks. Yes, you can install them, but in 9/10 cases I don't, I only have an alias pc="pre-commit run -a" which I run before each git-review. In fact I was considering adding an optional feature to git-review to auto run this when the repository has a .pre-commit-config.yml file. I am glad someone else opened the pre-commit subject before me. While it comes with its own challenges (git cloning), pre-commit resolves the problem of having predictable linter results by pinning them. Also it enables vey easy bumping of all of them. AFAIK, over the last year I removed hacking from several projects and replaced it with pre-commit and I am much happier. Not sure if others know but both bashate and doc8 can be used from pre-commit too. Extra bonus: we can avoid the case where we end-up having tons of jobs performing linting or style checks. > On 22 Nov 2019, at 13:11, Sean Mooney wrote: > > On Thu, 2019-11-21 at 15:32 -0800, Dan Smith wrote: >>> nova only has a couple but it might be intersting to convert those to precommit scripts. >>> looking through them some of them do seam useful although other are just python 2 vs python 3 >>> guidline that i hope will be less relevant now. >> >> I would so very much love if we did NOT do that. Precommit hooks are >> super annoying for writing up quick PoCs and DNM patches, which we do a >> lot. > ya true although i was referign to the precommit framework which we previously disucssed and said > shoudl be optional to install. do you think we should keep/maintain hacking in nova and or port these > to something else if not? anyway it was just a thought i dont want it to be mandaroy on every commit > for the poc hacking reason either but it might be a way to keep the checks without having to contiue to > maintain hacking. >> >> --Dan >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Nov 22 13:58:25 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 22 Nov 2019 13:58:25 +0000 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> <20191121181557.apixsfva7vbufgc3@yuggoth.org> <7fb9b364734277c71b0634a86d030f9778fa64ea.camel@redhat.com> Message-ID: <20191122135824.67jh5pf65y6g3v65@yuggoth.org> On 2019-11-22 13:51:10 +0000 (+0000), Sorin Sbarnea wrote: [...] > I am glad someone else opened the pre-commit subject before me. > While it comes with its own challenges (git cloning), pre-commit > resolves the problem of having predictable linter results by > pinning them. Also it enables vey easy bumping of all of them. [...] Is it still limited by only being able to install plugins from source, or can it consume released packages now? > Extra bonus: we can avoid the case where we end-up having tons of > jobs performing linting or style checks. Which can also be done by adding multiple check commands to a single tox testenv, or passing the names of multiple testenvs when invoking tox. Granted, tox is a fairly Python-oriented test tool, so pre-commit may be a more language-agnostic choice in that regard. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From smooney at redhat.com Fri Nov 22 13:58:47 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 22 Nov 2019 13:58:47 +0000 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> <20191121181557.apixsfva7vbufgc3@yuggoth.org> <7fb9b364734277c71b0634a86d030f9778fa64ea.camel@redhat.com> Message-ID: On Fri, 2019-11-22 at 13:51 +0000, Sorin Sbarnea wrote: > Keep in mind pre-commit tool != git-hooks, is like confusing JavaScript with Java ;) > > By default pre-commit does not install any hooks. Yes, you can install them, but in 9/10 cases I don't, I only have an > alias pc="pre-commit run -a" which I run before each git-review. In fact I was considering adding an optional feature > to git-review to auto run this when the repository has a .pre-commit-config.yml file. > > I am glad someone else opened the pre-commit subject before me. While it comes with its own challenges (git cloning), > pre-commit resolves the problem of having predictable linter results by pinning them. Also it enables vey easy bumping > of all of them. > > AFAIK, over the last year I removed hacking from several projects and replaced it with pre-commit and I am much > happier. > > Not sure if others know but both bashate and doc8 can be used from pre-commit too. > > Extra bonus: we can avoid the case where we end-up having tons of jobs performing linting or style checks. we need to keep the jobs even if we use pre-commit becasue it is optional to run it and we have to comply with teh PTI rules but addtional linters could be run via a singe pre-comite job. there is a patch to add precommit support to nova https://review.opendev.org/#/c/665518/ as an optional thing people can use if they choose too but that has been pending for 3 months at this point. > > > On 22 Nov 2019, at 13:11, Sean Mooney wrote: > > > > On Thu, 2019-11-21 at 15:32 -0800, Dan Smith wrote: > > > > nova only has a couple but it might be intersting to convert those to precommit scripts. > > > > looking through them some of them do seam useful although other are just python 2 vs python 3 > > > > guidline that i hope will be less relevant now. > > > > > > I would so very much love if we did NOT do that. Precommit hooks are > > > super annoying for writing up quick PoCs and DNM patches, which we do a > > > lot. > > > > ya true although i was referign to the precommit framework which we previously disucssed and said > > shoudl be optional to install. do you think we should keep/maintain hacking in nova and or port these > > to something else if not? anyway it was just a thought i dont want it to be mandaroy on every commit > > for the poc hacking reason either but it might be a way to keep the checks without having to contiue to > > maintain hacking. > > > > > > --Dan > > > > > > > > > From alexandre.arents at corp.ovh.com Fri Nov 22 14:05:19 2019 From: alexandre.arents at corp.ovh.com (Alexandre Arents) Date: Fri, 22 Nov 2019 15:05:19 +0100 Subject: [nova] unshelve image_ref bugs Message-ID: <20191122140519.zldtzhyp7ptdsm2h@corp.ovh.com> Hey all, We are using more and more shelve feature and we recently hit this referenced shelve/unshelve bug: https://bugs.launchpad.net/nova/+bug/1732428 http://lists.openstack.org/pipermail/openstack-dev/2017-December/125124.html To summup the issue: -Once we shelve/unshelve a qcow2 instance, we cannot anymore live-migrate/cold-migrate/resize without breaking it, involving data loss.. ->This is because the unshelved instance have a backing file corresponding to the deleted snapshot(shelve-snapshot), not the original image_ref in glance. So when we migrate the instance, destination host fetch original image from glance, not the one currently used by instance. I've tried to summup possible solution: A) Use patch proposed by Matt (abandonned): https://review.opendev.org/#/c/524726/ This change assume the existence of the snapshot image created by a shelve, that is to say don't delete image during unshelve and update instane.image_ref with it. PROS: -there is no more "hidden image" CONS: -each shelved/unshelved create images in glance(it uses space, can be a problem or not) -It breaks the assumption: "I want my instance remains unchanged after unshelve and keep original image_ref, so when I rebuild my instance, I use the original image (at spawn time), not the one used during last shelve" Note: Matt mention a possible workaround of the rebuild issue by using image_id in in request.spec instead of image_ref[1]) B) Someone propose to rebase backing file [2]: I did not check feasibility/complexity of this ? CONS: -What we do when original image is deleted? C) Change create_image()/imagebackend driver behavior, to create a flatten qcow2 file in case of unshelving. flattening disk may be a solution because there will be no more "orphan backing file". (Basicly doing like "flat" backend driver except we need to stay in qcow2 instead of RAW) PROS: -we keep orignal unshelve behavior/assumption CONS: -It means that in your infra configured in COW some instances will be in "qcow2 flat", Flat qcow2 instance works great (livemigration/resize..). Would all installation ok with that ? Ok it seems a little odd to ask COW driver to not do COW in some case. Alternatilevy we can force using flat driver if unshelving, but we need to change flat driver to support also qcow2. D) During spawn() if unshelving we convert "qcow2 disk with backing file" to a "flatten qcow2 disk", just after self._create_image(). It looks more like a workaround than a long term solution as it need to convert something created before, that do not meet the need(better to do C). E) Any other idea ? Currently to make short term OPS and User happy, we are about to do (D) as it works great in our environment, but we are looking for the project solution. Any recommendation? [1] http://lists.openstack.org/pipermail/openstack-dev/2018-September/134855.html [2] https://bugs.launchpad.net/nova/+bug/1732428/comments/6 -- Alexandre Arents aarents From smooney at redhat.com Fri Nov 22 14:19:04 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 22 Nov 2019 14:19:04 +0000 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: <20191122135824.67jh5pf65y6g3v65@yuggoth.org> References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> <20191121181557.apixsfva7vbufgc3@yuggoth.org> <7fb9b364734277c71b0634a86d030f9778fa64ea.camel@redhat.com> <20191122135824.67jh5pf65y6g3v65@yuggoth.org> Message-ID: <1416d0ab1e8821523c24cbfefce9f57dab1306bb.camel@redhat.com> On Fri, 2019-11-22 at 13:58 +0000, Jeremy Stanley wrote: > On 2019-11-22 13:51:10 +0000 (+0000), Sorin Sbarnea wrote: > [...] > > I am glad someone else opened the pre-commit subject before me. > > While it comes with its own challenges (git cloning), pre-commit > > resolves the problem of having predictable linter results by > > pinning them. Also it enables vey easy bumping of all of them. > > [...] > > Is it still limited by only being able to install plugins from > source, or can it consume released packages now? i think that was a fundemetal design choice they made to only supprot git and not packages. the one thing i wish they would add support for is the ablity to define a set of global hooks in you home directory e.g. ~/.config/.pre-commit.yaml so that you did not only have the choice of defining them in tree. or have the ablity to pass a file to use. if they supported either then we could use this ourselve without haveing to submit patches patches upstream. since we cant i have gone back to try spacemacs instead of using nano to see i can leverage the checkers built into that emacs distribution instead. i am likeing have a spell checker and auto indentation without having to go to a full idea like pycharm to get it. i love pycharm but i like codeing in a terminal too. > > > Extra bonus: we can avoid the case where we end-up having tons of > > jobs performing linting or style checks. > > Which can also be done by adding multiple check commands to a single > tox testenv, or passing the names of multiple testenvs when invoking > tox. Granted, tox is a fairly Python-oriented test tool, so > pre-commit may be a more language-agnostic choice in that regard. yep but its a fair point. if the was ever added to the PTI i would expeact aht we would define a tox env from running all the pre-commit checks but i dont see that a major priority. From C-Ramakrishna.Bhupathi at charter.com Fri Nov 22 14:55:48 2019 From: C-Ramakrishna.Bhupathi at charter.com (Bhupathi, Ramakrishna) Date: Fri, 22 Nov 2019 14:55:48 +0000 Subject: [Kolla] Configure docker service fails during bootstrap_servers Message-ID: Folks, Looking for help on this. When I am attempting to install OpenStack Kolla (development mode) I am running into this error when running bootstrap_servers ./kolla-ansible -i ../../multinode bootstrap-servers TASK [baremetal : Configure docker service] **************************************************************************************************** fatal: [localhost]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_eth0'"} PLAY RECAP ************************************************************************************************************************************* localhost : ok=29 changed=3 unreachable=0 failed=1 skipped=16 rescued=0 ignored=0 Command failed ansible-playbook -i all-in-one -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e kolla_action= bootstrap-servers /home/ubuntu/kolla-env/share/kolla-ansible/ansible/kolla-host.yml Can someone tell me what is going wrong here? --RamaK E-MAIL CONFIDENTIALITY NOTICE: The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Fri Nov 22 14:56:35 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 22 Nov 2019 08:56:35 -0600 Subject: [nova] Why don't we have a SNAPSHOTTING status? Message-ID: This came up in IRC today [1]. There are two issues: 1. As far as I can see, when creating a snapshot of a volume-backed server the API does not change the instance task_state so technically the user could perform some other action on the server while we're creating the volume snapshot(s), like try to resize it. My guess is that would fail since we likely can't be doing things like creating and deleting volume attachments on a volume that is actively being snapshot. 2. For image-backed servers, we set the progress the task_states through a few states [2] but those aren't conveyed in the overall server "status" field in the API [3]. Does anyone have any historical context on either of those issues? Issue #1 reminds me that we don't have a task_state transition while confirming a resized server either [4] but I guess that's less of an issue because there are not a lot of things you can do to a server in VERIFY_RESIZE status (delete and revert the resize I think), but that does mean I could fire off separate confirmResize and then revertResize actions on the same server at the same time and the confirm will probably fail down in compute rather than in the API with a 409 error. [1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-11-22.log.html#t2019-11-22T14:35:36 [2] https://github.com/openstack/nova/blob/991d675675c1c6bb87a2b9d19327e2b4473f6c0b/nova/compute/task_states.py#L34 [3] https://github.com/openstack/nova/blob/991d675675c1c6bb87a2b9d19327e2b4473f6c0b/nova/api/openstack/common.py#L48 [4] https://github.com/openstack/nova/blob/991d675675c1c6bb87a2b9d19327e2b4473f6c0b/nova/api/openstack/common.py#L84 -- Thanks, Matt From haleyb.dev at gmail.com Fri Nov 22 15:18:30 2019 From: haleyb.dev at gmail.com (Brian Haley) Date: Fri, 22 Nov 2019 10:18:30 -0500 Subject: [neutron] [all] Networking-ovn and neutron convergence Message-ID: <17ec435e-4132-22cc-1c9a-8b37766ac15f@gmail.com> Hi, For some time we have been discussing in the Neutron community the possibility of including the networking-ovn driver [1] as one of the in-tree Neutron drivers. There is already a spec [2] describing in detail why we want to do this and why we think it is good idea. We also discussed this during the PTG in Shanghai within our team [3], and had a discussion at the ops-meetup as well [4]. The OVN backend is free of many well-known issues which are impacting the existing ML2/OVS reference implementation today with the OVS agent. For example, OVN provides: * Control plane performance optimizations by not using rabbitmq; * DVR (and HA) by default, based on OpenFlow so it can be easy offloaded e.g. by SmartNICs; * Distributed DHCP; There are some feature parity gaps when comparing it to ML2/OVS that we plan to address. See [2] for details. We think that merging this code into the neutron repository will help to grow the networking-ovn community and will help us to keep a healthy Neutron team as well by increasing the number or contributors. Our current plan is to start merging code from networking-ovn into the neutron repository as soon as possible in the Ussuri cycle. But we also wanted to get any additional opinions from the wider community about this plan. What do users and operators of Neutron think about this? [1] https://opendev.org/openstack/networking-ovn [2] https://review.opendev.org/#/c/658414/ [3] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored Lines 252 - 295 [4] https://etherpad.openstack.org/p/shanghai-ptg-ops-meetup Lines 62 - 69 Thanks for any feedback, -Brian (and Slawek) From dms at danplanet.com Fri Nov 22 15:32:33 2019 From: dms at danplanet.com (Dan Smith) Date: Fri, 22 Nov 2019 07:32:33 -0800 Subject: [dev] Upgrading flake8 to support f-strings In-Reply-To: (Stephen Finucane's message of "Fri, 22 Nov 2019 13:32:56 +0000") References: <20191121172528.ihpz4fm3a3prlkug@yuggoth.org> <81be345eb7e64fb731b72a9344c47ac50dbae5be.camel@redhat.com> <20191121181557.apixsfva7vbufgc3@yuggoth.org> Message-ID: >> I would so very much love if we did NOT do that. Precommit hooks are >> super annoying for writing up quick PoCs and DNM patches, which we do a >> lot. > > $ git commit -n > > U iz wlcm. Yeah, thanks, I'm aware. I said "super annoying" not "unworkaroundable". I stand by my characterization of them and don't think that they're a good way to enforce or even report on things like this. --Dan From thierry at openstack.org Fri Nov 22 15:48:41 2019 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 22 Nov 2019 16:48:41 +0100 Subject: [resource-management-sig] Status of the "Resource Management" SIG In-Reply-To: References: <5a84404f-0e10-9010-61ed-29aff08b5ec6@openstack.org> Message-ID: Zhipeng Huang wrote: > Intend to launch the activity during Shanghai PTG via Cyborg sessions, > for those you are interested in topics like Kubernetes, OCP OAI, RISC-V > you are more than welcomed to reach out to me. Was that effort successful? Should we keep the resource management SIG, with status:forming and removing the other chairs? -- Thierry From radoslaw.piliszek at gmail.com Fri Nov 22 15:53:26 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Fri, 22 Nov 2019 16:53:26 +0100 Subject: [Kolla] Configure docker service fails during bootstrap_servers In-Reply-To: References: Message-ID: You are trying to deploy to localhost which does not have interface named eth0. Please configure globals.yml or inventory and retry. -yoctozepto pt., 22 lis 2019 o 16:09 Bhupathi, Ramakrishna napisał(a): > > Folks, > > Looking for help on this. When I am attempting to install OpenStack Kolla (development mode) I am running into this error when running bootstrap_servers > > > > ./kolla-ansible -i ../../multinode bootstrap-servers > > > > TASK [baremetal : Configure docker service] **************************************************************************************************** > > fatal: [localhost]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_eth0'"} > > > > PLAY RECAP ************************************************************************************************************************************* > > localhost : ok=29 changed=3 unreachable=0 failed=1 skipped=16 rescued=0 ignored=0 > > > > Command failed ansible-playbook -i all-in-one -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e kolla_action= bootstrap-servers /home/ubuntu/kolla-env/share/kolla-ansible/ansible/kolla-host.yml > > > > Can someone tell me what is going wrong here? > > > > --RamaK > > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. From Arkady.Kanevsky at dell.com Fri Nov 22 16:06:35 2019 From: Arkady.Kanevsky at dell.com (Arkady.Kanevsky at dell.com) Date: Fri, 22 Nov 2019 16:06:35 +0000 Subject: [meta-sig][multi-arch] propose forming a Multi-arch SIG In-Reply-To: <20191122040518.GA10388@thor.bakeyournoodle.com> References: <20191122040518.GA10388@thor.bakeyournoodle.com> Message-ID: <4bf2c65c956f46e58cbef5528b46b3ef@AUSX13MPS308.AMER.DELL.COM> Sounds interesting.. But not clear what is the goal for that SIG? Since OpenStack claims to have support for different HW. Are there any additional reqs? Clearly CI ones. Thanks, Arkady -----Original Message----- From: Tony Breeds Sent: Thursday, November 21, 2019 10:05 PM To: Rico Lin Cc: OpenStack Discuss Subject: Re: [meta-sig][multi-arch] propose forming a Multi-arch SIG On Wed, Nov 20, 2019 at 06:03:03PM +0800, Rico Lin wrote: > Dear all > In summit, there's a forum for ARM support [1] in Summit which many > people show they're interested in ARM support for OpenStack. > And since also we have Linaro shows interest in donating servers to > OpenStack infra. It's time for community to think about what we should > deal with those ARM servers once we have them in community infrastructure. > > One thing we should do as a community is to gather people for this topic. > So I propose we create a Multi-arch SIG and aim to support ARM > architecture as very first step. > I had the idea to call it ARM SIG before, but since there might be > high overlap knowledge between support ARM 64 and other architectures. > I propose we go for Multi-arch instead. > > This SIG will be a nice place to collect all the documents, gate jobs, > and to trace tasks. > > If you're also interested in that group, please reply to this email, > introduce yourself and tell us what you would like the group scope and > objectives to be, and what you can contribute to the group. Pick me Pick me :) I've been around OpenStack for about 5 years now, the last couple have been focused on brining multi-arch support (albeit ppc64le) into tripleo building on the enablement work that others have done. I'm keen to work with the SIG, to build out the ARM support and at the same time ensure we don't make it hard for other architectures to do the same Yours Tony. From openstack at nemebean.com Fri Nov 22 16:07:41 2019 From: openstack at nemebean.com (Ben Nemec) Date: Fri, 22 Nov 2019 10:07:41 -0600 Subject: [oslo] Reminder: Virtual PTG on Monday the 25th Message-ID: <27281eb7-4275-b6d2-b36c-c9115fbea90b@nemebean.com> Time: 1500-1700 UTC (starts at the regular meeting time) Etherpad: https://etherpad.openstack.org/p/oslo-shanghai-topics Location: https://meet.jit.si/oslo-ptg (note that I've had trouble using Firefox with Jitsi, Chrome seems to work fine) I look forward to seeing everyone! -Ben From balazs.gibizer at est.tech Fri Nov 22 16:14:23 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 22 Nov 2019 16:14:23 +0000 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: <1574421416.31688.4@est.tech> References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> Message-ID: <1574439261.31688.6@est.tech> On Fri, Nov 22, 2019 at 11:16, Balázs Gibizer wrote: > > > On Thu, Nov 21, 2019 at 17:28, Eric Fried wrote: >> Action Summary >> ============== >> If the above sounds reasonable, it would entail the following >> actions: >> - Neutron(/Cyborg?): backportable patch to >> s/CONF.host/socket.gethostname()/ >> - Nova: GET /os-hypervisors*?service_host=X in a new microversion. > > I guess that will be me, starting with a spec that outlines the new > API. Opened a blueprint[1], pushed up a small spec[2], and a WIP implementation[3] for the new API. Cheers, gibi [1] https://blueprints.launchpad.net/nova/+spec/filter-hypervisors-by-service-host [2] https://review.opendev.org/#/c/695716 [3] https://review.opendev.org/#/c/695708 From mnaser at vexxhost.com Fri Nov 22 16:17:48 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 22 Nov 2019 11:17:48 -0500 Subject: [neutron] [all] Networking-ovn and neutron convergence In-Reply-To: <17ec435e-4132-22cc-1c9a-8b37766ac15f@gmail.com> References: <17ec435e-4132-22cc-1c9a-8b37766ac15f@gmail.com> Message-ID: On Fri, Nov 22, 2019 at 10:22 AM Brian Haley wrote: > > Hi, > > For some time we have been discussing in the Neutron community the > possibility of including the networking-ovn driver [1] as one of the > in-tree Neutron drivers. There is already a spec [2] describing in > detail why we want to do this and why we think it is good idea. We also > discussed this during the PTG in Shanghai within our team [3], and had a > discussion at the ops-meetup as well [4]. > > The OVN backend is free of many well-known issues which are impacting > the existing ML2/OVS reference implementation today with the OVS agent. > For example, OVN provides: > > * Control plane performance optimizations by not using rabbitmq; > * DVR (and HA) by default, based on OpenFlow so it can be easy offloaded > e.g. by SmartNICs; > * Distributed DHCP; > > There are some feature parity gaps when comparing it to ML2/OVS that we > plan to address. See [2] for details. > > We think that merging this code into the neutron repository will help to > grow the networking-ovn community and will help us to keep a healthy > Neutron team as well by increasing the number or contributors. > > Our current plan is to start merging code from networking-ovn into the > neutron repository as soon as possible in the Ussuri cycle. But we also > wanted to get any additional opinions from the wider community about > this plan. What do users and operators of Neutron think about this? I'm very much supportive of something like this. The most important thing is figuring out a proper migration path for those that are sitting on ML2/OVS, the last time I had a look at this, it wasn't very straightforward AFAIK. > [1] https://opendev.org/openstack/networking-ovn > [2] https://review.opendev.org/#/c/658414/ > [3] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored > Lines 252 - 295 > [4] https://etherpad.openstack.org/p/shanghai-ptg-ops-meetup > Lines 62 - 69 > > Thanks for any feedback, > > -Brian (and Slawek) > From jeremyfreudberg at gmail.com Fri Nov 22 16:37:13 2019 From: jeremyfreudberg at gmail.com (Jeremy Freudberg) Date: Fri, 22 Nov 2019 11:37:13 -0500 Subject: [meta-sig][multi-arch] propose forming a Multi-arch SIG In-Reply-To: <4bf2c65c956f46e58cbef5528b46b3ef@AUSX13MPS308.AMER.DELL.COM> References: <20191122040518.GA10388@thor.bakeyournoodle.com> <4bf2c65c956f46e58cbef5528b46b3ef@AUSX13MPS308.AMER.DELL.COM> Message-ID: CI building stuff (packages, containers, images) fixing deployment tooling -- some tools might barf if you have a mix of archs a bit of evangelism probably more... On Fri, Nov 22, 2019 at 11:16 AM wrote: > > Sounds interesting.. > But not clear what is the goal for that SIG? > Since OpenStack claims to have support for different HW. > > Are there any additional reqs? Clearly CI ones. > > Thanks, > Arkady > > -----Original Message----- > From: Tony Breeds > Sent: Thursday, November 21, 2019 10:05 PM > To: Rico Lin > Cc: OpenStack Discuss > Subject: Re: [meta-sig][multi-arch] propose forming a Multi-arch SIG > > On Wed, Nov 20, 2019 at 06:03:03PM +0800, Rico Lin wrote: > > Dear all > > In summit, there's a forum for ARM support [1] in Summit which many > > people show they're interested in ARM support for OpenStack. > > And since also we have Linaro shows interest in donating servers to > > OpenStack infra. It's time for community to think about what we should > > deal with those ARM servers once we have them in community infrastructure. > > > > One thing we should do as a community is to gather people for this topic. > > So I propose we create a Multi-arch SIG and aim to support ARM > > architecture as very first step. > > I had the idea to call it ARM SIG before, but since there might be > > high overlap knowledge between support ARM 64 and other architectures. > > I propose we go for Multi-arch instead. > > > > This SIG will be a nice place to collect all the documents, gate jobs, > > and to trace tasks. > > > > If you're also interested in that group, please reply to this email, > > introduce yourself and tell us what you would like the group scope and > > objectives to be, and what you can contribute to the group. > > Pick me Pick me :) > > I've been around OpenStack for about 5 years now, the last couple have been focused on brining multi-arch support (albeit ppc64le) into tripleo building on the enablement work that others have done. > > I'm keen to work with the SIG, to build out the ARM support and at the same time ensure we don't make it hard for other architectures to do the same > > Yours Tony. From thomas.morin at orange.com Fri Nov 22 16:45:16 2019 From: thomas.morin at orange.com (thomas.morin at orange.com) Date: Fri, 22 Nov 2019 17:45:16 +0100 Subject: [all][neutron][networking-bagpipe][networking-bgpvpn] Maintainers needed In-Reply-To: <20191119102918.b5cmfecqjf746bqi@skaplons-mac> References: <20191119102918.b5cmfecqjf746bqi@skaplons-mac> Message-ID: <6317_1574441117_5DD8109D_6317_149_25_4d84cae8-c759-fc4b-d39c-9c999759830d@orange.com> Hi folks, About networking-bgpvpn project and the reference implementation in networking-bagpipe: while it's very true that contributors involvment has been lower in the past year with no feature added, the projects are I believe sane enough. With some help of neutron cores and other contributors (actually, new contributors), the project gate has been quite consistenly fixed to follow changes in "external factors". My plan as a contributor is to keep having some time for networking-bgpvpn and networking-bagpipe to ensure they're sane enough to release in Usuri with the current feature set, or more if proposals arise and there is energy to implement. And I welcome the idea of having Lajos Katona join on the project! (Thanks!) Best, -Thomas Slawek Kaplonski : > Hi, > > Over the past couple of cycles we have noticed that new contributions and > maintenance efforts for networking-bagpipe and networking-bgpvpn were > almost non existent. > This impacts patches for bug fixes, new features and reviews. The Neutron > core team is trying to at least keep the CI of this project healthy, but we > don’t have enough knowledge about the details of the > code base to review more complex patches. > > During the PTG in Shanghai we discussed that with operators and TC members > during the forum session [1] and later within the Neutron team during the > PTG session [2]. > > During these discussions, with the help of operators and TC members, we > reached the conclusion that we need to have someone responsible for > maintaining those projects. This doesn’t mean that the > maintainer needs to spend full time working on those projects. Rather, we > need someone to be the contact person for the project, who takes care of > the project’s CI and review patches. Of course that’s only a minimal > requirement. If the new maintainer works on new features for the project, > it’s even better :) > > If we don’t have any new maintainer(s) before milestone Ussuri-2, which is > Feb 10 - Feb 14 according to [3], we will need to mark networking-bgpvpn and > networking-bagpipe as deprecated and in “V” cycle we will propose to move the > projects from the Neutron stadium, hosted in the “openstack/“ namespace, to the > unofficial projects hosted in the “x/“ namespace. > > So if You are using this project now, or if You have customers who are > using it, please consider the possibility of maintaining it. Otherwise, > please be aware that it is highly possible that the project will be > deprecated and moved out from the official OpenStack projects. > > [1] > https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward > [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored - > Lines 379-421 > [3] https://releases.openstack.org/ussuri/schedule.html > _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. From mark at stackhpc.com Fri Nov 22 16:47:32 2019 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 22 Nov 2019 16:47:32 +0000 Subject: [kolla] PTG summary Message-ID: Hi, A couple of weeks ago we had our partially virtual PTG. The full notes are in Etherpad [1], but I will try to summarise some of the discussions here. We had a good turnout, and good engagement in the discussions. # Priorities I'll start at the end, as this is arguably the most tangible outcome of the discussions. At the end of the sessions, we gathered all potential work items for Ussuri, and voted on them in another Etherpad [2]. Afterwards these were cleaned up, ordered, and assigned blueprints. The priorities were transferred to the whiteboard [3] for tracking. Of course other things will come along, and some things won't get done, but these priorities should be used as a guide for anyone reviewing or developing patches during the Ussuri cycle # General topics Sustainability is a key theme in kolla recently. There are two main aspects to this - keeping the community healthy, and ensuring the scope of the project is maintainable. We discussed having some additional community meetings on a regular basis, as a way to keep in touch with operators and to build up more of a community outside of IRC. I proposed having a virtual onboarding session which a number of people expressed an interest in attending. Pulling this session out of the summit should allow for a more level playing field. ## Ceph Ansible We are continuing to investigate Ceph Ansible as an alternative to our native Ceph deployment. The work to migrate from an existing kolla deployment is still ongoing. There are some potential blockers in the form of no Ubuntu container image support in ceph-ansible, and no ARM container images published by the ceph-container project. ## Kolla-cli This project has been restored in the Train cycle, and we continue to keep an eye on its progress. ## Extensibility We discussed making kolla and kolla-ansible more extensible as a way to avoid the need to support every service under the sun. This should be simple in kolla, but kolla-ansible will require more effort to define how custom playbooks or roles are hooked in. ## Inverting kolla images There was a proposal to use more upstream docker images, and potentially add our tooling on top where necessary. This is an interesting idea, but could add complexity with more image distros to track. ## Cloud native logging The cloud-native folk seem to have agreed on capturing container logs from stdout, which doesn't align with our file-based model. We also miss some logs during startup that don't get logged to files which could be interesting. We agreed to try ingesting these into fluentd as a starting point. ## Tracking bug fixes in release notes We agreed to start experimenting with tracking bug fixes in release notes from the Ussuri release. Previously we have just tracked features, deprecations and upgrade notes. This will raise the bar for contribution slightly, so we will keep an eye on it. # Kolla ## CentOS 8 Supporting CentOS 8 base container images is key for us this cycle, as it allows us to move to python 3 based CentOS images. These images may start to break at any moment as services drop python 2 support. The Ussuri release will not support CentOS 7 base container images. We are currently blocked by a number of missing yum repositories for CentOS 8. Supporting running kolla-build on a CentOS 8 host is a simpler task, although getting Docker installed requires a few contortions at this point. ## Drop python 2 This depends on CentOS 8 host and container support. We are heavily at risk of being broken as other projects drop python 2 support. We plan to keep CentOS CI happy for as long as possible but expect it may break at some point during the cycle. ## Zuul proposal bot We talked about adding a zuul proposal bot to update source package versions on stable branches (running tools/version-check.py). I looked into it and put forward a PoC, but we are going to try switching to a YAML format definition before proceeding with this. ## Remove EPEL Everyone likes to hate EPEL, so we will try to remove it. We also discussed an off by default approach for our custom package repositories to provide some damage limitation if they go AWOL. ## Support matrix We made a good start with the support matrix [4] this cycle. We'd like to continue this effort, and continue to evaluate which images we support. The next step seems to be to better define our categories of images, and use these to define voting vs. non-voting build failures. We'd like to find community owners for some of our 'community maintained' images. ## RabbitMQ upgrade RabbitMQ 3.8 brings a prometheus exporter, which a number of people have expressed an interest in. This will require an erlang upgrade. ## Prometheus 2 There is no migration path between prometheus 1 and 2. We discussed a few options for a smooth transition, and I think we landed on this: * keep old prometheus container around, configure it as a remote read source * configure haproxy to flip to the new prometheus when ready # Kolla Ansible ## CentOS 8 This is where it gets interesting. How do we migrate a running system from CentOS 7 to 8? Ideally we would not couple this to an OpenStack upgrade, so at least one release needs to support both CentOS7 and 8 hosts. I will follow up on this topic separately as it's a big one, and I'd like to try a cross-project approach. ## Drop python 2 The main interesting decision here is: don't drop py2 for remote hosts in Ussuri, until we are sure that Ussuri will only need to run against CentOS 8 hosts (see above). ## OVN support Neutron seems to be making moves to deprecate OVS and LinuxBridge ML2 drivers, replacing them with OVN in tree. We have OVN images, but no deployment support in kolla ansible. We'd like to add it this cycle. Interesting questions around migration from OVS to OVN came up. Tripleo has some tooling which might help here. ## More host-level commands (day 2 ops) We have the bootstrap-servers command for bootstrapping hosts, but lack some commands for ongoing operations. Common examples include: * reconfiguring or upgrading docker in a safe manner (without live-restore, a docker restart takes down your containers). * adding new hosts. This requires updating /etc/hosts everywhere, but running bootstrap-servers again is heavy handed and risks a docker restart. Containers don't automatically pick up changes to /etc/hosts, so we need to address that. * pruning docker images ## Restarting services There was a request for a command to restart services. It could probably be cobbled together from existing code quite easily. ## More destruction It should be possible to run the destroy command against a subset of services. We could also do more to thoroughly clean up. ## More security friendliness (especially transport security) * Could we integrate with letsencrypt? Possibly. * Should we default to use TLS with self-signed certs? Probably, but expiry could cause some surprises without explicit buy-in from the operator. * Can we use per-host RabbitMQ usernames and passwords? Potentially... ## SELinux Comes up often, but never gets voted for in our priorities. It would be nice to get this one sorted though. ## Fluentd reconfiguration Currently it's not possible to deploy the common services (cron, kolla-toolbox, fluentd) without also deploying another service. There are a few fiddly details, but it should be possible to resolve. ## Ansible lint We agreed to try running ansible-lint on our codebase. The group has had mixed results with it before, but was open to trying again. ## Ansible maximum version pinning We agreed to define a maximum version of Ansible that we support. This will help to prevent breakage out of our control. ## Nova cells v2 Work continues this cycle on cells with support for shared cell controllers, and deployment of multiple RabbitMQ and MariaDB clusters. ## Config file audit We should use the oslo config validator [5] to ensure our config is valid. ## Podman This one keeps coming up, but we never agree to implement it. Possible issues include a lack of a full-featured Python library, and lack of a supported package for Debian/Ubuntu. We agreed to start thinking about how we might perform a migration from Docker one day, given the direction of Red Hat. [1] https://etherpad.openstack.org/p/kolla-ussuri-ptg [2] https://etherpad.openstack.org/p/kolla-ussuri-priorities [3] https://etherpad.openstack.org/p/KollaWhiteBoard [4] https://docs.openstack.org/kolla/latest/support_matrix.html [5] https://docs.openstack.org/oslo.config/latest/cli/validator.html Well that turned into more of an exhaustive list than I'd expected. Well done for reading (or scrolling) to the end. Cheers, Mark From mihalis68 at gmail.com Fri Nov 22 16:51:12 2019 From: mihalis68 at gmail.com (Chris Morgan) Date: Fri, 22 Nov 2019 11:51:12 -0500 Subject: [ops] topics for London ops meetup jan 202 Message-ID: https://twitter.com/osopsmeetup/status/1197920312555429889?s=20 -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.morin at orange.com Fri Nov 22 16:54:05 2019 From: thomas.morin at orange.com (thomas.morin at orange.com) Date: Fri, 22 Nov 2019 17:54:05 +0100 Subject: [neutron] [all] Networking-ovn and neutron convergence In-Reply-To: <17ec435e-4132-22cc-1c9a-8b37766ac15f@gmail.com> References: <17ec435e-4132-22cc-1c9a-8b37766ac15f@gmail.com> Message-ID: <20077_1574441646_5DD812AE_20077_319_13_ae59cd49-1c4b-529c-5cef-33ff24136b72@orange.com> Hi Brian, all, Perhaps the area to make more explicit is where the difference may lie between having networking-ovn "as one of the in-tree Neutron drivers" and "the ML2+OVS+DVR solution will be merged with the networking-ovn solution" [1]. More specifically, my question would be about whether or not this proposal includes removing ML2 code, and if yes which parts, and in particular whether the l2 agent extension integration point would be preserved.  Not preserving them would be a problem for projects such as networking-bagpipe, networking-bgpvpn and networking-sfc to exist for Ussuri release. These three projects have a different level of liveliness, but I would find problematic a choice that would prevent keeping networking-bgpvpn and it's reference implementation in networking-bagpipe, to be released for Ussuri.  Having OVN include BGPVPN functionality is something that had been discussed in the past, and the idea is still valid in my views, but it's unlikely that such a thing could land in Ussuri timeframe. So I would be glad if this specific point of preserving integration points that ML2 offers, can be clarified. Best, -Thomas [1] https://review.opendev.org/#/c/658414/18/specs/ussuri/ml2ovs-ovn-convergence.rst at 52 Brian Haley : > Hi, > > For some time we have been discussing in the Neutron community the > possibility of including the networking-ovn driver [1] as one of the > in-tree Neutron drivers.  There is already a spec [2] describing in > detail why we want to do this and why we think it is good idea.  We > also discussed this during the PTG in Shanghai within our team [3], > and had a discussion at the ops-meetup as well [4]. > > The OVN backend is free of many well-known issues which are impacting > the existing ML2/OVS reference implementation today with the OVS > agent. For example, OVN provides: > > * Control plane performance optimizations by not using rabbitmq; > * DVR (and HA) by default, based on OpenFlow so it can be easy offloaded >   e.g. by SmartNICs; > * Distributed DHCP; > > There are some feature parity gaps when comparing it to ML2/OVS that > we plan to address.  See [2] for details. > > We think that merging this code into the neutron repository will help > to grow the networking-ovn community and will help us to keep a > healthy Neutron team as well by increasing the number or contributors. > > Our current plan is to start merging code from networking-ovn into the > neutron repository as soon as possible in the Ussuri cycle. But we > also wanted to get any additional opinions from the wider community > about this plan.  What do users and operators of Neutron think about > this? > > [1] https://opendev.org/openstack/networking-ovn > [2] https://review.opendev.org/#/c/658414/ > [3] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored >     Lines 252 - 295 > [4] https://etherpad.openstack.org/p/shanghai-ptg-ops-meetup >     Lines 62 - 69 > > Thanks for any feedback, > > -Brian (and Slawek) > _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. From smooney at redhat.com Fri Nov 22 16:58:35 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 22 Nov 2019 16:58:35 +0000 Subject: [neutron] [all] Networking-ovn and neutron convergence In-Reply-To: References: <17ec435e-4132-22cc-1c9a-8b37766ac15f@gmail.com> Message-ID: <04f2b4c395286879283e202e842aa7a575e6803d.camel@redhat.com> On Fri, 2019-11-22 at 11:17 -0500, Mohammed Naser wrote: > On Fri, Nov 22, 2019 at 10:22 AM Brian Haley wrote: > > > > Hi, > > > > For some time we have been discussing in the Neutron community the > > possibility of including the networking-ovn driver [1] as one of the > > in-tree Neutron drivers. There is already a spec [2] describing in > > detail why we want to do this and why we think it is good idea. We also > > discussed this during the PTG in Shanghai within our team [3], and had a > > discussion at the ops-meetup as well [4]. > > > > The OVN backend is free of many well-known issues which are impacting > > the existing ML2/OVS reference implementation today with the OVS agent. > > For example, OVN provides: > > > > * Control plane performance optimizations by not using rabbitmq; > > * DVR (and HA) by default, based on OpenFlow so it can be easy offloaded > > e.g. by SmartNICs; > > * Distributed DHCP; > > > > There are some feature parity gaps when comparing it to ML2/OVS that we > > plan to address. See [2] for details. > > > > We think that merging this code into the neutron repository will help to > > grow the networking-ovn community and will help us to keep a healthy > > Neutron team as well by increasing the number or contributors. > > > > Our current plan is to start merging code from networking-ovn into the > > neutron repository as soon as possible in the Ussuri cycle. But we also > > wanted to get any additional opinions from the wider community about > > this plan. What do users and operators of Neutron think about this? > > I'm very much supportive of something like this. The most important thing > is figuring out a proper migration path for those that are sitting on ML2/OVS, > the last time I had a look at this, it wasn't very straightforward AFAIK. you should in theory be able to live migration now which did not work before although when i tried live migrating between ml2/ovs and ml2/linux bridge i found issues so we would need to test ml2/ovs and ml2/ovn and fix any issue we find. the main issue i see is the ovn use geneve as it default segmentaiton type where as ml2/ovs mainly uses vlan,vxlan and to a lesser degre gre. so without adding support for vxlan and gre network types to ovn you would have to change the segmentation type of the existing networks in the db which is an action not supported via the api. i know there is vxaln support ofr external vtep gateway in ovn but its not supproted for tenant networks as far as i know. the other complicaton is that different ml2 drivers do not form mesh networks between each other so even if you used vxlan or genve on both ml2/ovs and ml2/ovn it wont give network connectivity. i know there are migration sripts that kind of allow you to do this as an offline migraton between backends but i dont think there is a way to do this in a rolling fashion so effectivly you need to swap your entire cloud in one go. that is less then ideal even if you can live with the feature gaps between ml2/ovs and ml2/ovn today. with all that said ti woudl be nice to see progress in closing those gaps i have also noted a number of other feature gaps in comments in the spec but i fell like its still proably incomplete. > > > [1] https://opendev.org/openstack/networking-ovn > > [2] https://review.opendev.org/#/c/658414/ > > [3] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored > > Lines 252 - 295 > > [4] https://etherpad.openstack.org/p/shanghai-ptg-ops-meetup > > Lines 62 - 69 > > > > Thanks for any feedback, > > > > -Brian (and Slawek) > > > > From mriedemos at gmail.com Fri Nov 22 17:15:46 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 22 Nov 2019 11:15:46 -0600 Subject: [nova] unshelve image_ref bugs In-Reply-To: <20191122140519.zldtzhyp7ptdsm2h@corp.ovh.com> References: <20191122140519.zldtzhyp7ptdsm2h@corp.ovh.com> Message-ID: <43ee8fa3-0f2b-a66c-3d84-f1ced282eb87@gmail.com> On 11/22/2019 8:05 AM, Alexandre Arents wrote: > C) Change create_image()/imagebackend driver behavior, > to create a flatten qcow2 file in case of unshelving. > flattening disk may be a solution because there will be no more "orphan backing file". > (Basicly doing like "flat" backend driver except we need to stay in qcow2 instead of RAW) > PROS: > -we keep orignal unshelve behavior/assumption > CONS: > -It means that in your infra configured in COW some instances will be in "qcow2 flat", > Flat qcow2 instance works great (livemigration/resize..). Would all installation ok with that ? > Ok it seems a little odd to ask COW driver to not do COW in some case. Alternatilevy we can > force using flat driver if unshelving, but we need to change flat driver to support also qcow2. > D) During spawn() if unshelving we convert "qcow2 disk with backing file" to a "flatten qcow2 disk", > just after self._create_image(). > It looks more like a workaround than a long term solution as it need to convert something created before, > that do not meet the need(better to do C). Does this already fix your problem? https://review.opendev.org/#/q/If3c9d1de3ce0fe394405bd1e1f0fa08ce2baeda8 -- Thanks, Matt From pabelanger at redhat.com Fri Nov 22 17:21:08 2019 From: pabelanger at redhat.com (Paul Belanger) Date: Fri, 22 Nov 2019 12:21:08 -0500 Subject: [ops] Running VMware atop of OpenStack (eg: esxi) Message-ID: <20191122172108.GA263619@localhost.localdomain> Greetings, I wanted to ask if anybody in the community is running VMware a top of OpenStack in any capacity? In Ansible we are doing a POC as part of our testing platform and running into some common ops issue. For example, what do people usually do for things like using config-drive? Is there any specific tooling you us to customize esxi images to run atop OpenStack. Bascially, looking for humans to bounce questions off and see who else is doing it. Thanks! Paul From pabelanger at redhat.com Fri Nov 22 17:35:42 2019 From: pabelanger at redhat.com (Paul Belanger) Date: Fri, 22 Nov 2019 12:35:42 -0500 Subject: [ops] Running VMware atop of OpenStack (eg: esxi) In-Reply-To: <20191122172108.GA263619@localhost.localdomain> References: <20191122172108.GA263619@localhost.localdomain> Message-ID: <20191122173542.GA302307@localhost.localdomain> On Fri, Nov 22, 2019 at 12:21:08PM -0500, Paul Belanger wrote: > Greetings, > > I wanted to ask if anybody in the community is running VMware a top of > OpenStack in any capacity? In Ansible we are doing a POC as part of our > testing platform and running into some common ops issue. For example, > what do people usually do for things like using config-drive? Is there > any specific tooling you us to customize esxi images to run atop > OpenStack. > > Bascially, looking for humans to bounce questions off and see who else > is doing it. > To avoid confusion, in this context it would be booting esxi a top of kvm (via openstack) not running esxi as its own hypervisor alongside kvm. Thanks! Paul From raubvogel at gmail.com Fri Nov 22 17:52:17 2019 From: raubvogel at gmail.com (Mauricio Tavares) Date: Fri, 22 Nov 2019 12:52:17 -0500 Subject: [ops] Running VMware atop of OpenStack (eg: esxi) In-Reply-To: <20191122173542.GA302307@localhost.localdomain> References: <20191122172108.GA263619@localhost.localdomain> <20191122173542.GA302307@localhost.localdomain> Message-ID: On Fri, Nov 22, 2019 at 12:39 PM Paul Belanger wrote: > > On Fri, Nov 22, 2019 at 12:21:08PM -0500, Paul Belanger wrote: > > Greetings, > > > > I wanted to ask if anybody in the community is running VMware a top of > > OpenStack in any capacity? In Ansible we are doing a POC as part of our > > testing platform and running into some common ops issue. For example, > > what do people usually do for things like using config-drive? Is there > > any specific tooling you us to customize esxi images to run atop > > OpenStack. > > > > Bascially, looking for humans to bounce questions off and see who else > > is doing it. > > > To avoid confusion, in this context it would be booting esxi a top of > kvm (via openstack) not running esxi as its own hypervisor alongside > kvm. > I've never done it through openstack but I have ran esxi on the top of kvm many times. You have to ensure kvm is passing the hypervisor thing (vt-x/whatever) to the guest. I think https://libvirt.org/formatdomain.html#elementsFeatures has th eoptiong you need to turn on. > Thanks! > Paul > > From mgagne at calavera.ca Fri Nov 22 17:54:50 2019 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Fri, 22 Nov 2019 12:54:50 -0500 Subject: [ops] Running VMware atop of OpenStack (eg: esxi) In-Reply-To: <20191122172108.GA263619@localhost.localdomain> References: <20191122172108.GA263619@localhost.localdomain> Message-ID: Hi, On Fri, Nov 22, 2019 at 12:21 PM Paul Belanger wrote: > > Greetings, > > I wanted to ask if anybody in the community is running VMware a top of > OpenStack in any capacity? In Ansible we are doing a POC as part of our > testing platform and running into some common ops issue. For example, > what do people usually do for things like using config-drive? Is there > any specific tooling you us to customize esxi images to run atop > OpenStack. > > Bascially, looking for humans to bounce questions off and see who else > is doing it. > We don't plan on running ESXi on top of KVM but alongside KVM on a baremetal. So I won't be able to address that specific use case. However I can share with you what we are planning to do with config-drive support. (we are actively working on it as I'm typing) At first, we tried to package Glean for VMware and install that package when we built the image. The issue with that is that the package isn't signed. This can affect support and prevent the overall system from being updated due to the presence of an unsigned package. You would need some kind of partnership with VMware to get it signed. We therefore switched to a firstboot script which can be configured in the kickstart when building the image: https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.esxi.upgrade.doc/GUID-61A14EBB-5CF3-43EE-87EF-DB8EC6D83698.html (see %firstboot section) You can have multiple %firstboot section both using busybox or python as an interpreter. We have multiple sections performing various steps: 1) Find the config-drive partition. Since it's baremetal, config-drive is a primary partition at the end of the disk. 2) Load the iso9660 module 3) Mount the config-drive 4) A whole Python section which parses config-drive and perform hostname/network/password/publickeys configuration. 5) General cleanup: unmount config-drive, unload iso9660 module Now since it's a firstboot script, VMware no longer complains about unsigned packages. I hope this help. -- Mathieu From pabelanger at redhat.com Fri Nov 22 18:03:37 2019 From: pabelanger at redhat.com (Paul Belanger) Date: Fri, 22 Nov 2019 13:03:37 -0500 Subject: [ops] Running VMware atop of OpenStack (eg: esxi) In-Reply-To: References: <20191122172108.GA263619@localhost.localdomain> Message-ID: <20191122180337.GA304390@localhost.localdomain> On Fri, Nov 22, 2019 at 12:54:50PM -0500, Mathieu Gagné wrote: > Hi, > > On Fri, Nov 22, 2019 at 12:21 PM Paul Belanger wrote: > > > > Greetings, > > > > I wanted to ask if anybody in the community is running VMware a top of > > OpenStack in any capacity? In Ansible we are doing a POC as part of our > > testing platform and running into some common ops issue. For example, > > what do people usually do for things like using config-drive? Is there > > any specific tooling you us to customize esxi images to run atop > > OpenStack. > > > > Bascially, looking for humans to bounce questions off and see who else > > is doing it. > > > > We don't plan on running ESXi on top of KVM but alongside KVM on a > baremetal. So I won't be able to address that specific use case. > > However I can share with you what we are planning to do with > config-drive support. (we are actively working on it as I'm typing) > > At first, we tried to package Glean for VMware and install that > package when we built the image. The issue with that is that the > package isn't signed. This can affect support and prevent the overall > system from being updated due to the presence of an unsigned package. > You would need some kind of partnership with VMware to get it signed. > > We therefore switched to a firstboot script which can be configured in > the kickstart when building the image: > https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.esxi.upgrade.doc/GUID-61A14EBB-5CF3-43EE-87EF-DB8EC6D83698.html > (see %firstboot section) > > You can have multiple %firstboot section both using busybox or python > as an interpreter. > > We have multiple sections performing various steps: > 1) Find the config-drive partition. Since it's baremetal, config-drive > is a primary partition at the end of the disk. > 2) Load the iso9660 module > 3) Mount the config-drive > 4) A whole Python section which parses config-drive and perform > hostname/network/password/publickeys configuration. > 5) General cleanup: unmount config-drive, unload iso9660 module > > Now since it's a firstboot script, VMware no longer complains about > unsigned packages. > > I hope this help. > Nice, this is exactly the type of thing I was hoping for! Awesome! In fact, my initial though too was also update glean adding vmware support, good to know somebody else tried this first. As for first boot, this too in the approach we are taking. A teammate has created a python script to do the same. My first question was, why couldn't that script be glean? That is the part I am a little confused about. Also, are you interested in working on this 'python' script together upstream? I suspect we might be working to solve the same issue related to hostname, network, SSH. Paul > -- > Mathieu > From aj at suse.com Fri Nov 22 21:29:59 2019 From: aj at suse.com (Andreas Jaeger) Date: Fri, 22 Nov 2019 22:29:59 +0100 Subject: [kolla] PTG summary In-Reply-To: References: Message-ID: On 22/11/2019 17.47, Mark Goddard wrote: > [...] > ## Ceph Ansible > > We are continuing to investigate Ceph Ansible as an alternative to our > native Ceph deployment. The work to migrate from an existing kolla > deployment is still ongoing. There are some potential blockers in the > form of no Ubuntu container image support in ceph-ansible, and no ARM > container images published by the ceph-container project. I suggest to talk with the Ceph community before spending more work here. ceph-ansible is getting replaced in March by "SSH orchestrator", see https://docs.google.com/presentation/d/1JpcETNXpuB1JEuhX_c8xtgNnv0gJhSaQUffM7aRjUek/edit#slide=id.g78e5cb0e10_0_0 Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From zhipengh512 at gmail.com Sat Nov 23 01:26:27 2019 From: zhipengh512 at gmail.com (Zhipeng Huang) Date: Sat, 23 Nov 2019 09:26:27 +0800 Subject: [resource-management-sig] Status of the "Resource Management" SIG In-Reply-To: References: <5a84404f-0e10-9010-61ed-29aff08b5ec6@openstack.org> Message-ID: Yes we had a great session at Shanghai and Rico is working with me to update the tag to reflect the SIG status On Fri, Nov 22, 2019 at 11:55 PM Thierry Carrez wrote: > Zhipeng Huang wrote: > > Intend to launch the activity during Shanghai PTG via Cyborg sessions, > > for those you are interested in topics like Kubernetes, OCP OAI, RISC-V > > you are more than welcomed to reach out to me. > > Was that effort successful? Should we keep the resource management SIG, > with status:forming and removing the other chairs? > > -- > Thierry > > -- Zhipeng (Howard) Huang Principle Engineer OpenStack, Kubernetes, CNCF, LF Edge, ONNX, Kubeflow, OpenSDS, Open Service Broker API, OCP, Hyperledger, ETSI, SNIA, DMTF, W3C -------------- next part -------------- An HTML attachment was scrubbed... URL: From colleen at gazlene.net Sat Nov 23 02:25:41 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Fri, 22 Nov 2019 18:25:41 -0800 Subject: [keystone] Keystone Team Update - Week of 18 November 2019 Message-ID: # Keystone Team Update - Week of 18 November 2019 ## Meta I haven't done one of these updates in a while because of travel and $life. This week there were a few things worth sending a newsletter about, so here you go. However, as many of you are aware, my job focus is changing and I need to be more strategic and selective about the activities I put my time into. These weekly updates consume a not-insignificant amount of my time, and so from now on I'll not plan on continuing it. Many people have given me the feedback that they find this newsletter useful, and so I encourage anyone who has some pulse on the keystone team's activities to take up this weekly summary (need not be a core). The template is linked at the bottom of this email, and I've been using Lance's launchpad scripts[1] to generate bug statistics and some oneliners[2] to scrape Gerrit data. [1] https://github.com/lbragstad/launchpad-toolkit [2] https://gist.github.com/cmurphy/ee802fc0dc4bf57dffbea02265cc9e92 ## News ### PTG Recap We concluded our second virtual PTG session this week. Notes were kept in the etherpad[3] and I will send out a summary some time next week. [3] https://etherpad.openstack.org/p/keystone-shanghai-ptg ### Documentation for Upgrade Issues As operators start upgrading to Train[4] it's becoming clear that some of the changes we made last cycle aren't well documented and we need better tooling and documentation to help deal with them, especially immutable roles and deprecated policies. [4] http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2019-11-18.log.html#t2019-11-18T20:06:24 ### User Support and Bug Duty To help share the responsibility of ensuring users and operators have a positive experience when coming to the keystone team for help, we started a weekly rotation of support duty, organized by etherpad[5]. Being on-duty means being responsible for triaging new bugs, and responding to support requests on the mailing list and on IRC. It does not mean having to solve people's problems right away, only ensuring that they know someone is listening and cares about their issues. If you would like to help, feel free to add your name to the list - you need not be a keystone developer, just willing to help triage user issues, confirm bug reports, ask for more information, and help give pointers to resources that may help. We'll proceed roughly down the bulletted list but will take time at each IRC meeting to confirm the next person on-duty. [5] https://etherpad.openstack.org/p/keystone-l1-duty ### Unified Limits Update Work on the Unified Limits initiative[6] had stalled out last cycle but work on the oslo.limit implementation, which was the main thing blocking further progress in Nova or other projects, has been reignited this week[7]. Thanks John for picking this back up! [6] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/unified-limits.html [7] https://review.opendev.org/#/q/project:openstack/oslo.limit+is:open ### No Meeting Next Week Next week is the Thanksgiving holiday in the US so we'll skip the team meeting (Tuesday November 26). ## Open Specs Ussuri specs: https://bit.ly/2XDdpkU Ongoing specs: https://bit.ly/2OyDLTh ## Recently Merged Changes Search query: https://bit.ly/2pquOwT We merged 3 changes this week. ## Changes that need Attention Search query: https://bit.ly/2tymTje There are 32 changes that are passing CI, not in merge conflict, have no negative reviews and aren't proposed by bots. ## Bugs This week we opened 2 new bugs and closed 1. Bugs opened (2) Bug #1853170 (oslo.policy:High) opened by Ben Nemec https://bugs.launchpad.net/oslo.policy/+bug/1853170 Bug #1853038 (oslo.policy:Wishlist) opened by Ben Nemec https://bugs.launchpad.net/oslo.policy/+bug/1853038 Bugs closed (1) Bug #1852547 (keystone:Undecided) https://bugs.launchpad.net/keystone/+bug/1852547 ## Milestone Outlook https://releases.openstack.org/ussuri/schedule.html Specs for Ussuri must be proposed and ready to review by Milestone 1 which is in about three weeks. ## Shout-outs Thanks for Adam Harwell and Zachary Buhman for joining the keystone PTG and participating in the design discussion, John Garbutt for picking up the unified limits work, and everyone who signed up to help with bug duty! ## Help with this newsletter Help contribute to this newsletter by editing the etherpad: https://etherpad.openstack.org/p/keystone-team-newsletter From ryan at trolocsis.com Sat Nov 23 02:30:39 2019 From: ryan at trolocsis.com (Ryan Phillips) Date: Fri, 22 Nov 2019 20:30:39 -0600 Subject: [kolla] all-in-one install with swift failing Message-ID: Hi All, I've been trying to install Kolla (train) as an all-in-one install. The base install works great: I can get to the dashboard and create a nova instance. I am attempting to add swift support and followed the directions to configure swift. The problem is all the swift containers are coming up and exiting with: + sudo -E kolla_set_configs sudo: no tty present and no askpass program specified + sudo -E kolla_set_configs sudo: no tty present and no askpass program specified + sudo -E kolla_set_configs sudo: no tty present and no askpass program specified + sudo -E kolla_set_configs sudo: no tty present and no askpass program specified Does anyone know what is wrong? Regards, Ryan CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES cea6edf16348 kolla/ubuntu-binary-horizon:train "dumb-init --single-…" 7 minutes ago Up 7 minutes horizon 6c88d426a07d kolla/ubuntu-binary-heat-engine:train "dumb-init --single-…" 7 minutes ago Up 7 minutes heat_engine c28e5ce612dc kolla/ubuntu-binary-heat-api-cfn:train "dumb-init --single-…" 7 minutes ago Up 7 minutes heat_api_cfn bfc6cdd6b969 kolla/ubuntu-binary-heat-api:train "dumb-init --single-…" 7 minutes ago Up 7 minutes heat_api aa5c37b09733 kolla/ubuntu-binary-neutron-metadata-agent:train "dumb-init --single-…" 8 minutes ago Up 8 minutes neutron_metadata_agent 07154a5af33b kolla/ubuntu-binary-neutron-l3-agent:train "dumb-init --single-…" 8 minutes ago Up 8 minutes neutron_l3_agent 17e2265f8d34 kolla/ubuntu-binary-neutron-dhcp-agent:train "dumb-init --single-…" 8 minutes ago Up 8 minutes neutron_dhcp_agent 1fdb7bea5434 kolla/ubuntu-binary-neutron-openvswitch-agent:train "dumb-init --single-…" 8 minutes ago Up 8 minutes neutron_openvswitch_agent e1fe4fffe34d kolla/ubuntu-binary-neutron-server:train "dumb-init --single-…" 8 minutes ago Up 8 minutes neutron_server cc20097e0787 kolla/ubuntu-binary-openvswitch-vswitchd:train "dumb-init --single-…" 9 minutes ago Up 9 minutes openvswitch_vswitchd 3cd5b3ac32a9 kolla/ubuntu-binary-openvswitch-db-server:train "dumb-init --single-…" 9 minutes ago Up 9 minutes openvswitch_db 22684295751d kolla/ubuntu-binary-nova-compute:train "dumb-init --single-…" 10 minutes ago Up 10 minutes nova_compute 535e6c81e8f5 kolla/ubuntu-binary-nova-libvirt:train "dumb-init --single-…" 10 minutes ago Up 10 minutes nova_libvirt 59e06d362509 kolla/ubuntu-binary-nova-ssh:train "dumb-init --single-…" 10 minutes ago Up 10 minutes nova_ssh 575ea9e95e33 kolla/ubuntu-binary-nova-novncproxy:train "dumb-init --single-…" 10 minutes ago Up 10 minutes nova_novncproxy 150d2d5723e6 kolla/ubuntu-binary-nova-conductor:train "dumb-init --single-…" 10 minutes ago Up 10 minutes nova_conductor e6cd7b383972 kolla/ubuntu-binary-nova-api:train "dumb-init --single-…" 10 minutes ago Up 10 minutes nova_api af9df9d41b0f kolla/ubuntu-binary-nova-scheduler:train "dumb-init --single-…" 10 minutes ago Up 10 minutes nova_scheduler 1892ab9d3d19 kolla/ubuntu-binary-placement-api:train "dumb-init --single-…" 12 minutes ago Up 12 minutes placement_api eed26f6ed1b2 kolla/ubuntu-binary-glance-api:train "dumb-init --single-…" 13 minutes ago Up 13 minutes glance_api 983b379c9d52 kolla/ubuntu-binary-swift-proxy-server:train "dumb-init --single-…" 13 minutes ago Restarting (1) 55 seconds ago swift_proxy_server 0a7e6a59b493 kolla/ubuntu-binary-swift-object-expirer:train "dumb-init --single-…" 13 minutes ago Restarting (1) Less than a second ago swift_object_expirer d426ebe62e71 kolla/ubuntu-binary-swift-object:train "dumb-init --single-…" 13 minutes ago Restarting (1) 3 seconds ago swift_object_updater 8aa837f99b40 kolla/ubuntu-binary-swift-object:train "dumb-init --single-…" 13 minutes ago Restarting (1) 4 seconds ago swift_object_replicator 06e5000f82ce kolla/ubuntu-binary-swift-object:train "dumb-init --single-…" 13 minutes ago Restarting (1) 4 seconds ago swift_object_auditor ac17c224738b kolla/ubuntu-binary-swift-object:train "dumb-init --single-…" 14 minutes ago Restarting (1) 4 seconds ago swift_object_server fca502a360fb kolla/ubuntu-binary-swift-container:train "dumb-init --single-…" 14 minutes ago Restarting (1) 8 seconds ago swift_container_updater 8def0c6efde9 kolla/ubuntu-binary-swift-container:train "dumb-init --single-…" 14 minutes ago Restarting (1) 8 seconds ago swift_container_replicator e2aa84ebed7d kolla/ubuntu-binary-swift-container:train "dumb-init --single-…" 14 minutes ago Restarting (1) 8 seconds ago swift_container_auditor e7aabf20c11a kolla/ubuntu-binary-swift-container:train "dumb-init --single-…" 14 minutes ago Restarting (1) 11 seconds ago swift_container_server cc754157a930 kolla/ubuntu-binary-swift-account:train "dumb-init --single-…" 14 minutes ago Restarting (1) 14 seconds ago swift_account_reaper b3ef7952d2d3 kolla/ubuntu-binary-swift-account:train "dumb-init --single-…" 14 minutes ago Restarting (1) 14 seconds ago swift_account_replicator 7b984325b708 kolla/ubuntu-binary-swift-account:train "dumb-init --single-…" 14 minutes ago Restarting (1) 14 seconds ago swift_account_auditor 06454711e2f9 kolla/ubuntu-binary-swift-account:train "dumb-init --single-…" 14 minutes ago Restarting (1) 14 seconds ago swift_account_server 226002e50961 kolla/ubuntu-binary-swift-rsyncd:train "dumb-init --single-…" 14 minutes ago Restarting (1) 19 seconds ago swift_rsyncd 06ede7f32329 kolla/ubuntu-binary-keystone-fernet:train "dumb-init --single-…" 18 minutes ago Up 18 minutes keystone_fernet b6f258bfee93 kolla/ubuntu-binary-keystone-ssh:train "dumb-init --single-…" 18 minutes ago Up 18 minutes keystone_ssh c237390ab2d3 kolla/ubuntu-binary-keystone:train "dumb-init --single-…" 18 minutes ago Up 18 minutes keystone 9d55b6233854 kolla/ubuntu-binary-rabbitmq:train "dumb-init --single-…" 18 minutes ago Up 18 minutes rabbitmq 037db9b550d9 kolla/ubuntu-binary-memcached:train "dumb-init --single-…" 19 minutes ago Up 19 minutes memcached 747026ed2053 kolla/ubuntu-binary-mariadb:train "dumb-init -- kolla_…" 19 minutes ago Up 19 minutes mariadb 07148ecdfbd9 kolla/ubuntu-binary-chrony:train "dumb-init --single-…" 19 minutes ago Up 19 minutes chrony cc6883333991 kolla/ubuntu-binary-cron:train "dumb-init --single-…" 19 minutes ago Up 19 minutes cron 2eb7cb285418 kolla/ubuntu-binary-kolla-toolbox:train "dumb-init --single-…" 20 minutes ago Up 19 minutes kolla_toolbox b21c793b31d3 kolla/ubuntu-binary-fluentd:train "dumb-init --single-…" 20 minutes ago Up 20 minutes fluentd -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Sat Nov 23 09:54:53 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Sat, 23 Nov 2019 10:54:53 +0100 Subject: [kolla] all-in-one install with swift failing In-Reply-To: References: Message-ID: Hello Ryan, Please report a bug to https://bugs.launchpad.net/kolla-ansible As a matter of fact, swift does not seem to be that popular (compared to ceph), and, since we are not running deployment CI tests on it, it could be in any state. -yoctozepto sob., 23 lis 2019 o 03:41 Ryan Phillips napisał(a): > > Hi All, > > I've been trying to install Kolla (train) as an all-in-one install. The base install works great: I can get to the dashboard and create a nova instance. I am attempting to add swift support and followed the directions to configure swift. > > The problem is all the swift containers are coming up and exiting with: > > + sudo -E kolla_set_configs > sudo: no tty present and no askpass program specified > > Does anyone know what is wrong? > > Regards, > Ryan From ryan at trolocsis.com Sat Nov 23 17:28:30 2019 From: ryan at trolocsis.com (Ryan Phillips) Date: Sat, 23 Nov 2019 11:28:30 -0600 Subject: [kolla] all-in-one install with swift failing In-Reply-To: References: Message-ID: Thanks! Reported the issue. On Sat, Nov 23, 2019 at 3:55 AM Radosław Piliszek < radoslaw.piliszek at gmail.com> wrote: > Hello Ryan, > > Please report a bug to https://bugs.launchpad.net/kolla-ansible > > As a matter of fact, swift does not seem to be that popular (compared > to ceph), and, since we are not running deployment CI tests on it, it > could be in any state. > > -yoctozepto > > sob., 23 lis 2019 o 03:41 Ryan Phillips napisał(a): > > > > Hi All, > > > > I've been trying to install Kolla (train) as an all-in-one install. The > base install works great: I can get to the dashboard and create a nova > instance. I am attempting to add swift support and followed the directions > to configure swift. > > > > The problem is all the swift containers are coming up and exiting with: > > > > + sudo -E kolla_set_configs > > sudo: no tty present and no askpass program specified > > > > Does anyone know what is wrong? > > > > Regards, > > Ryan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kotobi at dkrz.de Sat Nov 23 18:17:54 2019 From: kotobi at dkrz.de (Amjad Kotobi) Date: Sat, 23 Nov 2019 19:17:54 +0100 Subject: Freezer Project Update In-Reply-To: <005401d59f9f$025a81c0$070f8540$@fingent.com> References: <003201d59aaf$cf415fa0$6dc41ee0$@fingent.com> <14B06106-F3AB-47DA-BD35-99B18383C147@dkrz.de> <005401d59f9f$025a81c0$070f8540$@fingent.com> Message-ID: <8CAA057D-CC0B-4803-BA20-7A252EABD600@dkrz.de> Hi, I’ve tried with latest version but didn’t go for production with webui. I deployed and patched it in standalone way which Freezer-api runs in VM. You are able to run freezer-agent without having scheduler|api running, you only need to export admin-rc. > On 20. Nov 2019, at 13:35, Deepa wrote: > > Hello Amjad/James > > We tried installing Freezer .Freezer-scheduler and Freezer-agent in a VM which need to be backed up and Freezer-api and freezer-webui on controller node.The version of Openstack is Train. > > Unfortunately nothing worked out ☹ Most likely [keystone] section of freezer-api not configured properly which those configuration not coming from installation package. I will paste you later the configuration structure. > > Getting below error on VM (client to be backed up) when I run freezer-agent --action info > > Critical Error: Authorization Failure. Authorization Failed: Not Found (HTTP 404) (Request-ID: req-0c71d8b4-ef1a-4c8d-8d12-26df763f5085) > > And getting error in Dashboard when we enabled Freezer-api and freezer-webui in dashboard > > During handling of the above exception ([*] Error 401: {"error": {"message": "The request you have made requires authentication.", "code": 401, "title": "Unauthorized"}}), another exception occurred: > > Doubt the issue is with keystone versioning v2/v3.It will be great if you can share or tell freeze.env file for Freezer-agent (For client VM) and freezer-api.conf file parameters. > Also what should be admin.rc file for freezer-webui. Correct it’s not api version thing. > > freezer-scheduler --config-file /etc/freezer/scheduler.conf start > Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Not Found (HTTP 404) (Request-ID: req-2adb597d-7ad5-45a8-9888-6b552c8e55cc) > > Any guidance is highly appreciated . > > Thanks a lot > > Regards, > Deepa K R Amjad > > From: James Page > Sent: Tuesday, November 19, 2019 10:19 PM > To: Amjad Kotobi > Cc: Deepa ; OpenStack Development Mailing List (not for usage questions) > Subject: Re: Freezer Project Update > > Hello > > On Fri, Nov 15, 2019 at 7:43 PM Amjad Kotobi > wrote: >> Hi, >> >> This project is pretty much in production state, from last summit it got active again from developer ends, we are using it for backup solution too. > > Great to hear that Freezer is getting some increased developer focus! > >> Documentation side isn’t that bright, very soon gonna get updated, anyhow you are able to install as standalone project in instance, I did it manually, didn’t use any provision tools. >> Let me know for specific part of deployment that is not clear. >> >> Amjad >> >> >>> On 14. Nov 2019, at 06:53, Deepa > wrote: >>> >>> Hello Team >>> >>> Good Day >>> >>> I am Deepa from Fingent Global Solutions and we are a big fan of Openstack and we do have 4 + openstack setup (including production) >>> We have deployed Openstack using juju and Maas .So when we check for backup feasibility other than cinder-backup we were able to see >>> Freezer Project. But couldn’t find any charms for it in juju charms. Also there isn’t a clear documentation on how to install freezer . >>> https://docs.openstack.org/releasenotes/freezer/train.html . No proper release notes in the latest version as well. >>> Can you please tell me whether this project is in developing state? Whether charms will be added to juju in future. > > Freezer is not currently on the plan for OpenStack Charms for Ussuri. Better install documentation and support from Linux distros would be a good first step in the right direction. > > Cheers > > James -------------- next part -------------- An HTML attachment was scrubbed... URL: From bitskrieg at bitskrieg.net Sun Nov 24 04:45:46 2019 From: bitskrieg at bitskrieg.net (Chris Apsey) Date: Sun, 24 Nov 2019 04:45:46 +0000 Subject: [neutron][ovn] networking-ovn-metadata-agent and neutron agent liveness In-Reply-To: <58B86CC6-6E25-4255-A150-5B16ED2FDD44@rackspace.com> References: <58B86CC6-6E25-4255-A150-5B16ED2FDD44@rackspace.com> Message-ID: James, After playing with this a little more, I think I have a way to handle this that is somewhat better than running as root directly: 1. Allow the ovsdb-server on the compute nodes to listen on 127.0.0.1:6640 (ovs-appctl -t ovsdb-server ovsdb-server/add-remote ptcp:6640:127.0.0.1) 2. Set the helper_command, ovsdb_connection, and the root_helper options in /etc/neutron/plugins/networking-ovn/networking-ovn-metadata-agent.ini as appropriate (example [1]) The process should start successfully as neutron. I'm still having issues with agent liveness reporting, but it appears to be entirely superficial. The agent works as expected. r Chris Apsey [1] https://github.com/GeorgiaCyber/kinetic/blob/networking-ovn/formulas/compute/files/networking_ovn_metadata_agent.ini ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Wednesday, November 20, 2019 7:15 AM, James Denton wrote: > Hi Chris – > > I recall having the same issue when first implementing OVN into OpenStack-Ansible, and currently have the OVN metadata agent running as root[1]. I’m curious to see how others solved the issue as well. Thanks for bringing this up. > > [1] https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/vars/main.yml#L495-L496 > > James Denton > > Network Engineer > > Rackspace Private Cloud > > james.denton at rackspace.com > > From: Chris Apsey > Reply-To: Chris Apsey > Date: Wednesday, November 20, 2019 at 12:00 AM > To: "openstack-discuss at lists.openstack.org" > Subject: [neutron][ovn] networking-ovn-metadata-agent and neutron agent liveness > > CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > > All, > > Currently experimenting with networking-ovn (rdo/train packages on centos7) and I've managed to cobble together a functional deployment with two exceptions: metadata agents and agent liveness. > > Ref: the metadata issues, it appears that the local compute node ovsdb server listens on a unix socket at /var/run/openvswitch/db.sock as openvswitch:hugetlbfs 0750. Since networking-ovn-metadata-agent runs as neutron, it's not able to interact with the local ovs database and gets stuck in a restart loop and complains about the inaccessible database socket. If I edit the systemd unit file and let the agent run as root, it functions as expected. This obviously isn't a real solution, but indicates to me a possible packaging bug? Not sure what the correct mix of permissions is, or if the local database should be listening on tcp:localhost:6640 as well and that's how the metadata agent should connect. The docs are sparse in this area, but I would imagine that something like the metadata-agent should 'just work' out of the box without having to change systemd unit files or mess with unix socket permissions. Thoughts? > > Secondly, ```openstack network agent list``` shows that all agents (ovn-controller) are all dead, all the time. However, if I display a single agent ```openstack network agent show $foo```, it shows as live. I looked around and saw some discussions about getting networking-ovn to deal with this better, but as of now the agents are reported as dead consistently unless they are explicitly polled, at least on centos 7. I haven't noticed any real impact, but the testing I'm doing is small scale. > > Other than those two issues, networking-ovn is great, and based on the discussions around possibly deprecating linuxbridge as an in-tree driver, it would make a great 'default' networking configuration option upstream, given the docs get cleaned up. > > Thanks in advance, > > r > > Chris Apsey -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Sun Nov 24 09:49:40 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Sun, 24 Nov 2019 10:49:40 +0100 Subject: [all][neutron][networking-bagpipe][networking-bgpvpn] Maintainers needed In-Reply-To: <6317_1574441117_5DD8109D_6317_149_25_4d84cae8-c759-fc4b-d39c-9c999759830d@orange.com> References: <20191119102918.b5cmfecqjf746bqi@skaplons-mac> <6317_1574441117_5DD8109D_6317_149_25_4d84cae8-c759-fc4b-d39c-9c999759830d@orange.com> Message-ID: <20191124094940.bbp5ch7lwu7qtslw@skaplons-mac> Hi, Thx Thomas for confirmation that You will still be able to spent some time on this project. And indeed Lajos offered that he can help with this project also so thanks a lot for volunteering Lajos! On Fri, Nov 22, 2019 at 05:45:16PM +0100, thomas.morin at orange.com wrote: > Hi folks, > > About networking-bgpvpn project and the reference implementation in > networking-bagpipe: while it's very true that contributors involvment has > been lower in the past year with no feature added, the projects are I > believe sane enough. With some help of neutron cores and other contributors > (actually, new contributors), the project gate has been quite consistenly > fixed to follow changes in "external factors". > > My plan as a contributor is to keep having some time for networking-bgpvpn > and networking-bagpipe to ensure they're sane enough to release in Usuri > with the current feature set, or more if proposals arise and there is energy > to implement. > > And I welcome the idea of having Lajos Katona join on the project! (Thanks!) > > Best, > > -Thomas > > > > Slawek Kaplonski : > > Hi, > > > > Over the past couple of cycles we have noticed that new contributions and > > maintenance efforts for networking-bagpipe and networking-bgpvpn were > > almost non existent. > > This impacts patches for bug fixes, new features and reviews. The Neutron > > core team is trying to at least keep the CI of this project healthy, but we > > don’t have enough knowledge about the details of the > > code base to review more complex patches. > > > > During the PTG in Shanghai we discussed that with operators and TC members > > during the forum session [1] and later within the Neutron team during the > > PTG session [2]. > > > > During these discussions, with the help of operators and TC members, we > > reached the conclusion that we need to have someone responsible for > > maintaining those projects. This doesn’t mean that the > > maintainer needs to spend full time working on those projects. Rather, we > > need someone to be the contact person for the project, who takes care of > > the project’s CI and review patches. Of course that’s only a minimal > > requirement. If the new maintainer works on new features for the project, > > it’s even better :) > > > > If we don’t have any new maintainer(s) before milestone Ussuri-2, which is > > Feb 10 - Feb 14 according to [3], we will need to mark networking-bgpvpn and > > networking-bagpipe as deprecated and in “V” cycle we will propose to move the > > projects from the Neutron stadium, hosted in the “openstack/“ namespace, to the > > unofficial projects hosted in the “x/“ namespace. > > > > So if You are using this project now, or if You have customers who are > > using it, please consider the possibility of maintaining it. Otherwise, > > please be aware that it is highly possible that the project will be > > deprecated and moved out from the official OpenStack projects. > > > > [1] > > https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward > > [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored - > > Lines 379-421 > > [3] https://releases.openstack.org/ussuri/schedule.html > > > > _________________________________________________________________________________________________________________________ > > Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, > Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. > > This message and its attachments may contain confidential or privileged information that may be protected by law; > they should not be distributed, used or copied without authorisation. > If you have received this email in error, please notify the sender and delete this message and its attachments. > As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. > Thank you. > > -- Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Sun Nov 24 09:55:05 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Sun, 24 Nov 2019 10:55:05 +0100 Subject: [neutron] [all] Networking-ovn and neutron convergence In-Reply-To: <20077_1574441646_5DD812AE_20077_319_13_ae59cd49-1c4b-529c-5cef-33ff24136b72@orange.com> References: <17ec435e-4132-22cc-1c9a-8b37766ac15f@gmail.com> <20077_1574441646_5DD812AE_20077_319_13_ae59cd49-1c4b-529c-5cef-33ff24136b72@orange.com> Message-ID: <20191124095505.o2irpjwbs76sasg5@skaplons-mac> Hi, On Fri, Nov 22, 2019 at 05:54:05PM +0100, thomas.morin at orange.com wrote: > Hi Brian, all, > > Perhaps the area to make more explicit is where the difference may lie > between having networking-ovn "as one of the in-tree Neutron drivers" and > "the ML2+OVS+DVR solution will be merged with the networking-ovn solution" > [1]. > > More specifically, my question would be about whether or not this proposal > includes removing ML2 code, and if yes which parts, and in particular > whether the l2 agent extension integration point would be preserved.  Not > preserving them would be a problem for projects such as networking-bagpipe, > networking-bgpvpn and networking-sfc to exist for Ussuri release. We are for sure not going to remove any code (if it's not simple duplicate code) from existing ML2/OVS implementation. This is very popular solution in Neutron, used by many clouds so we for sure will need to maintain it too still :) > > These three projects have a different level of liveliness, but I would find > problematic a choice that would prevent keeping networking-bgpvpn and it's > reference implementation in networking-bagpipe, to be released for Ussuri.  > Having OVN include BGPVPN functionality is something that had been discussed > in the past, and the idea is still valid in my views, but it's unlikely that > such a thing could land in Ussuri timeframe. > > So I would be glad if this specific point of preserving integration points > that ML2 offers, can be clarified. > > Best, > > -Thomas > > [1] https://review.opendev.org/#/c/658414/18/specs/ussuri/ml2ovs-ovn-convergence.rst at 52 > > > Brian Haley : > > Hi, > > > > For some time we have been discussing in the Neutron community the > > possibility of including the networking-ovn driver [1] as one of the > > in-tree Neutron drivers.  There is already a spec [2] describing in > > detail why we want to do this and why we think it is good idea.  We also > > discussed this during the PTG in Shanghai within our team [3], and had a > > discussion at the ops-meetup as well [4]. > > > > The OVN backend is free of many well-known issues which are impacting > > the existing ML2/OVS reference implementation today with the OVS agent. > > For example, OVN provides: > > > > * Control plane performance optimizations by not using rabbitmq; > > * DVR (and HA) by default, based on OpenFlow so it can be easy offloaded > >   e.g. by SmartNICs; > > * Distributed DHCP; > > > > There are some feature parity gaps when comparing it to ML2/OVS that we > > plan to address.  See [2] for details. > > > > We think that merging this code into the neutron repository will help to > > grow the networking-ovn community and will help us to keep a healthy > > Neutron team as well by increasing the number or contributors. > > > > Our current plan is to start merging code from networking-ovn into the > > neutron repository as soon as possible in the Ussuri cycle. But we also > > wanted to get any additional opinions from the wider community about > > this plan.  What do users and operators of Neutron think about this? > > > > [1] https://opendev.org/openstack/networking-ovn > > [2] https://review.opendev.org/#/c/658414/ > > [3] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored > >     Lines 252 - 295 > > [4] https://etherpad.openstack.org/p/shanghai-ptg-ops-meetup > >     Lines 62 - 69 > > > > Thanks for any feedback, > > > > -Brian (and Slawek) > > > > _________________________________________________________________________________________________________________________ > > Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, > Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. > > This message and its attachments may contain confidential or privileged information that may be protected by law; > they should not be distributed, used or copied without authorisation. > If you have received this email in error, please notify the sender and delete this message and its attachments. > As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. > Thank you. > > -- Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Sun Nov 24 10:20:23 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Sun, 24 Nov 2019 11:20:23 +0100 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> Message-ID: <20191124102023.d73dhicqdbrzh6pl@skaplons-mac> Hi, In http://kaplonski.pl/files/neutron_team_photo_shanghai_2019.zip You can find some good quality photos from Neutron team dinner which we had in Shanghai. Thx LIU Yulong for those pictures :) On Tue, Nov 12, 2019 at 02:53:11PM +0100, Slawek Kaplonski wrote: > Hi Neutron team, > > First if all thank to all of You for great and very productive week during the > PTG in Shanghai. > Below is summary of our discussions from whole 3 days. > If I forgot about something, please respond to the email and update missing > informations. But if You want to have follow up discussion about one of the > topics from this summary, please start a new thread to keep this one only as > high level summary of the PTG. > > On boarding > =========== > > Slides from onboarding session can be found at [1] > If You have any follow up questions to us about onboarding, or You need help > with starting any work in Neutron team, please contact me or Miguel Lavalle by > email or on IRC. My IRC nick is slaweq and Miguel's nick is mlavalle. We are > available on #openstack-neutron channel @freenode. > > Train retrospective > =================== > > Good things in Train cycle: > * working with this team is still good experience > * core team is stable, and we didn't lost any core reviewers during the cycle, > * networking is still one of key reasons why people use OpenStack > > Not good things: > * dimished vitality in stadium projects - we had also forum session and follow > discussion about this later during the PTG, > * gate instability - we have seen many issues which were out of our control, > like infra problems, grenade jobs failures, other projects failures, but also > many bugs on our side, > * we have really a lot of jobs in our check/gate queue. If each of them is > failing 5% of times, it's hard to merge any patch as almost every time, one of > jobs will fail. Later during the PTG we also discussed that topic and we were > looking for some jobs which we maybe can potentially drop from our queues. See > below for summary about that, > > Action items/improvements: > * many team meetings each week. We decided to limit number of meetings by: > ** consolidate performance subteam meeting into weekly team meeting - this topic > will be added to the team meeting's agenda for team meetings on Monday, > ** consolidate ovn convergence meeting into weekly team meeting - this topic > will be added to the team meeting's agenda for team meetings on Tuesday, > ** we need to check if QoS subteam meeting is still needed, > > * Review process: list of actual review priorities would be useful for the team, > we will add "Review-Priority" label to the Neutron reviews board and try to > use it during the Ussuri cycle. > > Openvswitch agent enhancements > ============================== > > We had bunch of topics related to potential improvements for > neutron-openvswitch-agent proposed mostly by Liu Yulong. Slides with his > proposals are available at [2]. > > * retire DHCP agent - resyncs of DHCP agent are problematic, especially when > agent hosts many networks. Proposal was to add new L2 agent's extension which > could be used instead of "regular" DHCP agent and to provide only basic DHCP > functionalities. > Such solutions would work in the way quite similar to how networking-ovn works > today but we would need to implement and maintain own dhcp server application. > > Problems of this solution are: > ** problems with compatibility e.g. with Ironic, > ** how it would work with mixed deployments, e.g. with ovs and sriov agents, > ** support for dhcp options, > > Advantages of this solution: > ** fully distributed DHCP service, > ** no DHCP agents, so less RPC messages on the bus and easier maintanance of > the agents, > > Team's feedback for that is that this is potentially nice solution which may > helps in some specific, large scale deploymnets. We can continue discussion > about this during Ussuri cycle for sure. > > * add accepted egress fdb flows > We agreed that this is a bug and we should continue work on this to propose > some way to fix it. > Solution proposed by LIU during this discussion wasn't good as it could > potentially break some corner cases. > > * new API and agent for L2 traffic health check > The team asked to add to the spec some more detailed and concrete use cases > with explanation how this new API may help operator of the cloud to > investigate where the problem actually is. > > * Local flows cache and batch updating > The team agreed that as long as this will be optional solution which operator > can opt-in we can give it a try. But spec and discuss details there will be > necessary. > > * stop processing ports twice in ovs-agent > We all agreed that this is a bug and should be fixed. But we have to be > careful as fixing this bug may cause some other problems e.g. with > live-migration - see nova-neutron cross project session. > > * ovs-agent: batch flow updates with --bundle > We all agreed that this can be done as an improvement of existing code. > Similar option is already used in openvswitch firewall driver. > > Neutron - Cyborg cross project session > ====================================== > > Etherpad for the session is at [3]. > Cyborg team wants to include Neutron in workflow of spawning VMs with Smart NICs > or accelerator cards. From Neutron's side, required change is to allow including > "accel" data in port binding profile. As long as this will be well documented > what can be placed there, there should be no problem with doing that. > Technically we can place almost anything there. > > Neutron - Kuryr cross project session > ===================================== > > Etherpad for the session is at [4]. > Kuryr team proposed 4 improvements for Neutron which would help a lot Kuryr. > Ideas are: > * Network cascade deletion, > * Force subport deletion, > * Tag resources at creation time, > * Security group creation with rules & bulk security group rule creation > > All of those ideas makes sense for Neutron team. Tag resources at creation time > is even accepted rfe already - see [5] but there was no volunteer to implement > it. We will add it to list of our BPs tracked weekly on team meeting. Miguel > Lavalle is going to take a look at it during this cycle. > For other proposals we need to have RFEs reported first. > > Starting the process of removing ML2/Linuxbridge > ================================================ > > Currently in Neutron tree we have 4 drivers: > * Linuxbridge, > * Openvswitch, > * macvtap, > * sriov. > SR-IOV driver is out of discussion here as this driver is > addressing slightly different use case than other out drivers. > > We started discussion about above topic because we don't want to end up with too > many drivers in-tree and we also had some discussions (and we have spec for that > already) about include networking-ovn as in-tree driver. > So with networking-ovn in-tree we would have already 4 drivers which can be used > on any hardware: linuxbridge, ovs, macvtap and ovn. > Conclusions from the discussion are: > * each driver requires proper testing in the gate, so we need to add many new > jobs to our check/gate queue, > * currently linuxbridge driver don't have a lot of development and feature > parity gaps between linuxbridge and ovs drivers is getting bigger and bigger > (e.g. dvr, trunk ports), > * also macvtap driver don't have a lot of activity in last few cycles. Maybe > this one could be also considered as candidate to deprecation, > * we need to have process of deprecating some drivers and time horizon for such > actions should be at least 2 cycles. > * we will not remove any driver completly but rather we will move it to be in > stadium process first so it still can be maintained by people who are > interested in it. > > Actions to do after this discussion: > * Miguel Lavalle will contact RAX and Godaddy (we know that those are > Linuxbridge users currently) to ask about their feedback about this, > * if there are any other companies using LB driver, Nate Johnston is willing to > help conctating them, please reach to him in such case. > * we may ratify marking linuxbridge as deprecated in the team meeting during > Ussuri cycle if nothing surprising pops in. > > Encrypted(IPSec) tenant networks > ================================ > > Interesting topic proposed but we need to have RFE and spec with more detailed > informations about it to continue discussions. > > Medatada service over IPv6 > ========================== > > This is continuation of old RFE [6]. > The only real problem is to choose proper IPv6 address which will be well known > address used e.g. by cloud-init. > Original spec proposed fe80::a9fe:a9fe as IPv6 address to access metadata > service. > We decided to be bold and define the standard. > Bence Romsics and Miguel Lavalle volunteered to reach out to cloud-init > maintainers to discuss that. > > walkthrough of OVN > ================== > > Since some time we have in review spec about ml2/ovs and ovn convergence. See > [7] for details. > List of parity gaps between those backends is available at [8]. > During the discussion we talked about things like: > * migration from ml2/ovs to ml2/ovn - some scripts are already done in [9], > * migration from ml2/lb to ml2/ovn - there was no any work done in this topic so > far but it should be doable also if someone would need it and want to invest > own time for that, > * include networking-ovn as in-tree neutron driver and reasons why it could be > good idea. > > Main reasons of that are: > ** that would help growing networking-ovn community, > ** would help to maintain a healthy project team, > ** the default drivers have always been in-tree, > > However such inclusion may also hurt modularity/logical separation/dependency > management/packaging/etc so we need to consider it really carefully and > consider all points of view and opinions. > > Next action item on this topic is to write more detailed summary of this topic > and send it to ML and ask wider audience for feedback. > > IPv6 devstack tempest test configuration vs OVN > =============================================== > > Generally team supports idea which was described during this session and we > should change sligtly IPv6 config on e.g. devstack deployments. > > Neutron - Edge SIG session > ========================== > > We discussed about RFE [10]. This will require also changes on placement side. > See [11] for details. > Also some cyborg and ovn related changes may be relevant to topics related to > Edge. > Currently specs which we have are only related to ML2/OVS solution. > > Neutron - Nova cross project session > ==================================== > > Etherpad for this session is on [12]. Summary written already by gibi can be > found at [13]. > On [14] You can find image which shows in visual way problem with live-migration > of instances with SR-IOV ports. > > Policy handling in Neutron > ========================== > > The goal of the session was to plan on Neutron's side similar effort to what > services like nova are doing now to use new roles like reader and scopes, like > project, domain, system provided by Keystone. > Miguel Lavalle volunteered to work on this for Neutron and to be part of popup > team for cross project collaboration on this topic. > > Neutron performance improvements > ================================ > > Miguel Lavalle shown us his new profiling decorator [15] and how we all can use > it to profile some of API calls in Neutron. > > Reevaluate Stadium projects > =========================== > > This was follow up discussion after forum session. Notes from forum session can > be found at [16]. > Nate also prepared some good data about stadium projects activity in last > cycles. See at [17] and [18] for details. > We all agreed that projects which are in (relatively) good condition now are: > * networking-ovn, > * networking-odl, > * ovsdbapp > > Projects in bad condition are other projects, like: > * neutron-interconnection, > * networking-sfc, > * networking-bagpipe/bgpvpn, > * networking-midonet, > * neutron-fwaas and neutron-fwaas-dashboard, > * neutron-dynamic-routing, > * neutron-vpnaas and neutron-vpnaas-dashboard, > > We decided to immediately remove neutron-interconnection project as it was never > really implemented. > For other of those projects, we will send emails to ML to ask for potential > maintainers of those projects. If there will be no any volunteers to maintain > some of those projects, we will deprecated them and move to "x/" namespace in 2 > cycles. > > Floating IP's On Routed Networks > ================================ > > There is still interest of doing this. Lajos Katona started adding some scenario > tests for routed networks already as we need improved test coverage for this > feature. > Miguel Lavalle said that he will possibly try to work on implementing this in > Ussuri cycle. > > L3 agent enhancement > ==================== > > We talked about couple potential improvements of existing L3 agent, all proposed > by LIU Yulong. > > * retire metering-agent > It seems that there is some interest in metering agent recently so we > shouldn't probably consider of retiring it for now. > We also talked about adding new "tc based" driver to the metering agent and > this discussion can be continue on rfe bug [19]. > > * Centralized DNAT (non-DVR) traffic (floating IP) Scale-out > This is proposal of new DVR solution. Some details of this new solution are > available at [20]. > We agreed that this proposal is trying to solve some very specific use case, > and it seems to be very complicated solution with many potential corner cases > to address. As a community we don't want to introduce and maintain such > complicated new L3 design. > > * Lazy-load agent side router resources when no related service port > Team wants to see RFE with detailed description of the exact problem which > this is trying to solve and than continue discussion on such RFE. > > Zuul jobs > ========= > > In this session we talked about jobs which we can potentially promote to be > voting (and we didn't found any of such) and about jobs which we maybe can > potentially remove from our queues. > Here is what we agreed: > * we have 2 iptables_hybrid jobs - one on Fedora and one on Ubuntu - we will > drop one of those jobs and left only one of them, > * drop neutron-grenade job as it is running still on py27 - we have grenade-py3 > which is the same job but run on py36 already, > * as it is begin of the cycle, we will switch in devstack neutron uwsgi to be > default choice and we will remove "-uwsgi" jobs from queue, > * we should compare our single node and multinode variants of same jobs and > maybe promote multinode jobs to be voting and then remove single node job - I > volunteered to do that, > * remove our existing experimental jobs as those jobs are mostly broken and > nobody is run those jobs in experimental queue actually, > * Yamamoto will check failing networking-midonet job and propose patch to make > it passing again, > * we will change neutron-tempest-plugin jobs for branch in EM phase to always > use certain tempest-plugin and tempest tag, than we will remove those jobs > from check and gate queue in master branch, > > Stateless security groups > ========================= > > Old RFE [21] was approved for neutron-fwaas project but we all agreed that this > should be now implemented for security groups in core Neutron. > People from Nuage are interested in work on this in upstream. > We should probably also explore how easy/hard it will be to implement it in > networking-ovn backend. > > Old, stagnant specs > =================== > > During this session we decided to abandon many of old specs which were proposed > long time ago and there is currently no any activity and interest in continue > working on them. > If anyone would be interested in continue work on some of them, feel free to > contact neutron core team on irc or through email and we can always reopen such > patch. > > Community Goal things > ===================== > > We discussed about currently proposed community goals and who can take care of > which goal on Neutron's side. > Currently there are proposals of community goals as below: > * python3 readiness - Nate will take care of this, > * move jobs definitions to zuul v3 - I will take care of it. In core neutron and > neutron-tempest-plugin we are (mostly) done. On stadium projects' side this > will require some work to do, > * Project specific PTL and contributor guides - Miguel Lavalle will take care of > this goal as former PTL, > > We will track progress of community goals weekly in our team meetings. > > Neutron-lib > =========== > > As some time ago our main neutron-lib maintainer (Boden) leaved from the > project, we need some new volunteers to continue work on it. Todo list is > available on [22]. > This should be mostly important for people who are maintaining stadium projects > or some 3rd party drivers/plugins so if You are doing things like that, please > check list from [22] and reach out to us on ML or #openstack-neutron IRC > channel. > > [1] https://www.slideshare.net/SawomirKaposki/neutron-on-boarding-room > [2] https://github.com/gotostack/shanghai_ptg/blob/master/shanghai_neutron_ptg_topics_liuyulong.pdf > [3] https://etherpad.openstack.org/p/Shanghai-Neutron-Cyborg-xproj > [4] https://etherpad.openstack.org/p/kuryr-neutron-nice-to-have > [5] https://bugs.launchpad.net/neutron/+bug/1815933 > [6] https://bugs.launchpad.net/neutron/+bug/1460177 > [7] https://review.opendev.org/#/c/658414/ > [8] https://etherpad.openstack.org/p/ML2-OVS-OVN-Convergence > [9] https://github.com/openstack/networking-ovn/tree/master/migration > [10] https://bugs.launchpad.net/neutron/+bug/1832526 > [11] http://lists.openstack.org/pipermail/openstack-discuss/2019-October/009991.html > [12] https://etherpad.openstack.org/p/ptg-ussuri-xproj-nova-neutron > [13] http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010654.html > [14] https://imgur.com/a/12PrQ9W > [15] https://review.opendev.org/678438 > [16] https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward > [17] https://ethercalc.openstack.org/neutron-stadium-train-metrics > [18] https://ibb.co/SBzDGdD > [19] https://bugs.launchpad.net/neutron/+bug/1817881 > [20] https://imgur.com/a/6MeNUNb > [21] https://bugs.launchpad.net/neutron/+bug/1753466 > [22] https://etherpad.openstack.org/p/neutron-lib-volunteers-and-punch-list > > -- > Slawek Kaplonski > Senior software engineer > Red Hat -- Slawek Kaplonski Senior software engineer Red Hat From alfredo.deluca at gmail.com Sun Nov 24 14:12:40 2019 From: alfredo.deluca at gmail.com (Alfredo De Luca) Date: Sun, 24 Nov 2019 15:12:40 +0100 Subject: [kolla] PTG summary In-Reply-To: References: Message-ID: Hi Andreas. Does it mean ceph-ansible will no longer be maintained? From your link it seems that you can activate a module ansible ...what does that mean exactly? Cheers On Fri, Nov 22, 2019 at 10:37 PM Andreas Jaeger wrote: > On 22/11/2019 17.47, Mark Goddard wrote: > > [...] > > ## Ceph Ansible > > > > We are continuing to investigate Ceph Ansible as an alternative to our > > native Ceph deployment. The work to migrate from an existing kolla > > deployment is still ongoing. There are some potential blockers in the > > form of no Ubuntu container image support in ceph-ansible, and no ARM > > container images published by the ceph-container project. > > I suggest to talk with the Ceph community before spending more work > here. ceph-ansible is getting replaced in March by "SSH orchestrator", see > > > https://docs.google.com/presentation/d/1JpcETNXpuB1JEuhX_c8xtgNnv0gJhSaQUffM7aRjUek/edit#slide=id.g78e5cb0e10_0_0 > > Andreas > -- > Andreas Jaeger aj at suse.com Twitter: jaegerandi > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 > > -- *Alfredo* -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Sun Nov 24 16:17:54 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Sun, 24 Nov 2019 11:17:54 -0500 Subject: [cinder] Ussuri Shanghai PTG summary Message-ID: <86d00be8-071b-b168-99a4-9be3512ec923@gmail.com> Sorry for the delay. A summary of what happened at the Cinder part of the Shanghai PTG is available: https://wiki.openstack.org/wiki/CinderUssuriPTGSummary Some topics are flagged for follow-up at this week's Cinder Virtual PTG, but any topics that you think could use more discussion are fair game to add to the Virtual PTG planning etherpad: https://etherpad.openstack.org/p/cinder-ussuri-virtual-ptg-planning The list of topics for the Virtual PTG is unordered right now. I'll send an email out tonight (New York time) when I've got them ordered so you'll know what to expect (roughly). cheers, brian From aj at suse.com Sun Nov 24 18:59:53 2019 From: aj at suse.com (Andreas Jaeger) Date: Sun, 24 Nov 2019 19:59:53 +0100 Subject: [kolla] PTG summary In-Reply-To: References: Message-ID: On 24/11/2019 15.12, Alfredo De Luca wrote: > > Hi Andreas.  > Does it mean ceph-ansible will no longer be maintained? From your link > it seems that you can activate a module ansible ...what does that mean > exactly? I don't know all those details, you might want to reach out to the Ceph community, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From alfredo.deluca at gmail.com Sun Nov 24 19:09:43 2019 From: alfredo.deluca at gmail.com (Alfredo De Luca) Date: Sun, 24 Nov 2019 20:09:43 +0100 Subject: [kolla] PTG summary In-Reply-To: References: Message-ID: Thanks On Sun, Nov 24, 2019 at 7:59 PM Andreas Jaeger wrote: > On 24/11/2019 15.12, Alfredo De Luca wrote: > > > > Hi Andreas. > > Does it mean ceph-ansible will no longer be maintained? From your link > > it seems that you can activate a module ansible ...what does that mean > > exactly? > > I don't know all those details, you might want to reach out to the Ceph > community, > > Andreas > -- > Andreas Jaeger aj at suse.com Twitter: jaegerandi > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 > -- *Alfredo* -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel at mlavalle.com Sun Nov 24 19:42:10 2019 From: miguel at mlavalle.com (Miguel Lavalle) Date: Sun, 24 Nov 2019 13:42:10 -0600 Subject: [neutron] Bug deputy report week of November 18th Message-ID: Hi, I was the bugs deputy for the week of November 18th. Here's the list of reported bugs: High: - https://bugs.launchpad.net/neutron/+bug/1853223 [VPNaaS]: Python3 RuntimeError: dictionary changed size during iteration Fix in progress: https://review.opendev.org/#/c/695130 - https://bugs.launchpad.net/neutron/+bug/1853603 SG rules not enforced This happens with Kuryr tests. Per https://bugzilla.redhat.com/show_bug.cgi?id=1688323, Nate Johnson is reported working on it in the Red Hat bugzilla Medium: - https://bugs.launchpad.net/neutron/+bug/1853171 Deprecate and remove any "ofctl" code in Neutron and related projects ralonsoh reported it and working on it - https://bugs.launchpad.net/neutron/+bug/1853637 Assign floating IP to port owned by another tenant is not override-able with RBAC policy Reported by Octavia team. Needs owner Low: - https://bugs.launchpad.net/neutron/+bug/1853582 ovs_all_ports option used by ovs cleanup has a misleading help string Requires further investigation: - https://bugs.launchpad.net/neutron/+bug/1853613 VMs don't get ip from dhcp after compute restart Submitter was asked for more information Incomplete: - https://bugs.launchpad.net/neutron/+bug/1853632 designate dns driver does not use domain settings for auth More information requested from submitter. Potential fix proposed: https://review.opendev.org/#/c/695726 Invalid: - https://bugs.launchpad.net/neutron/+bug/1853089 openstack env in Stein, CLI or Dashboard response slowly Report is too vague. More data requested -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Sun Nov 24 21:31:16 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Sun, 24 Nov 2019 16:31:16 -0500 Subject: [nova] splitting "ERROR" into 2 states Message-ID: Hi everyone, This is just a very open ended discussion that might bring up some interesting ideas, either that I'm going the wrong way about this, or perhaps this is something we need to think more about. As we go about increasing the instrumentation of the clouds we run, one of the interesting ideas was to measure the "ERROR" instance rate to see if more VMs than usual are hitting ERROR state. The problem with this right now is that there are a few factors at the moment where an instance can hit an ERROR state which are *not* cause of concern for the operator. For example, if you're booting an instance from a volume and you're at your quota, the instance will fail to boot, end up in "ERROR" state. If we're instrumenting that, we are likely going to be alerted on a high # of ERROR state instances but there's not much that we can do about it realistically. However, if we're getting a lot of ERROR instances because of "NoValidHost" or because some other valid failures such as in libvirt or RBD, then we'd probably want to be alerted on those. Does anyone have any ideas on how we can either better instrument this, or perhaps seeing how inside Nova, we have a "system error" and a "user error" Thanks :) Mohammed From rosmaita.fossdev at gmail.com Sun Nov 24 23:17:24 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Sun, 24 Nov 2019 18:17:24 -0500 Subject: [cinder] Virtual PTG reminder--Monday 25 Nov at 1500 UTC Message-ID: <3820f7a1-b515-b606-3ae2-a7bf6934f072@gmail.com> This is a quick reminder that we're having the first session of the Cinder Virtual PTG for Ussuri from 1500-1700 UTC on Monday 25 November. We'll meet in this BlueJeans room: https://bluejeans.com/3228528973 The session will be recorded. The rough schedule is here: https://etherpad.openstack.org/p/cinder-ussuri-virtual-ptg-planning It's not too late to add a topic if there's something you'd like to have discussed. The second session is Wednesday 27 November from 1500-1700 UTC. See you there! brian From gmann at ghanshyammann.com Mon Nov 25 00:17:59 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sun, 24 Nov 2019 18:17:59 -0600 Subject: [qa][form][ptg] QA Summary for Forum & PTG Message-ID: <16e9fec1691.e7e30f54197512.3188395463185762355@ghanshyammann.com> Hello Everyone, I am summarizing the QA PTG and Forum sessions held in Shanghai. * OpenStack QA – Project Update:  We gave the updates on what we finished on Train and draft plan for the Ussuri cycle. due to fewer contributors in QA, Train cycle activities are decreased as compare to Stein.  We tried to maintain the daily QA activity and finished a few important things. Slides are uploaded on the schedule site. * Users / Operators adoption of QA tools / plugins : Etherpad: https://etherpad.openstack.org/p/PVG-forum-qa-ops-user-feedback. This is another useful session for QA to get feedback as well as information about downstream tooling. Few tools we talked about: - Fault injection tests - https://opendev.org/x/tobiko/ - whitebox testing(Whitebox Tempest Plugin) - high availability testing: https://review.opendev.org/#/c/443504/ Run patrole jobs in the neutron upstream(Keystone was interested before) One big concern shared from a few people about a long time to get merged tempest patches. One idea to solve this is to bring critical reviews in Office hours. QA PTG: It was a small gathering this time for one day for PTG on Wednesday. Even with the small number of developers, we had good discussions on many topics.  Etherpad: https://etherpad.openstack.org/p/shanghai-ptg-qa * Train Retrospective: Retrospective bought up the few key issues where we need improvement. We collected the below action items including bug triage. Untriage QA bugs are increasing day by day. Action: - need to discuss blacklist plugins and how to notify and remove them if dead - gmann - start the process of community-goal work in QA - masayuki - sprint for bug triage with number of volunteers -  - (chandankumar)Include one bug in each sprint in TripleO CI tempest member - Traige the new bug and then pick the bug based on priority - For tripleo Ci team we will track here: https://tree.taiga.io/project/tripleo-ci-board/ - chandankumar * How to deal with an aging testing stack: With testtools being not so active, we need to think on the alternate or best suitable options to solve this issue. We discussed the few options which need to be discussed further on ML. - Can we fork the dependecies of testtools in Temepst or stestr ?  - As we are removing the py2.7 support in tempest, we can completly ignore/remove the unittest2 things but that is not case for testtools ? - Similar example done in different project: https://github.com/plone/Products.CMFPlone/issues/1882 - Remove the support of unittest2 from testtools ? py2.7 is going away from everywhere and testools can create tag or something for py2.7 usage ? - Since Python2 is going EOL on 01st Jan, 2020, so let's create a tag and remove the unitest2 with unitest for python3 release only Action: - Document the official supported test runner by Tempest. -  Soniya Vyas/Chandan Kumar - ML to discuss the above options - gmann  * Remove/migrate the .testr.conf to .stestr: 60 openstack/* repositories have .stestr.conf AND .testr.conf. We don't need to have both files at least. Let's take a look some of them and make a plan to remove if we can. - http://paste.openstack.org/show/785832/ - If both exist then remove the .testr.conf and Then verify that .stestr conf has the correct test path. If only .testr.conf then migrate to stestr.conf - We need to figure out the purpose of pbr .testr.conf code before removing. Is this just old codes or necessary? - https://opendev.org/openstack/pbr/src/branch/master/pbr/hooks/commands.py#L53 - https://opendev.org/openstack/pbr/src/branch/master/pbr/testr_command.py * Moving subunit2html script from os-testr: Since os-testr runner piece in os-testr project is deprecated but subunit2html project still exists there, it is widely used across the OpenStack ecosystem, Can we move to somewhere else?  I do not find any benefits to move those scripts to other places. We asked chandan to open an issue on stestr to discuss moving to stestr repo. mtreinish replied on this: os-testr meant to be the place in openstack that we could host the ostestr runner wrapper/script subunit2html, generate_subunit, etc. Just because ostestr is deprecated and being removed doesn't mean it's not the proper home for those other tools. * Separate integrated services tests can be used in TriplO CI: TriplO CI maintains a separate file to run dependent tests per service. Tempest has dependent services tox and integrated jobs and the same can be used in TriplO CI. For example: - blacklist file for networking testing [1]. - tox for networking[2]. * RBAC testing strategy: This was a cross-project strategy for positive/negative testing for system scope and new defaults in keystone. Keystone has implemented the new defaults and system scope in its policy and added a unit test to cover the new policies.  Nova is implementing the same in Ussuri cycle. As discussed in Denver PTG also, Tempest will implement the new credential for all 9 personas available in keystone.  Slowly migrate the tests start using the new policies. That will be done via a flag switching Tempest to use system scope or new defaults and that flag will be false to keep using the old policies for stable branch testing. We can use patrole tests or implement new tests in the Tempest plugin and verify the response. Both have the issue of performing the complete operation which is not required always for policy verification.  Running full functional tests is expensive and duplicates existing tests. One solution for that (we talked about it in Denver PTG also) is via some flag like os-profiler by just do the policy check and return the API response with specific return code. AGREE: - Tempest to provide the all 9 personas available from keystone. Slowly migrate Tempest existing tests to run with new policies. - We agreed to have two ways to test the policy: - Tempest like tests in tempest plugins with the complete operation and verify the things on response, not just policy return code. It depends on the project if they want to implement such tests. - Unit/Functional tests on the project's side. - Document the both way so that project can adopt the best suitable one. * How to remove tempest plugin sanity BLACKLIST: We have Tempest plugin blacklist. It should be removed in the future if possible. Some of them shouldn't be as a tempest-plugin because they're just neutron studium things which already moved to neutron-tempest-plugin but still exiting in repo also. Some of them are less active.  Remove below plugins from BLACKLIST: - x/group-based-policy - https://opendev.org/x/group-based-policy  - openstack/networking-generic-switch needs to be checked (setup.py/cfg?) Action:  - Add the start date in blacklist doc so that we can know how long a plugin is blacklisted.  - After 60 days: we send the email notification to openstack-discuss, PTL, maitainer and TC to either fix it or remove it from the governance.  * Python 2.7 drop plan: We discussed the next steps to drop the py2 from Tempest and other QA tools. AGREE: - Will be doing before milestone 2 - Create a new tag for python-2.7 saying it is the last tag and document that the Tempest tag needs Train u-c.  - Test the Tempest tag with Train u-c, if fail then we will disucss.  - TripleO and OSA is going to use CentOS 8 for train and master * Adding New glance tests to Tempest: We discussed on testing the new glance v2 api and feature. Below are the glance features and agreed points on how to test them. - Hide old images: Test can be added in Tempest. Hide the image and try to boot the server from the image in scenario tests.  - Delete barbican secrets from glance images: This test belongs to barbican-tempest-plugin which can be run as part of the barbican gate using an existing job. Running barbican job on glance gate is not required, we can add a new job (multi stores) on glance gate which can run this + other new features tests.  - Multiple stores: DevStack patch is already up, add a new zuul job to set up multiple stores and run on the glance gate with api and scenario tests. gmann to setup the zuulv3 job for that. * Tempest volunteers for reviewing patches: We've noticed that the amount of merged patches in October is less than in September and much less than it was during the summer. This has been brought in feedback sessions also. There is no perfect solution for this. Nowadays QA has less active core developers. We encourage people to bring up the critical or stuck patches in office hours. * Improving Tempest cleanup: Tempest cleanup is not so stable and not a perfect design. We have spec up to redesign that but could not get a consensus on that. I am ok to move with resource prefix with UUID.  We should extend the cleanup tool for plugins also. * Ussuri Priority & Planning: This was the last session for the PTG which could not happen on Wed due to strict time-up policy of the conference place which I really liked. Time-based working is much needed for IT people :). We met on Thursday morning in coffee area and discussed about priority for Ussuri cycle. QA Ussuri Priority Etherpad[3] has the priority items with the assignee. [1] https://github.com/openstack/tempest/blob/9cdd5250615bb6ab26a1a9a80743a03cc81b3b4a/tools/tempest-integrated-gate-networking-blacklist.txt [2] https://github.com/openstack/tempest/blob/9cdd5250615bb6ab26a1a9a80743a03cc81b3b4a/tox.ini#L121 [3] https://etherpad.openstack.org/p/qa-ussuri-priority -gmann From liang.a.fang at intel.com Mon Nov 25 01:30:02 2019 From: liang.a.fang at intel.com (Fang, Liang A) Date: Mon, 25 Nov 2019 01:30:02 +0000 Subject: [cinder] weekly meeting time and location change In-Reply-To: <227ed6e3-6dc3-6a87-8d11-c44358baf570@gmail.com> References: <227ed6e3-6dc3-6a87-8d11-c44358baf570@gmail.com> Message-ID: Thanks Brian. It is a big surprise that poll result supports meeting time change. Hope more Asia contributors to attend the weekly meeting. Regards Liang -----Original Message----- From: Brian Rosmaita Sent: Thursday, November 21, 2019 10:14 PM To: openstack-discuss at lists.openstack.org Subject: [cinder] weekly meeting time and location change Beginning with the 4 December 2019 meeting, the Cinder weekly meeting will be held as follows: Day: Wednesday Time: 1400 UTC Location: #openstack-meeting-4 Agenda: https://wiki.openstack.org/wiki/CinderMeetings Updated ICS file: http://eavesdrop.openstack.org/calendars/cinder-team-meeting.ics Note that the *time* and *IRC chat room* have changed. The day, agenda, meeting log locations, etc., remain the same. Thanks to Liang Fang for initiating this change, which will bring our meeting into a more reasonable time frame for contributors in Asia time zones. cheers, brian From sorrison at gmail.com Mon Nov 25 05:32:50 2019 From: sorrison at gmail.com (Sam Morrison) Date: Mon, 25 Nov 2019 16:32:50 +1100 Subject: [neutron][OVN] Multiple mechanism drivers Message-ID: <8B3A471E-B855-4D1C-AE52-080D4B0D92A9@gmail.com> We are looking at using OVN and are having some issues with it in our ML2 environment. We currently have 2 mechanism drivers in use: linuxbridge and midonet and these work well (midonet is the default tenant network driver for when users create a network) Adding OVN as a third mechanism driver causes the linuxbridge and midonet networks to stop working in terms of CRUD operations etc. It looks as if the OVN driver thinks it’s the only player and is trying to do things on ports that are in linuxbridge or midonet networks. Am I missing something here? (We’re using Stein version) Thanks, Sam From skaplons at redhat.com Mon Nov 25 07:51:44 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 25 Nov 2019 08:51:44 +0100 Subject: [neutron][OVN] Multiple mechanism drivers In-Reply-To: <8B3A471E-B855-4D1C-AE52-080D4B0D92A9@gmail.com> References: <8B3A471E-B855-4D1C-AE52-080D4B0D92A9@gmail.com> Message-ID: <20191125075144.vhppi2bnnnfyy57s@skaplons-mac> Hi, I think that this may be true that networking-ovn will not work properly with other drivers. I don't think it was tested at any time. Also the problem may be that when You are using networking-ovn than whole neutron topology is different. There are different agents for example. Please open a bug for that for networking-ovn. I think that networking-ovn team will take a look into that. On Mon, Nov 25, 2019 at 04:32:50PM +1100, Sam Morrison wrote: > We are looking at using OVN and are having some issues with it in our ML2 environment. > > We currently have 2 mechanism drivers in use: linuxbridge and midonet and these work well (midonet is the default tenant network driver for when users create a network) > > Adding OVN as a third mechanism driver causes the linuxbridge and midonet networks to stop working in terms of CRUD operations etc. > It looks as if the OVN driver thinks it’s the only player and is trying to do things on ports that are in linuxbridge or midonet networks. > > Am I missing something here? (We’re using Stein version) > > > Thanks, > Sam > > > -- Slawek Kaplonski Senior software engineer Red Hat From alexandre.arents at corp.ovh.com Mon Nov 25 09:59:24 2019 From: alexandre.arents at corp.ovh.com (Alexandre Arents) Date: Mon, 25 Nov 2019 10:59:24 +0100 Subject: [nova] unshelve image_ref bugs In-Reply-To: <43ee8fa3-0f2b-a66c-3d84-f1ced282eb87@gmail.com> References: <20191122140519.zldtzhyp7ptdsm2h@corp.ovh.com> <43ee8fa3-0f2b-a66c-3d84-f1ced282eb87@gmail.com> Message-ID: <20191125095924.kjszavhmv4bwqcfb@corp.ovh.com> Interesting one, that's doing 50% of the job, we just need to implement the missing imagebackend.py:Qcow2.flatten() method, checking it. Matt Riedemann wrote on ven. [2019-nov.-22 11:15:46 -0600]: > On 11/22/2019 8:05 AM, Alexandre Arents wrote: > > C) Change create_image()/imagebackend driver behavior, > > to create a flatten qcow2 file in case of unshelving. > > flattening disk may be a solution because there will be no more "orphan backing file". > > (Basicly doing like "flat" backend driver except we need to stay in qcow2 instead of RAW) > > PROS: > > -we keep orignal unshelve behavior/assumption > > CONS: > > -It means that in your infra configured in COW some instances will be in "qcow2 flat", > > Flat qcow2 instance works great (livemigration/resize..). Would all installation ok with that ? > > Ok it seems a little odd to ask COW driver to not do COW in some case. Alternatilevy we can > > force using flat driver if unshelving, but we need to change flat driver to support also qcow2. > > D) During spawn() if unshelving we convert "qcow2 disk with backing file" to a "flatten qcow2 disk", > > just after self._create_image(). > > It looks more like a workaround than a long term solution as it need to convert something created before, > > that do not meet the need(better to do C). > > Does this already fix your problem? > > https://review.opendev.org/#/q/If3c9d1de3ce0fe394405bd1e1f0fa08ce2baeda8 > > -- > > Thanks, > > Matt > -- Alexandre Arents From mark at stackhpc.com Mon Nov 25 10:03:26 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 25 Nov 2019 10:03:26 +0000 Subject: [kolla] PTG summary In-Reply-To: References: Message-ID: On Fri, 22 Nov 2019 at 21:30, Andreas Jaeger wrote: > > On 22/11/2019 17.47, Mark Goddard wrote: > > [...] > > ## Ceph Ansible > > > > We are continuing to investigate Ceph Ansible as an alternative to our > > native Ceph deployment. The work to migrate from an existing kolla > > deployment is still ongoing. There are some potential blockers in the > > form of no Ubuntu container image support in ceph-ansible, and no ARM > > container images published by the ceph-container project. > > I suggest to talk with the Ceph community before spending more work > here. ceph-ansible is getting replaced in March by "SSH orchestrator", see > > https://docs.google.com/presentation/d/1JpcETNXpuB1JEuhX_c8xtgNnv0gJhSaQUffM7aRjUek/edit#slide=id.g78e5cb0e10_0_0 Thanks for bringing this up Andreas, we'll reevaluate our approach here. > > Andreas > -- > Andreas Jaeger aj at suse.com Twitter: jaegerandi > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From mark at stackhpc.com Mon Nov 25 10:15:56 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 25 Nov 2019 10:15:56 +0000 Subject: [Kolla] Configure docker service fails during bootstrap_servers In-Reply-To: References: Message-ID: On Fri, 22 Nov 2019 at 15:54, Radosław Piliszek wrote: > > You are trying to deploy to localhost which does not have interface named eth0. > > Please configure globals.yml or inventory and retry. In particular, these variables affect network configuration: https://docs.openstack.org/kolla-ansible/ocata/user/production-architecture-guide.html#interface-configuration. > > -yoctozepto > > pt., 22 lis 2019 o 16:09 Bhupathi, Ramakrishna > napisał(a): > > > > Folks, > > > > Looking for help on this. When I am attempting to install OpenStack Kolla (development mode) I am running into this error when running bootstrap_servers > > > > > > > > ./kolla-ansible -i ../../multinode bootstrap-servers > > > > > > > > TASK [baremetal : Configure docker service] **************************************************************************************************** > > > > fatal: [localhost]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_eth0'"} > > > > > > > > PLAY RECAP ************************************************************************************************************************************* > > > > localhost : ok=29 changed=3 unreachable=0 failed=1 skipped=16 rescued=0 ignored=0 > > > > > > > > Command failed ansible-playbook -i all-in-one -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e kolla_action= bootstrap-servers /home/ubuntu/kolla-env/share/kolla-ansible/ansible/kolla-host.yml > > > > > > > > Can someone tell me what is going wrong here? > > > > > > > > --RamaK > > > > The contents of this e-mail message and > > any attachments are intended solely for the > > addressee(s) and may contain confidential > > and/or legally privileged information. If you > > are not the intended recipient of this message > > or if this message has been addressed to you > > in error, please immediately alert the sender > > by reply e-mail and then delete this message > > and any attachments. If you are not the > > intended recipient, you are notified that > > any use, dissemination, distribution, copying, > > or storage of this message or any attachment > > is strictly prohibited. > From nate.johnston at redhat.com Mon Nov 25 13:13:54 2019 From: nate.johnston at redhat.com (Nate Johnston) Date: Mon, 25 Nov 2019 08:13:54 -0500 Subject: [neutron] Bug deputy report week of November 18th In-Reply-To: References: Message-ID: <20191125131354.mva6zkv7ematicxh@firewall> On Sun, Nov 24, 2019 at 01:42:10PM -0600, Miguel Lavalle wrote: > I was the bugs deputy for the week of November 18th. Here's the list of > reported bugs: Thanks Miguel! I'm taking the helm now. Nate From marcin.juszkiewicz at linaro.org Mon Nov 25 13:43:43 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Mon, 25 Nov 2019 14:43:43 +0100 Subject: [kolla][tripleo] Infra style images Message-ID: <4e05834b-9654-0889-68f8-249ac869f902@linaro.org> One of things we have on a list of things to do during Ussuri cycle is implementation of 'infra' images. BP: https://blueprints.launchpad.net/kolla/+spec/infra-images # What are infra images? Images that are always built from binary packages (or are Java monsters). We have about 70 such ones from quick check. All those Ceph, Prometheus, MariaDB, cron, chrony, storm, sensu etc ones. ## libvirt image There is 'nova-libvirt' image. Contains libvirt daemon (with qemu and all required packages) so it would get renamed to 'libvirt'. # Building The idea is that 'infra' will be a new build type (like we have 'binary', 'source' etc). With all source base images marked as unbuildable so there will be no 'debian-infra-nova-compute' one. On the other hand building of 'binary'/'source' type images would lock out all 'infra' ones to not get images with names like 'debian-source-ceph-mon'. # Pros - Clean split between OpenStack components (binary/source) and infrastructure needed to get them running (infra). - Less images to publish on CI. Infra ones can be built weekly as they do not change much. - No more questions how did we built ceph-mon from source ;D # Cons - We need to change kolla-ansible, tripleo and maybe some other projects' code to use new type of images. - Migration from previous releases would be more complicated due to image renames. From mriedemos at gmail.com Mon Nov 25 13:51:02 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 25 Nov 2019 07:51:02 -0600 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: <1574439261.31688.6@est.tech> References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> <1574439261.31688.6@est.tech> Message-ID: <74a68d39-f46d-61a7-7ba1-6d2cd765e6df@gmail.com> On 11/22/2019 10:14 AM, Balázs Gibizer wrote: > Opened a blueprint[1], pushed up a small spec[2], and a WIP > implementation[3] for the new API. > > Cheers, > gibi > > [1] > https://blueprints.launchpad.net/nova/+spec/filter-hypervisors-by-service-host > [2]https://review.opendev.org/#/c/695716 The contentious point on the spec seems to be that it currently proposes to keep the same (incorrect) behavior where if filtering by hypevisor_hostname_pattern yields no results the API returns a 404 error today when it should just return an empty list. The question in the spec is then if when we add service_host filtering, if there are no results do we: 1. 404 (that's what the spec proposes to be consistent with existing [odd] behavior) 2. Return an empty list and in the same microversion change that 404 -> empty response behavior even if not filtering by service_host (so just filtering by hypervisor_hostname_pattern). From the review it sounds like most people (myself, Alex and Chris) think we should go with the latter and not perpetuate the former sin. -- Thanks, Matt From C-Ramakrishna.Bhupathi at charter.com Mon Nov 25 13:54:13 2019 From: C-Ramakrishna.Bhupathi at charter.com (Bhupathi, Ramakrishna) Date: Mon, 25 Nov 2019 13:54:13 +0000 Subject: [Kolla] RabbitMQ failure during deploy (Openstack with Kolla) Message-ID: <984b2c2f9ce04522ae97b16f7d12d162@NCEMEXGP032.CORP.CHARTERCOM.com> Hey , I am evaluating Openstack with Kolla (have the latest ) and following the steps. I see that rabbitmq fails to start up and (keeps restarting). Essentially the script fails in deploy. I have a single node all in one config. Can someone tell me what the cause for this failure? RUNNING HANDLER [rabbitmq : Waiting for rabbitmq to start on first node] **************************************************************************************************************** fatal: [localhost]: FAILED! => {"changed": true, "cmd": "docker exec rabbitmq rabbitmqctl wait /var/lib/rabbitmq/mnesia/rabbitmq.pid", "delta": "0:00:07.732346", "end": "2019-11-25 13:33:12.102342", "msg": "non-zero return code", "rc": 137, "start": "2019-11-25 13:33:04.369996", "stderr": "", "stderr_lines": [], "stdout": "Waiting for 'rabbit at kolla-ubuntu'\npid is 6", "stdout_lines": ["Waiting for 'rabbit at kolla-ubuntu'", "pid is 6"]} RUNNING HANDLER [rabbitmq : Restart rabbitmq container (rest of nodes)] ***************************************************************************************************************** NO MORE HOSTS LEFT ********************************************************************************************************************************************************************** PLAY RECAP ****************************************************************************************************************************************************************************** localhost : ok=95 changed=2 unreachable=0 failed=1 skipped=78 rescued=0 ignored=0 Command failed ansible-playbook -i ./all-in-one -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e kolla_action=deploy /home/ubuntu/venv/share/kolla-ansible/ansible/site.yml Accessing the container logs for rabbitmq .. this is what I see BOOT FAILED =========== Error description: {could_not_start,rabbitmq_management, {rabbitmq_management, {bad_return, {{rabbit_mgmt_app,start,[normal,[]]}, {'EXIT', {{could_not_start_listener, [{port,15672}], {shutdown, {failed_to_start_child,ranch_acceptors_sup, {listen_error,rabbit_web_dispatch_sup_15672,eaddrinuse}}}}, {gen_server,call, [rabbit_web_dispatch_registry, {add,rabbit_mgmt, [{port,15672}], #Fun, --RamaK E-MAIL CONFIDENTIALITY NOTICE: The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Mon Nov 25 14:05:41 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 25 Nov 2019 08:05:41 -0600 Subject: [nova] splitting "ERROR" into 2 states In-Reply-To: References: Message-ID: On 11/24/2019 3:31 PM, Mohammed Naser wrote: > For example, if you're booting an instance from a volume and you're at > your quota, the instance will fail to boot, end up in "ERROR" state. > If we're instrumenting that, we are likely going to be alerted on a > high # of ERROR state instances but there's not much that we can do > about it realistically. > We could probably eliminate most of that specific scenario by checking volume quota in the API like we do for port quota so the user would get a 403 error rather than one or more instances that failed to build in ERROR status. I know this isn't the gist of your email, but my point is we have historically just punted and set instances to ERROR status in a lot of cases but that might not necessarily be correct, e.g. [1]. So drilling in on common cases where the operation fails and the instance is just put to ERROR status is worthwhile IMO. If you reduce the number of times an instance goes to ERROR status for predictable reasons, then your alerts go down and when you do get alerted you should have a smaller set of things that you can reliably filter into an "ignore" bucket, like quota-related failures. > > Does anyone have any ideas on how we can either better instrument > this, or perhaps seeing how inside Nova, we have a "system error" and > a "user error" I would think there are also versioned notifications involved in these operations with error payloads that you could be inspecting to see if it's really something for which you need to be paged. That might get pretty whack-a-mole though since lots of operations can fail in lots of ways and trying to whitelist that would be hard (see the conversation in [2]). Every operation will have instance action events associated with it as well and if one of the events fails, e.g. compute_prep_resize fails due to a resize resource claim failure on the dest compute, the exception traceback will be recorded in the event and available in the os-instance-actions REST API for admins by default policy. So like error notifications, mining instance action events might be something to look into. [1] https://bugs.launchpad.net/nova/+bug/1811235 [2] https://bugs.launchpad.net/nova/+bug/1742102 -- Thanks, Matt From info at dantalion.nl Mon Nov 25 15:16:49 2019 From: info at dantalion.nl (info at dantalion.nl) Date: Mon, 25 Nov 2019 16:16:49 +0100 Subject: [olso][pbr][i18n] Using setup.cfg [files] data_files to install localization files Message-ID: <864f4dd9-a78e-d969-5fa6-f0a096a4a59a@dantalion.nl> Hello everyone :), I was wondering what the preferred method to install localization files is. I can think of some probably solutions such as: 1: Including the locale files as part of a package for the target systems package manager (pacman, yum, apt, etc). 2: adding the locale files to the [files] directive in setup.cfg: I hope someone can answer my question. Kind regards, Corne Lukken From smooney at redhat.com Mon Nov 25 15:17:44 2019 From: smooney at redhat.com (Sean Mooney) Date: Mon, 25 Nov 2019 15:17:44 +0000 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: <74a68d39-f46d-61a7-7ba1-6d2cd765e6df@gmail.com> References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> <1574439261.31688.6@est.tech> <74a68d39-f46d-61a7-7ba1-6d2cd765e6df@gmail.com> Message-ID: <64fbaf8bb79dbb74d472f055405b0901bc51ca65.camel@redhat.com> On Mon, 2019-11-25 at 07:51 -0600, Matt Riedemann wrote: > On 11/22/2019 10:14 AM, Balázs Gibizer wrote: > > Opened a blueprint[1], pushed up a small spec[2], and a WIP > > implementation[3] for the new API. > > > > Cheers, > > gibi > > > > [1] > > https://blueprints.launchpad.net/nova/+spec/filter-hypervisors-by-service-host > > [2]https://review.opendev.org/#/c/695716 > > The contentious point on the spec seems to be that it currently proposes > to keep the same (incorrect) behavior where if filtering by > hypevisor_hostname_pattern yields no results the API returns a 404 error > today when it should just return an empty list. The question in the spec > is then if when we add service_host filtering, if there are no results > do we: > > 1. 404 (that's what the spec proposes to be consistent with existing > [odd] behavior) > > 2. Return an empty list and in the same microversion change that 404 -> > empty response behavior even if not filtering by service_host (so just > filtering by hypervisor_hostname_pattern). > > From the review it sounds like most people (myself, Alex and Chris) > think we should go with the latter and not perpetuate the former sin. for what its worth that would be my preference as well. > From mark at stackhpc.com Mon Nov 25 15:59:09 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 25 Nov 2019 15:59:09 +0000 Subject: [kolla] Forum project update slides Message-ID: Hi, During the PTG there were some requests for the slides from the Kolla project update forum session given by Surya and Jeffrey. https://docs.google.com/presentation/d/19Lj9WcSjjcs8f5ljL1r4HMIQFl4Nokepsij7dYuXZLg/edit?usp=sharing Please get in touch if you can't reach the above and would like to see the slides. Thanks, Mark From mark at stackhpc.com Mon Nov 25 16:08:25 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 25 Nov 2019 16:08:25 +0000 Subject: [Kolla] RabbitMQ failure during deploy (Openstack with Kolla) In-Reply-To: <984b2c2f9ce04522ae97b16f7d12d162@NCEMEXGP032.CORP.CHARTERCOM.com> References: <984b2c2f9ce04522ae97b16f7d12d162@NCEMEXGP032.CORP.CHARTERCOM.com> Message-ID: On Mon, 25 Nov 2019 at 13:54, Bhupathi, Ramakrishna wrote: > > Hey , I am evaluating Openstack with Kolla (have the latest ) and following the steps. I see that rabbitmq fails to start up and (keeps restarting). Essentially the script fails in deploy. I have a single node all in one config. > > > > Can someone tell me what the cause for this failure? > > > > > > RUNNING HANDLER [rabbitmq : Waiting for rabbitmq to start on first node] **************************************************************************************************************** > > fatal: [localhost]: FAILED! => {"changed": true, "cmd": "docker exec rabbitmq rabbitmqctl wait /var/lib/rabbitmq/mnesia/rabbitmq.pid", "delta": "0:00:07.732346", "end": "2019-11-25 13:33:12.102342", "msg": "non-zero return code", "rc": 137, "start": "2019-11-25 13:33:04.369996", "stderr": "", "stderr_lines": [], "stdout": "Waiting for 'rabbit at kolla-ubuntu'\npid is 6", "stdout_lines": ["Waiting for 'rabbit at kolla-ubuntu'", "pid is 6"]} > > > > RUNNING HANDLER [rabbitmq : Restart rabbitmq container (rest of nodes)] ***************************************************************************************************************** > > > > NO MORE HOSTS LEFT ********************************************************************************************************************************************************************** > > > > PLAY RECAP ****************************************************************************************************************************************************************************** > > localhost : ok=95 changed=2 unreachable=0 failed=1 skipped=78 rescued=0 ignored=0 > > > > Command failed ansible-playbook -i ./all-in-one -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e kolla_action=deploy /home/ubuntu/venv/share/kolla-ansible/ansible/site.yml > > > > > > Accessing the container logs for rabbitmq .. this is what I see > > > > BOOT FAILED > > =========== > > > > Error description: > > {could_not_start,rabbitmq_management, > > {rabbitmq_management, > > {bad_return, > > {{rabbit_mgmt_app,start,[normal,[]]}, > > {'EXIT', > > {{could_not_start_listener, > > [{port,15672}], > > {shutdown, > > {failed_to_start_child,ranch_acceptors_sup, > > {listen_error,rabbit_web_dispatch_sup_15672,eaddrinuse}}}}, Hi RamaK, here is your issue. RabbitMQ management fails to start up due to the port already being in use (eaddrinuse). Perhaps RabbitMQ is running on the host already? > > {gen_server,call, > > [rabbit_web_dispatch_registry, > > {add,rabbit_mgmt, > > [{port,15672}], > > #Fun, > > > > --RamaK > > The contents of this e-mail message and > any attachments are intended solely for the > addressee(s) and may contain confidential > and/or legally privileged information. If you > are not the intended recipient of this message > or if this message has been addressed to you > in error, please immediately alert the sender > by reply e-mail and then delete this message > and any attachments. If you are not the > intended recipient, you are notified that > any use, dissemination, distribution, copying, > or storage of this message or any attachment > is strictly prohibited. From openstack at fried.cc Mon Nov 25 16:45:25 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 25 Nov 2019 10:45:25 -0600 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: <64fbaf8bb79dbb74d472f055405b0901bc51ca65.camel@redhat.com> References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> <1574439261.31688.6@est.tech> <74a68d39-f46d-61a7-7ba1-6d2cd765e6df@gmail.com> <64fbaf8bb79dbb74d472f055405b0901bc51ca65.camel@redhat.com> Message-ID: >> 2. Return an empty list and in the same microversion change that 404 -> >> empty response behavior even if not filtering by service_host (so just >> filtering by hypervisor_hostname_pattern). >> >> From the review it sounds like most people (myself, Alex and Chris) >> think we should go with the latter and not perpetuate the former sin. > for what its worth that would be my preference as well. +1 From Dong.Ding at dell.com Mon Nov 25 03:31:45 2019 From: Dong.Ding at dell.com (Dong.Ding at dell.com) Date: Mon, 25 Nov 2019 03:31:45 +0000 Subject: manila share group replication support Message-ID: Hi, experts, I'm trying to implement the share replication feature for EMC Unity Manila driver. Unity isn't capable to promote a single share. The share must be promoted together with its share server. The problem is that we have several shares exported from one single server. So, promoting a share will cause all shares being promoted. My question is that is there any solution to promote several shares as they are in a group? I was trying to find something useful about 'manila group replication' ,only find Ocata Doc mentioned it, but no detail information: https://specs.openstack.org/openstack/manila-specs/specs/ocata/manila-share-groups.html And there is no code or commit history matches 'group replication' in Manila. Do you have any suggestions for our situation? Thanks, Ding Dong -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Mon Nov 25 17:48:15 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 25 Nov 2019 17:48:15 +0000 Subject: [kolla][openstack-ansible][tripleo] CentOS 8 host migration Message-ID: Hi, During the recent kolla PTG discussions [1], we covered CentOS 8 in some detail. For kolla there are several aspects to CentOS 8 support: * kolla-build execution host * container images * kolla-ansible execution host * kolla-ansible remote hosts In this context I am discussing the last of these - kolla-ansible remote hosts, i.e. the OS on the controllers and computes etc. Given that this could be quite a thorny problem, I'd like to propose that we collaborate between deployment projects. It is my understanding that upgrades from CentOS 7 to 8 will not be supported, and a reinstall is required. We set out some goals for the migration: * Migrate hosts from CentOS 7 to CentOS 8 * Upgrade from Train CentOS 7 containers to Ussuri CentOS 8 containers * Note: this means Ussuri containers (CentOS 8) don't land on CentOS 7 * Decouple the CentOS 7 to 8 and OpenStack upgrades, to avoid operator pain * Avoid excessive downtime There is a Red Hat policy [2] that suggests that if you mix host and container OS versions, it is safer for the host to be ahead of the container. This makes sense if you think about userland accessing kernel features. This leads us to the following migration path: * Start with Train release, CentOS 7 containers * Redeploy (rolling) hosts with CentOS 8 * Provision * Configure host OS * Redeploy Train containers with CentOS 7 * Upgrade to Ussuri release, CentOS 8 containers The logic we used to come up with this procedure is as follows: * Host must be upgraded to CentOS 7 before containers (going on policy) * Train containers don't support CentOS 8 base due to RDO * Ussuri containers don't support CentOS 7 base due to RDO * Want to separate CentOS 7 -> 8 migration from OpenStack upgrade This leads us to the conclusion that at least one release of kolla-ansible must bridge these worlds, since we will be deploying Train containers on CentOS 8. We could do either of: 1. Train k-a supports CentOS 8 hosts 2. Ussuri k-a supports deploying Train containers While 1. may require us to backport more than we would like, supporting deployment of multiple OpenStack releases could be challenging, so we expect to go with 1 here. There are some risks involved, including: * some Train services may not work on CentOS 8 * ceph? * OVS & neutron agents? * libvirt? * Ansible minimum version 2.8 for CentOS 8 (ref?) Hopefully we can save ourselves some effort and solve this together. [1] https://etherpad.openstack.org/p/kolla-ussuri-ptg [2] https://access.redhat.com/support/policy/rhel-container-compatibility Thanks, Mark From openstack at nemebean.com Mon Nov 25 17:53:33 2019 From: openstack at nemebean.com (Ben Nemec) Date: Mon, 25 Nov 2019 11:53:33 -0600 Subject: [keystone] Keystone Team Update - Week of 18 November 2019 In-Reply-To: References: Message-ID: <752cd229-8dc5-a4a9-d7d5-85c391fac284@nemebean.com> On 11/22/19 8:25 PM, Colleen Murphy wrote: > However, as many of you are aware, my job focus is changing and I need to be more strategic and selective about the activities I put my time into. These weekly updates consume a not-insignificant amount of my time, and so from now on I'll not plan on continuing it. Many people have given me the feedback that they find this newsletter useful, and so I encourage anyone who has some pulse on the keystone team's activities to take up this weekly summary (need not be a core). :-(, but thanks for doing these so consistently. Hopefully someone can pick up where you leave off. From openstack at nemebean.com Mon Nov 25 18:02:45 2019 From: openstack at nemebean.com (Ben Nemec) Date: Mon, 25 Nov 2019 12:02:45 -0600 Subject: [oslo] Virtual PTG Planning In-Reply-To: References: Message-ID: <3ad0f559-5f28-ec8e-870e-f8733573a1c1@nemebean.com> This happened and we had some good discussion, so I'm glad we were able to get so much of the team together. There were a few technical difficulties, but I feel like those happen anytime you start a new video conference. There are some notes on the etherpad that cover the decisions made on the call, and there were a few action items that came out of it. If you have any questions about what was discussed feel free to ping us on #openstack-oslo or reply here. Thanks. -Ben On 11/13/19 12:08 PM, Ben Nemec wrote: > Hi Osloers, > > Given that a lot of the team was not in Shanghai and we had a few topics > proposed that didn't make sense to discuss as a result, I would like to > try doing a virtual PTG the way a number of the other teams are. I've > added a section to the PTG etherpad[0] with some proposed details, but > in general I'm thinking we meet on Jitsi (it's open source) around the > time of the Oslo meeting. It's possible we might be able to get through > everything in the regularly scheduled hour, but if possible I'd like to > keep the following hour (1600-1700 UTC) open as well. If everyone's > available we could do it next week (the 18th) or possibly the following > week (the 25th), although that runs into Thanksgiving week in the US so > people might be out. I've created a Doodle poll[1] with selections for > the next three weeks so please respond there if you can make it any of > those days. If none of them work well we can discuss alternative options. > > Thanks. > > -Ben > > 0: https://etherpad.openstack.org/p/oslo-shanghai-topics > 1: https://doodle.com/poll/8bqiv865ucyt8499 > From aschultz at redhat.com Mon Nov 25 18:03:28 2019 From: aschultz at redhat.com (Alex Schultz) Date: Mon, 25 Nov 2019 11:03:28 -0700 Subject: [kolla][openstack-ansible][tripleo] CentOS 8 host migration In-Reply-To: References: Message-ID: On Mon, Nov 25, 2019 at 10:54 AM Mark Goddard wrote: > Hi, > > During the recent kolla PTG discussions [1], we covered CentOS 8 in > some detail. For kolla there are several aspects to CentOS 8 support: > > * kolla-build execution host > * container images > * kolla-ansible execution host > * kolla-ansible remote hosts > > In this context I am discussing the last of these - kolla-ansible > remote hosts, i.e. the OS on the controllers and computes etc. Given > that this could be quite a thorny problem, I'd like to propose that we > collaborate between deployment projects. > > It is my understanding that upgrades from CentOS 7 to 8 will not be > supported, and a reinstall is required. We set out some goals for the > migration: > > * Migrate hosts from CentOS 7 to CentOS 8 > * Upgrade from Train CentOS 7 containers to Ussuri CentOS 8 containers > * Note: this means Ussuri containers (CentOS 8) don't land on CentOS 7 > * Decouple the CentOS 7 to 8 and OpenStack upgrades, to avoid operator pain > * Avoid excessive downtime > > There is a Red Hat policy [2] that suggests that if you mix host and > container OS versions, it is safer for the host to be ahead of the > container. This makes sense if you think about userland accessing > kernel features. > > This leads us to the following migration path: > > * Start with Train release, CentOS 7 containers > * Redeploy (rolling) hosts with CentOS 8 > * Provision > * Configure host OS > * Redeploy Train containers with CentOS 7 * Upgrade to Ussuri release, CentOS 8 containers > > The logic we used to come up with this procedure is as follows: > > * Host must be upgraded to CentOS 7 before containers (going on policy) > * Train containers don't support CentOS 8 base due to RDO > I believe the plan is to have a Train version on CentOS8 after all the things get bootstrapped. Unfortunately the current target is trying to get master on centos8 with the time frame currently TBD. I'm personally hoping really soon. > * Ussuri containers don't support CentOS 7 base due to RDO > * Want to separate CentOS 7 -> 8 migration from OpenStack upgrade > > This leads us to the conclusion that at least one release of > kolla-ansible must bridge these worlds, since we will be deploying > Train containers on CentOS 8. We could do either of: > > 1. Train k-a supports CentOS 8 hosts > 2. Ussuri k-a supports deploying Train containers > > While 1. may require us to backport more than we would like, > supporting deployment of multiple OpenStack releases could be > challenging, so we expect to go with 1 here. > > There are some risks involved, including: > > * some Train services may not work on CentOS 8 > * ceph? > * OVS & neutron agents? > * libvirt? > * Ansible minimum version 2.8 for CentOS 8 (ref?) > These should be fine as we're testing these with actual rhel8 over in TripleO (at least builds/running) for Train+. You'll run into other issues with pacemaker if you currently use that and try to mix versions. Bigger risk: lack of docker packaging in the upstream which also means no docker-distribution for local repositories. > > Hopefully we can save ourselves some effort and solve this together. > > [1] https://etherpad.openstack.org/p/kolla-ussuri-ptg > [2] https://access.redhat.com/support/policy/rhel-container-compatibility > > Thanks, > Mark > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Mon Nov 25 18:19:20 2019 From: openstack at nemebean.com (Ben Nemec) Date: Mon, 25 Nov 2019 12:19:20 -0600 Subject: [requirements][taskflow] networkx bump is blocked on what looks to be a taskflow issue. In-Reply-To: References: <20191028200850.qgwvblgbr23pxgo3@mthode.org> <6a2fba34-14e9-d71e-ed97-8e724d0b6b7a@gmail.com> Message-ID: <19002196-45b9-d775-f4bb-60a990966128@nemebean.com> Now that the first Ussuri Oslo releases have happened, this fix should be available as Taskflow 3.8.0. On 10/28/19 4:38 PM, Michael Johnson wrote: > Yep, my Taskflow patch will need to merge for networkx 2.4 to be ok. > > Shameless review request: https://review.opendev.org/#/c/689611/ > > Michael > > On Mon, Oct 28, 2019 at 2:19 PM Matt Riedemann wrote: >> >> On 10/28/2019 3:08 PM, Matthew Thode wrote: >>> I've made a test review so people can see and test against, but it looks >>> like the bump from 2.3 to 2.4 is causing some issues. >>> >>> hits nova,neutron,octavia,glance,cinder >> >> We already fixed [1] in nova but that was due to not using >> upper-constraints on transitive deps (nova pulls it in because of the >> powervm driver code that uses taskflow). >> >> I don't think we (nova) are in any rush to adopt the latest >> networkx/taskflow code anytime soon so the blocker is low priority for us. >> >> Having said all that, it looks like there is a patch to make taskflow >> work with networkx 2.4 [2]. >> >> [1] https://bugs.launchpad.net/nova/+bug/1848499 >> [2] https://review.opendev.org/#/c/689611/ >> >> -- >> >> Thanks, >> >> Matt >> > From zbitter at redhat.com Mon Nov 25 18:24:19 2019 From: zbitter at redhat.com (Zane Bitter) Date: Mon, 25 Nov 2019 13:24:19 -0500 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> Message-ID: <6e7cff0a-a00b-b94d-fc14-5528ee30993e@redhat.com> On 20/11/19 9:21 am, Matt Riedemann wrote: > On 11/20/2019 1:18 AM, Zane Bitter wrote: >> Because the core stable team is necessarily not as familiar with the >> review/backport history of contributors in every project as the >> individual project stable team is with contributors in each project. > > This is assuming that each project has a stable core team already, which > a lot don't, that's why we get a lot of "hi I'm the PTL du jour on > project X now please make me stable core even though I've never reviewed > any stable branch changes before". Correct, what I'm suggesting is a middle-ground position so that in the cases where there is no project-specific stable team, that team has to be bootstrapped by the global stable-maint team in the same way that they do already. This avoids the situation you mention, where e.g. the TC appoints a PTL who does not even qualify to run for election (no commits to the project) and suddenly they're able to approve stable backports with no training or oversight. We're obliged to appoint a PTL for every project, whether they're qualified or not, but we should not be obliged to add unqualified people to the project stable core team. For the cases where there *is* already a project stable team, it allows folks who have already been vetted and who are closest to the data to have input on the decision, and it relieves the burden of 3 already overworked people who are currently required to do all of the vetting of new stable reviewers. cheers, Zane. From rosmaita.fossdev at gmail.com Mon Nov 25 19:21:40 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Mon, 25 Nov 2019 14:21:40 -0500 Subject: [cinder][ops][extended-maintenance-sig][public-cloud-sig][enterprise-wg] Cinder to EOL some branches Message-ID: <3c5fd6e6-8ae4-d300-71a7-97b22431cb3b@gmail.com> This is a courtesy notice that having received no responses to my email of 28 October [0] proposing to EOL some currently open Cinder branches, and following the policy articulated in [1], at today's Virtual PTG meeting the Cinder project team has decided to put the following stable branches into the End of Life state: driverfixes/mitaka driverfixes/newton stable/ocata stable/pike I will submit the paperwork to get this process moving one week from today (2 December 2019). cheers, brian [0] http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010385.html [1] https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life From feilong at catalyst.net.nz Mon Nov 25 20:48:04 2019 From: feilong at catalyst.net.nz (Feilong Wang) Date: Tue, 26 Nov 2019 09:48:04 +1300 Subject: [Magnum] Virtual PTG planning In-Reply-To: <2f35bc6c-b4bb-dbe7-c16d-ede34bc23914@catalyst.net.nz> References: <2f35bc6c-b4bb-dbe7-c16d-ede34bc23914@catalyst.net.nz> Message-ID: <70f7d7bc-8862-241e-4a21-47e89e56c7e3@catalyst.net.nz> Hi team, After discussed with other team members, the virtual PTG is schedule on: 1st Session:  28th Nov 9:00AM-11:00AM UTC 2nd Session: 4th Dec 9:00AM-11:00AM UTC Please add your topics on https://etherpad.openstack.org/p/magnum-ussuri-virtual-ptg-planning Thanks. On 19/11/19 10:46 AM, Feilong Wang wrote: > Hi team, > > As we discussed on last weekly team meeting, we'd like to have a virtual > PTG before the Xmas holiday to plan our work for the U release. The > general idea is extending our current weekly meeting time from 1 hour to > 2 hours and having 2 sessions with total 4 hours. My current proposal is > as below, please reply if you have question or comments. Thanks. > > Pre discussion/Ideas collection:   20th Nov  9:00AM-10:00AM UTC > > 1st Session:  27th Nov 9:00AM-11:00AM UTC > > 2nd Session: 4th Dec 9:00AM-11:00AM UTC > > -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- From colleen at gazlene.net Mon Nov 25 21:01:46 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Mon, 25 Nov 2019 13:01:46 -0800 Subject: [keystone] PTG recap Message-ID: The keystone team didn't formally meet at the IRL PTG in Shanghai, but we had virtual meetings before and after the event to discuss the upcoming cycle. Below is a summary of what we discussed, more detailed notes were kept in the main etherpad[1] and sub-pads linked from there. [1] https://etherpad.openstack.org/p/keystone-shanghai-ptg Cycle Retrospective ------------------- https://etherpad.openstack.org/p/keystone-train-retrospective We started the pre-PTG with our usual end-of-cycle retrospective. One outcome of this was reevaluating the effectiveness of structured weekly office hours: we weren't having them frequently because often no one proposed a topic for the meeting, so we decided to replace the weekly-but-only-sometimes face-to-face meetings with once per month bug squash meetings. We're also going to experiment with having a weekly rotation of team members who will act as the point person for helping triage incoming bugs and responding to users with support questions, in an effort to create some structure to spread the load of ensuring users have a positive encounter with the team. Although we tried to have milestone-ly checkpoints on feature work, that did not seem to be enough to keep up momentum throughout the cycle on important features, so we'll try to have more frequent checkpoints, more precise deadlines, and better decomposition of tasks. Policy ------ https://etherpad.openstack.org/p/PVG-keystone-forum-policy The purpose of the policy discussion was mainly to prepare for the forum session happening the following week, so most of the result is captured by the forum recap[2]. This was the discussion during which the idea of forming a pop-up team to work through an initial set of projects, rather than creating a community goal and attempting to get all projects done in one cycle, was proposed. We had also discussed creating some kind of "policy-ready" tool to help operators figure out which projects had completed their migration, but this seemed only marginally more useful than simply keeping a manually updated list. (We discussed other ideas for operator tooling at the forum.) [2] http://www.gazlene.net/shanghai-forum-ptg.html Limits ------ https://etherpad.openstack.org/p/keystone-ussuri-ptg-unified-limits We didn't have any design discussion on Unified Limits, since nothing much has changed since it was discussed at the last PTG. The only issue is that no one has been working on it in the last cycle, so the next steps were simply to followup with stakeholders and PoC authors, reassess the priority of the work, and reinstigate the work (this is now in progress). SCIM Extension -------------- We discussed a proposal to add the SCIM protocol[3 as an API extension. The desire is to be able to sync user state between the keystone SQL database (including shadow users) and an external identity provider. The main rationale is to be able to sync the state of the users in an external IdP with the state of the cloud resources they own, so that, for example, decommissioned users would not have orphaned cloud resources. However, resources are actually owned by keystone projects, not users, and there is a many-to-many relationship between users and projects, so there is no direct link between the state of a user in the keystone database and the state of a resource they touched. Moreover, although keystone has a v3-ext section in its API reference[4], we don't effectively support API extensions because we don't have a discovery mechanism for them, so for the sake of interoperability we discourage adding config-enabled API extensions. We suggested that this feature could instead potentially be implemented as a proxy service in front of keystone, and if such an implementation turned out to be infeasible, we could revisit adding this to keystone at a later time. [3] https://tools.ietf.org/html/rfc7644 [4] https://docs.openstack.org/api-ref/identity/v3-ext/ Cycle Planning -------------- https://tree.taiga.io/project/keystone-ussuri-roadmap/kanban This cycle will be lighter in keystone-specific feature work, as much of our focus will be on cross-project goals. The only new spec to be proposed for Ussuri will be the Alembic migration[5] which has been in the backlog for a while. We'll also continue work on unfinished specs from Train[6][7]. We're also accounting for time needed to work on the community goals[8], as well as improving our documentation and improving the CLI. [5] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/backlog/alembic.html [6] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/backlog/expiring-group-memberships.html [7] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/backlog/support-federated-attr.html [8] https://governance.openstack.org/tc/goals/selected/ussuri/index.html From mgagne at calavera.ca Mon Nov 25 22:11:42 2019 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Mon, 25 Nov 2019 17:11:42 -0500 Subject: [ops] Running VMware atop of OpenStack (eg: esxi) In-Reply-To: <20191122180337.GA304390@localhost.localdomain> References: <20191122172108.GA263619@localhost.localdomain> <20191122180337.GA304390@localhost.localdomain> Message-ID: On Fri, Nov 22, 2019 at 1:03 PM Paul Belanger wrote: > > On Fri, Nov 22, 2019 at 12:54:50PM -0500, Mathieu Gagné wrote: > > Hi, > > > > On Fri, Nov 22, 2019 at 12:21 PM Paul Belanger wrote: > > > > > > Greetings, > > > > > > I wanted to ask if anybody in the community is running VMware a top of > > > OpenStack in any capacity? In Ansible we are doing a POC as part of our > > > testing platform and running into some common ops issue. For example, > > > what do people usually do for things like using config-drive? Is there > > > any specific tooling you us to customize esxi images to run atop > > > OpenStack. > > > > > > Bascially, looking for humans to bounce questions off and see who else > > > is doing it. > > > > > > > We don't plan on running ESXi on top of KVM but alongside KVM on a > > baremetal. So I won't be able to address that specific use case. > > > > However I can share with you what we are planning to do with > > config-drive support. (we are actively working on it as I'm typing) > > > > At first, we tried to package Glean for VMware and install that > > package when we built the image. The issue with that is that the > > package isn't signed. This can affect support and prevent the overall > > system from being updated due to the presence of an unsigned package. > > You would need some kind of partnership with VMware to get it signed. > > > > We therefore switched to a firstboot script which can be configured in > > the kickstart when building the image: > > https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.esxi.upgrade.doc/GUID-61A14EBB-5CF3-43EE-87EF-DB8EC6D83698.html > > (see %firstboot section) > > > > You can have multiple %firstboot section both using busybox or python > > as an interpreter. > > > > We have multiple sections performing various steps: > > 1) Find the config-drive partition. Since it's baremetal, config-drive > > is a primary partition at the end of the disk. > > 2) Load the iso9660 module > > 3) Mount the config-drive > > 4) A whole Python section which parses config-drive and perform > > hostname/network/password/publickeys configuration. > > 5) General cleanup: unmount config-drive, unload iso9660 module > > > > Now since it's a firstboot script, VMware no longer complains about > > unsigned packages. > > > > I hope this help. > > > Nice, this is exactly the type of thing I was hoping for! Awesome! In > fact, my initial though too was also update glean adding vmware support, > good to know somebody else tried this first. > > As for first boot, this too in the approach we are taking. A teammate > has created a python script to do the same. My first question was, why > couldn't that script be glean? That is the part I am a little confused > about. To install Glean, you would usually create a .vib package and install that. But since the package isn't signed, VMware complains. This isn't good if you want to have support from VMware. That's why we are now using a firstboot script which is added and executed from the kickstart, avoiding this whole package signing thing. > Also, are you interested in working on this 'python' script together > upstream? I suspect we might be working to solve the same issue related > to hostname, network, SSH. I'm happy to help and share what we have done so far. I however don't have a lot of bandwidth. I used to work on that internal project months ago, that's why I still have some knowledge about the subject. A coworker took over it and I'm helping him whenever I can to answer his questions. He is supposed to work on it this week so I should be able to get more material to share in the following days. -- Mathieu From colleen at gazlene.net Mon Nov 25 22:48:23 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Mon, 25 Nov 2019 14:48:23 -0800 Subject: =?UTF-8?Q?[keystone][nova][barbican][neutron][cinder][tc][policy]_Propos?= =?UTF-8?Q?al_for_policy_popup_team?= Message-ID: Hi, At the Shanghai forum, we discussed forming a popup team around reforming default policies for several projects. I've formally proposed this team here: https://review.opendev.org/695993 I've also created a wiki page to document the effort and coordinate the work: https://wiki.openstack.org/wiki/Consistent_and_Secure_Default_Policies_Popup_Team This is the best way I can think of to organize a cross-project initiative like this, since I think the openstack specs repo is kind of defunct? I'm open to other ideas for coordinating this. I've gone ahead and named project-specific liaisons based on discussions at the forum, and also created a member list based on the list of volunteers in the forum etherpad. If you weren't at the forum but would like to be involved, please go ahead and add your name to the list on the wiki page. To form the popup team, we need a second co-lead: please let me know if you're interested. As hinted in the wiki page, I'm not seeking to lead this long-term, so there are actually two lead positions open. Let me know your thoughts in this thread. Colleen From colleen at gazlene.net Mon Nov 25 22:56:56 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Mon, 25 Nov 2019 14:56:56 -0800 Subject: [keystone] Upcoming team meetings and PTGs In-Reply-To: <06575513-3416-40cd-9a5e-8e7bc3a1044e@www.fastmail.com> References: <06575513-3416-40cd-9a5e-8e7bc3a1044e@www.fastmail.com> Message-ID: <87a19607-a233-47a7-b2da-7bac51f27a62@www.fastmail.com> Reminder - no meeting tomorrow (November 26) due to the US holiday week. Next week (December 3) we'll hold our regular meeting and are also planning to have a sync-up meeting regarding use of the Patrole policy testing framework following the regular team meeting. Colleen From gmann at ghanshyammann.com Tue Nov 26 01:04:44 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 25 Nov 2019 19:04:44 -0600 Subject: [keystone][nova][barbican][neutron][cinder][tc][policy] Proposal for policy popup team In-Reply-To: References: Message-ID: <16ea53d3e09.100373e5c247518.6873714586749636882@ghanshyammann.com> ---- On Mon, 25 Nov 2019 16:48:23 -0600 Colleen Murphy wrote ---- > Hi, > > At the Shanghai forum, we discussed forming a popup team around reforming default policies for several projects. I've formally proposed this team here: > > https://review.opendev.org/695993 > > I've also created a wiki page to document the effort and coordinate the work: > > https://wiki.openstack.org/wiki/Consistent_and_Secure_Default_Policies_Popup_Team > > This is the best way I can think of to organize a cross-project initiative like this, since I think the openstack specs repo is kind of defunct? I'm open to other ideas for coordinating this. > > I've gone ahead and named project-specific liaisons based on discussions at the forum, and also created a member list based on the list of volunteers in the forum etherpad. If you weren't at the forum but would like to be involved, please go ahead and add your name to the list on the wiki page. > > To form the popup team, we need a second co-lead: please let me know if you're interested. As hinted in the wiki page, I'm not seeking to lead this long-term, so there are actually two lead positions open. Thanks Colleen for composing this. I can co-lead and help you with this. I can serve as TC liaison also. -gmann > > Let me know your thoughts in this thread. > > Colleen > > From zhipengh512 at gmail.com Tue Nov 26 02:12:49 2019 From: zhipengh512 at gmail.com (Zhipeng Huang) Date: Tue, 26 Nov 2019 10:12:49 +0800 Subject: [keystone][nova][barbican][neutron][cinder][tc][policy] Proposal for policy popup team In-Reply-To: <16ea53d3e09.100373e5c247518.6873714586749636882@ghanshyammann.com> References: <16ea53d3e09.100373e5c247518.6873714586749636882@ghanshyammann.com> Message-ID: Thanks Colleen, As I shared with Cyborg team earlier, I think as a new project it would be beneficial that cyborg also participate in the popup team to get policy right from the get go On Tue, Nov 26, 2019 at 9:12 AM Ghanshyam Mann wrote: > > ---- On Mon, 25 Nov 2019 16:48:23 -0600 Colleen Murphy < > colleen at gazlene.net> wrote ---- > > Hi, > > > > At the Shanghai forum, we discussed forming a popup team around > reforming default policies for several projects. I've formally proposed > this team here: > > > > https://review.opendev.org/695993 > > > > I've also created a wiki page to document the effort and coordinate the > work: > > > > > https://wiki.openstack.org/wiki/Consistent_and_Secure_Default_Policies_Popup_Team > > > > This is the best way I can think of to organize a cross-project > initiative like this, since I think the openstack specs repo is kind of > defunct? I'm open to other ideas for coordinating this. > > > > I've gone ahead and named project-specific liaisons based on > discussions at the forum, and also created a member list based on the list > of volunteers in the forum etherpad. If you weren't at the forum but would > like to be involved, please go ahead and add your name to the list on the > wiki page. > > > > To form the popup team, we need a second co-lead: please let me know if > you're interested. As hinted in the wiki page, I'm not seeking to lead > this long-term, so there are actually two lead positions open. > > Thanks Colleen for composing this. I can co-lead and help you with this. I > can serve as TC liaison also. > > -gmann > > > > > Let me know your thoughts in this thread. > > > > Colleen > > > > > > > -- Zhipeng (Howard) Huang Principle Engineer OpenStack, Kubernetes, CNCF, LF Edge, ONNX, Kubeflow, OpenSDS, Open Service Broker API, OCP, Hyperledger, ETSI, SNIA, DMTF, W3C -------------- next part -------------- An HTML attachment was scrubbed... URL: From glongwave at gmail.com Tue Nov 26 03:45:53 2019 From: glongwave at gmail.com (ChangBo Guo) Date: Tue, 26 Nov 2019 11:45:53 +0800 Subject: [meta-sig][multi-arch] propose forming a Multi-arch SIG In-Reply-To: References: <20191122040518.GA10388@thor.bakeyournoodle.com> <4bf2c65c956f46e58cbef5528b46b3ef@AUSX13MPS308.AMER.DELL.COM> Message-ID: Glad to see that we plan to form a SIG to better support ARM64 and other architecture CPUs, I think we can do something in the SIG, functional tests, performance data share, share best practices. BTW, we also have slides for the forum , find it at https://docs.google.com/presentation/d/1TyKYqDTnuncQQpx6DFKNAIg1buCgjeGH1RsIMY14idI/edit?usp=sharing Jeremy Freudberg 于2019年11月23日周六 上午12:46写道: > CI > building stuff (packages, containers, images) > fixing deployment tooling -- some tools might barf if you have a mix of > archs > a bit of evangelism > > probably more... > > On Fri, Nov 22, 2019 at 11:16 AM wrote: > > > > Sounds interesting.. > > But not clear what is the goal for that SIG? > > Since OpenStack claims to have support for different HW. > > > > Are there any additional reqs? Clearly CI ones. > > > > Thanks, > > Arkady > > > > -----Original Message----- > > From: Tony Breeds > > Sent: Thursday, November 21, 2019 10:05 PM > > To: Rico Lin > > Cc: OpenStack Discuss > > Subject: Re: [meta-sig][multi-arch] propose forming a Multi-arch SIG > > > > On Wed, Nov 20, 2019 at 06:03:03PM +0800, Rico Lin wrote: > > > Dear all > > > In summit, there's a forum for ARM support [1] in Summit which many > > > people show they're interested in ARM support for OpenStack. > > > And since also we have Linaro shows interest in donating servers to > > > OpenStack infra. It's time for community to think about what we should > > > deal with those ARM servers once we have them in community > infrastructure. > > > > > > One thing we should do as a community is to gather people for this > topic. > > > So I propose we create a Multi-arch SIG and aim to support ARM > > > architecture as very first step. > > > I had the idea to call it ARM SIG before, but since there might be > > > high overlap knowledge between support ARM 64 and other architectures. > > > I propose we go for Multi-arch instead. > > > > > > This SIG will be a nice place to collect all the documents, gate jobs, > > > and to trace tasks. > > > > > > If you're also interested in that group, please reply to this email, > > > introduce yourself and tell us what you would like the group scope and > > > objectives to be, and what you can contribute to the group. > > > > Pick me Pick me :) > > > > I've been around OpenStack for about 5 years now, the last couple have > been focused on brining multi-arch support (albeit ppc64le) into tripleo > building on the enablement work that others have done. > > > > I'm keen to work with the SIG, to build out the ARM support and at the > same time ensure we don't make it hard for other architectures to do the > same > > > > Yours Tony. > > -- ChangBo Guo(gcb) Community Director @EasyStack -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongbin034 at gmail.com Tue Nov 26 04:11:29 2019 From: hongbin034 at gmail.com (Hongbin Lu) Date: Mon, 25 Nov 2019 23:11:29 -0500 Subject: [zun][zun-ui][horizon] Request code review for a breaking fix Message-ID: Hi Horizon folks, We have an issue that needs to be fixed at horizon side. Please check this bug: https://bugs.launchpad.net/zun-ui/+bug/1847889 We propose a fix on Horizon but the patch hasn't been moved forward for a while. Would I ask for a code review for the patch https://review.opendev.org/#/c/688290/ ? Without the fix, our horizon plugin couldn't work correctly. Best regards, Hongbin -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Tue Nov 26 04:30:24 2019 From: satish.txt at gmail.com (Satish Patel) Date: Mon, 25 Nov 2019 23:30:24 -0500 Subject: openrc issue with keystone LDAP integration Message-ID: I am running openstack stein and very strange issue going on, life was good until today when i finish my keystone + LDAP integration with multi-domain setup and all role assignment in SQL. when today one of user complained that his openrc isn't working correctly, look like something openrc doesn't like about LDAP integration, but same user can access everything from Horizon. my ldap domain is "eng" and here is my openrc file. # COMMON OPENSTACK ENVS export OS_ENDPOINT_TYPE=internalURL export OS_INTERFACE=internalURL export OS_USERNAME=spatel export OS_PASSWORD='MyLDAPPassword123' export OS_PROJECT_NAME=eng export OS_TENANT_NAME=eng export OS_AUTH_TYPE=password export OS_AUTH_URL=http://172.28.16.9:5000/v3 export OS_NO_CACHE=1 export OS_USER_DOMAIN_NAME=eng export OS_PROJECT_DOMAIN_NAME=eng export OS_REGION_NAME=RegionOne # For openstackclient export OS_IDENTITY_API_VERSION=3 export OS_AUTH_VERSION=3 [root at openstack ~]# source spatel.rc [root at openstack ~]# nova list ERROR (Unauthorized): The request you have made requires authentication. (HTTP 401) (Request-ID: req-5877deee-b8be-4b21-9ff6-855ae43e268e) but if i take same openrc file and add "admin" account it and "default" domain then it works so don't know why it doesn't like LDAP creds? From sungn2 at lenovo.com Tue Nov 26 09:28:01 2019 From: sungn2 at lenovo.com (Guannan GN2 Sun) Date: Tue, 26 Nov 2019 09:28:01 +0000 Subject: Communication problem between ironic-python-agent and CI server. Message-ID: <6815c663fa6647f6bab938e8d4b751e6@lenovo.com> Hi team, I'm now trying to use ironic deployed with devstack to manage baremetal machine. However when it run into deploying stage, I open the BM server terminal and see it successfully load ramdisk and boot into it. It get the ip I assigned and I can ping it from CI server side. But it then deploy failed just about 2 minutes later. When I check ironic-conductor log with command "sudo journalctl -a --unit devstack at ir-cond" and found error like this: ERROR ironic.drivers.modules.agent_client [None req-de37bc21-8d62-41db-8983-c06789939818 None None] Failed to connect to the agent running on node ea88ba26-756d-4d32-89f4-7ff086fa8868 for invoking command iscsi.start_iscsi_target. Error: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(, 'Connection to 10.0.0.25 timed out. (connect timeout=60)')): ConnectTimeout: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(, 'Connection to 10.0.0.25 timed out. (connect timeout=60)')) I can ping it from CI server side, so it is strange why the connection time out between ironic-python-agent and CI server. Does anyone meet similar problem or have idea about it? Thank you! Best Regards, Guannan -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Tue Nov 26 09:47:47 2019 From: thierry at openstack.org (Thierry Carrez) Date: Tue, 26 Nov 2019 10:47:47 +0100 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <6e7cff0a-a00b-b94d-fc14-5528ee30993e@redhat.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> <6e7cff0a-a00b-b94d-fc14-5528ee30993e@redhat.com> Message-ID: <98b748ba-f7eb-051d-5d91-929cbe6974f3@openstack.org> Zane Bitter wrote: > On 20/11/19 9:21 am, Matt Riedemann wrote: >> >> This is assuming that each project has a stable core team already, >> which a lot don't, that's why we get a lot of "hi I'm the PTL du jour >> on project X now please make me stable core even though I've never >> reviewed any stable branch changes before". > > Correct, what I'm suggesting is a middle-ground position so that in the > cases where there is no project-specific stable team, that team has to > be bootstrapped by the global stable-maint team in the same way that > they do already. That sounds like a reasonable compromise. -- Thierry Carrez (ttx) From mnaser at vexxhost.com Tue Nov 26 09:48:51 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 26 Nov 2019 04:48:51 -0500 Subject: Communication problem between ironic-python-agent and CI server. In-Reply-To: <6815c663fa6647f6bab938e8d4b751e6@lenovo.com> References: <6815c663fa6647f6bab938e8d4b751e6@lenovo.com> Message-ID: Sent from my iPhone > On Nov 26, 2019, at 4:34 AM, Guannan GN2 Sun wrote: > >  > Hi team, > > > > I'm now trying to use ironic deployed with devstack to manage baremetal machine. > > However when it run into deploying stage, I open the BM server terminal and see it successfully load ramdisk and boot into it. It get the ip I assigned and I can ping it from CI server side. But it then deploy failed just about 2 minutes later. > > > > When I check ironic-conductor log with command "sudo journalctl -a --unit devstack at ir-cond" and found error like this: > > ERROR ironic.drivers.modules.agent_client [None req-de37bc21-8d62-41db-8983-c06789939818 None None] Failed to connect to the agent running on node ea88ba26-756d-4d32-89f4-7ff086fa8868 for invoking command iscsi.start_iscsi_target. Error: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(, 'Connection to 10.0.0.25 timed out. (connect timeout=60)')): ConnectTimeout: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(, 'Connection to 10.0.0.25 timed out. (connect timeout=60)')) > > I can ping it from CI server side, so it is strange why the connection time out between ironic-python-agent and CI server. Does anyone meet similar problem or have idea about it? You can ping it but can you make an HTTP request to port 9999 via something like curl? > Thank you! > > > Best Regards, > > Guannan -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Tue Nov 26 09:59:22 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 26 Nov 2019 09:59:22 +0000 Subject: [kolla][tripleo] Infra style images In-Reply-To: <4e05834b-9654-0889-68f8-249ac869f902@linaro.org> References: <4e05834b-9654-0889-68f8-249ac869f902@linaro.org> Message-ID: Thanks for kicking off this conversation Marcin. On Mon, 25 Nov 2019 at 13:44, Marcin Juszkiewicz wrote: > > One of things we have on a list of things to do during Ussuri cycle is > implementation of 'infra' images. > > BP: https://blueprints.launchpad.net/kolla/+spec/infra-images > > # What are infra images? > > Images that are always built from binary packages (or are Java > monsters). We have about 70 such ones from quick check. All those Ceph, > Prometheus, MariaDB, cron, chrony, storm, sensu etc ones. There are some e.g. skydive where we e.g. pull go binaries from github. Another loose definition is that they are everything other than python OpenStack projects. > > ## libvirt image > > There is 'nova-libvirt' image. Contains libvirt daemon (with qemu and > all required packages) so it would get renamed to 'libvirt'. > Not a hard requirement as it depends on base rather than nova-base, but it makes sense. > > # Building > > The idea is that 'infra' will be a new build type (like we have > 'binary', 'source' etc). With all source base images marked as > unbuildable so there will be no 'debian-infra-nova-compute' one. > > On the other hand building of 'binary'/'source' type images would lock > out all 'infra' ones to not get images with names like > 'debian-source-ceph-mon'. > We need to think about the image hierarchy. The base images are always large, and if we don't share them between source, binary and infra, we could end up increasing the overall storage size rather than reducing it. Currently there are a few places where the base image depends on install_type: * we set the KOLLA_INSTALL_TYPE environment variable. This could be moved to openstack-base * RHEL binary images don't install EPEL. We're trying to get rid of EPEL, or at least only install it where necessary * there are some minor differences in the packages installed on CentOS. We could easily unify this There might be an argument in favour of dropping the type from the name for infra images, so that we just have debian-base, debian-cron etc. > > # Pros > > - Clean split between OpenStack components (binary/source) and > infrastructure needed to get them running (infra). > > - Less images to publish on CI. Infra ones can be built weekly > as they do not change much. It would be interesting to know how much storage we'll actually save through this effort. And also how much we'd save through squashing. > > - No more questions how did we built ceph-mon from source ;D > > > # Cons > > - We need to change kolla-ansible, tripleo and maybe some other > projects' code to use new type of images. > > - Migration from previous releases would be more complicated > due to image renames. > No longer possible to build all necessary images with one kolla-build command. From mark at stackhpc.com Tue Nov 26 10:16:55 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 26 Nov 2019 10:16:55 +0000 Subject: [kolla][openstack-ansible][tripleo] CentOS 8 host migration In-Reply-To: References: Message-ID: Thanks for responding Alex. On Mon, 25 Nov 2019 at 18:04, Alex Schultz wrote: > > > > On Mon, Nov 25, 2019 at 10:54 AM Mark Goddard wrote: >> >> Hi, >> >> During the recent kolla PTG discussions [1], we covered CentOS 8 in >> some detail. For kolla there are several aspects to CentOS 8 support: >> >> * kolla-build execution host >> * container images >> * kolla-ansible execution host >> * kolla-ansible remote hosts >> >> In this context I am discussing the last of these - kolla-ansible >> remote hosts, i.e. the OS on the controllers and computes etc. Given >> that this could be quite a thorny problem, I'd like to propose that we >> collaborate between deployment projects. >> >> It is my understanding that upgrades from CentOS 7 to 8 will not be >> supported, and a reinstall is required. We set out some goals for the >> migration: >> >> * Migrate hosts from CentOS 7 to CentOS 8 >> * Upgrade from Train CentOS 7 containers to Ussuri CentOS 8 containers >> * Note: this means Ussuri containers (CentOS 8) don't land on CentOS 7 >> * Decouple the CentOS 7 to 8 and OpenStack upgrades, to avoid operator pain >> * Avoid excessive downtime >> >> There is a Red Hat policy [2] that suggests that if you mix host and >> container OS versions, it is safer for the host to be ahead of the >> container. This makes sense if you think about userland accessing >> kernel features. >> >> This leads us to the following migration path: >> >> * Start with Train release, CentOS 7 containers >> * Redeploy (rolling) hosts with CentOS 8 >> * Provision >> * Configure host OS >> * Redeploy Train containers with CentOS 7 >> >> * Upgrade to Ussuri release, CentOS 8 containers >> >> The logic we used to come up with this procedure is as follows: >> >> * Host must be upgraded to CentOS 7 before containers (going on policy) >> * Train containers don't support CentOS 8 base due to RDO > > > I believe the plan is to have a Train version on CentOS8 after all the things get bootstrapped. Unfortunately the current target is trying to get master on centos8 with the time frame currently TBD. I'm personally hoping really soon. Hopefully it won't require too many changes in kolla to support it. Based on this we could modify our plan to use Train containers with a CentOS 8 base when deploying on CentOS 8 hosts. We'd need to support building and publishing both CentOS 7 and 8 containers for the Train cycle. That would avoid the host/container mismatch. > >> >> * Ussuri containers don't support CentOS 7 base due to RDO >> * Want to separate CentOS 7 -> 8 migration from OpenStack upgrade >> >> This leads us to the conclusion that at least one release of >> kolla-ansible must bridge these worlds, since we will be deploying >> Train containers on CentOS 8. We could do either of: >> >> 1. Train k-a supports CentOS 8 hosts >> 2. Ussuri k-a supports deploying Train containers >> >> While 1. may require us to backport more than we would like, >> supporting deployment of multiple OpenStack releases could be >> challenging, so we expect to go with 1 here. >> >> There are some risks involved, including: >> >> * some Train services may not work on CentOS 8 >> * ceph? >> * OVS & neutron agents? >> * libvirt? >> * Ansible minimum version 2.8 for CentOS 8 (ref?) > > > These should be fine as we're testing these with actual rhel8 over in TripleO (at least builds/running) for Train+. You'll run into other issues with pacemaker if you currently use that and try to mix versions. I think with a CentOS 8 based container I'd be less worried here. Also if Tripleo needs this to work :) > > Bigger risk: lack of docker packaging in the upstream which also means no docker-distribution for local repositories. Currently for testing CentOS 8 we are using the CentOS 7 Docker upstream repo, which requires module_hotfixes to work. https://review.opendev.org/#/c/692794/4/tools/setup_RedHat.sh > > >> >> >> Hopefully we can save ourselves some effort and solve this together. >> >> [1] https://etherpad.openstack.org/p/kolla-ussuri-ptg >> [2] https://access.redhat.com/support/policy/rhel-container-compatibility >> >> Thanks, >> Mark >> From skaplons at redhat.com Tue Nov 26 11:12:18 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 26 Nov 2019 12:12:18 +0100 Subject: [neutron][scientific-sig] SIG help with Linuxbridge ML2 maintenance? In-Reply-To: References: Message-ID: <20191126111218.4n3ixijv7iqbxblm@skaplons-mac> Hi, Sorry for late reply but I missed this email somehow. I can join today's meeting. So please add this item to the agenda if it's still possible. On Thu, Nov 21, 2019 at 06:31:54PM +0000, Stig Telfer wrote: > Hi all - > > Following this discussion [1] around the Linuxbridge ML2 driver, I’m aware that a number of members of the Scientific SIG use this driver and appreciate its performance and simplicity. > > Would anyone from the Neutron project involved in this issue be interested in joining a Scientific SIG meeting to discuss how SIG members can help with keeping this driver maintained? Our next meeting is Tuesday 26th November at 2100 UTC. If that’s possible, please let me know and we’ll put it on the agenda. > > Many thanks, > Stig > > [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010761.html -- Slawek Kaplonski Senior software engineer Red Hat From jonathan.rosser at rd.bbc.co.uk Tue Nov 26 11:29:21 2019 From: jonathan.rosser at rd.bbc.co.uk (Jonathan Rosser) Date: Tue, 26 Nov 2019 11:29:21 +0000 Subject: [kolla][openstack-ansible][tripleo] CentOS 8 host migration In-Reply-To: References: Message-ID: On 25/11/2019 17:48, Mark Goddard wrote: > Hi, > > During the recent kolla PTG discussions [1], we covered CentOS 8 in > some detail. For kolla there are several aspects to CentOS 8 support: > > * kolla-build execution host > * container images > * kolla-ansible execution host > * kolla-ansible remote hosts > Hi Mark, Whilst the container approach in openstack-ansible is different (LXC) and presents it's own challenges for Centos8, there is no doubt a common set of issues. We also do a container-less bare-metal deploy mode which would be the first to try to get working. I have a WIP patch for OSA for Centos-8 for a while now which really does not get so far. https://review.opendev.org/#/c/689629 I've not yet managed to create dummy interfaces with network-manager which leads to a horrible hack setting up the networking for CI. Also failing was building of dbus-python which is required for the Ansible nmcli module, due to an incompatibility with autotools 1.16. There are a also few pieces we take from EPEL (lsyncd) which are missing. Then there is/was the absence of an obvious source of RDO packages. I am concerned about the difficulty of creating a tractable upgrade path which separates OS from OpenStack upgrades. For openstack-ansible, Rocky was the transition release which supported both Xenial and Bionic. Achieving the same transition release for Centos 7 to 8 upgrades looks challenging. Jon. From jonathan.rosser at rd.bbc.co.uk Tue Nov 26 11:33:16 2019 From: jonathan.rosser at rd.bbc.co.uk (Jonathan Rosser) Date: Tue, 26 Nov 2019 11:33:16 +0000 Subject: [meta-sig][multi-arch] propose forming a Multi-arch SIG In-Reply-To: <20191121001509.GB976114@fedora19.localdomain> References: <20191121001509.GB976114@fedora19.localdomain> Message-ID: On 21/11/2019 00:15, Ian Wienand wrote: > On Wed, Nov 20, 2019 at 06:03:03PM +0800, Rico Lin wrote: >> If you're also interested in that group, please reply to this email, >> introduce yourself and tell us what you would like the group scope and >> objectives to be, and what you can contribute to the group. > > I have worked with some upstream people such as hrw to get > diskimage-builder working with EFI and ARM64, and setup the infra to > build ARM64 nodes on Linaro's donated resources. Although infra is > all an open book in terms of configuration, etc. and anyone can > contribute, I'll be happy to help with issues or mentor anyone on > using the gate resources we have available. > openstack-ansible is ready to go on arm CI but in order to make the jobs run in a reasonable time and not simply timeout a source of pre-built arm python wheels is needed. It would be a shame to let the work that got contributed to OSA for arm just rot. WIP patch here https://review.opendev.org/#/c/618305/ From gr at ham.ie Tue Nov 26 12:55:28 2019 From: gr at ham.ie (Graham Hayes) Date: Tue, 26 Nov 2019 12:55:28 +0000 Subject: [tc][board][all] - Adding OpenStack community support to the savedotorg campaign Message-ID: Hey All, I am not sure if this has been seen by everyone or not - but there is a change in how the .org top level domain is being ran in the works, in a way that may not be in the best interests of the non profits that it was created to facilitate. [1] A lot of well known non profits have already joined, and as a community that has an interest in the internet as a whole, and uses a .org domain, I think we should add our voice in support. What do people think? Are we happy to have the TC use its voice on behalf of the OpenStack project, or do we think the board should use its voice on behalf of the entire foundation? - Graham 1 - https://savedotorg.org/ From thierry at openstack.org Tue Nov 26 12:59:55 2019 From: thierry at openstack.org (Thierry Carrez) Date: Tue, 26 Nov 2019 13:59:55 +0100 Subject: [tc][board][all] - Adding OpenStack community support to the savedotorg campaign In-Reply-To: References: Message-ID: <810c06fc-8d62-415e-b9cd-dd323ecd47ff@openstack.org> Graham Hayes wrote: > I am not sure if this has been seen by everyone or not - but there is > a change in how the .org top level domain is being ran in the works, in > a way that may not be in the best interests of the non profits that it > was created to facilitate. [1] > > A lot of well known non profits have already joined, and as a > community that has an interest in the internet as a whole, and > uses a .org domain, I think we should add our voice in support. > > What do people think? Are we happy to have the TC use its voice on > behalf of the OpenStack project, or do we think the board should use > its voice on behalf of the entire foundation? It probably makes more sense (and has more weight) at the Foundation-level. -- Thierry Carrez From amotoki at gmail.com Tue Nov 26 13:22:20 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Tue, 26 Nov 2019 22:22:20 +0900 Subject: [zun][zun-ui][horizon] Request code review for a breaking fix In-Reply-To: References: Message-ID: Sorry for late. While the patch has been approved by other horizon cores, I have a question which blocks me for long. The proposed code is related to the serial console support and AFAIK it was introduced to support the serial console of nova servers. Perhaps Hongbin confirms it works with zun instance, but how can we test it with nova servers with serial consoles (ironic instances?)? Many developers add features to horizon but they don't leave enough information on how to test them, so the current horizon team is struggling to know how to test :-( That's one reason that reviews for non-popular areas tend to take time for long.... I wonder how we can improve this situation..... Thanks, Akihiro On Tue, Nov 26, 2019 at 1:13 PM Hongbin Lu wrote: > > Hi Horizon folks, > > We have an issue that needs to be fixed at horizon side. Please check this bug: > > https://bugs.launchpad.net/zun-ui/+bug/1847889 > > We propose a fix on Horizon but the patch hasn't been moved forward for a while. Would I ask for a code review for the patch https://review.opendev.org/#/c/688290/ ? Without the fix, our horizon plugin couldn't work correctly. > > Best regards, > Hongbin From C-Ramakrishna.Bhupathi at charter.com Tue Nov 26 14:11:19 2019 From: C-Ramakrishna.Bhupathi at charter.com (Bhupathi, Ramakrishna) Date: Tue, 26 Nov 2019 14:11:19 +0000 Subject: [Kolla] FAILED - RETRYING: wait for MariaDB to be available via HAProxy (10 retries left). Message-ID: Mark, Thanks for the tip. Now I get a failure (as below) during deploy (I have all-in-one) config. I am on stein. It appears to me that this has been presumably fixed . But I still running into this error. I do not see anything explicit in the HAProxy/MariaDB logs. TASK [mariadb : wait for MariaDB to be available via HAProxy] ************************************************************************************************** FAILED - RETRYING: wait for MariaDB to be available via HAProxy (10 retries left). FAILED - RETRYING: wait for MariaDB to be available via HAProxy (9 retries left). FAILED - RETRYING: wait for MariaDB to be available via HAProxy (8 retries left). FAILED - RETRYING: wait for MariaDB to be available via HAProxy (7 retries left). FAILED - RETRYING: wait for MariaDB to be available via HAProxy (6 retries left). FAILED - RETRYING: wait for MariaDB to be available via HAProxy (5 retries left). FAILED - RETRYING: wait for MariaDB to be available via HAProxy (4 retries left). FAILED - RETRYING: wait for MariaDB to be available via HAProxy (3 retries left). FAILED - RETRYING: wait for MariaDB to be available via HAProxy (2 retries left). FAILED - RETRYING: wait for MariaDB to be available via HAProxy (1 retries left). fatal: [localhost]: FAILED! => {"attempts": 10, "changed": false, "elapsed": 60, "msg": "Timeout when waiting for search string MariaDB in 10.1.0.250:3306"} NO MORE HOSTS LEFT ********************************************************************************************************************************************* PLAY RECAP ***************************************************************************************************************************************************** localhost : ok=76 changed=0 unreachable=0 failed=1 skipped=74 rescued=0 ignored=0 Command failed ansible-playbook -i ../../all-in-one -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e kolla_action=deploy /home/cloud-user/kolla-ansible/ansible/site.yml --RamaK -----Original Message----- From: Mark Goddard [mailto:mark at stackhpc.com] Sent: Monday, November 25, 2019 11:08 AM To: Bhupathi, Ramakrishna Cc: openstack-discuss at lists.openstack.org Subject: Re: [Kolla] RabbitMQ failure during deploy (Openstack with Kolla) On Mon, 25 Nov 2019 at 13:54, Bhupathi, Ramakrishna wrote: > > Hey , I am evaluating Openstack with Kolla (have the latest ) and following the steps. I see that rabbitmq fails to start up and (keeps restarting). Essentially the script fails in deploy. I have a single node all in one config. > > > > Can someone tell me what the cause for this failure? > > > > > > RUNNING HANDLER [rabbitmq : Waiting for rabbitmq to start on first > node] > ********************************************************************** > ****************************************** > > fatal: [localhost]: FAILED! => {"changed": true, "cmd": "docker exec > rabbitmq rabbitmqctl wait /var/lib/rabbitmq/mnesia/rabbitmq.pid", > "delta": "0:00:07.732346", "end": "2019-11-25 13:33:12.102342", "msg": > "non-zero return code", "rc": 137, "start": "2019-11-25 > 13:33:04.369996", "stderr": "", "stderr_lines": [], "stdout": "Waiting > for 'rabbit at kolla-ubuntu'\npid is 6", "stdout_lines": ["Waiting for > 'rabbit at kolla-ubuntu'", "pid is 6"]} > > > > RUNNING HANDLER [rabbitmq : Restart rabbitmq container (rest of > nodes)] > ********************************************************************** > ******************************************* > > > > NO MORE HOSTS LEFT > ********************************************************************** > ********************************************************************** > ************************** > > > > PLAY RECAP > ********************************************************************** > ********************************************************************** > ********************************** > > localhost : ok=95 changed=2 unreachable=0 failed=1 skipped=78 rescued=0 ignored=0 > > > > Command failed ansible-playbook -i ./all-in-one -e > @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e > CONFIG_DIR=/etc/kolla -e kolla_action=deploy > /home/ubuntu/venv/share/kolla-ansible/ansible/site.yml > > > > > > Accessing the container logs for rabbitmq .. this is what I see > > > > BOOT FAILED > > =========== > > > > Error description: > > {could_not_start,rabbitmq_management, > > {rabbitmq_management, > > {bad_return, > > {{rabbit_mgmt_app,start,[normal,[]]}, > > {'EXIT', > > {{could_not_start_listener, > > [{port,15672}], > > {shutdown, > > {failed_to_start_child,ranch_acceptors_sup, > > > {listen_error,rabbit_web_dispatch_sup_15672,eaddrinuse}}}}, Hi RamaK, here is your issue. RabbitMQ management fails to start up due to the port already being in use (eaddrinuse). Perhaps RabbitMQ is running on the host already? > > {gen_server,call, > > [rabbit_web_dispatch_registry, > > {add,rabbit_mgmt, > > [{port,15672}], > > #Fun, > > > > --RamaK > > The contents of this e-mail message and any attachments are intended > solely for the > addressee(s) and may contain confidential and/or legally privileged > information. If you are not the intended recipient of this message or > if this message has been addressed to you in error, please immediately > alert the sender by reply e-mail and then delete this message and any > attachments. If you are not the intended recipient, you are notified > that any use, dissemination, distribution, copying, or storage of this > message or any attachment is strictly prohibited. E-MAIL CONFIDENTIALITY NOTICE: The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. From sean.mcginnis at gmail.com Tue Nov 26 14:14:29 2019 From: sean.mcginnis at gmail.com (Sean McGinnis) Date: Tue, 26 Nov 2019 08:14:29 -0600 Subject: [OpenStack Foundation] [tc][board][all] - Adding OpenStack community support to the savedotorg campaign In-Reply-To: References: Message-ID: On Tue, Nov 26, 2019 at 7:00 AM Graham Hayes wrote: > > Hey All, > > I am not sure if this has been seen by everyone or not - but there is > a change in how the .org top level domain is being ran in the works, in > a way that may not be in the best interests of the non profits that it > was created to facilitate. [1] > > A lot of well known non profits have already joined, and as a > community that has an interest in the internet as a whole, and > uses a .org domain, I think we should add our voice in support. > > What do people think? Are we happy to have the TC use its voice on > behalf of the OpenStack project, or do we think the board should use > its voice on behalf of the entire foundation? > > - Graham > > > 1 - https://savedotorg.org/ > I agree this is a Foundation level thing (though individuals can add their names to the petition as well). I would support adding OSF to the list of organizations. Sean From gr at ham.ie Tue Nov 26 14:33:10 2019 From: gr at ham.ie (Graham Hayes) Date: Tue, 26 Nov 2019 14:33:10 +0000 Subject: [OpenStack Foundation] [tc][board][all] - Adding OpenStack community support to the savedotorg campaign In-Reply-To: <21b14ae74ca92eeae7da448e2f5076d2297310fe.camel@evrard.me> References: <21b14ae74ca92eeae7da448e2f5076d2297310fe.camel@evrard.me> Message-ID: On 26/11/2019 14:25, Jean-Philippe Evrard wrote: > Hello, > > I agree with others: IMO, it is the Foundation who should act on this. > I would love to see the OpenStack Foundation logo out there. > > Thank Graham for showing us this petition! > > Regards, > JP > I am not sure of the procedure, I am more than willing to put this on the board agenda for Dec 10th, if that is allowed. It should be a simple board resolution along the lines of: The Board supports the aims of the savedotorg campaign, and resolves to add its name to the list of supporting organizations. Unless someone shouts I will propose it on the agenda [1] by EOW. 1 - https://wiki.openstack.org/wiki/Governance/Foundation/10December2019BoardMeeting > > > _______________________________________________ > Foundation mailing list > Foundation at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/foundation > From gr at ham.ie Tue Nov 26 14:34:09 2019 From: gr at ham.ie (Graham Hayes) Date: Tue, 26 Nov 2019 14:34:09 +0000 Subject: [OpenStack Foundation] [tc][board][all] - Adding OpenStack community support to the savedotorg campaign In-Reply-To: References: Message-ID: <47a312e7-fc13-c9df-16ce-1117fbc7633e@ham.ie> On 26/11/2019 14:14, Sean McGinnis wrote: > On Tue, Nov 26, 2019 at 7:00 AM Graham Hayes wrote: >> >> Hey All, >> >> I am not sure if this has been seen by everyone or not - but there is >> a change in how the .org top level domain is being ran in the works, in >> a way that may not be in the best interests of the non profits that it >> was created to facilitate. [1] >> >> A lot of well known non profits have already joined, and as a >> community that has an interest in the internet as a whole, and >> uses a .org domain, I think we should add our voice in support. >> >> What do people think? Are we happy to have the TC use its voice on >> behalf of the OpenStack project, or do we think the board should use >> its voice on behalf of the entire foundation? >> >> - Graham >> >> >> 1 - https://savedotorg.org/ >> > > I agree this is a Foundation level thing (though individuals can add their > names to the petition as well). I would support adding OSF to the list of > organizations. Yeap - that is a good point - I would encourage people who agree with the petition to sign as individuals as well. > Sean > From amy at demarco.com Tue Nov 26 14:50:02 2019 From: amy at demarco.com (Amy) Date: Tue, 26 Nov 2019 08:50:02 -0600 Subject: [OpenStack Foundation] [tc][board][all] - Adding OpenStack community support to the savedotorg campaign In-Reply-To: References: Message-ID: <057BD0AC-451C-4E64-9D30-6CC6A20DFA0F@demarco.com> I agree with the OSF submitting on behalf of OpenStack and the other projects. I also like the idea of individuals signing as contributors. Amy Marrich (spotz) > On Nov 26, 2019, at 8:17 AM, Sean McGinnis wrote: > > On Tue, Nov 26, 2019 at 7:00 AM Graham Hayes wrote: >> >> Hey All, >> >> I am not sure if this has been seen by everyone or not - but there is >> a change in how the .org top level domain is being ran in the works, in >> a way that may not be in the best interests of the non profits that it >> was created to facilitate. [1] >> >> A lot of well known non profits have already joined, and as a >> community that has an interest in the internet as a whole, and >> uses a .org domain, I think we should add our voice in support. >> >> What do people think? Are we happy to have the TC use its voice on >> behalf of the OpenStack project, or do we think the board should use >> its voice on behalf of the entire foundation? >> >> - Graham >> >> >> 1 - https://savedotorg.org/ >> > > I agree this is a Foundation level thing (though individuals can add their > names to the petition as well). I would support adding OSF to the list of > organizations. > > Sean > From mark at stackhpc.com Tue Nov 26 15:04:36 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 26 Nov 2019 15:04:36 +0000 Subject: [Kolla] FAILED - RETRYING: wait for MariaDB to be available via HAProxy (10 retries left). In-Reply-To: References: Message-ID: On Tue, 26 Nov 2019 at 14:12, Bhupathi, Ramakrishna wrote: > > Mark, > Thanks for the tip. > > Now I get a failure (as below) during deploy (I have all-in-one) config. I am on stein. > It appears to me that this has been presumably fixed . But I still running into this error. > > I do not see anything explicit in the HAProxy/MariaDB logs. Are both the haproxy and mariadb containers up and not restarting? Does your api_interface have the VIP activated? Is haproxy listening on 10.1.0.250:3306 (check ss)? > > > > TASK [mariadb : wait for MariaDB to be available via HAProxy] ************************************************************************************************** > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (10 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (9 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (8 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (7 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (6 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (5 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (4 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (3 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (2 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (1 retries left). > fatal: [localhost]: FAILED! => {"attempts": 10, "changed": false, "elapsed": 60, "msg": "Timeout when waiting for search string MariaDB in 10.1.0.250:3306"} > > NO MORE HOSTS LEFT ********************************************************************************************************************************************* > > PLAY RECAP ***************************************************************************************************************************************************** > localhost : ok=76 changed=0 unreachable=0 failed=1 skipped=74 rescued=0 ignored=0 > > Command failed ansible-playbook -i ../../all-in-one -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e kolla_action=deploy /home/cloud-user/kolla-ansible/ansible/site.yml > > --RamaK > > > > > > -----Original Message----- > From: Mark Goddard [mailto:mark at stackhpc.com] > Sent: Monday, November 25, 2019 11:08 AM > To: Bhupathi, Ramakrishna > Cc: openstack-discuss at lists.openstack.org > Subject: Re: [Kolla] RabbitMQ failure during deploy (Openstack with Kolla) > > On Mon, 25 Nov 2019 at 13:54, Bhupathi, Ramakrishna wrote: > > > > Hey , I am evaluating Openstack with Kolla (have the latest ) and following the steps. I see that rabbitmq fails to start up and (keeps restarting). Essentially the script fails in deploy. I have a single node all in one config. > > > > > > > > Can someone tell me what the cause for this failure? > > > > > > > > > > > > RUNNING HANDLER [rabbitmq : Waiting for rabbitmq to start on first > > node] > > ********************************************************************** > > ****************************************** > > > > fatal: [localhost]: FAILED! => {"changed": true, "cmd": "docker exec > > rabbitmq rabbitmqctl wait /var/lib/rabbitmq/mnesia/rabbitmq.pid", > > "delta": "0:00:07.732346", "end": "2019-11-25 13:33:12.102342", "msg": > > "non-zero return code", "rc": 137, "start": "2019-11-25 > > 13:33:04.369996", "stderr": "", "stderr_lines": [], "stdout": "Waiting > > for 'rabbit at kolla-ubuntu'\npid is 6", "stdout_lines": ["Waiting for > > 'rabbit at kolla-ubuntu'", "pid is 6"]} > > > > > > > > RUNNING HANDLER [rabbitmq : Restart rabbitmq container (rest of > > nodes)] > > ********************************************************************** > > ******************************************* > > > > > > > > NO MORE HOSTS LEFT > > ********************************************************************** > > ********************************************************************** > > ************************** > > > > > > > > PLAY RECAP > > ********************************************************************** > > ********************************************************************** > > ********************************** > > > > localhost : ok=95 changed=2 unreachable=0 failed=1 skipped=78 rescued=0 ignored=0 > > > > > > > > Command failed ansible-playbook -i ./all-in-one -e > > @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e > > CONFIG_DIR=/etc/kolla -e kolla_action=deploy > > /home/ubuntu/venv/share/kolla-ansible/ansible/site.yml > > > > > > > > > > > > Accessing the container logs for rabbitmq .. this is what I see > > > > > > > > BOOT FAILED > > > > =========== > > > > > > > > Error description: > > > > {could_not_start,rabbitmq_management, > > > > {rabbitmq_management, > > > > {bad_return, > > > > {{rabbit_mgmt_app,start,[normal,[]]}, > > > > {'EXIT', > > > > {{could_not_start_listener, > > > > [{port,15672}], > > > > {shutdown, > > > > {failed_to_start_child,ranch_acceptors_sup, > > > > > > {listen_error,rabbit_web_dispatch_sup_15672,eaddrinuse}}}}, > > Hi RamaK, here is your issue. RabbitMQ management fails to start up due to the port already being in use (eaddrinuse). Perhaps RabbitMQ is running on the host already? > > > > > {gen_server,call, > > > > [rabbit_web_dispatch_registry, > > > > {add,rabbit_mgmt, > > > > [{port,15672}], > > > > #Fun, > > > > > > > > --RamaK > > > > The contents of this e-mail message and any attachments are intended > > solely for the > > addressee(s) and may contain confidential and/or legally privileged > > information. If you are not the intended recipient of this message or > > if this message has been addressed to you in error, please immediately > > alert the sender by reply e-mail and then delete this message and any > > attachments. If you are not the intended recipient, you are notified > > that any use, dissemination, distribution, copying, or storage of this > > message or any attachment is strictly prohibited. > E-MAIL CONFIDENTIALITY NOTICE: > The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. From mark at stackhpc.com Tue Nov 26 15:11:34 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 26 Nov 2019 15:11:34 +0000 Subject: [kolla][openstack-ansible][tripleo] CentOS 8 host migration In-Reply-To: References: Message-ID: On Tue, 26 Nov 2019 at 11:30, Jonathan Rosser wrote: > > > > On 25/11/2019 17:48, Mark Goddard wrote: > > Hi, > > > > During the recent kolla PTG discussions [1], we covered CentOS 8 in > > some detail. For kolla there are several aspects to CentOS 8 support: > > > > * kolla-build execution host > > * container images > > * kolla-ansible execution host > > * kolla-ansible remote hosts > > > > Hi Mark, > > Whilst the container approach in openstack-ansible is different (LXC) > and presents it's own challenges for Centos8, there is no doubt a common > set of issues. We also do a container-less bare-metal deploy mode which > would be the first to try to get working. > > I have a WIP patch for OSA for Centos-8 for a while now which really > does not get so far. > > https://review.opendev.org/#/c/689629 > > I've not yet managed to create dummy interfaces with network-manager > which leads to a horrible hack setting up the networking for CI. > > Also failing was building of dbus-python which is required for the > Ansible nmcli module, due to an incompatibility with autotools 1.16. > > There are a also few pieces we take from EPEL (lsyncd) which are missing. > > Then there is/was the absence of an obvious source of RDO packages. > > I am concerned about the difficulty of creating a tractable upgrade path > which separates OS from OpenStack upgrades. For openstack-ansible, Rocky > was the transition release which supported both Xenial and Bionic. > Achieving the same transition release for Centos 7 to 8 upgrades looks > challenging. > > Jon. Thanks Jon, there's some useful information in there. Given that RDO Ussuri will not support CentOS 7, I think you will also need to do the 7 to 8 transition on Train. The timing's not ideal here - I can see all the deployment tools either needing to backport large changes to stable/train, or delay their releases (if they haven't released yet). Let's keep this thread going as we each hit different roadblocks. From hberaud at redhat.com Tue Nov 26 15:18:34 2019 From: hberaud at redhat.com (Herve Beraud) Date: Tue, 26 Nov 2019 16:18:34 +0100 Subject: [olso][pbr][i18n] Using setup.cfg [files] data_files to install localization files In-Reply-To: <864f4dd9-a78e-d969-5fa6-f0a096a4a59a@dantalion.nl> References: <864f4dd9-a78e-d969-5fa6-f0a096a4a59a@dantalion.nl> Message-ID: Hello, Good question, I think that both solutions can do the job, and it depends on your needs. If you choose the second option I think you also need to edit the `MANIFEST.in` to recursively grab your files in your package, then that will look like something like: ``` # MANIFEST.in recursive-include translations/ * ``` ``` # setup.cfg [files] packages = yourproject data_files = etc/yourproject/translation = translations/* ``` Where do you want to use this? I mean, do you have a specific project behind this question? Le lun. 25 nov. 2019 à 16:18, info at dantalion.nl a écrit : > Hello everyone :), > > I was wondering what the preferred method to install localization files > is. I can think of some probably solutions such as: > > 1: Including the locale files as part of a package for the target > systems package manager (pacman, yum, apt, etc). > 2: adding the locale files to the [files] directive in setup.cfg: > > I hope someone can answer my question. > > Kind regards, > Corne Lukken > > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From nsitlani03 at gmail.com Tue Nov 26 15:33:35 2019 From: nsitlani03 at gmail.com (Namrata Sitlani) Date: Tue, 26 Nov 2019 21:03:35 +0530 Subject: [magnum] Kubernetes, multiple versions: kube-system pods in pending status on newly spun-up clusters (since last Thursday) Message-ID: Hello folks, As of last week (14 Nov), our Magnum (Rocky) environment has stopped spinning up working Kubernetes clusters. To be precise, Magnum does report the cluster status as CREATE_COMPLETE, but once it is up all its Kubernetes pods are stuck in the Pending state. We use the following commands to create the Kubernetes clusters: http://paste.openstack.org/show/786348/. All pods show pending status which shows cluster could not select a minion node for them. The deployment fails with the following output " 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate" For reference: http://paste.openstack.org/show/786717/. For reference output of kubectl get all -A http://paste.openstack.org/show/786729/ If we manually remove NoSchedule taint from the minion nodes we get all the pods running For reference: http://paste.openstack.org/show/786718/ But after the manual fix too openstack-cloud-controller-manager pods are missing so any interaction from the Kubernetes control plane to OpenStack services is non-functional. We are assuming that the missing openstack cloud controller manager pod is also the reason for the taint issue which we are encountering. For the node taint issue, https://ask.openstack.org/en/question/120442/magnum-kubernetes-noschedule-taint/ suggests to add [trust] cluster_user_trust = true to magnum.conf. But there is OSA variable named magnum_cluster_user_trust that can be set to true for this purpose. However, the default for this variable has been True and we have confirmed we have the parameter in our environment cluster_user_trust=True as well. Note: The kubernetes cluster deployed here uses kube_tag 1.14.8 but we are getting same result with other kube_tag versions v1.13.12, v1.14.8 also. Ideally, we should not remove taints manually so can you please confirm our findings and help us find a way forward We can provide more logs if needed Thanks Namrata Sitlani -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Tue Nov 26 15:47:16 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Wed, 27 Nov 2019 00:47:16 +0900 Subject: [olso][pbr][i18n] Using setup.cfg [files] data_files to install localization files In-Reply-To: <864f4dd9-a78e-d969-5fa6-f0a096a4a59a@dantalion.nl> References: <864f4dd9-a78e-d969-5fa6-f0a096a4a59a@dantalion.nl> Message-ID: Hi, I think localization files are automatically included in a python package even if setup.cfg has no explicit entry. Here is an example of the case of the horizon repository: http://paste.openstack.org/show/786732/ I can see localized files after installing a horizon wheel package into a virtualenv. > 1: Including the locale files as part of a package for the target > systems package manager (pacman, yum, apt, etc). These package managers just do the similar thing in a different way. > 2: adding the locale files to the [files] directive in setup.cfg: I don't think we need to specify them in setup.cfg. Am I missing something? Thanks, Akihiro Motoki (amotoki) On Tue, Nov 26, 2019 at 12:17 AM info at dantalion.nl wrote: > > Hello everyone :), > > I was wondering what the preferred method to install localization files > is. I can think of some probably solutions such as: > > 1: Including the locale files as part of a package for the target > systems package manager (pacman, yum, apt, etc). > 2: adding the locale files to the [files] directive in setup.cfg: > > I hope someone can answer my question. > > Kind regards, > Corne Lukken > From bence.romsics at gmail.com Tue Nov 26 15:57:13 2019 From: bence.romsics at gmail.com (Bence Romsics) Date: Tue, 26 Nov 2019 16:57:13 +0100 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> <1574439261.31688.6@est.tech> <74a68d39-f46d-61a7-7ba1-6d2cd765e6df@gmail.com> <64fbaf8bb79dbb74d472f055405b0901bc51ca65.camel@redhat.com> Message-ID: Hi All, After gibi opened the neutron bug [1] I assigned it to myself and now I got to summarize my plan to fix it. Please comment either in a reply to this or in the bug report if you see problems, have questions, etc. (1) Extend ovs and sriov agents config with 'resource_provider_hypervisors' for example: ml2_conf.ini [ovs] bridge_mappings = physnet0:br-physnet0,... resource_provider_bandwidths = br-physnet0:10000000:10000000,... resource_provider_hypervisors = physnet0:hypervisor0,... # this is new, values default to socket.gethostname() for each key in resource_provider_bandwidths sriov_agent.ini [sriov_nic] physical_device_mappings = physnet1:ens5,... resource_provider_bandwidths = ens5:40000000:40000000,... resource_provider_hypervisors = physnet1:hypervisor1,... # this is new, defaults as above The values for resource_provider_hypervisors are opaque identifiers for neutron. Since each physical device can have its own hypervisor associated possible features like ovs-superagent (for smartnics) could be supported. Use of socket.gethostname() is only hardcoded as a default, so non-libvirt hypervisors are taken care of. (2) Extend the report_state message's configurations field alike: { 'bridge_mappings': {'physnet0': 'br-physnet0'}, 'resource_provider_bandwidths': {'br-physnet0': {'egress': 10000000, 'ingress': 10000000}}, 'resource_provider_hypervisors': {'br-physnet0': 'hypervisor0'}, 'resource_provider_inventory_defaults': {'allocation_ratio': 1.0, 'min_unit': 1, 'step_size': 1, 'reserved': 0} } Do not touch the host field of the same message. Since we always treated the configurations field as free format, IMO changes to it should be backportable. Let me know if you think otherwise. (3) In neutron-server report_state.host is used in binding as now - no change here. report_state.configurations.resource_provider_hypervisors.PHYSDEV to be used in selecting parent resource provider for agent and physdev RP-tree. When not available in the message still fall back to using report_state.host as today. (4) At the moment I don't see the need to use the proposed new nova API to query hypervisors managed by a nova-compute since as soon as it returns 1+ hypervisors neutron cannot do anything with the result. Cheers, Bence [1] https://bugs.launchpad.net/neutron/+bug/1853840 From jimmy at openstack.org Tue Nov 26 16:16:14 2019 From: jimmy at openstack.org (Jimmy McArthur) Date: Tue, 26 Nov 2019 10:16:14 -0600 Subject: [OpenStack Foundation] [tc][board][all] - Adding OpenStack community support to the savedotorg campaign In-Reply-To: <057BD0AC-451C-4E64-9D30-6CC6A20DFA0F@demarco.com> References: <057BD0AC-451C-4E64-9D30-6CC6A20DFA0F@demarco.com> Message-ID: <1595EA09-2B8C-45ED-B836-5751FC79DBA1@openstack.org> From my reading of the situation, it’s a done deal. It seems like ICAAN just crammed it through. Is there new info out that could reverse it? Thanks, Jimmy > On Nov 26, 2019, at 8:50 AM, Amy wrote: > > I agree with the OSF submitting on behalf of OpenStack and the other projects. I also like the idea of individuals signing as contributors. > > Amy Marrich (spotz) > >>> On Nov 26, 2019, at 8:17 AM, Sean McGinnis wrote: >>> >>> On Tue, Nov 26, 2019 at 7:00 AM Graham Hayes wrote: >>> >>> Hey All, >>> >>> I am not sure if this has been seen by everyone or not - but there is >>> a change in how the .org top level domain is being ran in the works, in >>> a way that may not be in the best interests of the non profits that it >>> was created to facilitate. [1] >>> >>> A lot of well known non profits have already joined, and as a >>> community that has an interest in the internet as a whole, and >>> uses a .org domain, I think we should add our voice in support. >>> >>> What do people think? Are we happy to have the TC use its voice on >>> behalf of the OpenStack project, or do we think the board should use >>> its voice on behalf of the entire foundation? >>> >>> - Graham >>> >>> >>> 1 - https://savedotorg.org/ >>> >> >> I agree this is a Foundation level thing (though individuals can add their >> names to the petition as well). I would support adding OSF to the list of >> organizations. >> >> Sean >> > > _______________________________________________ > Foundation mailing list > Foundation at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/foundation From mriedemos at gmail.com Tue Nov 26 16:30:41 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 26 Nov 2019 10:30:41 -0600 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> <1574439261.31688.6@est.tech> <74a68d39-f46d-61a7-7ba1-6d2cd765e6df@gmail.com> <64fbaf8bb79dbb74d472f055405b0901bc51ca65.camel@redhat.com> Message-ID: On 11/26/2019 9:57 AM, Bence Romsics wrote: > (4) At the moment I don't see the need to use the proposed new nova > API to query hypervisors managed by a nova-compute since as soon as it > returns 1+ hypervisors neutron cannot do anything with the result. The API would only return more than one hypervisor for a given compute service host if that host is managing ironic nodes, which won't be the case if you're looking for nodes managed by a KVM node that supports QoS ports. The point of the API change is to let neutron do everything automatically and avoid additional configuration, right? The configuration on the neutron side was meant to be a workaround. -- Thanks, Matt From gr at ham.ie Tue Nov 26 16:34:42 2019 From: gr at ham.ie (Graham Hayes) Date: Tue, 26 Nov 2019 16:34:42 +0000 Subject: [OpenStack Foundation] [tc][board][all] - Adding OpenStack community support to the savedotorg campaign In-Reply-To: <1595EA09-2B8C-45ED-B836-5751FC79DBA1@openstack.org> References: <057BD0AC-451C-4E64-9D30-6CC6A20DFA0F@demarco.com> <1595EA09-2B8C-45ED-B836-5751FC79DBA1@openstack.org> Message-ID: On 26/11/2019 16:16, Jimmy McArthur wrote: > From my reading of the situation, it’s a done deal. It seems like ICAAN just crammed it through. Is there new info out that could reverse it? > > Thanks, > Jimmy My understanding is it looks bleak, but there is a chance to shame the owners into halting the sale. I think ICAAN can still block the same (until December 13th, 30 days after they were notified), and through pure poor marketing, and raising our voices, this could help convince the current owners of PIR (who is ISOC) to re-consider the idea. >> On Nov 26, 2019, at 8:50 AM, Amy wrote: >> >> I agree with the OSF submitting on behalf of OpenStack and the other projects. I also like the idea of individuals signing as contributors. >> >> Amy Marrich (spotz) >> >>>> On Nov 26, 2019, at 8:17 AM, Sean McGinnis wrote: >>>> >>>> On Tue, Nov 26, 2019 at 7:00 AM Graham Hayes wrote: >>>> >>>> Hey All, >>>> >>>> I am not sure if this has been seen by everyone or not - but there is >>>> a change in how the .org top level domain is being ran in the works, in >>>> a way that may not be in the best interests of the non profits that it >>>> was created to facilitate. [1] >>>> >>>> A lot of well known non profits have already joined, and as a >>>> community that has an interest in the internet as a whole, and >>>> uses a .org domain, I think we should add our voice in support. >>>> >>>> What do people think? Are we happy to have the TC use its voice on >>>> behalf of the OpenStack project, or do we think the board should use >>>> its voice on behalf of the entire foundation? >>>> >>>> - Graham >>>> >>>> >>>> 1 - https://savedotorg.org/ >>>> >>> >>> I agree this is a Foundation level thing (though individuals can add their >>> names to the petition as well). I would support adding OSF to the list of >>> organizations. >>> >>> Sean >>> >> >> _______________________________________________ >> Foundation mailing list >> Foundation at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/foundation > From gr at ham.ie Tue Nov 26 16:37:06 2019 From: gr at ham.ie (Graham Hayes) Date: Tue, 26 Nov 2019 16:37:06 +0000 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <98b748ba-f7eb-051d-5d91-929cbe6974f3@openstack.org> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> <8f83c8c3-e51b-d505-01cb-cffcad37a022@redhat.com> <1e5284e4-43d5-4977-ac97-5c56d7cef9a9@gmail.com> <6e7cff0a-a00b-b94d-fc14-5528ee30993e@redhat.com> <98b748ba-f7eb-051d-5d91-929cbe6974f3@openstack.org> Message-ID: <9c81956f-092c-459a-fef1-8f56d76f2520@ham.ie> On 26/11/2019 09:47, Thierry Carrez wrote: > Zane Bitter wrote: >> On 20/11/19 9:21 am, Matt Riedemann wrote: >>> >>> This is assuming that each project has a stable core team already, >>> which a lot don't, that's why we get a lot of "hi I'm the PTL du jour >>> on project X now please make me stable core even though I've never >>> reviewed any stable branch changes before". >> >> Correct, what I'm suggesting is a middle-ground position so that in >> the cases where there is no project-specific stable team, that team >> has to be bootstrapped by the global stable-maint team in the same way >> that they do already. > > That sounds like a reasonable compromise. > Yeah, I like this idea - it allows more flexibility while not chaining the door wide open. From smooney at redhat.com Tue Nov 26 16:44:51 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 26 Nov 2019 16:44:51 +0000 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> <1574439261.31688.6@est.tech> <74a68d39-f46d-61a7-7ba1-6d2cd765e6df@gmail.com> <64fbaf8bb79dbb74d472f055405b0901bc51ca65.camel@redhat.com> Message-ID: <1460799ac1551da5817996ba09f430d731c36ab1.camel@redhat.com> On Tue, 2019-11-26 at 16:57 +0100, Bence Romsics wrote: > Hi All, > > After gibi opened the neutron bug [1] I assigned it to myself and now > I got to summarize my plan to fix it. Please comment either in a reply > to this or in the bug report if you see problems, have questions, etc. > > (1) Extend ovs and sriov agents config with > 'resource_provider_hypervisors' for example: > > ml2_conf.ini > [ovs] > bridge_mappings = physnet0:br-physnet0,... > resource_provider_bandwidths = br-physnet0:10000000:10000000,... > resource_provider_hypervisors = physnet0:hypervisor0,... # this is im guessing you are adding this for the ironic smart nic usecasue but i dont think this makes sense. in the non irionc smarnic case where ovs is deployed localy there will be only 1 hypervior listed for all physnets on the hsot. in the ironic case i dont think it makes sense as physntes span multiple hyperviors and we dont use the physnet today as part of the lookup so even in this case mapping physnets to a list of hyperviour does not make senses. what would make sense woudl be a mapping between CONF.host and the hypervior hostname but i honestly think we shouyd not require a config option at all. if we simply report the hypervior_hostname by calling socket.gethostname that shoudl be sufficent. the nova api will be extended so you can get the compute node uuid by using the value in CONF.host so this new hypervior hostname is only for backport reasons to have a way to wrokaround the lack of query support in the /os-hyperviors api in stein and train. the other config option that might make sense woudl be the root resource provider uuid but we dont want to require that nova is installed the compute node started before you can generate teh neutron agent conf so i dont think that is a good approch. > new, values default to socket.gethostname() for each key in > resource_provider_bandwidths > > sriov_agent.ini > [sriov_nic] > physical_device_mappings = physnet1:ens5,... > resource_provider_bandwidths = ens5:40000000:40000000,... > resource_provider_hypervisors = physnet1:hypervisor1,... # this is > new, defaults as above > > The values for resource_provider_hypervisors are opaque identifiers > for neutron. Since each physical device can have its own hypervisor > associated possible features like ovs-superagent (for smartnics) could > be supported. it could be i think this is a rather poor way to facilitate that. the pyysnet is not a correct key to use for resource_provider_hypervisors > Use of socket.gethostname() is only hardcoded as a > default, so non-libvirt hypervisors are taken care of. the requirement that the conf.host match on both nova and neutron is not livirt sepecific. nova and neutron need to agree on the name of the host for the port bidnign extention to work correctly. > > (2) Extend the report_state message's configurations field alike: > > { > 'bridge_mappings': {'physnet0': 'br-physnet0'}, > 'resource_provider_bandwidths': {'br-physnet0': {'egress': 10000000, > 'ingress': 10000000}}, > 'resource_provider_hypervisors': {'br-physnet0': 'hypervisor0'}, again i think this should be 'resource_provider_name': 'hypervisor0', or 'resource_provider_hypervisors': {"value set in CONF.host": 'hypervisor0', ... }, > 'resource_provider_inventory_defaults': {'allocation_ratio': 1.0, > 'min_unit': 1, 'step_size': 1, 'reserved': 0} > } > > Do not touch the host field of the same message. > > Since we always treated the configurations field as free format, IMO > changes to it should be backportable. Let me know if you think > otherwise. yes i think they should be backporatble > > (3) In neutron-server > > report_state.host is used in binding as now - no change here. +1 > > report_state.configurations.resource_provider_hypervisors.PHYSDEV for a given host all phynest or network interface will have the same hypervior hostname. if we had a resource_provider_name config we would only need to use it if its value did not match the existing host value so we should just do this name = report_state.configurations.get('resource_provider_name', report_state.host) > to > be used in selecting parent resource provider for agent and physdev > RP-tree. When not available in the message still fall back to using > report_state.host as today. > > (4) At the moment I don't see the need to use the proposed new nova > API to query hypervisors managed by a nova-compute since as soon as it > returns 1+ hypervisors neutron cannot do anything with the result. it would only do that for ironic and there is no current usage of placement via neutron with ironic. i dont think the config option proposed above is a better solution so i think we need to continue to disucss this. > > Cheers, > Bence > > [1] https://bugs.launchpad.net/neutron/+bug/1853840 > From colleen at gazlene.net Tue Nov 26 16:50:03 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Tue, 26 Nov 2019 08:50:03 -0800 Subject: =?UTF-8?Q?Re:_[keystone][nova][barbican][neutron][cinder][tc][policy]_Pr?= =?UTF-8?Q?oposal_for_policy_popup_team?= In-Reply-To: <16ea53d3e09.100373e5c247518.6873714586749636882@ghanshyammann.com> References: <16ea53d3e09.100373e5c247518.6873714586749636882@ghanshyammann.com> Message-ID: <9eab1103-79a7-454a-aa40-c43a9cd8e5dc@www.fastmail.com> On Mon, Nov 25, 2019, at 17:04, Ghanshyam Mann wrote: > > ---- On Mon, 25 Nov 2019 16:48:23 -0600 Colleen Murphy > wrote ---- > > Hi, > > > > At the Shanghai forum, we discussed forming a popup team around > reforming default policies for several projects. I've formally proposed > this team here: > > > > https://review.opendev.org/695993 > > > > I've also created a wiki page to document the effort and coordinate > the work: > > > > > https://wiki.openstack.org/wiki/Consistent_and_Secure_Default_Policies_Popup_Team > > > > This is the best way I can think of to organize a cross-project > initiative like this, since I think the openstack specs repo is kind of > defunct? I'm open to other ideas for coordinating this. > > > > I've gone ahead and named project-specific liaisons based on > discussions at the forum, and also created a member list based on the > list of volunteers in the forum etherpad. If you weren't at the forum > but would like to be involved, please go ahead and add your name to the > list on the wiki page. > > > > To form the popup team, we need a second co-lead: please let me know > if you're interested. As hinted in the wiki page, I'm not seeking to > lead this long-term, so there are actually two lead positions open. > > Thanks Colleen for composing this. I can co-lead and help you with > this. I can serve as TC liaison also. Thanks Ghanshyam, I've added you as a lead in the wiki page and will add you as lead and TC liaison in the governance change. Colleen > > -gmann > > > > > Let me know your thoughts in this thread. > > > > Colleen > > > > > > From colleen at gazlene.net Tue Nov 26 16:51:08 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Tue, 26 Nov 2019 08:51:08 -0800 Subject: =?UTF-8?Q?Re:_[keystone][nova][barbican][neutron][cinder][tc][policy]_Pr?= =?UTF-8?Q?oposal_for_policy_popup_team?= In-Reply-To: References: <16ea53d3e09.100373e5c247518.6873714586749636882@ghanshyammann.com> Message-ID: <87abcd15-9376-421b-96d2-3597b7a1e356@www.fastmail.com> On Mon, Nov 25, 2019, at 18:12, Zhipeng Huang wrote: > Thanks Colleen, > > As I shared with Cyborg team earlier, I think as a new project it would > be beneficial that cyborg also participate in the popup team to get > policy right from the get go Thanks Howard, I've added cyborg to the page and added you as liaison. Colleen > > On Tue, Nov 26, 2019 at 9:12 AM Ghanshyam Mann wrote: > > > > ---- On Mon, 25 Nov 2019 16:48:23 -0600 Colleen Murphy wrote ---- > > > Hi, > > > > > > At the Shanghai forum, we discussed forming a popup team around reforming default policies for several projects. I've formally proposed this team here: > > > > > > https://review.opendev.org/695993 > > > > > > I've also created a wiki page to document the effort and coordinate the work: > > > > > > https://wiki.openstack.org/wiki/Consistent_and_Secure_Default_Policies_Popup_Team > > > > > > This is the best way I can think of to organize a cross-project initiative like this, since I think the openstack specs repo is kind of defunct? I'm open to other ideas for coordinating this. > > > > > > I've gone ahead and named project-specific liaisons based on discussions at the forum, and also created a member list based on the list of volunteers in the forum etherpad. If you weren't at the forum but would like to be involved, please go ahead and add your name to the list on the wiki page. > > > > > > To form the popup team, we need a second co-lead: please let me know if you're interested. As hinted in the wiki page, I'm not seeking to lead this long-term, so there are actually two lead positions open. > > > > Thanks Colleen for composing this. I can co-lead and help you with this. I can serve as TC liaison also. > > > > -gmann > > > > > > > > Let me know your thoughts in this thread. > > > > > > Colleen > > > > > > > > > > > > > -- > Zhipeng (Howard) Huang > > Principle Engineer > OpenStack, Kubernetes, CNCF, LF Edge, ONNX, Kubeflow, OpenSDS, Open > Service Broker API, OCP, Hyperledger, ETSI, SNIA, DMTF, W3C > From smooney at redhat.com Tue Nov 26 16:53:05 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 26 Nov 2019 16:53:05 +0000 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> <1574439261.31688.6@est.tech> <74a68d39-f46d-61a7-7ba1-6d2cd765e6df@gmail.com> <64fbaf8bb79dbb74d472f055405b0901bc51ca65.camel@redhat.com> Message-ID: <844b9328536d1a9138b09d2439e31c934d571754.camel@redhat.com> On Tue, 2019-11-26 at 10:30 -0600, Matt Riedemann wrote: > On 11/26/2019 9:57 AM, Bence Romsics wrote: > > (4) At the moment I don't see the need to use the proposed new nova > > API to query hypervisors managed by a nova-compute since as soon as it > > returns 1+ hypervisors neutron cannot do anything with the result. > > The API would only return more than one hypervisor for a given compute > service host if that host is managing ironic nodes, which won't be the > case if you're looking for nodes managed by a KVM node that supports QoS > ports. > > The point of the API change is to let neutron do everything > automatically and avoid additional configuration, right? The > configuration on the neutron side was meant to be a workaround. yes this ^ > From cboylan at sapwetik.org Tue Nov 26 16:56:12 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 26 Nov 2019 08:56:12 -0800 Subject: =?UTF-8?Q?Re:_[Kolla]_FAILED_-_RETRYING:_wait_for_MariaDB_to_be_availabl?= =?UTF-8?Q?e_via_HAProxy_(10_retries_left).?= In-Reply-To: References: Message-ID: <8a19640f-1258-4476-a4f2-d728f590f74b@www.fastmail.com> On Tue, Nov 26, 2019, at 7:04 AM, Mark Goddard wrote: > On Tue, 26 Nov 2019 at 14:12, Bhupathi, Ramakrishna > wrote: > > > > Mark, > > Thanks for the tip. > > > > Now I get a failure (as below) during deploy (I have all-in-one) config. I am on stein. > > It appears to me that this has been presumably fixed . But I still running into this error. > > > > I do not see anything explicit in the HAProxy/MariaDB logs. > > Are both the haproxy and mariadb containers up and not restarting? > Does your api_interface have the VIP activated? Is haproxy listening > on 10.1.0.250:3306 (check ss)? > We ran into mariadb startup slowness recently in Zuul CI jobs. Basically they generate timezone data tables on first startup which can be very slow depending on your disk IO. Not sure if this is the issue, but the workaround is to set an env var to disable the timezone data table generation (if you don't use those tables). https://github.com/docker-library/mariadb/issues/261 https://github.com/docker-library/mariadb/issues/262 From fsbiz at yahoo.com Tue Nov 26 17:04:22 2019 From: fsbiz at yahoo.com (fsbiz at yahoo.com) Date: Tue, 26 Nov 2019 17:04:22 +0000 (UTC) Subject: [ironic]: Timeout reached while waiting for callback for node In-Reply-To: <1A784864-3B47-4DE4-AD1B-F4614FB6ADC9@cern.ch> References: <1530284401.3551200.1572301601958.ref@mail.yahoo.com> <1530284401.3551200.1572301601958@mail.yahoo.com> <1A784864-3B47-4DE4-AD1B-F4614FB6ADC9@cern.ch> Message-ID: <1837232093.5649028.1574787862334@mail.yahoo.com> Thanks Arne and Julia with the great suggestions on scaling ironic nodes. We are currently trying to root cause an issue (it has occured twice) where a large number of nodes(but not all the nodes) suddenly migrate from one IC to another. E.g.69 nodes moved from sc-ironic04 and sc-ironic05 tosc-ironic06 from 21:07 to 21:10 on nov. 23rd. [root at sc-ironic06 nova]# grep "moving from" /var/log/nova/nova-compute.log-20191124 2019-11-23 21:07:46.606 210241 INFOnova.compute.resource_tracker [req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - --] ComputeNode 1cb9ef2e-aa7d-4e25-8878-14669a3ead7a moving fromsc-ironic05.nvc.nvidia.com to sc-ironic06.nvc.nvidia.com 2019-11-23 21:08:17.518 210241 INFOnova.compute.resource_tracker [req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - --] ComputeNode 56e58642-12ac-4455-bc95-2a328198f845 moving fromsc-ironic04.nvc.nvidia.com to sc-ironic06.nvc.nvidia.com 2019-11-23 21:08:35.843 210241 INFOnova.compute.resource_tracker [req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - --] ComputeNode e0b9b94c-2ea3-4324-a85f-645d572e370b moving fromsc-ironic05.nvc.nvidia.com to sc-ironic06.nvc.nvidia.com 2019-11-23 21:08:42.264 210241 INFOnova.compute.resource_tracker [req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - --] ComputeNode 1c7d461c-2de7-4d9a-beff-dcb490c7b2e4 moving fromsc-ironic04.nvc.nvidia.com to sc-ironic06.nvc.nvidia.com 2019-11-23 21:08:43.819 210241 INFO nova.compute.resource_tracker[req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - - -] ComputeNode73ed8bd4-23c2-46bc-b748-e6f5ab6fa932 moving from sc-ironic05.nvc.nvidia.com tosc-ironic06.nvc.nvidia.com 2019-11-23 21:08:45.651 210241 INFO nova.compute.resource_tracker[req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - - -] ComputeNode51da1570-5666-4a21-a46f-4b7510d28415 moving from sc-ironic05.nvc.nvidia.com tosc-ironic06.nvc.nvidia.com 2019-11-23 21:08:46.905 210241 INFOnova.compute.resource_tracker [req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - --] ComputeNode 38b41797-4b97-405b-bbd5-fccc61d237c3 moving fromsc-ironic04.nvc.nvidia.com to sc-ironic06.nvc.nvidia.com 2019-11-23 21:08:49.065 210241 INFOnova.compute.resource_tracker [req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - --] ComputeNode c5c89749-a11c-4eb8-b159-e8d47ecfcbb9 moving fromsc-ironic04.nvc.nvidia.com to sc-ironic06.nvc.nvidia.com Restarting nova-compute and ironic-conductor services on the IC seems to have fixed the issue but we are still in the root cause analysis phase and seem to have hit a wall narrowing this down.  Any suggestions are welcome. Thanks,Fred. On Wednesday, October 30, 2019, 02:02:42 PM PDT, Arne Wiebalck wrote: Hi Fred, To confirm what Julia said: We currently have ~3700 physical nodes in Ironic, managed by 3 controllers (16GB VMs running httpd, conductor, and inspector). We recently moved to l -------------- next part -------------- An HTML attachment was scrubbed... URL: From info at dantalion.nl Tue Nov 26 17:27:09 2019 From: info at dantalion.nl (info at dantalion.nl) Date: Tue, 26 Nov 2019 18:27:09 +0100 Subject: [olso][pbr][i18n] Using setup.cfg [files] data_files to install localization files In-Reply-To: References: <864f4dd9-a78e-d969-5fa6-f0a096a4a59a@dantalion.nl> Message-ID: Hello, I don't think we need to specify them in setup.cfg. Am I missing something? I thought so to but I recently checked on Watcher and noticed that executing 'python setup.py install' does not build + move the .mo files into the required locale directories. I proceeded to test on other OpenStack projects and saw the same behavior. This was verified by listing the available languages with: `oslo_i18n.get_available_languages(DOMAIN)` which than only returns `['en_US']` Maybe `python setup.py install` is not the correct command to do this? Kind regards. Corne Lukken On 26-11-19 16:47, Akihiro Motoki wrote: > Hi, > > I think localization files are automatically included in a python > package even if setup.cfg has no explicit entry. > > Here is an example of the case of the horizon repository: > http://paste.openstack.org/show/786732/ > I can see localized files after installing a horizon wheel package > into a virtualenv. > >> 1: Including the locale files as part of a package for the target >> systems package manager (pacman, yum, apt, etc). > These package managers just do the similar thing in a different way. > >> 2: adding the locale files to the [files] directive in setup.cfg: > I don't think we need to specify them in setup.cfg. > Am I missing something? > > Thanks, > Akihiro Motoki (amotoki) > > On Tue, Nov 26, 2019 at 12:17 AM info at dantalion.nl wrote: >> Hello everyone :), >> >> I was wondering what the preferred method to install localization files >> is. I can think of some probably solutions such as: >> >> 1: Including the locale files as part of a package for the target >> systems package manager (pacman, yum, apt, etc). >> 2: adding the locale files to the [files] directive in setup.cfg: >> >> I hope someone can answer my question. >> >> Kind regards, >> Corne Lukken >> From jeremyfreudberg at gmail.com Tue Nov 26 17:32:47 2019 From: jeremyfreudberg at gmail.com (Jeremy Freudberg) Date: Tue, 26 Nov 2019 12:32:47 -0500 Subject: [sahara] Cancelling Sahara meeting November 28, and looking ahead to December 12 Message-ID: Hi all, Due to the Thanksgiving holiday in the US there will be no Sahara meeting 2019-11-28. Additionally I will be on PTO the date of the following meeting 2019-12-12. I may establish a special meeting for 2019-12-05 depending on interest. Thanks, Jeremy From gr at ham.ie Tue Nov 26 17:33:23 2019 From: gr at ham.ie (Graham Hayes) Date: Tue, 26 Nov 2019 17:33:23 +0000 Subject: [OpenStack Foundation] [tc][board][all] - Adding OpenStack community support to the savedotorg campaign In-Reply-To: References: <057BD0AC-451C-4E64-9D30-6CC6A20DFA0F@demarco.com> <1595EA09-2B8C-45ED-B836-5751FC79DBA1@openstack.org> Message-ID: <2a19b6c1-c769-7725-7bb9-3998cdfe34c5@ham.ie> On 26/11/2019 16:44, Alan Clark wrote: > Who owns savedotorg.org? I see EFF references on the page but when I look it up the domain it appears to all be "REDACTED". >From https://savedotorg.org/index.php/about/ - it is ran by NTEN (https://www.nten.org/) It is supported by the EFF (among others) > >> -----Original Message----- >> From: Graham Hayes [mailto:gr at ham.ie] >> Sent: Tuesday, November 26, 2019 9:35 AM >> To: Jimmy McArthur >> Cc: openstack-discuss at lists.openstack.org; foundation- >> board at lists.openstack.org; foundation at lists.openstack.org >> Subject: Re: [OpenStack Foundation] [tc][board][all] - Adding OpenStack >> community support to the savedotorg campaign >> >> On 26/11/2019 16:16, Jimmy McArthur wrote: >>> From my reading of the situation, it’s a done deal. It seems like ICAAN just >> crammed it through. Is there new info out that could reverse it? >>> >>> Thanks, >>> Jimmy >> >> My understanding is it looks bleak, but there is a chance to shame the owners >> into halting the sale. >> >> I think ICAAN can still block the same (until December 13th, 30 days after they >> were notified), and through pure poor marketing, and raising our voices, this >> could help convince the current owners of PIR (who is ISOC) to re-consider the >> idea. >> >> >>>> On Nov 26, 2019, at 8:50 AM, Amy wrote: >>>> >>>> I agree with the OSF submitting on behalf of OpenStack and the other projects. >> I also like the idea of individuals signing as contributors. >>>> >>>> Amy Marrich (spotz) >>>> >>>>>> On Nov 26, 2019, at 8:17 AM, Sean McGinnis >> wrote: >>>>>> >>>>>> On Tue, Nov 26, 2019 at 7:00 AM Graham Hayes wrote: >>>>>> >>>>>> Hey All, >>>>>> >>>>>> I am not sure if this has been seen by everyone or not - but there >>>>>> is a change in how the .org top level domain is being ran in the >>>>>> works, in a way that may not be in the best interests of the non >>>>>> profits that it was created to facilitate. [1] >>>>>> >>>>>> A lot of well known non profits have already joined, and as a >>>>>> community that has an interest in the internet as a whole, and uses >>>>>> a .org domain, I think we should add our voice in support. >>>>>> >>>>>> What do people think? Are we happy to have the TC use its voice on >>>>>> behalf of the OpenStack project, or do we think the board should >>>>>> use its voice on behalf of the entire foundation? >>>>>> >>>>>> - Graham >>>>>> >>>>>> >>>>>> 1 - https://savedotorg.org/ >>>>>> >>>>> >>>>> I agree this is a Foundation level thing (though individuals can add >>>>> their names to the petition as well). I would support adding OSF to >>>>> the list of organizations. >>>>> >>>>> Sean >>>>> >>>> >>>> _______________________________________________ >>>> Foundation mailing list >>>> Foundation at lists.openstack.org >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/foundation >>> >> >> >> _______________________________________________ >> Foundation mailing list >> Foundation at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/foundation From info at dantalion.nl Tue Nov 26 17:35:48 2019 From: info at dantalion.nl (info at dantalion.nl) Date: Tue, 26 Nov 2019 18:35:48 +0100 Subject: [olso][pbr][i18n] Using setup.cfg [files] data_files to install localization files In-Reply-To: References: <864f4dd9-a78e-d969-5fa6-f0a096a4a59a@dantalion.nl> Message-ID: <4fcec795-07c1-8b48-14c2-da8495678098@dantalion.nl> Hello, This questions comes from noticing locales aren't detected on Watcher and many other OpenStack projects when `python setup.py install` is executed. The behavior of installing and detecting locales seems to be the same both inside and without a virtualenv. On  Watcher we do not have a MANIFEST.in file in the repository, what would such a file look like do you perhaps have a reference? Currently I was thinking of doing something like this: #setup.cfg[files] data_files =     usr/share/locale/nl/LC_MESSAGES =         watcher/locale/nl/LC_MESSAGES/watcher.mo This, however, feels clunky and I would expect locale installation and detection to be more automatic. you can test if locales are detected for a project using: `oslo_i18n.get_available_languages(DOMAIN)` In the case of Watcher this only returns `['en_US']` which is inserted as default. Maybe `python setup.py install` is not the right command to install locales? Kind regards, Corne Lukken On 26-11-19 16:18, Herve Beraud wrote: > Hello, > > Good question, I think that both solutions can do the job, and it depends > on your needs. > > If you choose the second option I think you also need to edit the > `MANIFEST.in` to recursively grab your files in your package, then that > will look like something like: > ``` > # MANIFEST.in > recursive-include translations/ * > ``` > ``` > # setup.cfg > [files] > packages = > yourproject > data_files = > etc/yourproject/translation = translations/* > ``` > > Where do you want to use this? I mean, do you have a specific project > behind this question? > > Le lun. 25 nov. 2019 à 16:18, info at dantalion.nl a > écrit : > >> Hello everyone :), >> >> I was wondering what the preferred method to install localization files >> is. I can think of some probably solutions such as: >> >> 1: Including the locale files as part of a package for the target >> systems package manager (pacman, yum, apt, etc). >> 2: adding the locale files to the [files] directive in setup.cfg: >> >> I hope someone can answer my question. >> >> Kind regards, >> Corne Lukken >> >> From senrique at redhat.com Tue Nov 26 17:45:41 2019 From: senrique at redhat.com (Sofia Enriquez) Date: Tue, 26 Nov 2019 14:45:41 -0300 Subject: [cinder] Anastasiya accepted for Outreachy Message-ID: Hi Cinder team, I'd like to announce that Anastasiya will be working with us improving the Tempest coverage this round. The internship schedule starts on Dec. 3, 2019, to March 3, 2020. Feel free to reach her on IRC *as anastzhyr* if something comes up. On the other hand, If you have any suggestions for tempest test scenarios or possible ideas, please let me know. Regards, Sofi -- L. Sofía Enriquez she/her Associate Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From surya.seetharaman9 at gmail.com Tue Nov 26 18:08:15 2019 From: surya.seetharaman9 at gmail.com (Surya Seetharaman) Date: Tue, 26 Nov 2019 19:08:15 +0100 Subject: [nova][api] Message-ID: Hello everyone, We came across this bug [1] in nova recently and wanted to know what people think is the best (relatively) way to fix this. In the past, the project id validation was added as a best effort to prevent users from being able to enter random values into the database. When this validation is used from the os flavor set/unset admin apis [2], there are chances that keystone returns a 403 which gets silently ignored by nova [3] allowing the user to enter the provided project_id/name without validation or warning or remove an existing flavor-project mapping. There were a couple of options discussed on IRC [4] to fix this behaviour out of which the practically reasonable ones are: 1) close the bug as invalid - tweak your config (we could add docs, idk if that would be found or help) to do what you need to avoid the 403 from keystone 2) change the 403 case as an error and raise it back to the compute api caller - maybe enough time has passed to not worry about backward compat with the old non-validating behavior Option 2 seems better than option 1 for most of us, however what we cannot agree upon is if this change should be accompanied by a microversion bump or not. [1] https://bugs.launchpad.net/nova/+bug/1854053 [2] https://github.com/openstack/nova/blob/fd67f69cfdaf04620f2e8a5f1fbf5737096965d8/nova/api/openstack/compute/flavor_access.py#L64 [3] https://github.com/openstack/nova/blob/d621914442855ce67ce0b99003f7e69e8ee515e6/nova/api/openstack/identity.py#L61 [4] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-11-26.log.html#t2019-11-26T16:20:24 Cheers, Surya. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stig.openstack at telfer.org Tue Nov 26 18:16:36 2019 From: stig.openstack at telfer.org (Stig Telfer) Date: Tue, 26 Nov 2019 18:16:36 +0000 Subject: [scientific-sig] Meeting today, 2100UTC - Supercomputing, networking, monitoring, and more Message-ID: <361459EF-81FC-4155-B915-AD2ADA6D75AE@telfer.org> Hi All - We have an IRC meeting coming up at 2100 UTC (about 2.5 hours time) in channel #openstack-meeting. Everyone is welcome. Today’s agenda is here: https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_November_29th_2019 It’s pretty packed today, we have a wash-up from SC to discuss, plus an update on Linuxbridge/ML2, plus an update on mixed virt and baremetal, plus a discussion on telemetry after Gnocchi. Loads to cover! Cheers, Stig -------------- next part -------------- An HTML attachment was scrubbed... URL: From gouthampravi at gmail.com Tue Nov 26 18:30:46 2019 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Tue, 26 Nov 2019 10:30:46 -0800 Subject: [keystone][nova][barbican][neutron][cinder][tc][policy] Proposal for policy popup team In-Reply-To: <87abcd15-9376-421b-96d2-3597b7a1e356@www.fastmail.com> References: <16ea53d3e09.100373e5c247518.6873714586749636882@ghanshyammann.com> <87abcd15-9376-421b-96d2-3597b7a1e356@www.fastmail.com> Message-ID: On Tue, Nov 26, 2019 at 8:54 AM Colleen Murphy wrote: > On Mon, Nov 25, 2019, at 18:12, Zhipeng Huang wrote: > > Thanks Colleen, > > > > As I shared with Cyborg team earlier, I think as a new project it would > > be beneficial that cyborg also participate in the popup team to get > > policy right from the get go > > Thanks Howard, I've added cyborg to the page and added you as liaison. > > Colleen > Hi Colleen, I couldn't participate in person in Shanghai - but we have been talking about this in the manila team in the past. I'd like to be added to these discussions so I can coordinate the efforts around improving our API policies as well. Thank you, Goutham > > > > > On Tue, Nov 26, 2019 at 9:12 AM Ghanshyam Mann > wrote: > > > > > > ---- On Mon, 25 Nov 2019 16:48:23 -0600 Colleen Murphy < > colleen at gazlene.net> wrote ---- > > > > Hi, > > > > > > > > At the Shanghai forum, we discussed forming a popup team around > reforming default policies for several projects. I've formally proposed > this team here: > > > > > > > > https://review.opendev.org/695993 > > > > > > > > I've also created a wiki page to document the effort and coordinate > the work: > > > > > > > > > https://wiki.openstack.org/wiki/Consistent_and_Secure_Default_Policies_Popup_Team > > > > > > > > This is the best way I can think of to organize a cross-project > initiative like this, since I think the openstack specs repo is kind of > defunct? I'm open to other ideas for coordinating this. > > > > > > > > I've gone ahead and named project-specific liaisons based on > discussions at the forum, and also created a member list based on the list > of volunteers in the forum etherpad. If you weren't at the forum but would > like to be involved, please go ahead and add your name to the list on the > wiki page. > > > > > > > > To form the popup team, we need a second co-lead: please let me > know if you're interested. As hinted in the wiki page, I'm not seeking to > lead this long-term, so there are actually two lead positions open. > > > > > > Thanks Colleen for composing this. I can co-lead and help you with > this. I can serve as TC liaison also. > > > > > > -gmann > > > > > > > > > > > Let me know your thoughts in this thread. > > > > > > > > Colleen > > > > > > > > > > > > > > > > > > > > -- > > Zhipeng (Howard) Huang > > > > Principle Engineer > > OpenStack, Kubernetes, CNCF, LF Edge, ONNX, Kubeflow, OpenSDS, Open > > Service Broker API, OCP, Hyperledger, ETSI, SNIA, DMTF, W3C > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Tue Nov 26 18:37:09 2019 From: amy at demarco.com (Amy) Date: Tue, 26 Nov 2019 12:37:09 -0600 Subject: [cinder] Anastasiya accepted for Outreachy In-Reply-To: References: Message-ID: This is great news! Good luck, have fun, and learn lots! Amy (spotz) > On Nov 26, 2019, at 11:48 AM, Sofia Enriquez wrote: > >  > Hi Cinder team, > > I'd like to announce that Anastasiya will be working with us improving the Tempest coverage this round. The internship schedule starts on Dec. 3, 2019, to March 3, 2020. Feel free to reach her on IRC as anastzhyr if something comes up. > > On the other hand, If you have any suggestions for tempest test scenarios or possible ideas, please let me know. > > Regards, > Sofi > > -- > L. Sofía Enriquez > She/Her > Associate Software Engineer > Red Hat PnT > IRC: @enriquetaso > @RedHat Red Hat Red Hat > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimmy at tipit.net Tue Nov 26 16:49:21 2019 From: jimmy at tipit.net (Jimmy Mcarthur) Date: Tue, 26 Nov 2019 10:49:21 -0600 Subject: [OpenStack Foundation] [tc][board][all] - Adding OpenStack community support to the savedotorg campaign In-Reply-To: References: <057BD0AC-451C-4E64-9D30-6CC6A20DFA0F@demarco.com> <1595EA09-2B8C-45ED-B836-5751FC79DBA1@openstack.org> Message-ID: <5DDD5791.8050506@tipit.net> > Graham Hayes > November 26, 2019 at 10:34 AM > > > My understanding is it looks bleak, but there is a chance to shame the > owners into halting the sale. > > I think ICAAN can still block the same (until December 13th, 30 days > after they were notified), and through pure poor marketing, and > raising our voices, this could help convince the current owners of > PIR (who is ISOC) to re-consider the idea. Great news! Thanks for the clarification. I already added my name to the list and definitely agree about the Foundation support. I'll raise it on our ML as well. > > >>> On Nov 26, 2019, at 8:50 AM, Amy wrote: >>> >>> I agree with the OSF submitting on behalf of OpenStack and the other projects. I also like the idea of individuals signing as contributors. >>> >>> Amy Marrich (spotz) >>> >>>>> On Nov 26, 2019, at 8:17 AM, Sean McGinnis wrote: >>>>> >>>>> On Tue, Nov 26, 2019 at 7:00 AM Graham Hayes wrote: >>>>> >>>>> Hey All, >>>>> >>>>> I am not sure if this has been seen by everyone or not - but there is >>>>> a change in how the .org top level domain is being ran in the works, in >>>>> a way that may not be in the best interests of the non profits that it >>>>> was created to facilitate. [1] >>>>> >>>>> A lot of well known non profits have already joined, and as a >>>>> community that has an interest in the internet as a whole, and >>>>> uses a .org domain, I think we should add our voice in support. >>>>> >>>>> What do people think? Are we happy to have the TC use its voice on >>>>> behalf of the OpenStack project, or do we think the board should use >>>>> its voice on behalf of the entire foundation? >>>>> >>>>> - Graham >>>>> >>>>> >>>>> 1 - https://savedotorg.org/ >>>>> >>>> I agree this is a Foundation level thing (though individuals can add their >>>> names to the petition as well). I would support adding OSF to the list of >>>> organizations. >>>> >>>> Sean >>>> >>> _______________________________________________ >>> Foundation mailing list >>> Foundation at lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/foundation > > > _______________________________________________ > Foundation mailing list > Foundation at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/foundation > Jimmy McArthur > November 26, 2019 at 10:16 AM > From my reading of the situation, it’s a done deal. It seems like > ICAAN just crammed it through. Is there new info out that could > reverse it? > > Thanks, > Jimmy > > > > > _______________________________________________ > Foundation mailing list > Foundation at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/foundation > Amy > November 26, 2019 at 8:50 AM > I agree with the OSF submitting on behalf of OpenStack and the other > projects. I also like the idea of individuals signing as contributors. > > Amy Marrich (spotz) > > > _______________________________________________ > Foundation mailing list > Foundation at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Tue Nov 26 19:49:04 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 26 Nov 2019 14:49:04 -0500 Subject: [cinder] Anastasiya accepted for Outreachy In-Reply-To: References: Message-ID: On 11/26/19 12:45 PM, Sofia Enriquez wrote: > Hi Cinder team, > > I'd like to announce that Anastasiya will be working with us improving > the Tempest coverage this round. The internship schedule starts on Dec. > 3, 2019, to March 3, 2020. Feel free to reach her on IRC /as anastzhyr/ > if something comes up. Congratulations, Anastasiya! Improving tempest coverage is one of our priorities for Ussuri, so I'm really glad you'll be working on this topic. Also, special thanks to you, Sofi, for acting as Anastasiya's mentor. > > On the other hand, If you have any suggestions for tempest test > scenarios or possible ideas, please let me know. > > Regards, > Sofi > > -- > > L. Sofía Enriquez > > she/her > > Associate Software Engineer > > Red Hat PnT > > IRC: @enriquetaso > > @RedHat Red Hat > Red Hat > > > > From juliaashleykreger at gmail.com Tue Nov 26 20:05:31 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Tue, 26 Nov 2019 12:05:31 -0800 Subject: Communication problem between ironic-python-agent and CI server. In-Reply-To: References: <6815c663fa6647f6bab938e8d4b751e6@lenovo.com> Message-ID: Jay reached out to me and in some discussion it seems like the following is occurring: * Ramdisk is loading from tftp_server * Conductor is not able to reach the 10.0.0.0/24 subnet where the ironic-python-agent is running * There appears to be a lack of a route inside the CI host that the conductor is operating on telling the host kernel to direct packets for IPA to the neutron router. Ramdisk loading would still work if egress traffic is being NAT translated, but ingress traffic would appear like this, ironic being unable to send packets because the conductor is communicating from the context of the CI host, and any namespaces created by neutron may not be directly reachable. -Julia On Tue, Nov 26, 2019 at 1:48 AM Mohammed Naser wrote: > > > > Sent from my iPhone > > On Nov 26, 2019, at 4:34 AM, Guannan GN2 Sun wrote: > >  > > Hi team, > > > I'm now trying to use ironic deployed with devstack to manage baremetal machine. > > However when it run into deploying stage, I open the BM server terminal and see it successfully load ramdisk and boot into it. It get the ip I assigned and I can ping it from CI server side. But it then deploy failed just about 2 minutes later. > > > When I check ironic-conductor log with command "sudo journalctl -a --unit devstack at ir-cond" and found error like this: > > ERROR ironic.drivers.modules.agent_client [None req-de37bc21-8d62-41db-8983-c06789939818 None None] Failed to connect to the agent running on node ea88ba26-756d-4d32-89f4-7ff086fa8868 for invoking command iscsi.start_iscsi_target. Error: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(, 'Connection to 10.0.0.25 timed out. (connect timeout=60)')): ConnectTimeout: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(, 'Connection to 10.0.0.25 timed out. (connect timeout=60)')) > > I can ping it from CI server side, so it is strange why the connection time out between ironic-python-agent and CI server. Does anyone meet similar problem or have idea about it? > > > You can ping it but can you make an HTTP request to port 9999 via something like curl? > > Thank you! > > > Best Regards, > > Guannan From mriedemos at gmail.com Tue Nov 26 20:24:36 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 26 Nov 2019 14:24:36 -0600 Subject: [nova][api] In-Reply-To: References: Message-ID: <5be847b4-f5f6-6365-6485-79c5b547a066@gmail.com> I'd recommend adding a subject title in the future. On 11/26/2019 12:08 PM, Surya Seetharaman wrote: > 2) change the 403 case as an error and raise it back to the compute api > caller - maybe enough time has passed to not worry about backward compat > with the old non-validating behavior Note that the APIs that would change are admin-only by default. So in this case nova is configured with a service user to check if the requested project_id exists on behalf of the (admin) user making the compute API request to add/remove flavor access (or update quota values for a project). The service user does not have enough permissions in keystone to check if the project exists. Option 1 is give that service user more authority. Option 2 is basically re-raise that error to the compute (admin) user to let them know they basically need to fix their deployment (option 1 again). > > Option 2 seems better than option 1 for most of us, however what we > cannot agree upon is if this change should be accompanied by a > microversion bump or not. As noted above, I don't think option 2 precludes option 1. The compute API (admin) user will just get a 403 rather than perhaps silently wrong 200 response. If they get a 403 they likely need to fix things which is option 1. I don't think a microversion is necessary for this if we go with option 2 since the admin user shouldn't have to opt into non-broken behavior. Yes the project_id validation stuff was added awhile ago but it was added without a microversion in its own right as a bug fix - and we used to get a *lot* of duplicate bugs about being able to use these APIs with garbage project IDs since we previously didn't validate. Here are a couple of other examples of (non-admin) APIs which changed from a bogus success response to a failure without a microversion: * Trying to attach an SR-IOV port to an instance. * Trying to rebuild a volume-backed server with a new image. -- Thanks, Matt From sundar.nadathur at intel.com Tue Nov 26 22:00:15 2019 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Tue, 26 Nov 2019 22:00:15 +0000 Subject: [nova] [cyborg] Impact of moving bind to compute Message-ID: Hi, We had a thread [1] on this subject from May of this year. The preference was that "ARQ creation happens in conductor and binding happens in compute" [2]. The ARQ binding involves device preparation and FPGA programming, which may take a while. So, it is done asynchronously. It is desirable to kickstart the binding ASAP, to maximize the overlap with other tasks needed for VM creation. We wound up doing all of binding in the compute for the following reason. If we call Cyborg to initiate ARQ binding and then wait for the notification event, we may miss the event if it comes in the window in between. So we had to call wait_for_instance_event() and, within its scope, call Cyborg for binding. This logic moved everything to compute. But now we are close to having an improved wait_for_instance_event() [3]. So I propose to: A. Start the binding in the conductor. This gets maximum concurrency between binding and other tasks. B. Wait for the binding notification in the compute manager (without losing the event). In fact, we can wait inside _build_resources, which is where Neutron/Cinder resources are gathered as well. That will allow for doing the cleanup in a consistent manner as today. C. Call Cyborg to get the ARQs in the virt driver, like today. Please LMK if you have any objections. [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006541.html [2] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006979.html [3] https://review.opendev.org/#/c/695985/ Regards, Sundar -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Tue Nov 26 22:10:30 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Tue, 26 Nov 2019 16:10:30 -0600 Subject: [cinder] Anastasiya accepted for Outreachy In-Reply-To: References: Message-ID: <20191126221030.GA107149@sm-workstation> On Tue, Nov 26, 2019 at 02:49:04PM -0500, Brian Rosmaita wrote: > On 11/26/19 12:45 PM, Sofia Enriquez wrote: > > Hi Cinder team, > > > > I'd like to announce that Anastasiya will be working with us improving > > the Tempest coverage this round. The internship schedule starts on Dec. > > 3, 2019, to March 3, 2020. Feel free to reach her on IRC /as anastzhyr/ > > if something comes up. > > Congratulations, Anastasiya! Improving tempest coverage is one of our > priorities for Ussuri, so I'm really glad you'll be working on this topic. > > Also, special thanks to you, Sofi, for acting as Anastasiya's mentor. > Totally agree! Welcome Anastasiya and thank you Sofi! Sean From openstack at fried.cc Tue Nov 26 22:35:13 2019 From: openstack at fried.cc (Eric Fried) Date: Tue, 26 Nov 2019 16:35:13 -0600 Subject: [nova] [cyborg] Impact of moving bind to compute In-Reply-To: References: Message-ID: <738df881-cf5f-619f-9ab3-d27404cef665@fried.cc> > A.      Start the binding in the conductor. This gets maximum > concurrency between binding and other tasks. > > B.      Wait for the binding notification in the compute manager > (without losing the event). In fact, we can wait inside > _build_resources, which is where Neutron/Cinder resources are gathered > as well. That will allow for doing the cleanup in a consistent manner as > today. +many From dms at danplanet.com Tue Nov 26 23:06:07 2019 From: dms at danplanet.com (Dan Smith) Date: Tue, 26 Nov 2019 15:06:07 -0800 Subject: [nova] [cyborg] Impact of moving bind to compute In-Reply-To: (Sundar Nadathur's message of "Tue, 26 Nov 2019 22:00:15 +0000") References: Message-ID: > But now we are close to having an improved wait_for_instance_event() [3]. So I propose to: > > A. Start the binding in the conductor. This gets maximum concurrency between binding and other tasks. > > B. Wait for the binding notification in the compute manager (without losing the event). In fact, we can wait inside _build_resources, which is where > Neutron/Cinder resources are gathered as well. That will allow for doing the cleanup in a consistent manner as today. > > C. Call Cyborg to get the ARQs in the virt driver, like today. We actually collect the neutron event in the virt driver. We kick off some of the early stuff in _build_resources(), but those are things that we want to be able to do from conductor. I'd ideally like to move the wait further down into the stack purely so we overlap with the image fetch. That's the thing that will take the longest on the compute node. If the system is unloaded, the conductor->compute->virt stuff could happen pretty quick, and if we wait a minute (for example) for programming to finish before we start spawn(), that's enough time that we could have potentially already finished the image fetch. This is also time where we're holding a spot in the parallel build limit queue, but we're not doing anything useful. That said, things can move around inside the compute manager and virt driver without affecting upgrades, so if it's easier to do it in _build_resources() now, we can see about optimizing later. It should, however, happen as the last step in _build_resources() so that we overlap with all the network and block stuff that happens there already. --Dan From dms at danplanet.com Tue Nov 26 23:16:47 2019 From: dms at danplanet.com (Dan Smith) Date: Tue, 26 Nov 2019 15:16:47 -0800 Subject: [nova] [cyborg] Impact of moving bind to compute In-Reply-To: (Sundar Nadathur's message of "Tue, 26 Nov 2019 22:00:15 +0000") References: Message-ID: > But now we are close to having an improved wait_for_instance_event() [3]. So I propose to: > > A. Start the binding in the conductor. This gets maximum concurrency between binding and other tasks. > > B. Wait for the binding notification in the compute manager (without losing the event). In fact, we can wait inside _build_resources, which is where > Neutron/Cinder resources are gathered as well. That will allow for doing the cleanup in a consistent manner as today. > > C. Call Cyborg to get the ARQs in the virt driver, like today. Sorry, I missed this. No, I don't think this is reasonable. I'm -5 on where you have it today. However, there is zero point in calling to cyborg in _build_resources() and then calling it again in the virt driver just a couple stack frames away. The point of _build_resources() is to collect resources that we need to clean up if we fail, and yield them to the build process. Store your ARQs there, pass them to the virt driver, and roll them back if you fail. --Dan From ces.eduardo98 at gmail.com Tue Nov 26 23:33:06 2019 From: ces.eduardo98 at gmail.com (Carlos Eduardo) Date: Tue, 26 Nov 2019 21:33:06 -0200 Subject: manila share group replication support In-Reply-To: References: Message-ID: Hi, Ding Dong! Currently, promoting several shares by their group is not implemented in Manila. I've realized you have searched for some references in the share groups documentation and this may not be the thing you're searching for. I've talked with Goutham today about this case and we have a possible solution for this. You can achieve this behavior by implementing an extra-spec for groups, in order to allow groups to be replicated. Then, introduce a "promote" API which will make you able to promote your groups. In this way, if the driver implements group replication, it will be able to request the creation of a new share server on the share network subnet for each new group of shares. So replicas will ble able to failover together when the share group replica get promoted. To implement it, we think that a good starting point is to write down a spec with the proposed changes. You can bring this discussion up in our virtual PTG as well. If you want to, please add a topic in [1]. [1] https://etherpad.openstack.org/p/manila-shanghai-ptg-planning Regards, Carlos Silva. Em seg., 25 de nov. de 2019 às 15:56, escreveu: > Hi, experts, > > > > I'm trying to implement the share replication feature for EMC Unity Manila > driver. > > > > Unity isn’t capable to promote a single share. The share must be promoted > together with its share server. > > The problem is that we have several shares exported from one single > server. So, promoting a share will cause all shares being promoted. > > > > My question is that is there any solution to promote several shares as > they are in a group? > > > > I was trying to find something useful about ‘manila group replication’ > ,only find Ocata Doc mentioned it, but no detail information: > > > https://specs.openstack.org/openstack/manila-specs/specs/ocata/manila-share-groups.html > > > > And there is no code or commit history matches ‘group replication’ in > Manila. > > > > Do you have any suggestions for our situation? > > > > Thanks, > > Ding Dong > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sundar.nadathur at intel.com Wed Nov 27 01:29:35 2019 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Wed, 27 Nov 2019 01:29:35 +0000 Subject: [nova] [cyborg] Impact of moving bind to compute In-Reply-To: References: Message-ID: > From: Dan Smith > Sent: Tuesday, November 26, 2019 3:17 PM > Subject: Re: [nova] [cyborg] Impact of moving bind to compute > > > But now we are close to having an improved wait_for_instance_event() [3]. > So I propose to: > > > > A. Start the binding in the conductor. This gets maximum concurrency > between binding and other tasks. > > > > B. Wait for the binding notification in the compute manager (without > losing the event). In fact, we can wait inside _build_resources, which is where > > Neutron/Cinder resources are gathered as well. That will allow for doing the > cleanup in a consistent manner as today. > > > > C. Call Cyborg to get the ARQs in the virt driver, like today. > > Sorry, I missed this. No, I don't think this is reasonable. I'm -5 on where you > have it today. However, there is zero point in calling to cyborg in > _build_resources() and then calling it again in the virt driver just a couple > stack frames away. The point of _build_resources() is to collect resources that > we need to clean up if we fail, and yield them to the build process. Store your > ARQs there, pass them to the virt driver, and roll them back if you fail. Agreed, thanks. > --Dan Regards, Sundar From iwienand at redhat.com Wed Nov 27 02:38:37 2019 From: iwienand at redhat.com (Ian Wienand) Date: Wed, 27 Nov 2019 13:38:37 +1100 Subject: [meta-sig][multi-arch] propose forming a Multi-arch SIG In-Reply-To: References: <20191121001509.GB976114@fedora19.localdomain> Message-ID: <20191127023837.GA1261300@fedora19.localdomain> On Tue, Nov 26, 2019 at 11:33:16AM +0000, Jonathan Rosser wrote: > openstack-ansible is ready to go on arm CI but in order to make the jobs run > in a reasonable time and not simply timeout a source of pre-built arm python > wheels is needed. It would be a shame to let the work that got contributed > to OSA for arm just rot. Yeah, for reference we started a story with this [1] and Paul even had a WIP change [2]. Another part of this is having local mirrors setup, which I've had a play with in [3]. But I've noticed that getting two nodes for testing here doesn't often work (see test history). At the moment we only have Linaro London in the rotation and my contacts there have moved on. I think before we go too far with building jobs, etc. we need to make sure we're consistently booting test instances there. I had a quick look at the nodepool logs and it seems the requests just time out -- without access to the backend logs I can't tell much else. I can certainly extract vm id's etc if we know a way to investigate. -i [1] https://storyboard.openstack.org/#!/story/2005353 [2] https://review.opendev.org/552700 [3] https://review.opendev.org/690798 From dangtrinhnt at gmail.com Wed Nov 27 03:09:21 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Wed, 27 Nov 2019 12:09:21 +0900 Subject: [auto-scaling][self-healing] Discussion to merge two SIG to one In-Reply-To: References: Message-ID: Hi Rico and others, Please let me know what I could do to help this process. Bests, On Fri, Nov 22, 2019 at 10:23 AM Duc Truong wrote: > +1 from me. > > On Thu, Nov 21, 2019 at 3:21 PM Trinh Nguyen > wrote: > >> Hi Rico, >> >> +1 >> That is a very good idea. Coincidentally, I'm working on some research >> projects that focus on autoscaling and self-healing at the same time. And >> the combined group would be a very good idea because I don't have to switch >> back and forth between the groups for discussion. >> >> Thanks, >> >> >> On Thu, Nov 21, 2019 at 12:57 AM Rico Lin >> wrote: >> >>> Dear all >>> >>> As we discussed in PTG about merge two SIG to one. >>> I would like to continue the discussion on ML. >>> >>> In PTG, Eric proposes the idea to merge two SIG due to the high >>> overlapping of domains and tasks. >>> >>> I think this is a great idea since, over the last 6 months, most of the >>> discussions in both SIG are overlapped. So I'm onboard with this idea. >>> >>> Here's how I think we can continue this idea: >>> >>> 1. Create new SIG (maybe 'Automation SIG'? feel free to propose name >>> which can cover both interest.) >>> 2. Redirect docs and wiki to new SIG. And rework on index so there >>> will be no confusion >>> 3. Move repos from both SIGs to new SIG >>> 4. Mark auto-scaling SIG and self-healing SIG as inactive. >>> 5. remove auto-scaling SIG and self-healing SIG after a >>> reasonable waiting time >>> >>> >>> Let us know what you think about this. Otherwise, we definitely expect >>> this to happen soon. >>> >>> >>> >>> -- >>> May The Force of OpenStack Be With You, >>> >>> *Rico Lin*irc: ricolin >>> >>> >> >> -- >> *Trinh Nguyen* >> *www.edlab.xyz * >> >> -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From sundar.nadathur at intel.com Wed Nov 27 05:56:47 2019 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Wed, 27 Nov 2019 05:56:47 +0000 Subject: [keystone][nova][barbican][neutron][cinder][tc][policy] Proposal for policy popup team In-Reply-To: <87abcd15-9376-421b-96d2-3597b7a1e356@www.fastmail.com> References: <16ea53d3e09.100373e5c247518.6873714586749636882@ghanshyammann.com> <87abcd15-9376-421b-96d2-3597b7a1e356@www.fastmail.com> Message-ID: > From: Colleen Murphy > Sent: Tuesday, November 26, 2019 8:51 AM > Subject: Re: [keystone][nova][barbican][neutron][cinder][tc][policy] Proposal > for policy popup team > > On Mon, Nov 25, 2019, at 18:12, Zhipeng Huang wrote: > > Thanks Colleen, > > > > As I shared with Cyborg team earlier, I think as a new project it > > would be beneficial that cyborg also participate in the popup team to > > get policy right from the get go > > Thanks Howard, I've added cyborg to the page and added you as liaison. > > Colleen Please nominate Yumeng Bao (yumeng_bao at yahoo.com) as the liaison with Cyborg team. She will contribute the spec. We have informed Howard of this. Thanks to Howard for bringing up Cyborg in this context. Regards, Sundar From balazs.gibizer at est.tech Wed Nov 27 08:26:42 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Wed, 27 Nov 2019 08:26:42 +0000 Subject: [nova][api] In-Reply-To: References: Message-ID: <1574843198.31688.7@est.tech> On Tue, Nov 26, 2019 at 19:08, Surya Seetharaman wrote: > Hello everyone, > > We came across this bug [1] in nova recently and wanted to know what > people think is the best (relatively) way to fix this. > > In the past, the project id validation was added as a best effort to > prevent users from being able to enter random values into the > database. When this validation is used from the os flavor set/unset > admin apis [2], there are chances that keystone returns a 403 which > gets silently ignored by nova [3] allowing the user to enter the > provided project_id/name without validation or warning or remove an > existing flavor-project mapping. There were a couple of options > discussed on IRC [4] to fix this behaviour out of which the > practically reasonable ones are: > > 1) close the bug as invalid - tweak your config (we could add docs, > idk if that would be found or help) to do what you need to avoid the > 403 from keystone > 2) change the 403 case as an error and raise it back to the compute > api caller - maybe enough time has passed to not worry about backward > compat with the old non-validating behavior > > Option 2 seems better than option 1 for most of us, however what we > cannot agree upon is if this change should be accompanied by a > microversion bump or not. My 2 cents: Make the problem explicit by raising the error back to the caller (which is the admin by default), enhance our docs to help the admin to fix the nova service user's permissions to avoid the 403. gibi > > [1] https://bugs.launchpad.net/nova/+bug/1854053 > [2] > https://github.com/openstack/nova/blob/fd67f69cfdaf04620f2e8a5f1fbf5737096965d8/nova/api/openstack/compute/flavor_access.py#L64 > [3] > https://github.com/openstack/nova/blob/d621914442855ce67ce0b99003f7e69e8ee515e6/nova/api/openstack/identity.py#L61 > [4] > http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-11-26.log.html#t2019-11-26T16:20:24 > > Cheers, > Surya. From surya.seetharaman9 at gmail.com Wed Nov 27 08:45:50 2019 From: surya.seetharaman9 at gmail.com (Surya Seetharaman) Date: Wed, 27 Nov 2019 09:45:50 +0100 Subject: [nova][api] Behaviour of project_id validation In-Reply-To: References: Message-ID: Apologies, like Matt pointed out I sort of forgot to add the title in my original email. On Tue, Nov 26, 2019 at 7:08 PM Surya Seetharaman < surya.seetharaman9 at gmail.com> wrote: > Hello everyone, > > We came across this bug [1] in nova recently and wanted to know what > people think is the best (relatively) way to fix this. > > In the past, the project id validation was added as a best effort to > prevent users from being able to enter random values into the database. > When this validation is used from the os flavor set/unset admin apis [2], > there are chances that keystone returns a 403 which gets silently ignored > by nova [3] allowing the user to enter the provided project_id/name without > validation or warning or remove an existing flavor-project mapping. There > were a couple of options discussed on IRC [4] to fix this behaviour out of > which the practically reasonable ones are: > > 1) close the bug as invalid - tweak your config (we could add docs, idk if > that would be found or help) to do what you need to avoid the 403 from > keystone > 2) change the 403 case as an error and raise it back to the compute api > caller - maybe enough time has passed to not worry about backward compat > with the old non-validating behavior > > Option 2 seems better than option 1 for most of us, however what we cannot > agree upon is if this change should be accompanied by a microversion bump > or not. > > [1] https://bugs.launchpad.net/nova/+bug/1854053 > [2] > https://github.com/openstack/nova/blob/fd67f69cfdaf04620f2e8a5f1fbf5737096965d8/nova/api/openstack/compute/flavor_access.py#L64 > [3] > https://github.com/openstack/nova/blob/d621914442855ce67ce0b99003f7e69e8ee515e6/nova/api/openstack/identity.py#L61 > [4] > http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-11-26.log.html#t2019-11-26T16:20:24 > > Cheers, > Surya. > -- Regards, Surya. -------------- next part -------------- An HTML attachment was scrubbed... URL: From surya.seetharaman9 at gmail.com Wed Nov 27 09:35:43 2019 From: surya.seetharaman9 at gmail.com (Surya Seetharaman) Date: Wed, 27 Nov 2019 10:35:43 +0100 Subject: [nova][api] Behaviour of project_id validation In-Reply-To: <5be847b4-f5f6-6365-6485-79c5b547a066@gmail.com> References: <5be847b4-f5f6-6365-6485-79c5b547a066@gmail.com> Message-ID: On Tue, Nov 26, 2019 at 9:26 PM Matt Riedemann wrote: > Note that the APIs that would change are admin-only by default. So in > this case nova is configured with a service user to check if the > requested project_id exists on behalf of the (admin) user making the > compute API request to add/remove flavor access (or update quota values > for a project). The service user does not have enough permissions in > keystone to check if the project exists. Option 1 is give that service > user more authority. Option 2 is basically re-raise that error to the > compute (admin) user to let them know they basically need to fix their > deployment (option 1 again). > > > A combo of both solutions where we raise the error to the user and amend our docs to help them fix it seems good to me. > > I don't think a microversion is necessary for this ++ ---------- Cheers, Surya. -------------- next part -------------- An HTML attachment was scrubbed... URL: From C-Ramakrishna.Bhupathi at charter.com Wed Nov 27 13:07:05 2019 From: C-Ramakrishna.Bhupathi at charter.com (Bhupathi, Ramakrishna) Date: Wed, 27 Nov 2019 13:07:05 +0000 Subject: [Kolla] FAILED - RETRYING: wait for MariaDB to be available via HAProxy (10 retries left). In-Reply-To: References: Message-ID: <93cc6a74eef6487ea01cfeba46f8b0ac@NCEMEXGP032.CORP.CHARTERCOM.com> Thanks Mark. The deploy worked after a reboot of my server. Appreciate your help. --RamaK -----Original Message----- From: Mark Goddard [mailto:mark at stackhpc.com] Sent: Tuesday, November 26, 2019 10:05 AM To: Bhupathi, Ramakrishna Cc: openstack-discuss at lists.openstack.org Subject: Re: [Kolla] FAILED - RETRYING: wait for MariaDB to be available via HAProxy (10 retries left). On Tue, 26 Nov 2019 at 14:12, Bhupathi, Ramakrishna wrote: > > Mark, > Thanks for the tip. > > Now I get a failure (as below) during deploy (I have all-in-one) config. I am on stein. > It appears to me that this has been presumably fixed . But I still running into this error. > > I do not see anything explicit in the HAProxy/MariaDB logs. Are both the haproxy and mariadb containers up and not restarting? Does your api_interface have the VIP activated? Is haproxy listening on 10.1.0.250:3306 (check ss)? > > > > TASK [mariadb : wait for MariaDB to be available via HAProxy] > ********************************************************************** > **************************** FAILED - RETRYING: wait for MariaDB to be > available via HAProxy (10 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (9 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (8 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (7 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (6 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (5 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (4 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (3 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (2 retries left). > FAILED - RETRYING: wait for MariaDB to be available via HAProxy (1 retries left). > fatal: [localhost]: FAILED! => {"attempts": 10, "changed": false, > "elapsed": 60, "msg": "Timeout when waiting for search string MariaDB > in 10.1.0.250:3306"} > > NO MORE HOSTS LEFT > ********************************************************************** > ********************************************************************** > * > > PLAY RECAP ***************************************************************************************************************************************************** > localhost : ok=76 changed=0 unreachable=0 failed=1 skipped=74 rescued=0 ignored=0 > > Command failed ansible-playbook -i ../../all-in-one -e > @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e > CONFIG_DIR=/etc/kolla -e kolla_action=deploy > /home/cloud-user/kolla-ansible/ansible/site.yml > > --RamaK > > > > > > -----Original Message----- > From: Mark Goddard [mailto:mark at stackhpc.com] > Sent: Monday, November 25, 2019 11:08 AM > To: Bhupathi, Ramakrishna > Cc: openstack-discuss at lists.openstack.org > Subject: Re: [Kolla] RabbitMQ failure during deploy (Openstack with > Kolla) > > On Mon, 25 Nov 2019 at 13:54, Bhupathi, Ramakrishna wrote: > > > > Hey , I am evaluating Openstack with Kolla (have the latest ) and following the steps. I see that rabbitmq fails to start up and (keeps restarting). Essentially the script fails in deploy. I have a single node all in one config. > > > > > > > > Can someone tell me what the cause for this failure? > > > > > > > > > > > > RUNNING HANDLER [rabbitmq : Waiting for rabbitmq to start on first > > node] > > ******************************************************************** > > ** > > ****************************************** > > > > fatal: [localhost]: FAILED! => {"changed": true, "cmd": "docker exec > > rabbitmq rabbitmqctl wait /var/lib/rabbitmq/mnesia/rabbitmq.pid", > > "delta": "0:00:07.732346", "end": "2019-11-25 13:33:12.102342", "msg": > > "non-zero return code", "rc": 137, "start": "2019-11-25 > > 13:33:04.369996", "stderr": "", "stderr_lines": [], "stdout": > > "Waiting for 'rabbit at kolla-ubuntu'\npid is 6", "stdout_lines": > > ["Waiting for 'rabbit at kolla-ubuntu'", "pid is 6"]} > > > > > > > > RUNNING HANDLER [rabbitmq : Restart rabbitmq container (rest of > > nodes)] > > ******************************************************************** > > ** > > ******************************************* > > > > > > > > NO MORE HOSTS LEFT > > ******************************************************************** > > ** > > ******************************************************************** > > ** > > ************************** > > > > > > > > PLAY RECAP > > ******************************************************************** > > ** > > ******************************************************************** > > ** > > ********************************** > > > > localhost : ok=95 changed=2 unreachable=0 failed=1 skipped=78 rescued=0 ignored=0 > > > > > > > > Command failed ansible-playbook -i ./all-in-one -e > > @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e > > CONFIG_DIR=/etc/kolla -e kolla_action=deploy > > /home/ubuntu/venv/share/kolla-ansible/ansible/site.yml > > > > > > > > > > > > Accessing the container logs for rabbitmq .. this is what I see > > > > > > > > BOOT FAILED > > > > =========== > > > > > > > > Error description: > > > > {could_not_start,rabbitmq_management, > > > > {rabbitmq_management, > > > > {bad_return, > > > > {{rabbit_mgmt_app,start,[normal,[]]}, > > > > {'EXIT', > > > > {{could_not_start_listener, > > > > [{port,15672}], > > > > {shutdown, > > > > {failed_to_start_child,ranch_acceptors_sup, > > > > > > {listen_error,rabbit_web_dispatch_sup_15672,eaddrinuse}}}}, > > Hi RamaK, here is your issue. RabbitMQ management fails to start up due to the port already being in use (eaddrinuse). Perhaps RabbitMQ is running on the host already? > > > > > {gen_server,call, > > > > [rabbit_web_dispatch_registry, > > > > {add,rabbit_mgmt, > > > > [{port,15672}], > > > > #Fun, > > > > > > > > --RamaK > > > > The contents of this e-mail message and any attachments are intended > > solely for the > > addressee(s) and may contain confidential and/or legally privileged > > information. If you are not the intended recipient of this message > > or if this message has been addressed to you in error, please > > immediately alert the sender by reply e-mail and then delete this > > message and any attachments. If you are not the intended recipient, > > you are notified that any use, dissemination, distribution, copying, > > or storage of this message or any attachment is strictly prohibited. > E-MAIL CONFIDENTIALITY NOTICE: > The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. E-MAIL CONFIDENTIALITY NOTICE: The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. From thierry at openstack.org Wed Nov 27 13:22:56 2019 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 27 Nov 2019 14:22:56 +0100 Subject: [largescale-sig] First meeting summary and next actions Message-ID: <334d7d42-b745-ef64-df84-6a215d3e53e5@openstack.org> Hi everyone, As announced in [1], the Large Scale SIG had its first IRC meeting today. You can access summary and logs at: http://eavesdrop.openstack.org/meetings/large_scale_sig/2019/large_scale_sig.2019-11-27-09.00.html This meeting was mostly focused on agreeing on team logistics, and an early discussion on short-term objectives for the group. As a result of this meeting I created: - Reference information about the SIG at https://wiki.openstack.org/wiki/Large_Scale_SIG - A change formally proposing the SIG at https://review.opendev.org/696302 - Two etherpads to further refine our two initial ideas for SIG short-term goals: Scaling within one cluster, and instrumentation of the bottlenecks: https://etherpad.openstack.org/p/large-scale-sig-cluster-scaling Document large scale configuration and tips &tricks: https://etherpad.openstack.org/p/large-scale-sig-documentation Between now and next meeting, the idea is to validate those initial documents, get the SIG formally set up, and refine those goals on those two etherpads, ideally splitting them into early objectives and initial steps. Current proposal is to have our next meeting on Dec 18. Belmiro is unfortunately not available on that week, but the week before amorin and myself are not available, and the week(s) after are the end-of-year holidays for some of us, so that is still probably the best date before next year. Cheers, [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-November/011062.html -- Thierry Carrez (ttx) From rico.lin.guanyu at gmail.com Wed Nov 27 13:43:34 2019 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Wed, 27 Nov 2019 21:43:34 +0800 Subject: [meta-sig] Add Thierry Carrez to core reviewer Message-ID: I would like to propose adding Thierry to core reviewer for governance-sigs I don't think anyone will doubt his knowledge of maintaining governance-sigs repo. He will be able to do core review on all patches. To create SIG patch, it will require both UC and TC to give their blessing. That means besides Amy's +2, we only need Thierry or mine +2 to approve the workflow. Let me know what you think if you have other thoughts, otherwise, I will add him in core think week. -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From bence.romsics at gmail.com Wed Nov 27 14:28:53 2019 From: bence.romsics at gmail.com (Bence Romsics) Date: Wed, 27 Nov 2019 15:28:53 +0100 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: <844b9328536d1a9138b09d2439e31c934d571754.camel@redhat.com> References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> <1574439261.31688.6@est.tech> <74a68d39-f46d-61a7-7ba1-6d2cd765e6df@gmail.com> <64fbaf8bb79dbb74d472f055405b0901bc51ca65.camel@redhat.com> <844b9328536d1a9138b09d2439e31c934d571754.camel@redhat.com> Message-ID: Hi, Matt wrote: > The point of the API change is to let neutron do everything > automatically and avoid additional configuration, right? The > configuration on the neutron side was meant to be a workaround. I had the following reasons to propose using configuration: (1) Using a nova API from neutron-server means we'd have a startup ordering dependency. Neutron would have to wait for nova to start up. So far it had to wait for placement - there we did not have an alternative. Here we have. (2) We cannot use a new nova API for fixing this bug in stein and train. If the backportable fix works on master too, do we want to create a different fix for master? Sean wrote: > > (1) Extend ovs and sriov agents config with > > 'resource_provider_hypervisors' for example: > > > > ml2_conf.ini > > [ovs] > > bridge_mappings = physnet0:br-physnet0,... > > resource_provider_bandwidths = br-physnet0:10000000:10000000,... > > resource_provider_hypervisors = physnet0:hypervisor0,... # this is > im guessing you are adding this for the ironic smart nic usecasue but i dont think this makes sense. That's right. I made a mistake there. Thanks for catching it. I meant keying resource_provider_hypervisors by the physdevs (the values in mappings) not the physnets. For example: resource_provider_hypervisors = br-physnet0:hypervisor0,... I understand we don't have use for this right now (neutron with ironic does not use placement), but if we don't allow the 1:N mapping here, then we prohibit the future use of it too. Cheers, Bence From smooney at redhat.com Wed Nov 27 14:59:26 2019 From: smooney at redhat.com (Sean Mooney) Date: Wed, 27 Nov 2019 14:59:26 +0000 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> <1574439261.31688.6@est.tech> <74a68d39-f46d-61a7-7ba1-6d2cd765e6df@gmail.com> <64fbaf8bb79dbb74d472f055405b0901bc51ca65.camel@redhat.com> <844b9328536d1a9138b09d2439e31c934d571754.camel@redhat.com> Message-ID: On Wed, 2019-11-27 at 15:28 +0100, Bence Romsics wrote: > Hi, > > Matt wrote: > > The point of the API change is to let neutron do everything > > automatically and avoid additional configuration, right? The > > configuration on the neutron side was meant to be a workaround. > > I had the following reasons to propose using configuration: > > (1) Using a nova API from neutron-server means we'd have a startup > ordering dependency. Neutron would have to wait for nova to start up. > So far it had to wait for placement - there we did not have an > alternative. Here we have. > > (2) We cannot use a new nova API for fixing this bug in stein and > train. If the backportable fix works on master too, do we want to > create a different fix for master? > > Sean wrote: > > > (1) Extend ovs and sriov agents config with > > > 'resource_provider_hypervisors' for example: > > > > > > ml2_conf.ini > > > [ovs] > > > bridge_mappings = physnet0:br-physnet0,... > > > resource_provider_bandwidths = br-physnet0:10000000:10000000,... > > > resource_provider_hypervisors = physnet0:hypervisor0,... # this is > > > > im guessing you are adding this for the ironic smart nic usecasue but i dont think this makes sense. > > That's right. I made a mistake there. Thanks for catching it. I meant > keying resource_provider_hypervisors by the physdevs (the values in > mappings) not the physnets. For example: > > resource_provider_hypervisors = br-physnet0:hypervisor0,... this also wont work as the same bridge name will exists on multipel hosts > > I understand we don't have use for this right now (neutron with ironic > does not use placement), but if we don't allow the 1:N mapping here, > then we prohibit the future use of it too. > > Cheers, > Bence > From bence.romsics at gmail.com Wed Nov 27 15:20:01 2019 From: bence.romsics at gmail.com (Bence Romsics) Date: Wed, 27 Nov 2019 16:20:01 +0100 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> <1574439261.31688.6@est.tech> <74a68d39-f46d-61a7-7ba1-6d2cd765e6df@gmail.com> <64fbaf8bb79dbb74d472f055405b0901bc51ca65.camel@redhat.com> <844b9328536d1a9138b09d2439e31c934d571754.camel@redhat.com> Message-ID: > > resource_provider_hypervisors = br-physnet0:hypervisor0,... > this also wont work as the same bridge name will exists on multipel hosts Of course the same bridge/nic name can exist on multiple hosts. And each report_state message is clearly belonging to a single agent and the configurations field is persisted per agent, so there won't be a collision ever. From i at liuyulong.me Wed Nov 27 15:33:02 2019 From: i at liuyulong.me (=?utf-8?B?TElVIFl1bG9uZw==?=) Date: Wed, 27 Nov 2019 23:33:02 +0800 Subject: [neutron][L3][OVN] add a time slot for OVN L3 functionality in the L3 meeting Message-ID: Hi all, Neutron team recently merged the convergence ML2+OVS and OVN [1], it was also discussed during the PTG in Shanghai [2]. And more about that is the train is now getting started [3][4]. So during the L3 meeting, we may want to add a new section in L3 meeting to talk about some OVN L3 related topics, such as feature gaps, bugs, new ideas or anything needs to draw the attention of the neutron team. OVN is also a long-developed project with various functions. So, an independent L3 meeting time looks appropriate, we cannot cover all the OVN stuff in one meeting. So if you are interested in OVN and some L3 stuff, do not be hesitant, please come to the L3 meeting with your thoughts, ideas or questions. The Neutron L3 Sub-team Meeting [5] will be held weekly on Wednesday at 14:00 UTC in irc channel #openstack-meeting. [1] http://specs.openstack.org/openstack/neutron-specs/specs/ussuri/ml2ovs-ovn-convergence.html [2] http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010702.html [3] https://blueprints.launchpad.net/neutron/+spec/neutron-ovn-merge [4] https://review.opendev.org/#/q/topic:bp/neutron-ovn-merge [5] http://eavesdrop.openstack.org/#Neutron_L3_Sub-team_Meeting Regards, LIU Yulong -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Wed Nov 27 17:02:38 2019 From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Wed, 27 Nov 2019 17:02:38 +0000 Subject: [cinder][ops][extended-maintenance-sig][public-cloud-sig][enterprise-wg] Cinder to EOL some branches In-Reply-To: <3c5fd6e6-8ae4-d300-71a7-97b22431cb3b@gmail.com> References: <3c5fd6e6-8ae4-d300-71a7-97b22431cb3b@gmail.com> Message-ID: <7e8b4158-89b2-6207-6f06-215782e0b126@est.tech> Hi, First of all, sorry for the late response. About EOLing Ocata and Pike: Extended Maintenance was formed just to have a common place for interested parties, vendors, operators, to push bugfixes to. Currently these branches are in a good shape, check / gate jobs work and as far as I see there are a couple of backports, too (not too many, though). So hopefully it's not a waste of efforts. Why don't we keep them open as long as the CI works and there are patches? Of course, whenever e.g. Ocata branch / CI becomes unmaintainable (zuul v3 incompatibilities) or there aren't anyone who fixes the issues there, we can put it to EOL then. I write this, because my employer supports Extended Maintenance, and I usually try to fix CI issues on stable branches and reviewing patches there. So maybe I can be some help here, too. Please consider leaving branches in Extended Maintenance open as long as they are in a good shape and there are bugfixes coming. Thanks, Előd On 2019. 11. 25. 20:21, Brian Rosmaita wrote: > This is a courtesy notice that having received no responses to my > email of 28 October [0] proposing to EOL some currently open Cinder > branches, and following the policy articulated in [1], at today's > Virtual PTG meeting the Cinder project team has decided to put the > following stable branches into the End of Life state: >   driverfixes/mitaka >   driverfixes/newton >   stable/ocata >   stable/pike > > I will submit the paperwork to get this process moving one week from > today (2 December 2019). > > > cheers, > brian > > [0] > http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010385.html > [1] > https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life > From gmann at ghanshyammann.com Wed Nov 27 17:02:49 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 27 Nov 2019 11:02:49 -0600 Subject: [cinder] Anastasiya accepted for Outreachy In-Reply-To: <20191126221030.GA107149@sm-workstation> References: <20191126221030.GA107149@sm-workstation> Message-ID: <16eadd0c11e.10a571b60328058.1259845006172962881@ghanshyammann.com> ---- On Tue, 26 Nov 2019 16:10:30 -0600 Sean McGinnis wrote ---- > On Tue, Nov 26, 2019 at 02:49:04PM -0500, Brian Rosmaita wrote: > > On 11/26/19 12:45 PM, Sofia Enriquez wrote: > > > Hi Cinder team, > > > > > > I'd like to announce that Anastasiya will be working with us improving > > > the Tempest coverage this round. The internship schedule starts on Dec. > > > 3, 2019, to March 3, 2020. Feel free to reach her on IRC /as anastzhyr/ > > > if something comes up. > > > > Congratulations, Anastasiya! Improving tempest coverage is one of our > > priorities for Ussuri, so I'm really glad you'll be working on this topic. > > > > Also, special thanks to you, Sofi, for acting as Anastasiya's mentor. > > > > Totally agree! Welcome Anastasiya and thank you Sofi! Thanks. One question on the scope of Tempest test coverage. Is it for overall Tempest coverage for all projects or just Cinder Tempest tests? -gmann > > Sean > > From smooney at redhat.com Wed Nov 27 17:03:16 2019 From: smooney at redhat.com (Sean Mooney) Date: Wed, 27 Nov 2019 17:03:16 +0000 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> <1574439261.31688.6@est.tech> <74a68d39-f46d-61a7-7ba1-6d2cd765e6df@gmail.com> <64fbaf8bb79dbb74d472f055405b0901bc51ca65.camel@redhat.com> <844b9328536d1a9138b09d2439e31c934d571754.camel@redhat.com> Message-ID: On Wed, 2019-11-27 at 16:20 +0100, Bence Romsics wrote: > > > resource_provider_hypervisors = br-physnet0:hypervisor0,... > > > > this also wont work as the same bridge name will exists on multipel hosts > > Of course the same bridge/nic name can exist on multiple hosts. And > each report_state message is clearly belonging to a single agent and > the configurations field is persisted per agent, so there won't be a > collision ever. > that is in the non iroinc smart nic case. in the ironic smart nic case with the ovs super agent which is the only case where there would be multiple hypervisor managed by the same agent the agent will be remote. so in the non ironic case it does not need to be a list. in the smartnic case it might need to be a list but a mapping of bridge or pyshnet wont be unique and a agent hostname (CONF.host) to hypervior host would be 1:N so its not clear how you would select form the N RPs if all you know form nova is the binding host which is the service host not hypervior hostname. From gmann at ghanshyammann.com Wed Nov 27 17:35:35 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 27 Nov 2019 11:35:35 -0600 Subject: [meta][K8s][API][Extended Maintenance][Operation Docs] Change SIG to Advisory status In-Reply-To: References: Message-ID: <16eadeec01a.10ce8ff56329123.2937780973121264558@ghanshyammann.com> ---- On Fri, 22 Nov 2019 01:03:47 -0600 Rico Lin wrote ---- > Dear SIG members and chairs > To follow the discussion in Meta SIG PTG room [1]. I would like to propose to change the following SIGs to Advisory status [2] to represent the SIG stays around for provide help, make sure everything stays working and provide advice when needed.K8s SIG > API SIG > Extended Maintenance SIG > Operation Docs SIG > If you think your SIG should not belong to `advisory` statue. Please advise from the following statuses:active: SIG reaches out for discussion and event, have plans for the current cycle, host meetings or send ML out regularly. > > forming: SIG still setting up. > > advisory: SIG stays around for help, make sure everything stays working and provide advice when needed. Thanks a lot Rico for all your effort for the SIG help and management. +1. I like the 'advisory' status which will clear out the difference between inactive and "active but with on-demand services only". One question on this status- Does this still include the updates on repo/guidance doc etc ? For Example: if I want to add a few more guidelines or changes to current repo/doc-sites etc then this SIG will still go with usual discussion and review process and not saying that 'we are in an advisory role so we have closed any repo/doc update'. It will be good if you can explain those statuses in detail with their scope activity in sig-guidelines doc. > API SIG Is it ok to move api-sig to 'advisory'? I think Michael mentioned about some work to finish on traiging the current open issues/todo etc. Should we wait for that work to be finished? May be Michael or Dmitry can update on the latest status. > complete: SIG completes its mission. Can we include some status for the inactive SIG who has not completed its mission? something like 'On-Hold' or 'Need-help' etc. It will help if anyone looking for that SIG can help or manage. It is more like backlogs for history and reference if same type of problem comes and someone wants to form a SIG. [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-August/008683.html -gmann > > If that sounds correct, I need at least one chair from each SIG +1 on [2], so we can make sure it's what SIGs agreed on. > [1] https://etherpad.openstack.org/p/PVG-meta-sig[2] https://review.opendev.org/#/c/695625/ > > -- > May The Force of OpenStack Be With You, > Rico Lin > irc: ricolin > > > > From Dong.Ding at dell.com Wed Nov 27 02:45:49 2019 From: Dong.Ding at dell.com (Dong.Ding at dell.com) Date: Wed, 27 Nov 2019 02:45:49 +0000 Subject: manila share group replication support In-Reply-To: References: Message-ID: Hi, Carlos, Thanks for your quick reply. I have write the topic at the end of note. https://etherpad.openstack.org/p/manila-shanghai-ptg-planning See if it’s ready to review. Thanks, Ding Dong From: Carlos Eduardo Sent: Wednesday, November 27, 2019 7:33 AM To: Ding, Dong Cc: openstack-discuss at lists.openstack.org; Liang, Ryan; Sun, Hao; Huang, Yong Subject: Re: manila share group replication support [EXTERNAL EMAIL] Hi, Ding Dong! Currently, promoting several shares by their group is not implemented in Manila. I've realized you have searched for some references in the share groups documentation and this may not be the thing you're searching for. I've talked with Goutham today about this case and we have a possible solution for this. You can achieve this behavior by implementing an extra-spec for groups, in order to allow groups to be replicated. Then, introduce a "promote" API which will make you able to promote your groups. In this way, if the driver implements group replication, it will be able to request the creation of a new share server on the share network subnet for each new group of shares. So replicas will ble able to failover together when the share group replica get promoted. To implement it, we think that a good starting point is to write down a spec with the proposed changes. You can bring this discussion up in our virtual PTG as well. If you want to, please add a topic in [1]. [1] https://etherpad.openstack.org/p/manila-shanghai-ptg-planning Regards, Carlos Silva. Em seg., 25 de nov. de 2019 às 15:56, > escreveu: Hi, experts, I'm trying to implement the share replication feature for EMC Unity Manila driver. Unity isn’t capable to promote a single share. The share must be promoted together with its share server. The problem is that we have several shares exported from one single server. So, promoting a share will cause all shares being promoted. My question is that is there any solution to promote several shares as they are in a group? I was trying to find something useful about ‘manila group replication’ ,only find Ocata Doc mentioned it, but no detail information: https://specs.openstack.org/openstack/manila-specs/specs/ocata/manila-share-groups.html And there is no code or commit history matches ‘group replication’ in Manila. Do you have any suggestions for our situation? Thanks, Ding Dong -------------- next part -------------- An HTML attachment was scrubbed... URL: From deepa.kr at fingent.com Wed Nov 27 05:19:23 2019 From: deepa.kr at fingent.com (Deepa) Date: Wed, 27 Nov 2019 10:49:23 +0530 Subject: Freezer Project Update In-Reply-To: <8CAA057D-CC0B-4803-BA20-7A252EABD600@dkrz.de> References: <003201d59aaf$cf415fa0$6dc41ee0$@fingent.com> <14B06106-F3AB-47DA-BD35-99B18383C147@dkrz.de> <005401d59f9f$025a81c0$070f8540$@fingent.com> <8CAA057D-CC0B-4803-BA20-7A252EABD600@dkrz.de> Message-ID: <000401d5a4e2$3d12bce0$b73836a0$@fingent.com> Hello Amjad Thanks for the tips. We were able to fix freezer-agent/freezer-scheduler in VM’s also fixed errors at freezer-webui end .Now able to create action(backup ) , schedule job etc. Restoration part not yet successful. We are working on it .But the main drawback we can see is its not listing the backups under backup tab inside Disaster management nor with freezer backup-list command. Hence while trying to restore missing the backup id/name . Were able to see few bugs related to it ?Any idea how this works ? https://bugs.launchpad.net/freezer/+bug/1602253 https://bugs.launchpad.net/freezer/+bug/1603097 https://bugs.launchpad.net/freezer/+bug/1614453 Regards, Deepa K R From: Amjad Kotobi Sent: Saturday, November 23, 2019 11:48 PM To: Deepa Cc: James Page ; OpenStack Development Mailing List (not for usage questions) Subject: Re: Freezer Project Update Hi, I’ve tried with latest version but didn’t go for production with webui. I deployed and patched it in standalone way which Freezer-api runs in VM. You are able to run freezer-agent without having scheduler|api running, you only need to export admin-rc. On 20. Nov 2019, at 13:35, Deepa > wrote: Hello Amjad/James We tried installing Freezer .Freezer-scheduler and Freezer-agent in a VM which need to be backed up and Freezer-api and freezer-webui on controller node.The version of Openstack is Train. Unfortunately nothing worked out ☹ Most likely [keystone] section of freezer-api not configured properly which those configuration not coming from installation package. I will paste you later the configuration structure. Getting below error on VM (client to be backed up) when I run freezer-agent --action info Critical Error: Authorization Failure. Authorization Failed: Not Found (HTTP 404) (Request-ID: req-0c71d8b4-ef1a-4c8d-8d12-26df763f5085) And getting error in Dashboard when we enabled Freezer-api and freezer-webui in dashboard During handling of the above exception ([*] Error 401: {"error": {"message": "The request you have made requires authentication.", "code": 401, "title": "Unauthorized"}}), another exception occurred: Doubt the issue is with keystone versioning v2/v3.It will be great if you can share or tell freeze.env file for Freezer-agent (For client VM) and freezer-api.conf file parameters. Also what should be admin.rc file for freezer-webui. Correct it’s not api version thing. freezer-scheduler --config-file /etc/freezer/scheduler.conf start Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Not Found (HTTP 404) (Request-ID: req-2adb597d-7ad5-45a8-9888-6b552c8e55cc) Any guidance is highly appreciated . Thanks a lot Regards, Deepa K R Amjad From: James Page > Sent: Tuesday, November 19, 2019 10:19 PM To: Amjad Kotobi > Cc: Deepa >; OpenStack Development Mailing List (not for usage questions) > Subject: Re: Freezer Project Update Hello On Fri, Nov 15, 2019 at 7:43 PM Amjad Kotobi < kotobi at dkrz.de> wrote: Hi, This project is pretty much in production state, from last summit it got active again from developer ends, we are using it for backup solution too. Great to hear that Freezer is getting some increased developer focus! Documentation side isn’t that bright, very soon gonna get updated, anyhow you are able to install as standalone project in instance, I did it manually, didn’t use any provision tools. Let me know for specific part of deployment that is not clear. Amjad On 14. Nov 2019, at 06:53, Deepa < deepa.kr at fingent.com> wrote: Hello Team Good Day I am Deepa from Fingent Global Solutions and we are a big fan of Openstack and we do have 4 + openstack setup (including production) We have deployed Openstack using juju and Maas .So when we check for backup feasibility other than cinder-backup we were able to see Freezer Project. But couldn’t find any charms for it in juju charms. Also there isn’t a clear documentation on how to install freezer . https://docs.openstack.org/releasenotes/freezer/train.html. No proper release notes in the latest version as well. Can you please tell me whether this project is in developing state? Whether charms will be added to juju in future. Freezer is not currently on the plan for OpenStack Charms for Ussuri. Better install documentation and support from Linux distros would be a good first step in the right direction. Cheers James -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 29286 bytes Desc: not available URL: From kurt at garloff.de Wed Nov 27 05:33:26 2019 From: kurt at garloff.de (Kurt Garloff) Date: Wed, 27 Nov 2019 06:33:26 +0100 Subject: [OpenStack Foundation] [tc][board][all] - Adding OpenStack community support to the savedotorg campaign In-Reply-To: References: Message-ID: <29A4C161-3519-463C-A568-5A25651002C1@garloff.de> Hi, On 26 November 2019 15:14:29 CET, Sean McGinnis wrote: >On Tue, Nov 26, 2019 at 7:00 AM Graham Hayes wrote: >>[...] >> What do people think? Are we happy to have the TC use its voice on >> behalf of the OpenStack project, or do we think the board should use >> its voice on behalf of the entire foundation? >> >> - Graham >> >> >> 1 - https://savedotorg.org/ >> > >I agree this is a Foundation level thing (though individuals can add >their names to the petition as well). I would support adding OSF to the list >of organizations. +1 -- Kurt Garloff , Cologne, Germany (Sent from Mobile with K9.) From sungn2 at lenovo.com Wed Nov 27 08:42:47 2019 From: sungn2 at lenovo.com (Guannan GN2 Sun) Date: Wed, 27 Nov 2019 08:42:47 +0000 Subject: =?utf-8?B?562U5aSNOiBbRXh0ZXJuYWxdICBSZTogQ29tbXVuaWNhdGlvbiBwcm9ibGVt?= =?utf-8?Q?_between_ironic-python-agent_and_CI_server.?= In-Reply-To: References: <6815c663fa6647f6bab938e8d4b751e6@lenovo.com> , Message-ID: Thank you Julia and Mohammed, I guess there may have something wrong with my network configuration. Because CI server is not directly connect with BM node. As our physical network is designed like this: [cid:a7c29f59-740a-4fd7-8a84-c6a28a27a0c5] So I use neutron to create "br-ens9" between ens9 and br-int when I deploy devstack. So that it can ping to ip I assigned to eno2 on BM node when deploying. However, I don't know whether ironic conductor can communiate to ironic python agent. Is that could be the root cause? I will take a look into it. Thank you! Best Regards, Guannan ________________________________ 发件人: Julia Kreger 发送时间: 2019年11月27日 4:05:31 收件人: Mohammed Naser 抄送: Guannan GN2 Sun; openstack-discuss at lists.openstack.org; Jay Bryant1 主题: [External] Re: Communication problem between ironic-python-agent and CI server. Jay reached out to me and in some discussion it seems like the following is occurring: * Ramdisk is loading from tftp_server * Conductor is not able to reach the 10.0.0.0/24 subnet where the ironic-python-agent is running * There appears to be a lack of a route inside the CI host that the conductor is operating on telling the host kernel to direct packets for IPA to the neutron router. Ramdisk loading would still work if egress traffic is being NAT translated, but ingress traffic would appear like this, ironic being unable to send packets because the conductor is communicating from the context of the CI host, and any namespaces created by neutron may not be directly reachable. -Julia On Tue, Nov 26, 2019 at 1:48 AM Mohammed Naser wrote: > > > > Sent from my iPhone > > On Nov 26, 2019, at 4:34 AM, Guannan GN2 Sun wrote: > >  > > Hi team, > > > I'm now trying to use ironic deployed with devstack to manage baremetal machine. > > However when it run into deploying stage, I open the BM server terminal and see it successfully load ramdisk and boot into it. It get the ip I assigned and I can ping it from CI server side. But it then deploy failed just about 2 minutes later. > > > When I check ironic-conductor log with command "sudo journalctl -a --unit devstack at ir-cond" and found error like this: > > ERROR ironic.drivers.modules.agent_client [None req-de37bc21-8d62-41db-8983-c06789939818 None None] Failed to connect to the agent running on node ea88ba26-756d-4d32-89f4-7ff086fa8868 for invoking command iscsi.start_iscsi_target. Error: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(, 'Connection to 10.0.0.25 timed out. (connect timeout=60)')): ConnectTimeout: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(, 'Connection to 10.0.0.25 timed out. (connect timeout=60)')) > > I can ping it from CI server side, so it is strange why the connection time out between ironic-python-agent and CI server. Does anyone meet similar problem or have idea about it? > > > You can ping it but can you make an HTTP request to port 9999 via something like curl? > > Thank you! > > > Best Regards, > > Guannan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pastedImage.png Type: image/png Size: 80094 bytes Desc: pastedImage.png URL: From amy at demarco.com Wed Nov 27 17:43:22 2019 From: amy at demarco.com (Amy Marrich) Date: Wed, 27 Nov 2019 11:43:22 -0600 Subject: [meta-sig] Add Thierry Carrez to core reviewer In-Reply-To: References: Message-ID: I wholeheartedly agree with Thierry being added as a Core to the repo. +2! Thanks, Amy (spotz) On Wed, Nov 27, 2019 at 7:45 AM Rico Lin wrote: > I would like to propose adding Thierry to core reviewer for > governance-sigs > I don't think anyone will doubt his knowledge of maintaining > governance-sigs repo. > He will be able to do core review on all patches. To create SIG patch, it > will require both UC and TC to give their blessing. That means besides > Amy's +2, we only need Thierry or mine +2 to approve the workflow. > > Let me know what you think if you have other thoughts, otherwise, I will > add him in core think week. > > -- > May The Force of OpenStack Be With You, > > *Rico Lin*irc: ricolin > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Wed Nov 27 17:56:42 2019 From: mihalis68 at gmail.com (Chris Morgan) Date: Wed, 27 Nov 2019 12:56:42 -0500 Subject: [ops] registration now open for OpenStack Ops Metup, Bloomberg London jan 7,8 2020 Message-ID: Registration is now open for the OpenStack Ops Meetup at Bloomberg London, January 7,8 2020: https://go.bloomberg.com/attend/invite/openstack-operators-meetup-london2020/ Please sign up if attending. Agenda still under formation via normal community process (link on sign-up page). See you there! Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From sshnaidm at redhat.com Wed Nov 27 17:57:58 2019 From: sshnaidm at redhat.com (Sagi Shnaidman) Date: Wed, 27 Nov 2019 19:57:58 +0200 Subject: [tripleo][ironic][ansible][openstack-ansible] Ironic/Baremetal Ansible modules Message-ID: Hi, all in the light of finding the new home place for openstack related ansible modules [1] I'd like to discuss the best strategy to create Ironic ansible modules. Existing Ironic modules in Ansible repo don't cover even half of Ironic functionality, don't fit current needs and definitely require an additional work. There are a few topics that require attention and better be solved before modules are written to save additional work. We prepared an etherpad [2] with all these questions and if you have ideas or suggestions on how it should look you're welcome to update it. We'd like to decide the final place for them, name conventions (the most complex one!), what they should look like and how better to implement. Anybody interested in Ansible and baremetal management in Openstack, you're more than welcome to contribute. Thanks [1] https://review.opendev.org/#/c/684740/ [2] https://etherpad.openstack.org/p/ironic-ansible-modules -- Best regards Sagi Shnaidman -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Nov 27 18:26:07 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 27 Nov 2019 12:26:07 -0600 Subject: [nova][api] Behaviour of project_id validation In-Reply-To: References: <5be847b4-f5f6-6365-6485-79c5b547a066@gmail.com> Message-ID: <16eae1d03f7.103bacc7e330316.7506234740575974363@ghanshyammann.com> ---- On Wed, 27 Nov 2019 03:35:43 -0600 Surya Seetharaman wrote ---- > > > On Tue, Nov 26, 2019 at 9:26 PM Matt Riedemann wrote: > Note that the APIs that would change are admin-only by default. So in > this case nova is configured with a service user to check if the > requested project_id exists on behalf of the (admin) user making the > compute API request to add/remove flavor access (or update quota values > for a project). The service user does not have enough permissions in > keystone to check if the project exists. Option 1 is give that service > user more authority. Option 2 is basically re-raise that error to the > compute (admin) user to let them know they basically need to fix their > deployment (option 1 again). > > > > A combo of both solutions where we raise the error to the user and amend our docs to help them fix it seems good to me. +1 on the solution. I like that code tells the error to users because people do not read the doc always. > > I don't think a microversion is necessary for this > ++ I disagree here. My main concern is that this is not the always-broken case. For the case where we have complete broken behaviour then we do not need microverison to fix that as mentioned by matt too. In this case: Even 403 from keystone on GET /project, it may possible that the project exists and request Nova to add that projects in flavor access is right. This is a success case in the current situation which will be changed to 400 after the proposed solution(option2 or 2+1). This is behaviour change and should be done with microvesion. In old change where we added the verify_project_id did not change the success case, that only handled the case where keystone returned 404 on GET /project means it is confirmed that the requested project does not exist so it will break later so nova started 400 instead of 200. which was clearly a broken case. Any other case where projects may exist was kept as it is so microversion was not needed there. But now we are changing the success cases also to return an error and ask the user to have the GET /project permission first otherwise nova cannot process the request. Your project might be valid but nova cannot conform that till you have permission to GET /project. -gmann > > ---------- > > Cheers,Surya. From gagehugo at gmail.com Wed Nov 27 20:46:33 2019 From: gagehugo at gmail.com (Gage Hugo) Date: Wed, 27 Nov 2019 14:46:33 -0600 Subject: [security] Security SIG Meeting Tomorrow Canceled Message-ID: Just a reminder that the security sig meeting tomorrow is cancelled, hope everyone has a great holiday weekend, and we will be back next week! -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Wed Nov 27 20:54:17 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 27 Nov 2019 14:54:17 -0600 Subject: [oslo] Adoption of microversion-parse In-Reply-To: References: <7d239dce-77c5-0e23-e1cb-57785e241b07@openstack.org> <72c50e25-0a26-b3d8-5f3a-64c48570824d@nemebean.com> Message-ID: <7afe1b43-9fd9-51ee-9593-be32a690af6c@nemebean.com> On 11/14/19 9:48 AM, Chris Dent wrote: > On Thu, 14 Nov 2019, Ben Nemec wrote: >> On 10/21/19 9:08 AM, Eric Fried wrote: >>>> Makes sense. We probably want to have an independent core team for >>>> it in >>>> addition to oslo-core so we can add people like Chris to it. >>> >>> I volunteer to help maintain it, if you'll have me. >> >> Works for me. Any objections from the existing core team? > > Works for me too. > Since I've heard no objections, I added Eric to the core team. Thanks for volunteering! -Ben From colleen at gazlene.net Wed Nov 27 20:54:49 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Wed, 27 Nov 2019 12:54:49 -0800 Subject: =?UTF-8?Q?Re:_[keystone][nova][barbican][neutron][cinder][tc][policy]_Pr?= =?UTF-8?Q?oposal_for_policy_popup_team?= In-Reply-To: References: <16ea53d3e09.100373e5c247518.6873714586749636882@ghanshyammann.com> <87abcd15-9376-421b-96d2-3597b7a1e356@www.fastmail.com> Message-ID: <25695fd7-9ff4-4706-9b3b-4e96543d0d82@www.fastmail.com> On Tue, Nov 26, 2019, at 21:56, Nadathur, Sundar wrote: > > From: Colleen Murphy > > Sent: Tuesday, November 26, 2019 8:51 AM > > Subject: Re: [keystone][nova][barbican][neutron][cinder][tc][policy] Proposal > > for policy popup team > > > > On Mon, Nov 25, 2019, at 18:12, Zhipeng Huang wrote: > > > Thanks Colleen, > > > > > > As I shared with Cyborg team earlier, I think as a new project it > > > would be beneficial that cyborg also participate in the popup team to > > > get policy right from the get go > > > > Thanks Howard, I've added cyborg to the page and added you as liaison. > > > > Colleen > > Please nominate Yumeng Bao (yumeng_bao at yahoo.com) as the liaison with > Cyborg team. She will contribute the spec. We have informed Howard of > this. Thanks Sundar, I have replaced Howard with Yumeng as the liaison for cyborg. (Side note - the wiki can be edited by anyone, and I am tracking changes in it, so anyone may feel free to change their project liaison or add or remove team members and I will be notified of the change.) Colleen > > Thanks to Howard for bringing up Cyborg in this context. > > Regards, > Sundar > From colleen at gazlene.net Wed Nov 27 20:55:57 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Wed, 27 Nov 2019 12:55:57 -0800 Subject: =?UTF-8?Q?Re:_[keystone][nova][barbican][neutron][cinder][tc][policy]_Pr?= =?UTF-8?Q?oposal_for_policy_popup_team?= In-Reply-To: References: <16ea53d3e09.100373e5c247518.6873714586749636882@ghanshyammann.com> <87abcd15-9376-421b-96d2-3597b7a1e356@www.fastmail.com> Message-ID: On Tue, Nov 26, 2019, at 10:30, Goutham Pacha Ravi wrote: > Hi Colleen, > > I couldn't participate in person in Shanghai - but we have been talking > about this in the manila team in the past. I'd like to be added to > these discussions so I can coordinate the efforts around improving our > API policies as well. Hi Goutham, I've added the manila team to the tracking page and added you as liaison. Colleen > > Thank you, > Goutham From stevenrelf at aol.com Wed Nov 27 20:59:08 2019 From: stevenrelf at aol.com (Steven Relf) Date: Wed, 27 Nov 2019 20:59:08 +0000 (UTC) Subject: [Usage][Designate] Salve Bind servers References: <1960112069.4062341.1574888348325.ref@mail.yahoo.com> Message-ID: <1960112069.4062341.1574888348325@mail.yahoo.com> Hi list....   I'm looking at making use of Designate, and have so farmanaged to get designate configured and working against a single bind9 server.   The issue I'm having, and this could be due to notunderstanding bind config well, is how do I configure other bind servers to beslaves of the server used by designate. I keep running into an issue where theslave doesn’t want to do a transfer and errors with "received notify forzone 'domain': not authoritative" I'm assuming this has something to do with the slave notknowing about the zone.   So the question, do I have to have all my bind servershooked upto designate, and managed with a mdns service, or am I missingsomething obvious.   Rgds Steve.   The future has already arrived. It's just not evenlydistributed yet - William Gibson -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Wed Nov 27 22:17:15 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Thu, 28 Nov 2019 07:17:15 +0900 Subject: [requirements][stable] Capping requirements in stable branches Message-ID: Hi, I have a question on version capping in requirements files in stable branches. When some newer version of dependent library does not work with my project, do we accept a patch to add version capping of the library in stable branches? For example, the horizon team received a patch to novaclient to <16 [1]. novaclient 16.0.0 was released after Train release, so there is no surprise that we don't have the version cap novaclient<16 in Train release. My understanding is that we don't usually cap versions of libraries after the release, but I am sending this mail to check our general guideline. Thanks, Akihiro Motoki (irc: amotoki) [1] https://review.opendev.org/#/c/693000/ From mthode at mthode.org Wed Nov 27 22:35:25 2019 From: mthode at mthode.org (Matthew Thode) Date: Wed, 27 Nov 2019 16:35:25 -0600 Subject: [requirements][stable] Capping requirements in stable branches In-Reply-To: References: Message-ID: <20191127223525.4ldrrjoliuzbt6o3@mthode.org> On 19-11-28 07:17:15, Akihiro Motoki wrote: > Hi, > > I have a question on version capping in requirements files in stable branches. > > When some newer version of dependent library does not work with my project, > do we accept a patch to add version capping of the library in stable branches? > > For example, the horizon team received a patch to novaclient to <16 [1]. > novaclient 16.0.0 was released after Train release, so there is no surprise > that we don't have the version cap novaclient<16 in Train release. > > My understanding is that we don't usually cap versions of libraries > after the release, > but I am sending this mail to check our general guideline. > > Thanks, > Akihiro Motoki (irc: amotoki) > > [1] https://review.opendev.org/#/c/693000/ > You are correct, we don't cap generally, though I wonder if the tests allow it given that our goal is to maintian co-installability (through upper-constraints.txt) not have things be uncapped (that's a side effect of lowering the maintence burden in master). We would not allow the cap in the reqs project, but it's possible it can work per-project. -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From yamamoto at midokura.com Thu Nov 28 02:12:31 2019 From: yamamoto at midokura.com (Takashi Yamamoto) Date: Thu, 28 Nov 2019 11:12:31 +0900 Subject: [neutron][OVN] Multiple mechanism drivers In-Reply-To: <20191125075144.vhppi2bnnnfyy57s@skaplons-mac> References: <8B3A471E-B855-4D1C-AE52-080D4B0D92A9@gmail.com> <20191125075144.vhppi2bnnnfyy57s@skaplons-mac> Message-ID: hi, On Mon, Nov 25, 2019 at 5:00 PM Slawek Kaplonski wrote: > > Hi, > > I think that this may be true that networking-ovn will not work properly > with other drivers. > I don't think it was tested at any time. > Also the problem may be that when You are using networking-ovn than whole > neutron topology is different. There are different agents for example. > > Please open a bug for that for networking-ovn. I think that networking-ovn team > will take a look into that. networking-midonet ignores networks without "midonet" type segments to avoid interfering other mechanism drivers. maybe networking-ovn can have something similar. wrt agents, last time i checked there was no problem with running midonet agent and ovs agent on the same host, sharing the kernel datapath. so i guess there's no problem with ovn either. wrt l3, unfortunately neither midonet or ovn have implemented "l3 flavor" thing yet. so you have to choose a single l3 plugin. iirc, Sam's deployment doesn't use l3 for linuxbridge, right? > > On Mon, Nov 25, 2019 at 04:32:50PM +1100, Sam Morrison wrote: > > We are looking at using OVN and are having some issues with it in our ML2 environment. > > > > We currently have 2 mechanism drivers in use: linuxbridge and midonet and these work well (midonet is the default tenant network driver for when users create a network) > > > > Adding OVN as a third mechanism driver causes the linuxbridge and midonet networks to stop working in terms of CRUD operations etc. > > It looks as if the OVN driver thinks it’s the only player and is trying to do things on ports that are in linuxbridge or midonet networks. > > > > Am I missing something here? (We’re using Stein version) > > > > > > Thanks, > > Sam > > > > > > > > -- > Slawek Kaplonski > Senior software engineer > Red Hat > > From rico.lin.guanyu at gmail.com Thu Nov 28 04:53:39 2019 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Thu, 28 Nov 2019 12:53:39 +0800 Subject: [meta-sig] Add Thierry Carrez to core reviewer In-Reply-To: References: Message-ID: With both mine and Amy's +2, I added Thierry back to meta-sig core:) On Thu, Nov 28, 2019 at 1:43 AM Amy Marrich wrote: > I wholeheartedly agree with Thierry being added as a Core to the repo. +2! > > Thanks, > > Amy (spotz) > > On Wed, Nov 27, 2019 at 7:45 AM Rico Lin > wrote: > >> I would like to propose adding Thierry to core reviewer for >> governance-sigs >> I don't think anyone will doubt his knowledge of maintaining >> governance-sigs repo. >> He will be able to do core review on all patches. To create SIG patch, it >> will require both UC and TC to give their blessing. That means besides >> Amy's +2, we only need Thierry or mine +2 to approve the workflow. >> >> Let me know what you think if you have other thoughts, otherwise, I will >> add him in core think week. >> >> -- >> May The Force of OpenStack Be With You, >> >> *Rico Lin*irc: ricolin >> >> -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From rico.lin.guanyu at gmail.com Thu Nov 28 05:14:20 2019 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Thu, 28 Nov 2019 13:14:20 +0800 Subject: [meta][K8s][API][Extended Maintenance][Operation Docs] Change SIG to Advisory status In-Reply-To: <638eb1eb-febc-3a53-9bb8-f81f610bdb14@suse.com> References: <638eb1eb-febc-3a53-9bb8-f81f610bdb14@suse.com> Message-ID: On Fri, Nov 22, 2019 at 4:21 PM Andreas Jaeger wrote: > > > What will happen with these two documents if the Operation Docs SIG > becomes advisory state? > > Should we retire the repos and delete the content? > As mentioned in patch review, I will try to get Chris or Sean to provide input to this ML > I don't know whether the other SIGS own any deliverables, where we need > to discuss what to do with them, We don't allow SIGs to own any deliverables at the current stage. as defined in [1] ``Project team is required when planning for release a deliverable service`` > > Andreas > > > [...] > -- > Andreas Jaeger aj at suse.com Twitter: jaegerandi > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 [1] https://governance.openstack.org/tc/reference/comparison-of-official-group-structures.html#produces-the-software-project-teams -- May The Force of OpenStack Be With You, Rico Lin irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From rico.lin.guanyu at gmail.com Thu Nov 28 05:15:04 2019 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Thu, 28 Nov 2019 13:15:04 +0800 Subject: [meta][K8s][API][Extended Maintenance][Operation Docs] Change SIG to Advisory status In-Reply-To: <16eadeec01a.10ce8ff56329123.2937780973121264558@ghanshyammann.com> References: <16eadeec01a.10ce8ff56329123.2937780973121264558@ghanshyammann.com> Message-ID: On Thu, Nov 28, 2019 at 1:35 AM Ghanshyam Mann wrote: > > > Thanks a lot Rico for all your effort for the SIG help and management. > > +1. I like the 'advisory' status which will clear out the difference between inactive and "active but with on-demand services only". > One question on this status- Does this still include the updates on repo/guidance doc etc ? For Example: if I want to add a few more > guidelines or changes to current repo/doc-sites etc then this SIG will still go with usual discussion and review process and not > saying that 'we are in an advisory role so we have closed any repo/doc update'. > > It will be good if you can explain those statuses in detail with their scope activity in sig-guidelines doc. Let's separate patches and add a sig-status.rst file and define each status in detail. > > > API SIG > > Is it ok to move api-sig to 'advisory'? I think Michael mentioned about some work to finish on traiging the > current open issues/todo etc. Should we wait for that work to be finished? May be Michael or Dmitry > can update on the latest status. It's always up to SIG to tell us what their real status is. And I will not merge that patch until we have chair's +1 from each SIG. Will ping Michael and Dmitry to ask for their feedback on this. > > > complete: SIG completes its mission. > > Can we include some status for the inactive SIG who has not completed its mission? something like 'On-Hold' or 'Need-help' etc. > It will help if anyone looking for that SIG can help or manage. It is more like backlogs for history and reference if same type of problem > comes and someone wants to form a SIG. Here's one interesting fact, we alway delete the SIG as soon as it's no longer required. But we make complete status because SIG might need to keep their repo. We can protentially create a backlog file for SIGs so others can looks up during working on new SIG idea. And this will make sigs.yaml file more clear (contains only forming/active/advisory SIG) -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Thu Nov 28 05:28:46 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Thu, 28 Nov 2019 14:28:46 +0900 Subject: [requirements][stable] Capping requirements in stable branches In-Reply-To: <20191127223525.4ldrrjoliuzbt6o3@mthode.org> References: <20191127223525.4ldrrjoliuzbt6o3@mthode.org> Message-ID: On Thu, Nov 28, 2019 at 7:43 AM Matthew Thode wrote: > > On 19-11-28 07:17:15, Akihiro Motoki wrote: > > Hi, > > > > I have a question on version capping in requirements files in stable branches. > > > > When some newer version of dependent library does not work with my project, > > do we accept a patch to add version capping of the library in stable branches? > > > > For example, the horizon team received a patch to novaclient to <16 [1]. > > novaclient 16.0.0 was released after Train release, so there is no surprise > > that we don't have the version cap novaclient<16 in Train release. > > > > My understanding is that we don't usually cap versions of libraries > > after the release, > > but I am sending this mail to check our general guideline. > > > > Thanks, > > Akihiro Motoki (irc: amotoki) > > > > [1] https://review.opendev.org/#/c/693000/ > > > > You are correct, we don't cap generally, though I wonder if the tests > allow it given that our goal is to maintian co-installability (through > upper-constraints.txt) not have things be uncapped (that's a side effect > of lowering the maintence burden in master). Let me ask in detail more. Is it true for stable branches? If the only reason of uncapping is to lower the maintenance cost in the master branch, I wonder we can allow capping in stable branches. It might help consumers who use OpenStack via PyPI directly. For example, horizon train does not work with python-novaclient >= 16.0.0. Generally speaking, capping python-novaclient to <16 in train helps consumers. > We would not allow the cap > in the reqs project, but it's possible it can work per-project. In my understanding, we cannot do that unless a library is listed in blacklist.txt. Otherwise, requirements-check complains it even in stable branches. Thanks, Akihiro From mnaser at vexxhost.com Thu Nov 28 05:38:19 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Thu, 28 Nov 2019 00:38:19 -0500 Subject: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: <8184e8c7-b5aa-9801-c4a6-b1e9f8db11b2@suse.com> References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> <9e3c2bd1-69d7-0492-5744-b802a4584551@matthias-runge.de> <0b6bd1f3-acd6-8b9c-7e0e-1f7d241a6aa4@debian.org> <20191121152901.xzyv6o6q6jstc36u@yuggoth.org> <20191121205356.fdwb6oaeoppmk6up@firewall> <8184e8c7-b5aa-9801-c4a6-b1e9f8db11b2@suse.com> Message-ID: On Fri, Nov 22, 2019 at 3:50 AM Witek Bedyk wrote: > > On 11/21/19 9:53 PM, Nate Johnston wrote: > > > I know several very large sites (ingesting billions of records per day) > > that run community InfluxDB and they get HA by putting influx-proxy [1] > > in front of it. I've evaluated it for large scale uses before as well, > > and with influx-proxy I found no need for the clustering option. > > Similar architecture is followed by Monasca. InfluxDB instances can be > assigned to different Kafka consumer groups and consume messages > independently from the message queue. In case one of the instances is > down all the measurements are still buffered and get persisted as soon > as the instance is available again. I'm curious on what the final decision of the Ceilometer team regarding this discussion? > Best greetings > Witek > From mthode at mthode.org Thu Nov 28 06:27:14 2019 From: mthode at mthode.org (Matthew Thode) Date: Thu, 28 Nov 2019 00:27:14 -0600 Subject: [requirements][stable] Capping requirements in stable branches In-Reply-To: References: <20191127223525.4ldrrjoliuzbt6o3@mthode.org> Message-ID: <20191128062714.ehtglq7jeo2dukyp@mthode.org> On 19-11-28 14:28:46, Akihiro Motoki wrote: > On Thu, Nov 28, 2019 at 7:43 AM Matthew Thode wrote: > > > > On 19-11-28 07:17:15, Akihiro Motoki wrote: > > > Hi, > > > > > > I have a question on version capping in requirements files in stable branches. > > > > > > When some newer version of dependent library does not work with my project, > > > do we accept a patch to add version capping of the library in stable branches? > > > > > > For example, the horizon team received a patch to novaclient to <16 [1]. > > > novaclient 16.0.0 was released after Train release, so there is no surprise > > > that we don't have the version cap novaclient<16 in Train release. > > > > > > My understanding is that we don't usually cap versions of libraries > > > after the release, > > > but I am sending this mail to check our general guideline. > > > > > > Thanks, > > > Akihiro Motoki (irc: amotoki) > > > > > > [1] https://review.opendev.org/#/c/693000/ > > > > > > > You are correct, we don't cap generally, though I wonder if the tests > > allow it given that our goal is to maintian co-installability (through > > upper-constraints.txt) not have things be uncapped (that's a side effect > > of lowering the maintence burden in master). > > Let me ask in detail more. > > Is it true for stable branches? If the only reason of uncapping is to lower > the maintenance cost in the master branch, I wonder we can allow capping > in stable branches. It might help consumers who use OpenStack via PyPI directly. > > For example, horizon train does not work with python-novaclient >= 16.0.0. > Generally speaking, capping python-novaclient to <16 in train helps consumers. > For stable branch issues with projects not 'requirements' I'd refer you to the stable policy / team (already tagged in the subject line). I suspect that we'd need to know what the cap for each project / version would be for each release (is it a major version bump?, minor?, etc). > > We would not allow the cap > > in the reqs project, but it's possible it can work per-project. > > In my understanding, we cannot do that unless a library is listed in > blacklist.txt. > Otherwise, requirements-check complains it even in stable branches. > > Thanks, > Akihiro > -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From amotoki at gmail.com Thu Nov 28 06:30:37 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Thu, 28 Nov 2019 15:30:37 +0900 Subject: [olso][pbr][i18n] Using setup.cfg [files] data_files to install localization files In-Reply-To: References: <864f4dd9-a78e-d969-5fa6-f0a096a4a59a@dantalion.nl> Message-ID: It seems you need to break down the thing into two parts: (1) Installing po files (2) Compiling po files into .mo files In my understanding, python setup.py (or more preferably pip install) handles only (1). (2) is not covered by the python package installation process and you need to compile mo files manually if you use pip to install OpenStack python packages. gettext (used under oslo_i18n.get_available_languages) only looks for mo files. That's the reason get_available_languages() did not detect languages other than en_US. In case of horizon, po files are compiled explicitly during devstack stack.sh. Most distro packages generate mo files during package creation or package installation. If you would like to enable translations in OpenStack services, I believe you need to compile po files into mo files. Akihiro On Wed, Nov 27, 2019 at 2:27 AM info at dantalion.nl wrote: > > Hello, > > I don't think we need to specify them in setup.cfg. > Am I missing something? > > I thought so to but I recently checked on Watcher and noticed that executing > 'python setup.py install' does not build + move the .mo files into the required locale > directories. I proceeded to test on other OpenStack projects and saw the same behavior. > > This was verified by listing the available languages with: > `oslo_i18n.get_available_languages(DOMAIN)` which than only returns > `['en_US']` > > Maybe `python setup.py install` is not the correct command to do this? > > Kind regards. > Corne Lukken > > On 26-11-19 16:47, Akihiro Motoki wrote: > > Hi, > > > > I think localization files are automatically included in a python > > package even if setup.cfg has no explicit entry. > > > > Here is an example of the case of the horizon repository: > > http://paste.openstack.org/show/786732/ > > I can see localized files after installing a horizon wheel package > > into a virtualenv. > > > >> 1: Including the locale files as part of a package for the target > >> systems package manager (pacman, yum, apt, etc). > > These package managers just do the similar thing in a different way. > > > >> 2: adding the locale files to the [files] directive in setup.cfg: > > I don't think we need to specify them in setup.cfg. > > Am I missing something? > > > > Thanks, > > Akihiro Motoki (amotoki) > > > > On Tue, Nov 26, 2019 at 12:17 AM info at dantalion.nl wrote: > >> Hello everyone :), > >> > >> I was wondering what the preferred method to install localization files > >> is. I can think of some probably solutions such as: > >> > >> 1: Including the locale files as part of a package for the target > >> systems package manager (pacman, yum, apt, etc). > >> 2: adding the locale files to the [files] directive in setup.cfg: > >> > >> I hope someone can answer my question. > >> > >> Kind regards, > >> Corne Lukken > >> > From witold.bedyk at suse.com Thu Nov 28 08:26:26 2019 From: witold.bedyk at suse.com (Witek Bedyk) Date: Thu, 28 Nov 2019 09:26:26 +0100 Subject: [auto-scaling][self-healing] Discussion to merge two SIG to one In-Reply-To: References: Message-ID: <1b39c3fe-22a1-c84c-ed13-05fbd9360d7d@suse.com> On 11/22/19 12:20 AM, Trinh Nguyen wrote: > Hi Rico, > > +1 > That is a very good idea. Coincidentally, I'm working on some research > projects that focus on autoscaling and self-healing at the same time. > And the combined group would be a very good idea because I don't have to > switch back and forth between the groups for discussion. > > Thanks, > > > On Thu, Nov 21, 2019 at 12:57 AM Rico Lin > wrote: > > Dear all > > As we discussed in PTG about merge two SIG to one. > I would like to continue the discussion on ML. > > In PTG, Eric proposes the idea to merge two SIG due to the high > overlapping of domains and tasks. > > I think this is a great idea since, over the last 6 months, most of > the discussions in both SIG are overlapped. So I'm onboard with this > idea. > > Here's how I think we can continue this idea: > > 1. Create new SIG (maybe 'Automation SIG'? feel free to propose > name which can cover both interest.) > 2. Redirect docs and wiki to new SIG. And rework on index so there > will be no confusion > 3. Move repos from both SIGs to new SIG > 4. Mark auto-scaling SIG and self-healing SIG as inactive. > 5. remove auto-scaling SIG and self-healing SIG after a > reasonable waiting time Hi, how about starting with joining the SIGs meeting times and organizing the Forum and PTG events together? The repositories and wiki pages could stay as they are and refer to each other. Merging the content into one new repository definitely also has value but you should think how to structure it so that the users can find useful information quickly. I think merging is good if you have an idea how to better structure the content, and time to review the existing one and do all the formal stuff. Just gluing the documents won't help. Cheers Witek From skaplons at redhat.com Thu Nov 28 08:26:59 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 28 Nov 2019 09:26:59 +0100 Subject: [neutron] Including networking-ovn-core team into neutron-core Message-ID: Hi neutrinos, We recently merged spec [1] and now we are starting to moving networking-ovn code to be in neutron tree. Blueprint to track progress of this is on [2]. As a consequence of this merge of code, I think that we need to also include networking-ovn-core team into neutron-core to give people who are cores on networking-ovn today ability to merge ovn related patches in neutron after this migration will be done. Of course current networking-ovn cores should only approve patches related to ovn driver, and not approve patches related to different areas of neutron code. So if there will be no objections for that until the end of this week, I will include networking-ovn-core group in neutron-core. [1] https://review.opendev.org/#/c/658414/ [2] https://blueprints.launchpad.net/neutron/+spec/neutron-ovn-merge — Slawek Kaplonski Senior software engineer Red Hat From balazs.gibizer at est.tech Thu Nov 28 08:54:06 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Thu, 28 Nov 2019 08:54:06 +0000 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> <1574439261.31688.6@est.tech> <74a68d39-f46d-61a7-7ba1-6d2cd765e6df@gmail.com> <64fbaf8bb79dbb74d472f055405b0901bc51ca65.camel@redhat.com> <844b9328536d1a9138b09d2439e31c934d571754.camel@redhat.com> Message-ID: <1574931242.31688.9@est.tech> On Wed, Nov 27, 2019 at 17:03, Sean Mooney wrote: > On Wed, 2019-11-27 at 16:20 +0100, Bence Romsics wrote: >> > > resource_provider_hypervisors = br-physnet0:hypervisor0,... >> > >> > this also wont work as the same bridge name will exists on >> multipel hosts >> >> Of course the same bridge/nic name can exist on multiple hosts. And >> each report_state message is clearly belonging to a single agent and >> the configurations field is persisted per agent, so there won't be a >> collision ever. >> > that is in the non iroinc smart nic case. in the ironic smart nic > case with the ovs super agent > which is the only case where there would be multiple hypervisor > managed by the same > agent the agent will be remote. When you say "ironic smart nic case with the ovs super agent", do you refer to this abandoned spec [1]? > > so in the non ironic case it does not need to be a list. > in the smartnic case it might need to be a list In that spec the author proposes [2] not to break the 1-1 mapping between OVS agent and remote OVS. So as far as I see there is no need for a list in this case either. > but a mapping of bridge or pyshnet wont be unique > and a agent hostname (CONF.host) to hypervior host would be 1:N so > its not clear how you would select > form the N RPs if all you know form nova is the binding host which is > the service host not hypervior hostname. > Are we talking about a problem during binding here? As this feels to be a different problem than from creating device RPs under the proper compute node RP. Anyhow my simple understanding is the following: * a physical NIC or an OVS integration bridge always belongs to one single hypervisor. While a hypervisor might have more than on physical NIC or an OVS bridge * the identity (e.g. hypervisor hostname) of such hypervisor is known at deployment time * the neutron agent config can have a mapping between the device (NIC or OVS bridge) and the hypervisor identity and this mapping can be sent up to the neutron server via RPC * the neutron agent already sends up the service host name where the agent runs to the neutron server via RPC. * the neutron server knowing the service host and the device -> hypervisor identity mapping can find the compute node RP under which the device RP needs to be created. @Sean: Where does my list of reasoning breaks from your perspective? Cheers, gibi [1] https://review.opendev.org/#/c/595402/5/specs/stein/remote-ovs-agent.rst [2] https://review.opendev.org/#/c/595402/5/specs/stein/remote-ovs-agent.rst at 28 From rico.lin.guanyu at gmail.com Thu Nov 28 08:57:12 2019 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Thu, 28 Nov 2019 16:57:12 +0800 Subject: [auto-scaling][self-healing] Discussion to merge two SIG to one In-Reply-To: <1b39c3fe-22a1-c84c-ed13-05fbd9360d7d@suse.com> References: <1b39c3fe-22a1-c84c-ed13-05fbd9360d7d@suse.com> Message-ID: On Thu, Nov 28, 2019 at 4:37 PM Witek Bedyk wrote: > > Hi, > how about starting with joining the SIGs meeting times and organizing > the Forum and PTG events together? The repositories and wiki pages could > stay as they are and refer to each other. > I think even if we merged two SIG, repositories should stay separated as they're now. IMO we can simply rename openstack/auto-scaling-sig to openstack/auto-scaling and so as to self-healing. Or just keep it the same will be fine IMO. We don't need a new repo for the new SIG (at least not for now). I do like the idea to start with joining the SIGs meeting times and organizing the Forum and PTG events together. One more proposal in my mind will be, join the channel for IRC. > > I think merging is good if you have an idea how to better structure the > content, and time to review the existing one and do all the formal > stuff. Just gluing the documents won't help. Totally agree with this point! > > Cheers > Witek > -- May The Force of OpenStack Be With You, Rico Lin irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Thu Nov 28 08:59:32 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 28 Nov 2019 09:59:32 +0100 Subject: [neutron] Proposing Jakub Libosvar as Neutron core reviewer Message-ID: <207982A0-CBE7-47D9-A19C-CCFCACCB34EF@redhat.com> Hi neutrinos, We already started process of migrating networking-ovn driver to be one of in-tree neutron drivers. Blueprint for that is [1]. As part of this process I today proposed to include networking-ovn-core group into neutron-core group. Mail about it can be found at [2]. One of persons in networking-ovn-group is Jakub Libosvar who was Neutron core for very long time in the past. He knows very well not only ovn related code but also have great knowledge about all Neutron code base. So I would like to propose to Jakub as Neutron core reviewer again as he will be back working on neutron again now, after ovn will be in-tree driver. What do You think about it? I will wait for Your opinions for 1 week from now. Thx for all Your comments about it. [1] https://blueprints.launchpad.net/neutron/+spec/neutron-ovn-merge [2] http://lists.openstack.org/pipermail/openstack-discuss/2019-November/011240.html — Slawek Kaplonski Senior software engineer Red Hat From soulxu at gmail.com Thu Nov 28 09:02:19 2019 From: soulxu at gmail.com (Alex Xu) Date: Thu, 28 Nov 2019 17:02:19 +0800 Subject: [nova][api] Behaviour of project_id validation In-Reply-To: <16eae1d03f7.103bacc7e330316.7506234740575974363@ghanshyammann.com> References: <5be847b4-f5f6-6365-6485-79c5b547a066@gmail.com> <16eae1d03f7.103bacc7e330316.7506234740575974363@ghanshyammann.com> Message-ID: Ghanshyam Mann 于2019年11月28日周四 上午2:27写道: > ---- On Wed, 27 Nov 2019 03:35:43 -0600 Surya Seetharaman < > surya.seetharaman9 at gmail.com> wrote ---- > > > > > > On Tue, Nov 26, 2019 at 9:26 PM Matt Riedemann > wrote: > > Note that the APIs that would change are admin-only by default. So in > > this case nova is configured with a service user to check if the > > requested project_id exists on behalf of the (admin) user making the > > compute API request to add/remove flavor access (or update quota values > > for a project). The service user does not have enough permissions in > > keystone to check if the project exists. Option 1 is give that service > > user more authority. Option 2 is basically re-raise that error to the > > compute (admin) user to let them know they basically need to fix their > > deployment (option 1 again). > > > > > > > > A combo of both solutions where we raise the error to the user and > amend our docs to help them fix it seems good to me. > > +1 on the solution. I like that code tells the error to users because > people do not read the doc always. > > > > > I don't think a microversion is necessary for this > > ++ > > I disagree here. My main concern is that this is not the always-broken > case. > For the case where we have complete broken behaviour then we do not need > microverison to fix that as mentioned by matt too. > > In this case: Even 403 from keystone on GET /project, it may possible > that the project exists and request Nova to add that projects in > flavor access is right. This is a success case in the current situation > which will be changed to 400 after the proposed solution(option2 or 2+1). > This is behaviour change and should be done with microvesion. > That is the case we expected to be fixed by the operator, right? If the operator fix that, then there won't be any API behavior change. I guess that what Matt point. > > In old change where we added the verify_project_id did not change the > success case, that only handled the case where keystone returned > 404 on GET /project means it is confirmed that the requested project does > not exist so it will break later so nova started 400 instead of 200. > which was clearly a broken case. Any other case where projects may exist > was kept as it is so microversion was not needed there. > But now we are changing the success cases also to return an error and ask > the user to have the GET /project permission first otherwise > nova cannot process the request. Your project might be valid but nova > cannot conform that till you have permission to GET /project. > > -gmann > > > > > ---------- > > > > Cheers,Surya. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnasiadka at gmail.com Thu Nov 28 09:33:21 2019 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Thu, 28 Nov 2019 10:33:21 +0100 Subject: [kolla] kolla-ceph future Message-ID: <247D790A-31AA-456F-AE8E-AF18374014AE@gmail.com> Hello, Ceph deployment functionality in Kolla-Ansible has been deprecated in Train (https://review.opendev.org/#/c/669214/1). Given the fact that Ceph deployment was (and is) important functionality in Kolla-Ansible - I’d like to propose following steps: 1) Improve external Ceph functionality in Kolla-Ansible (configurable keyrings, users, improved documentation) - in progress now 2) Replace current kolla-ceph CI jobs with an equivalent using ceph-ansible (and possibly ceph orchestrator after Octopus is out around March 2020) 3) Remove kolla-ceph functionality in Ussuri release from Kolla-Ansible code base - or (if there are volunteers) extract it as a separate repository with a separate set of maintainers/core team. 4) (OPTIONAL) Find a volunteer to work on ceph-ansible/ceph-orchestrator existing deployment takeover functionality for ceph-kolla (current core team has no spare cycles to work on that) Essentially this mail is a call for volunteers for 3) - if there is somebody that is interested in picking up the code and maintaining it - please speak up. If there are none - we will start preparing for removing the code in 2020. Thank you for your understanding in this not easy decision. Kind regards, Michal Nasiadka From mark at stackhpc.com Thu Nov 28 09:41:14 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 28 Nov 2019 09:41:14 +0000 Subject: [tripleo][ironic][ansible][openstack-ansible] Ironic/Baremetal Ansible modules In-Reply-To: References: Message-ID: On Wed, 27 Nov 2019 at 17:58, Sagi Shnaidman wrote: > > Hi, all > > in the light of finding the new home place for openstack related ansible modules [1] I'd like to discuss the best strategy to create Ironic ansible modules. Existing Ironic modules in Ansible repo don't cover even half of Ironic functionality, don't fit current needs and definitely require an additional work. There are a few topics that require attention and better be solved before modules are written to save additional work. We prepared an etherpad [2] with all these questions and if you have ideas or suggestions on how it should look you're welcome to update it. > We'd like to decide the final place for them, name conventions (the most complex one!), what they should look like and how better to implement. > Anybody interested in Ansible and baremetal management in Openstack, you're more than welcome to contribute. Thanks for raising this, we're definitely missing some key things for ironic. I added a couple of roles and modules that we developed for kayobe to the etherpad. Would be happy to contribute them to the collection. > > Thanks > > [1] https://review.opendev.org/#/c/684740/ > [2] https://etherpad.openstack.org/p/ironic-ansible-modules > > -- > Best regards > Sagi Shnaidman From yu.chengde at 99cloud.net Thu Nov 28 10:22:55 2019 From: yu.chengde at 99cloud.net (yu.chengde at 99cloud.net) Date: Thu, 28 Nov 2019 18:22:55 +0800 Subject: [nova] There is no screen for UEFI instance from console display Message-ID: Hi. After launch an UEFI mode instance in OpenStack, the console screen show the message “Guest has no initialized the displayed (yet) " Engage your help, many thanks Base on ubuntu cloud image, listed as below https://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-amd64-uefi1.img -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at ya.ru Thu Nov 28 12:05:20 2019 From: noonedeadpunk at ya.ru (Dmitriy Rabotyagov) Date: Thu, 28 Nov 2019 14:05:20 +0200 Subject: [tripleo][ironic][ansible][openstack-ansible] Ironic/Baremetal Ansible modules In-Reply-To: References: Message-ID: <826531574942720@sas8-ed615920eca2.qloud-c.yandex.net> Hi, I feel that this might be a good topic to discuss in terms of ansible SIG [1] and it can become the new home. So maybe we can plan a meeting for closer and prodictive discussion? [1] https://etherpad.openstack.org/p/ansible-sig 28.11.2019, 11:45, "Mark Goddard" : > On Wed, 27 Nov 2019 at 17:58, Sagi Shnaidman wrote: >>  Hi, all >> >>  in the light of finding the new home place for openstack related ansible modules [1] I'd like to discuss the best strategy to create Ironic ansible modules. Existing Ironic modules in Ansible repo don't cover even half of Ironic functionality, don't fit current needs and definitely require an additional work. There are a few topics that require attention and better be solved before modules are written to save additional work. We prepared an etherpad [2] with all these questions and if you have ideas or suggestions on how it should look you're welcome to update it. >>  We'd like to decide the final place for them, name conventions (the most complex one!), what they should look like and how better to implement. >>  Anybody interested in Ansible and baremetal management in Openstack, you're more than welcome to contribute. > > Thanks for raising this, we're definitely missing some key things for > ironic. I added a couple of roles and modules that we developed for > kayobe to the etherpad. Would be happy to contribute them to the > collection. > >>  Thanks >> >>  [1] https://review.opendev.org/#/c/684740/ >>  [2] https://etherpad.openstack.org/p/ironic-ansible-modules >> >>  -- >>  Best regards >>  Sagi Shnaidman --  Kind Regards, Dmitriy Rabotyagov From zhangbailin at inspur.com Thu Nov 28 12:11:27 2019 From: zhangbailin at inspur.com (=?utf-8?B?QnJpbiBaaGFuZyjlvKDnmb7mnpcp?=) Date: Thu, 28 Nov 2019 12:11:27 +0000 Subject: =?utf-8?B?UmU6IFtsaXN0cy5vcGVuc3RhY2sub3Jn5Luj5Y+RXVtub3ZhXVtwdGddIFN1?= =?utf-8?B?cHBvcnQgcmUtY29uZmlndXJlIGRlbGV0ZWRfb25fdGVybWluYXRpb24gaW4g?= =?utf-8?Q?server?= Message-ID: <24bc42a9cab043c3b105cc958dd246c2@inspur.com> Hi Sylvain, Matthew: https://review.opendev.org/#/c/693828/ Here is the PoC code, please review this change, thanks. Sylvain Bauza [mailto:sbauza at redhat.com] Items: [lists.openstack.org代发][nova][ptg] Support re-configure deleted_on_termination in server Spec is https://review.opendev.org/#/c/580336/ Most people seem to think this makes sense but realize there are already other ways to do this (snapshot) and therefore it's not totally necessary. The agreement in the room was to post the code up for the change, as this will help sell people on it if it's trivial enough and document the use case (i.e. are there scenarios where this would make life 10x easier?) -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Nov 28 12:27:40 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 28 Nov 2019 12:27:40 +0000 Subject: [neutron][OVN] Multiple mechanism drivers In-Reply-To: References: <8B3A471E-B855-4D1C-AE52-080D4B0D92A9@gmail.com> <20191125075144.vhppi2bnnnfyy57s@skaplons-mac> Message-ID: On Thu, 2019-11-28 at 11:12 +0900, Takashi Yamamoto wrote: > hi, > > On Mon, Nov 25, 2019 at 5:00 PM Slawek Kaplonski wrote: > > > > Hi, > > > > I think that this may be true that networking-ovn will not work properly > > with other drivers. > > I don't think it was tested at any time. it should work with other direver if you use vlan or flat networks. it will not form mesh tunnel networks with other drivers event if you use geneve for the other ml2 driver. > > Also the problem may be that when You are using networking-ovn than whole > > neutron topology is different. There are different agents for example. > > > > Please open a bug for that for networking-ovn. I think that networking-ovn team > > will take a look into that. > > networking-midonet ignores networks without "midonet" type segments to > avoid interfering other mechanism drivers. > maybe networking-ovn can have something similar. that is actully the opiste of how that shoudl work. you are ment to be able to have multiple ml2 drivers share the same segmentation type and you are not ment to have a segmentation type that is specific to a mech driver. give we dont scheduler based on the segmenation type supprot today either (we shoudl by the way) it would be very fagile to use a dedicated ovn segmentation type and i woudl not advise doing it for midio net. ideally we would create placement aggregate or triats to track which segmentation types are supported by which hosts. traits are proably better for the segmentation types but modelling network segments them selves would be better as aggreates. if we really wanted to model the capsity of the segmenation types we woudl addtionally create shareing resouce providers with inventories of network segmenation types resouce classes per physnet with a singel gloabl rp for the tunneled types. then every time you allocated a network in neuton you would create an allocation for that network and tag ports with the approreate aggreate requrest. on the nova side wew could combine the segment and segmenation type aggreate request form the port with any other aggreates form nova and pass all of them as member_of requriements to placment to ensure we land on a host that can provide the required network connectivty. today we litrallly just assme all nodes are connected to all networks with all segmenation types and hope for the best. thats a bit of a tangent but just pointing out we should schduler on network connectivity and segmenation types but we shoudl not have backend specific segmenation types. > > wrt agents, last time i checked there was no problem with running > midonet agent and ovs agent on the same host, sharing the kernel > datapath. > so i guess there's no problem with ovn either. you can run ml2/ovn and ml2/ovs on the same cloud. just put the ml2/ovs first. it will fail to bind if a host dose not have the ovs neutron agent running and will then bind with ml2/ovn instead. it might work the other way too but i have nto tested that. > > wrt l3, unfortunately neither midonet or ovn have implemented "l3 > flavor" thing yet. so you have to choose a single l3 plugin. > iirc, Sam's deployment doesn't use l3 for linuxbridge, right? if you have dedicated network nodes that is not really a proable. just make sure that they are all ovn or all ovs or whatever makes sense. its the same way that if you deploy with ml2/ovs and want to use ovs-dpdk that you only instlal ovs-dpdk on the comptue nodes and use kernel ovs on the networking nodes to avoid the terible network performace when using network namespace for nat and routing. if you have tunneled networks it would be an issue butin that case you just need to ensure that at least 1 router is created on each plugin so you would use ha routers by default and set the ha factor so that it created routers on node with both mechinmum dirvers. again however since the different ml2/driver do not form a mesh you should really only use different ml2 drivers if you are using vlan or flat networks. > > > > > On Mon, Nov 25, 2019 at 04:32:50PM +1100, Sam Morrison wrote: > > > We are looking at using OVN and are having some issues with it in our ML2 environment. > > > > > > We currently have 2 mechanism drivers in use: linuxbridge and midonet and these work well (midonet is the default > > > tenant network driver for when users create a network) > > > > > > Adding OVN as a third mechanism driver causes the linuxbridge and midonet networks to stop working in terms of > > > CRUD operations etc. i would try adding ovn last so it is only used if the other two cannot bind the port. the mech driver list is orderd for this reason so you can express preference. > > > It looks as if the OVN driver thinks it’s the only player and is trying to do things on ports that are in > > > linuxbridge or midonet networks. that would be a bug if so. > > > > > > Am I missing something here? (We’re using Stein version) > > > > > > > > > Thanks, > > > Sam > > > > > > > > > > > > > -- > > Slawek Kaplonski > > Senior software engineer > > Red Hat > > > > > > From dtantsur at redhat.com Thu Nov 28 12:47:35 2019 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Thu, 28 Nov 2019 13:47:35 +0100 Subject: [tripleo][ironic][ansible][openstack-ansible] Ironic/Baremetal Ansible modules In-Reply-To: <826531574942720@sas8-ed615920eca2.qloud-c.yandex.net> References: <826531574942720@sas8-ed615920eca2.qloud-c.yandex.net> Message-ID: Hi, On Thu, Nov 28, 2019 at 1:07 PM Dmitriy Rabotyagov wrote: > Hi, > > I feel that this might be a good topic to discuss in terms of ansible SIG > [1] and it can become the new home. > So maybe we can plan a meeting for closer and prodictive discussion? > I think it's a great idea. Do you have any formal meetings yet or should we maybe schedule a separate one? Dmitry > > [1] https://etherpad.openstack.org/p/ansible-sig > > > 28.11.2019, 11:45, "Mark Goddard" : > > On Wed, 27 Nov 2019 at 17:58, Sagi Shnaidman > wrote: > >> Hi, all > >> > >> in the light of finding the new home place for openstack related > ansible modules [1] I'd like to discuss the best strategy to create Ironic > ansible modules. Existing Ironic modules in Ansible repo don't cover even > half of Ironic functionality, don't fit current needs and definitely > require an additional work. There are a few topics that require attention > and better be solved before modules are written to save additional work. We > prepared an etherpad [2] with all these questions and if you have ideas or > suggestions on how it should look you're welcome to update it. > >> We'd like to decide the final place for them, name conventions (the > most complex one!), what they should look like and how better to implement. > >> Anybody interested in Ansible and baremetal management in Openstack, > you're more than welcome to contribute. > > > > Thanks for raising this, we're definitely missing some key things for > > ironic. I added a couple of roles and modules that we developed for > > kayobe to the etherpad. Would be happy to contribute them to the > > collection. > > > >> Thanks > >> > >> [1] https://review.opendev.org/#/c/684740/ > >> [2] https://etherpad.openstack.org/p/ironic-ansible-modules > >> > >> -- > >> Best regards > >> Sagi Shnaidman > > -- > Kind Regards, > Dmitriy Rabotyagov > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Nov 28 12:57:05 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 28 Nov 2019 12:57:05 +0000 Subject: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set In-Reply-To: <1574931242.31688.9@est.tech> References: <48b3f33d-dfd5-dbae-de65-1891285c94c6@fried.cc> <1574421416.31688.4@est.tech> <1574439261.31688.6@est.tech> <74a68d39-f46d-61a7-7ba1-6d2cd765e6df@gmail.com> <64fbaf8bb79dbb74d472f055405b0901bc51ca65.camel@redhat.com> <844b9328536d1a9138b09d2439e31c934d571754.camel@redhat.com> <1574931242.31688.9@est.tech> Message-ID: <1b165fe4198de9bf165db295d68586db5ae8c313.camel@redhat.com> On Thu, 2019-11-28 at 08:54 +0000, Balázs Gibizer wrote: > > On Wed, Nov 27, 2019 at 17:03, Sean Mooney wrote: > > On Wed, 2019-11-27 at 16:20 +0100, Bence Romsics wrote: > > > > > resource_provider_hypervisors = br-physnet0:hypervisor0,... > > > > > > > > this also wont work as the same bridge name will exists on > > > multipel hosts > > > > > > Of course the same bridge/nic name can exist on multiple hosts. And > > > each report_state message is clearly belonging to a single agent and > > > the configurations field is persisted per agent, so there won't be a > > > collision ever. > > > > > > > that is in the non iroinc smart nic case. in the ironic smart nic > > case with the ovs super agent > > which is the only case where there would be multiple hypervisor > > managed by the same > > agent the agent will be remote. > > When you say "ironic smart nic case with the ovs super agent", do you > refer to this abandoned spec [1]? yes and https://review.opendev.org/#/c/595512/2 there are two related abandoned spec that came up in train but i dont think either are progressing anymore. > > > > > so in the non ironic case it does not need to be a list. > > in the smartnic case it might need to be a list > > In that spec the author proposes [2] not to break the 1-1 mapping > between OVS agent and remote OVS. So as far as I see there is no need > for a list in this case either. i agree although there was an expression of the desire to allow the agent to manage multiple hosts. its been a while but i belive that is what https://review.opendev.org/#/c/595512/2/specs/stein/scalable-ovs-agent.rst covered. > > > > but a mapping of bridge or pyshnet wont be unique > > and a agent hostname (CONF.host) to hypervior host would be 1:N so > > its not clear how you would select > > form the N RPs if all you know form nova is the binding host which is > > the service host not hypervior hostname. > > > > Are we talking about a problem during binding here? yes and know it would be a proablem during binding as we just pass the service host in the binding host so wew would need to add a binding:hypervior_host also if we wanted port bidnign to work in that case. otherwise we would be changing the meaning of binding:host in ml2/ovs. currenlty it refers to the service host wich is shared bettwen nova and neutron for both the comptue and networking agent. in the agentless case it is used more like the hypervior hostname. odl and i think ovn add info to the agents table in neutron even though they dont have agents to allow per host configuraiton to be expressed. the binding host is used to select that. anyway in the rp case you have a similar problem. today odl and ovn do not support minium bandwidth, in the futrue if they add it they would have to create an rp per host based on the info in the agents table. if ml2 ovs was extened to have a 1:N mapping between neutorn ovs agent and multpie host the service host set in CONF.host would map to the host the agent is running on not the host the vm is being booted on and you would need some addtional mapping the same way the ironic driver work. in any case https://review.opendev.org/#/c/595512 is also abandoned so i dont think we shoudl try to cater for that case now especially since we want to back port this to stien. if we wanted to support 1:N mappings in the ovs agent and not requrie chagnes in nova would actully want to change CONF.host to be a list and have all the bandwith provider config be keyed off of the service host. you could do this in a number of ways that are not importnat right now like dynamic config but usign the device or physnet are not good was to approch this. > As this feels to be > a different problem than from creating device RPs under the proper > compute node RP. > > Anyhow my simple understanding is the following: > * a physical NIC or an OVS integration bridge always belongs to one > single hypervisor. While a hypervisor might have more than on physical > NIC or an OVS bridge for the most part yes there are way to make that not true when dealing with pcie over rdma and other no production composable infrasture stuff but from an openstack point of view and with relevence to stien and train you are correct. > * the identity (e.g. hypervisor hostname) of such hypervisor is known > at deployment time yes, and its often set by the deployment tool although not always as it can be set via dhcp or manually. > * the neutron agent config can have a mapping between the device (NIC > or OVS bridge) and the hypervisor identity and this mapping can be sent > up to the neutron server via RPC yes although i dont think that is required. i think we jsut need to pass back the hypervior host name and no other info mataion that is not currently in the agent report. > * the neutron agent already sends up the service host name where the > agent runs to the neutron server via RPC. yes and neutron uses that for the same usecase that nova does to determin which host the agent run on and match to the host set in the binding_details:host field we set when binding the port. > * the neutron server knowing the service host and the device -> > hypervisor identity mapping can find the compute node RP under which > the device RP needs to be created. you dont need the the device to hypervior mapping. in a non sriov case you dont typeically have a device that the port is assocaited too just a physnet which is None in the case of tunneled ports. so in ovs or linux bridge it more typical for the prot to be assocated with a segmenation tyep that is assocaited with a bridge that may have an interface attached but its only loosely assocated with a device. > > @Sean: Where does my list of reasoning breaks from your perspective? your resoning that id does not need to be a list? if that is you assertion i agree compltely it should not be a list. the once case it could break is if the agent starts to manage multiple host the same way the ironic agent does. however to support that nova would have to change the infor it sets in the port bidnign we would have to set the hypervior host name instead of the service host name. that would be a big change and would require a new api extention in my view so i dont think we should condier it now. > > Cheers, > gibi > > [1] > https://review.opendev.org/#/c/595402/5/specs/stein/remote-ovs-agent.rst > [2] > https://review.opendev.org/#/c/595402/5/specs/stein/remote-ovs-agent.rst at 28 > > From artem.goncharov at gmail.com Thu Nov 28 13:18:37 2019 From: artem.goncharov at gmail.com (Artem Goncharov) Date: Thu, 28 Nov 2019 14:18:37 +0100 Subject: [tripleo][ironic][ansible][openstack-ansible] Ironic/Baremetal Ansible modules In-Reply-To: References: <826531574942720@sas8-ed615920eca2.qloud-c.yandex.net> Message-ID: Hi > On 28. Nov 2019, at 13:47, Dmitry Tantsur wrote: > > Hi, > > On Thu, Nov 28, 2019 at 1:07 PM Dmitriy Rabotyagov > wrote: > Hi, > > I feel that this might be a good topic to discuss in terms of ansible SIG [1] and it can become the new home. > So maybe we can plan a meeting for closer and prodictive discussion? > > I think it's a great idea. Do you have any formal meetings yet or should we maybe schedule a separate one? > > Dmitry There is a bigger issue to be resolved first: what is the home for OS Ansible modules (https://review.opendev.org/#/q/project:openstack/ansible-collections-openstack , started by Monty recently. Somehow I struggle to find original discussion on that)? How those are treated now part of Ansible now doesn’t really look that much promising/effective and I have just a big bunch of improvements waiting for OS to overtake development. -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcin.juszkiewicz at linaro.org Thu Nov 28 13:34:58 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Thu, 28 Nov 2019 14:34:58 +0100 Subject: [nova] There is no screen for UEFI instance from console display In-Reply-To: References: Message-ID: <9dd4fb7b-787d-7d6b-e835-a50a8bfd2d90@linaro.org> W dniu 28.11.2019 o 11:22, yu.chengde at 99cloud.net pisze: > Hi. > After launch an UEFI mode instance in OpenStack, the console screen show the message “Guest has no initialized the displayed (yet) " > Engage your help, many thanks > > Base on ubuntu cloud image, listed as below > https://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-amd64-uefi1.img Please check with bionic image. On arm64 xenial image was unable to initialize graphics for uefi hosts. From mriedemos at gmail.com Thu Nov 28 14:34:03 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 28 Nov 2019 08:34:03 -0600 Subject: [nova][api] Behaviour of project_id validation In-Reply-To: <16eae1d03f7.103bacc7e330316.7506234740575974363@ghanshyammann.com> References: <5be847b4-f5f6-6365-6485-79c5b547a066@gmail.com> <16eae1d03f7.103bacc7e330316.7506234740575974363@ghanshyammann.com> Message-ID: On 11/27/2019 12:26 PM, Ghanshyam Mann wrote: > In old change where we added the verify_project_id did not change the success case, that only handled the case where keystone returned > 404 on GET /project means it is confirmed that the requested project does not exist so it will break later so nova started 400 instead of 200. What "old change" are you referring to here? Before [1] there was *no* validation performed when adding/removing flavor access or updating quotas for a given project. Hence all of the duplicate bugs about being able to typo/fat-finger/pass garbage values to those APIs and then be confused later when flavor access and quotas aren't working as expected. There have been a few changes to the verify method since it was added, but the point is it was a non-microversion change to admin APIs which would turn previously invalid but passing requests to failures. I'm assuming 403 was handled "gracefully" more for backward compatibility than anything else but as can be seen from Surya's bug is just masking the original issue that this validation code was trying to fix. Nothing is changing in request or response schema and this should only be enforced (re-raise on 403 from keystone) in the APIs that are admin-only by default so it's not an interop concern. I just don't see why we would expect someone to need to opt into this validation actually working - and if misconfigured actually failing to indicate to the admin using the API that their deployment needs to be fixed. [1] https://review.opendev.org/#/c/435010/ -- Thanks, Matt From noonedeadpunk at ya.ru Thu Nov 28 15:36:37 2019 From: noonedeadpunk at ya.ru (Dmitriy Rabotyagov) Date: Thu, 28 Nov 2019 17:36:37 +0200 Subject: [tripleo][ironic][ansible][openstack-ansible][ansible-sig] Ironic/Baremetal Ansible modules In-Reply-To: References: <826531574942720@sas8-ed615920eca2.qloud-c.yandex.net> Message-ID: <126374281574955397@iva5-2a2172cb7cff.qloud-c.yandex.net> An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Nov 28 17:40:15 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 28 Nov 2019 11:40:15 -0600 Subject: [nova][api] Behaviour of project_id validation In-Reply-To: References: <5be847b4-f5f6-6365-6485-79c5b547a066@gmail.com> <16eae1d03f7.103bacc7e330316.7506234740575974363@ghanshyammann.com> Message-ID: <16eb3196154.c8424262368627.6558058975122736719@ghanshyammann.com> ---- On Thu, 28 Nov 2019 08:34:03 -0600 Matt Riedemann wrote ---- > On 11/27/2019 12:26 PM, Ghanshyam Mann wrote: > > In old change where we added the verify_project_id did not change the success case, that only handled the case where keystone returned > > 404 on GET /project means it is confirmed that the requested project does not exist so it will break later so nova started 400 instead of 200. > > What "old change" are you referring to here? Before [1] there was *no* > validation performed when adding/removing flavor access or updating > quotas for a given project. Hence all of the duplicate bugs about being > able to typo/fat-finger/pass garbage values to those APIs and then be > confused later when flavor access and quotas aren't working as expected. > There have been a few changes to the verify method since it was added, > but the point is it was a non-microversion change to admin APIs which > would turn previously invalid but passing requests to failures. +1. agree on that change which is making an invalid passed request to failure so does not require microversion. > > I'm assuming 403 was handled "gracefully" more for backward > compatibility than anything else but as can be seen from Surya's bug is > just masking the original issue that this validation code was trying to fix. > > Nothing is changing in request or response schema and this should only > be enforced (re-raise on 403 from keystone) in the APIs that are > admin-only by default so it's not an interop concern. This is a good point, On rethinking on "how the operator can fix the current success case which going to be failed case after proposed solution" is via a configuration change only (change the policy permissions). Which is similar to changing the default roles for any policy. If we change the policy defaults then we follow the deprecation phase only and do not bump the microversion. If we can follow the deprecation warning (the same as a policy change case) in this proposed solution also then we can avoid the microversion bump. I mean, in Ussuri release, we log the warning saying that "this API might be successful for you but it will fail if the user does not have GET /project" permissions from V release. and in V release we raise the error for keystone's 403 or non-2xx cases also. -gmann > > I just don't see why we would expect someone to need to opt into this > validation actually working - and if misconfigured actually failing to > indicate to the admin using the API that their deployment needs to be fixed. > > [1] https://review.opendev.org/#/c/435010/ > > -- > > Thanks, > > Matt > > From gmann at ghanshyammann.com Thu Nov 28 17:43:51 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 28 Nov 2019 11:43:51 -0600 Subject: [nova][api] Behaviour of project_id validation In-Reply-To: References: <5be847b4-f5f6-6365-6485-79c5b547a066@gmail.com> <16eae1d03f7.103bacc7e330316.7506234740575974363@ghanshyammann.com> Message-ID: <16eb31caeed.11344b3a3368713.2929866839154291372@ghanshyammann.com> ---- On Thu, 28 Nov 2019 03:02:19 -0600 Alex Xu wrote ---- > > > Ghanshyam Mann 于2019年11月28日周四 上午2:27写道: > ---- On Wed, 27 Nov 2019 03:35:43 -0600 Surya Seetharaman wrote ---- > > > > > > On Tue, Nov 26, 2019 at 9:26 PM Matt Riedemann wrote: > > Note that the APIs that would change are admin-only by default. So in > > this case nova is configured with a service user to check if the > > requested project_id exists on behalf of the (admin) user making the > > compute API request to add/remove flavor access (or update quota values > > for a project). The service user does not have enough permissions in > > keystone to check if the project exists. Option 1 is give that service > > user more authority. Option 2 is basically re-raise that error to the > > compute (admin) user to let them know they basically need to fix their > > deployment (option 1 again). > > > > > > > > A combo of both solutions where we raise the error to the user and amend our docs to help them fix it seems good to me. > > +1 on the solution. I like that code tells the error to users because people do not read the doc always. > > > > > I don't think a microversion is necessary for this > > ++ > > I disagree here. My main concern is that this is not the always-broken case. > For the case where we have complete broken behaviour then we do not need microverison to fix that as mentioned by matt too. > > In this case: Even 403 from keystone on GET /project, it may possible that the project exists and request Nova to add that projects in > flavor access is right. This is a success case in the current situation which will be changed to 400 after the proposed solution(option2 or 2+1). > This is behaviour change and should be done with microvesion. > > That is the case we expected to be fixed by the operator, right? If the operator fix that, then there won't be any API behavior change. I guess that what Matt point. yeah, they can fix it via configuration change only so we can consider this as policy default change or extra policy checks changes which do not require microversion. -gmann > > In old change where we added the verify_project_id did not change the success case, that only handled the case where keystone returned > 404 on GET /project means it is confirmed that the requested project does not exist so it will break later so nova started 400 instead of 200. > which was clearly a broken case. Any other case where projects may exist was kept as it is so microversion was not needed there. > But now we are changing the success cases also to return an error and ask the user to have the GET /project permission first otherwise > nova cannot process the request. Your project might be valid but nova cannot conform that till you have permission to GET /project. > > -gmann > > > > > ---------- > > > > Cheers,Surya. > > > From jungleboyj at gmail.com Thu Nov 28 17:55:25 2019 From: jungleboyj at gmail.com (Jay S. Bryant) Date: Thu, 28 Nov 2019 11:55:25 -0600 Subject: [cinder] Anastasiya accepted for Outreachy In-Reply-To: References: Message-ID: Welcome Anastasiya!  Really happy to have you join the team and thank you for mentoring Sofi.  All your efforts on Cinder are greatly appreciated! Jay On 11/26/2019 1:49 PM, Brian Rosmaita wrote: > On 11/26/19 12:45 PM, Sofia Enriquez wrote: >> Hi Cinder team, >> >> I'd like to announce that Anastasiya will be working with us >> improving the Tempest coverage this round. The internship schedule >> starts on Dec. 3, 2019, to March 3, 2020. Feel free to reach her on >> IRC /as anastzhyr/ if something comes up. > > Congratulations, Anastasiya!  Improving tempest coverage is one of our > priorities for Ussuri, so I'm really glad you'll be working on this > topic. > > Also, special thanks to you, Sofi, for acting as Anastasiya's mentor. > >> >> On the other hand, If you have any suggestions for tempest test >> scenarios or possible ideas, please let me know. >> >> Regards, >> Sofi >> >> -- >> >> L. Sofía Enriquez >> >> she/her >> >> Associate Software Engineer >> >> Red Hat PnT >> >> IRC: @enriquetaso >> >> @RedHat Red Hat >> Red Hat >> >> >> >> > > From miguel at mlavalle.com Thu Nov 28 20:39:49 2019 From: miguel at mlavalle.com (Miguel Lavalle) Date: Thu, 28 Nov 2019 14:39:49 -0600 Subject: [neutron] Proposing Jakub Libosvar as Neutron core reviewer In-Reply-To: <207982A0-CBE7-47D9-A19C-CCFCACCB34EF@redhat.com> References: <207982A0-CBE7-47D9-A19C-CCFCACCB34EF@redhat.com> Message-ID: Glad to see Jakub back in Neutron core. +1 On Thu, Nov 28, 2019 at 3:00 AM Slawek Kaplonski wrote: > Hi neutrinos, > > We already started process of migrating networking-ovn driver to be one of > in-tree neutron drivers. Blueprint for that is [1]. > As part of this process I today proposed to include networking-ovn-core > group into neutron-core group. Mail about it can be found at [2]. > One of persons in networking-ovn-group is Jakub Libosvar who was Neutron > core for very long time in the past. He knows very well not only ovn > related code but also have great knowledge about all Neutron code base. > So I would like to propose to Jakub as Neutron core reviewer again as he > will be back working on neutron again now, after ovn will be in-tree driver. > What do You think about it? > I will wait for Your opinions for 1 week from now. Thx for all Your > comments about it. > > [1] https://blueprints.launchpad.net/neutron/+spec/neutron-ovn-merge > [2] > http://lists.openstack.org/pipermail/openstack-discuss/2019-November/011240.html > > — > Slawek Kaplonski > Senior software engineer > Red Hat > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yamamoto at midokura.com Fri Nov 29 00:48:25 2019 From: yamamoto at midokura.com (Takashi Yamamoto) Date: Fri, 29 Nov 2019 09:48:25 +0900 Subject: [neutron][OVN] Multiple mechanism drivers In-Reply-To: References: <8B3A471E-B855-4D1C-AE52-080D4B0D92A9@gmail.com> <20191125075144.vhppi2bnnnfyy57s@skaplons-mac> Message-ID: On Thu, Nov 28, 2019 at 9:27 PM Sean Mooney wrote: > > On Thu, 2019-11-28 at 11:12 +0900, Takashi Yamamoto wrote: > > hi, > > > > On Mon, Nov 25, 2019 at 5:00 PM Slawek Kaplonski wrote: > > > > > > Hi, > > > > > > I think that this may be true that networking-ovn will not work properly > > > with other drivers. > > > I don't think it was tested at any time. > it should work with other direver if you use vlan or flat networks. > it will not form mesh tunnel networks with other drivers event if you use geneve for the > other ml2 driver. > > > Also the problem may be that when You are using networking-ovn than whole > > > neutron topology is different. There are different agents for example. > > > > > > Please open a bug for that for networking-ovn. I think that networking-ovn team > > > will take a look into that. > > > > networking-midonet ignores networks without "midonet" type segments to > > avoid interfering other mechanism drivers. > > maybe networking-ovn can have something similar. > that is actully the opiste of how that shoudl work. > you are ment to be able to have multiple ml2 drivers share the same segmentation type > and you are not ment to have a segmentation type that is specific to a mech driver. > give we dont scheduler based on the segmenation type supprot today either (we shoudl by the way) > it would be very fagile to use a dedicated ovn segmentation type and i woudl not advise doing it for > midio net. IMO it doesn't make sense to use the same segmentation types unless the on-wire protocol among nodes is actually compatible with the reference implementations. (in case of midonet, it isn't.) > > ideally we would create placement aggregate or triats to track which segmentation types > are supported by which hosts. traits are proably better for the segmentation types but modelling network segments > them selves would be better as aggreates. > > if we really wanted to model the capsity of the segmenation types we woudl addtionally create shareing resouce providers > with inventories of network segmenation types resouce classes per physnet with a singel gloabl rp for the tunneled > types. then every time you allocated a network in neuton you would create an allocation for that network and tag ports > with the approreate aggreate requrest. > > on the nova side wew could combine the segment and segmenation type aggreate request form the port with any > other aggreates form nova and pass all of them as member_of requriements to placment to ensure we land on a > host that can provide the required network connectivty. today we litrallly just assme all nodes are connected > to all networks with all segmenation types and hope for the best. > > thats a bit of a tangent but just pointing out we should schduler on network connectivity and segmenation types > but we shoudl not have backend specific segmenation types. > > > > > wrt agents, last time i checked there was no problem with running > > midonet agent and ovs agent on the same host, sharing the kernel > > datapath. > > so i guess there's no problem with ovn either. > you can run ml2/ovn and ml2/ovs on the same cloud. > just put the ml2/ovs first. it will fail to bind if a host dose not have > the ovs neutron agent running and will then bind with ml2/ovn instead. > > it might work the other way too but i have nto tested that. > > > > wrt l3, unfortunately neither midonet or ovn have implemented "l3 > > flavor" thing yet. so you have to choose a single l3 plugin. > > iirc, Sam's deployment doesn't use l3 for linuxbridge, right? > if you have dedicated network nodes that is not really a proable. > just make sure that they are all ovn or all ovs or whatever makes sense. > its the same way that if you deploy with ml2/ovs and want to use ovs-dpdk that > you only instlal ovs-dpdk on the comptue nodes and use kernel ovs on the networking nodes > to avoid the terible network performace when using network namespace for nat and routing. > > if you have tunneled networks it would be an issue butin that case you just need to ensure that at least 1 router > is created on each plugin so you would use ha routers by default and set the ha factor so that it created routers on > node with both mechinmum dirvers. again however since the different ml2/driver do not form a mesh you should > really only use different ml2 drivers if you are using vlan or flat networks. > > > > > > > > On Mon, Nov 25, 2019 at 04:32:50PM +1100, Sam Morrison wrote: > > > > We are looking at using OVN and are having some issues with it in our ML2 environment. > > > > > > > > We currently have 2 mechanism drivers in use: linuxbridge and midonet and these work well (midonet is the default > > > > tenant network driver for when users create a network) > > > > > > > > Adding OVN as a third mechanism driver causes the linuxbridge and midonet networks to stop working in terms of > > > > CRUD operations etc. > i would try adding ovn last so it is only used if the other two cannot bind the port. > the mech driver list is orderd for this reason so you can express preference. > > > > It looks as if the OVN driver thinks it’s the only player and is trying to do things on ports that are in > > > > linuxbridge or midonet networks. > that would be a bug if so. > > > > > > > > Am I missing something here? (We’re using Stein version) > > > > > > > > > > > > Thanks, > > > > Sam > > > > > > > > > > > > > > > > > > -- > > > Slawek Kaplonski > > > Senior software engineer > > > Red Hat > > > > > > > > > > > From yumeng_bao at yahoo.com Fri Nov 29 02:40:18 2019 From: yumeng_bao at yahoo.com (yumeng bao) Date: Fri, 29 Nov 2019 10:40:18 +0800 Subject: =?utf-8?Q?Re:_[keystone][nova][barbican][neutron][cinder][tc][po?= =?utf-8?Q?licy]_Proposal_for_policy_popup_team=E2=81=A9?= References: <1C1ED9D1-BB0C-4517-845E-1C9E0CD7CC6E.ref@yahoo.com> Message-ID: <1C1ED9D1-BB0C-4517-845E-1C9E0CD7CC6E@yahoo.com> >> Please nominate Yumeng Bao (yumeng_bao at yahoo.com) as the liaison with >> Cyborg team. She will contribute the spec. We have informed Howard of >> this. > Thanks Sundar, I have replaced Howard with Yumeng as the liaison for cyborg. > (Side note - the wiki can be edited by anyone, and I am tracking changes in it, so anyone may feel free to change their project liaison or add or remove team members and I will be notified of the change.) > Colleen Thanks Colleen for the update and the note reminder ! I will update the cyborg team progress on wiki paget later. Regards, Yumeng From yu.chengde at 99cloud.net Fri Nov 29 03:14:08 2019 From: yu.chengde at 99cloud.net (YuChengDe) Date: Fri, 29 Nov 2019 11:14:08 +0800 (GMT+08:00) Subject: =?UTF-8?B?UmU6IFtub3ZhXSBUaGVyZSBpcyBubyBzY3JlZW4gZm9yIFVFRkkgaW5zdGFuY2UgZnJvbSBjb25zb2xl?= Message-ID: Hi Marcin Thanks~ I cloud not find uefi mode cloud image in ubuntu bionic. Instead, use the image bionic-server-cloudimg-amd64.img with metadata "hw_firmware_type: uefi" https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img However, console shows no screen. Besides, neither creating a CentOS image with console available isn't work. GRUB_CMDLINE_LINUX = "... console=tty0 console=ttyS0,115200n8" Cloud you please provide some suggestions how to implement console input/output with uefi mode os. My appreciation~ Hi. After launch an UEFI mode instance in OpenStack, the console screen show the message “Guest has no initialized the displayed (yet) " Engage your help, many thanks Base on ubuntu cloud image, listed as below https://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-amd64-uefi1.img // // Please check with bionic image. // // On arm64 xenial image was unable to initialize graphics for uefi hosts. -- ————————————————————————————— 九州云信息科技有限公司 99CLOUD Inc. 于成德 产品开发部 邮箱(Email): yu.chengde at 99cloud.net 手机(Mobile): 13816965096 地址(Addr): 上海市局门路427号1号楼206 Room 206, Bldg 1, No.427 JuMen Road, ShangHai, China 网址(Site): http://www.99cloud.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Nov 29 07:27:49 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 29 Nov 2019 08:27:49 +0100 Subject: [stein] [manila-ui] error Message-ID: Hello, I just installed openstack stein on centos. Manila works fine my command line but when I click "share" in the dashboard the following error appears: Environment: Request Method: GET Request URL: http://10.102.184.190/dashboard/project/shares/ Django Version: 1.11.20 Python Version: 2.7.5 Installed Applications: ['openstack_dashboard.dashboards.project', 'neutron_lbaas_dashboard', 'heat_dashboard', 'openstack_dashboard.dashboards.admin', 'openstack_dashboard.dashboards.identity', 'openstack_dashboard.dashboards.settings', 'dashboards', 'openstack_dashboard', 'django.contrib.contenttypes', 'django.contrib.auth', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'django.contrib.humanize', 'django_pyscss', 'debreach', 'openstack_dashboard.django_pyscss_fix', 'compressor', 'horizon', 'openstack_auth'] Installed Middleware: ('openstack_auth.middleware.OpenstackAuthMonkeyPatchMiddleware', 'debreach.middleware.RandomCommentMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'horizon.middleware.OperationLogMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'horizon.middleware.HorizonMiddleware', 'horizon.themes.ThemeMiddleware', 'django.middleware.locale.LocaleMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware', 'openstack_dashboard.contrib.developer.profiler.middleware.ProfilerClientMiddleware', 'openstack_dashboard.contrib.developer.profiler.middleware.ProfilerMiddleware') Traceback: File "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py" in inner 41. response = get_response(request) File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py" in _get_response 187. response = self.process_exception_by_middleware(e, request) File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py" in _get_response 185. response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec 36. return view_func(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec 52. return view_func(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec 36. return view_func(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec 113. return view_func(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec 84. return view_func(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/django/views/generic/base.py" in view 68. return self.dispatch(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/django/views/generic/base.py" in dispatch 88. return handler(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in get 223. handled = self.construct_tables() File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in construct_tables 214. handled = self.handle_table(table) File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in handle_table 123. data = self._get_data_dict() File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in _get_data_dict 43. data.extend(func()) File "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py" in wrapped 109. value = cache[key] = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py" in get_shares_data 57. share_nets = manila.share_network_list(self.request) File "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py" in share_network_list 280. return manilaclient(request).share_networks.list(detailed=detailed, Exception Type: AttributeError at /project/shares/ Exception Value: 'NoneType' object has no attribute 'share_networks' Anyone can help, please ? Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongbin034 at gmail.com Fri Nov 29 08:20:52 2019 From: hongbin034 at gmail.com (Hongbin Lu) Date: Fri, 29 Nov 2019 03:20:52 -0500 Subject: [zun][zun-ui][horizon] Request code review for a breaking fix In-Reply-To: References: Message-ID: On Tue, Nov 26, 2019 at 8:22 AM Akihiro Motoki wrote: > Sorry for late. While the patch has been approved by other horizon > cores, I have a question which blocks me for long. > > The proposed code is related to the serial console support and AFAIK > it was introduced to support the serial console of nova servers. > Perhaps Hongbin confirms it works with zun instance, but how can we > test it with nova servers with serial consoles (ironic instances?)? > I tested it in Zun and Nova and confirms that it works with zun containers and nova VM instances (ironic instances unconfirmed). BTW, we need to backport this fix to stable/train. I can let this patch *burn* on master for a while. If nobody complains, I will go ahead to propose a backport. > > Many developers add features to horizon but they don't leave enough > information on how to test them, so the current horizon team is > struggling to know how to test :-( > That's one reason that reviews for non-popular areas tend to take time > for long.... I wonder how we can improve this situation..... > > Thanks, > Akihiro > > > On Tue, Nov 26, 2019 at 1:13 PM Hongbin Lu wrote: > > > > Hi Horizon folks, > > > > We have an issue that needs to be fixed at horizon side. Please check > this bug: > > > > https://bugs.launchpad.net/zun-ui/+bug/1847889 > > > > We propose a fix on Horizon but the patch hasn't been moved forward for > a while. Would I ask for a code review for the patch > https://review.opendev.org/#/c/688290/ ? Without the fix, our horizon > plugin couldn't work correctly. > > > > Best regards, > > Hongbin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Nov 29 08:38:26 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 29 Nov 2019 09:38:26 +0100 Subject: [stein][manila-ui] error Message-ID: Hello, I just installed openstack stein on centos. Manila works fine my command line but when I click "share" in the dashboard the following error appears: Environment: Request Method: GET Request URL: http://10.102.184.190/dashboard/project/shares/ Django Version: 1.11.20 Python Version: 2.7.5 Installed Applications: ['openstack_dashboard.dashboards.project', 'neutron_lbaas_dashboard', 'heat_dashboard', 'openstack_dashboard.dashboards.admin', 'openstack_dashboard.dashboards.identity', 'openstack_dashboard.dashboards.settings', 'dashboards', 'openstack_dashboard', 'django.contrib.contenttypes', 'django.contrib.auth', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'django.contrib.humanize', 'django_pyscss', 'debreach', 'openstack_dashboard.django_pyscss_fix', 'compressor', 'horizon', 'openstack_auth'] Installed Middleware: ('openstack_auth.middleware.OpenstackAuthMonkeyPatchMiddleware', 'debreach.middleware.RandomCommentMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'horizon.middleware.OperationLogMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'horizon.middleware.HorizonMiddleware', 'horizon.themes.ThemeMiddleware', 'django.middleware.locale.LocaleMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware', 'openstack_dashboard.contrib.developer.profiler.middleware.ProfilerClientMiddleware', 'openstack_dashboard.contrib.developer.profiler.middleware.ProfilerMiddleware') Traceback: File "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py" in inner 41. response = get_response(request) File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py" in _get_response 187. response = self.process_exception_by_middleware(e, request) File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py" in _get_response 185. response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec 36. return view_func(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec 52. return view_func(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec 36. return view_func(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec 113. return view_func(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec 84. return view_func(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/django/views/generic/base.py" in view 68. return self.dispatch(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/django/views/generic/base.py" in dispatch 88. return handler(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in get 223. handled = self.construct_tables() File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in construct_tables 214. handled = self.handle_table(table) File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in handle_table 123. data = self._get_data_dict() File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in _get_data_dict 43. data.extend(func()) File "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py" in wrapped 109. value = cache[key] = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py" in get_shares_data 57. share_nets = manila.share_network_list(self.request) File "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py" in share_network_list 280. return manilaclient(request).share_networks.list(detailed=detailed, Exception Type: AttributeError at /project/shares/ Exception Value: 'NoneType' object has no attribute 'share_networks' Anyone can help, please ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Nov 29 11:06:46 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 29 Nov 2019 12:06:46 +0100 Subject: [horizon][stein] issue Message-ID: Hello, I just installed stein on centos 7 and when I have a log of instances clicking "next" on horizon the following messages appears in httpd error log (please help me): [Fri Nov 29 10:52:47.306231 2019] [:error] [pid 27395] INFO openstack_auth.forms Login successful for user "admin" using domain "default", remote address 10.102.184.193. [Fri Nov 29 10:53:04.594482 2019] [:error] [pid 27395] ERROR django.request Internal Server Error: /dashboard/project/instances/ [Fri Nov 29 10:53:04.594503 2019] [:error] [pid 27395] Traceback (most recent call last): [Fri Nov 29 10:53:04.594507 2019] [:error] [pid 27395] File "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line 41, in inner [Fri Nov 29 10:53:04.594510 2019] [:error] [pid 27395] response = get_response(request) [Fri Nov 29 10:53:04.594513 2019] [:error] [pid 27395] File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, in _get_response [Fri Nov 29 10:53:04.594515 2019] [:error] [pid 27395] response = self.process_exception_by_middleware(e, request) [Fri Nov 29 10:53:04.594518 2019] [:error] [pid 27395] File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, in _get_response [Fri Nov 29 10:53:04.594521 2019] [:error] [pid 27395] response = wrapped_callback(request, *callback_args, **callback_kwargs) [Fri Nov 29 10:53:04.594523 2019] [:error] [pid 27395] File "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec [Fri Nov 29 10:53:04.594526 2019] [:error] [pid 27395] return view_func(request, *args, **kwargs) [Fri Nov 29 10:53:04.594538 2019] [:error] [pid 27395] File "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec [Fri Nov 29 10:53:04.594561 2019] [:error] [pid 27395] return view_func(request, *args, **kwargs) [Fri Nov 29 10:53:04.594563 2019] [:error] [pid 27395] File "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec [Fri Nov 29 10:53:04.594566 2019] [:error] [pid 27395] return view_func(request, *args, **kwargs) [Fri Nov 29 10:53:04.594568 2019] [:error] [pid 27395] File "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec [Fri Nov 29 10:53:04.594586 2019] [:error] [pid 27395] return view_func(request, *args, **kwargs) [Fri Nov 29 10:53:04.594589 2019] [:error] [pid 27395] File "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec [Fri Nov 29 10:53:04.594591 2019] [:error] [pid 27395] return view_func(request, *args, **kwargs) [Fri Nov 29 10:53:04.594594 2019] [:error] [pid 27395] File "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, in view [Fri Nov 29 10:53:04.594597 2019] [:error] [pid 27395] return self.dispatch(request, *args, **kwargs) [Fri Nov 29 10:53:04.594600 2019] [:error] [pid 27395] File "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, in dispatch [Fri Nov 29 10:53:04.594603 2019] [:error] [pid 27395] return handler(request, *args, **kwargs) [Fri Nov 29 10:53:04.594605 2019] [:error] [pid 27395] File "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get [Fri Nov 29 10:53:04.594608 2019] [:error] [pid 27395] handled = self.construct_tables() [Fri Nov 29 10:53:04.594611 2019] [:error] [pid 27395] File "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in construct_tables [Fri Nov 29 10:53:04.594613 2019] [:error] [pid 27395] handled = self.handle_table(table) [Fri Nov 29 10:53:04.594616 2019] [:error] [pid 27395] File "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in handle_table [Fri Nov 29 10:53:04.594638 2019] [:error] [pid 27395] File "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 252, in _get_data_dict [Fri Nov 29 10:53:04.594641 2019] [:error] [pid 27395] self._data = {self.table_class._meta.name: self.get_data()} [Fri Nov 29 10:53:04.594644 2019] [:error] [pid 27395] File "/usr/share/openstack-dashboard/openstack_dashboard/dashboards/project/instances/views.py", line 186, in get_data [Fri Nov 29 10:53:04.594661 2019] [:error] [pid 27395] boot_volume.volume_image_metadata['image_id'] in [Fri Nov 29 10:53:04.594664 2019] [:error] [pid 27395] KeyError: 'image_id -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdopiera at redhat.com Fri Nov 29 11:43:10 2019 From: rdopiera at redhat.com (Radek Dopieralski) Date: Fri, 29 Nov 2019 12:43:10 +0100 Subject: [horizon][stein] issue In-Reply-To: References: Message-ID: Looks like you hit https://bugs.launchpad.net/horizon/+bug/1834747 On Fri, Nov 29, 2019 at 12:20 PM Ignazio Cassano wrote: > Hello, > I just installed stein on centos 7 and when I have a log of instances > clicking "next" on horizon the following messages appears in httpd error > log (please help me): > > [Fri Nov 29 10:52:47.306231 2019] [:error] [pid 27395] INFO > openstack_auth.forms Login successful for user "admin" using domain > "default", remote address 10.102.184.193. > [Fri Nov 29 10:53:04.594482 2019] [:error] [pid 27395] ERROR > django.request Internal Server Error: /dashboard/project/instances/ > [Fri Nov 29 10:53:04.594503 2019] [:error] [pid 27395] Traceback (most > recent call last): > [Fri Nov 29 10:53:04.594507 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line > 41, in inner > [Fri Nov 29 10:53:04.594510 2019] [:error] [pid 27395] response = > get_response(request) > [Fri Nov 29 10:53:04.594513 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, > in _get_response > [Fri Nov 29 10:53:04.594515 2019] [:error] [pid 27395] response = > self.process_exception_by_middleware(e, request) > [Fri Nov 29 10:53:04.594518 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, > in _get_response > [Fri Nov 29 10:53:04.594521 2019] [:error] [pid 27395] response = > wrapped_callback(request, *callback_args, **callback_kwargs) > [Fri Nov 29 10:53:04.594523 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec > [Fri Nov 29 10:53:04.594526 2019] [:error] [pid 27395] return > view_func(request, *args, **kwargs) > [Fri Nov 29 10:53:04.594538 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec > [Fri Nov 29 10:53:04.594561 2019] [:error] [pid 27395] return > view_func(request, *args, **kwargs) > [Fri Nov 29 10:53:04.594563 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec > [Fri Nov 29 10:53:04.594566 2019] [:error] [pid 27395] return > view_func(request, *args, **kwargs) > [Fri Nov 29 10:53:04.594568 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec > [Fri Nov 29 10:53:04.594586 2019] [:error] [pid 27395] return > view_func(request, *args, **kwargs) > [Fri Nov 29 10:53:04.594589 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec > [Fri Nov 29 10:53:04.594591 2019] [:error] [pid 27395] return > view_func(request, *args, **kwargs) > [Fri Nov 29 10:53:04.594594 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, > in view > [Fri Nov 29 10:53:04.594597 2019] [:error] [pid 27395] return > self.dispatch(request, *args, **kwargs) > [Fri Nov 29 10:53:04.594600 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, > in dispatch > [Fri Nov 29 10:53:04.594603 2019] [:error] [pid 27395] return > handler(request, *args, **kwargs) > [Fri Nov 29 10:53:04.594605 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get > [Fri Nov 29 10:53:04.594608 2019] [:error] [pid 27395] handled = > self.construct_tables() > [Fri Nov 29 10:53:04.594611 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in > construct_tables > [Fri Nov 29 10:53:04.594613 2019] [:error] [pid 27395] handled = > self.handle_table(table) > [Fri Nov 29 10:53:04.594616 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in > handle_table > [Fri Nov 29 10:53:04.594638 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 252, in > _get_data_dict > [Fri Nov 29 10:53:04.594641 2019] [:error] [pid 27395] self._data = > {self.table_class._meta.name: self.get_data()} > [Fri Nov 29 10:53:04.594644 2019] [:error] [pid 27395] File > "/usr/share/openstack-dashboard/openstack_dashboard/dashboards/project/instances/views.py", > line 186, in get_data > [Fri Nov 29 10:53:04.594661 2019] [:error] [pid 27395] > boot_volume.volume_image_metadata['image_id'] in > [Fri Nov 29 10:53:04.594664 2019] [:error] [pid 27395] KeyError: 'image_id > -- Radomir Dopieralski -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdopiera at redhat.com Fri Nov 29 11:43:10 2019 From: rdopiera at redhat.com (Radek Dopieralski) Date: Fri, 29 Nov 2019 12:43:10 +0100 Subject: [horizon][stein] issue In-Reply-To: References: Message-ID: Looks like you hit https://bugs.launchpad.net/horizon/+bug/1834747 On Fri, Nov 29, 2019 at 12:20 PM Ignazio Cassano wrote: > Hello, > I just installed stein on centos 7 and when I have a log of instances > clicking "next" on horizon the following messages appears in httpd error > log (please help me): > > [Fri Nov 29 10:52:47.306231 2019] [:error] [pid 27395] INFO > openstack_auth.forms Login successful for user "admin" using domain > "default", remote address 10.102.184.193. > [Fri Nov 29 10:53:04.594482 2019] [:error] [pid 27395] ERROR > django.request Internal Server Error: /dashboard/project/instances/ > [Fri Nov 29 10:53:04.594503 2019] [:error] [pid 27395] Traceback (most > recent call last): > [Fri Nov 29 10:53:04.594507 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line > 41, in inner > [Fri Nov 29 10:53:04.594510 2019] [:error] [pid 27395] response = > get_response(request) > [Fri Nov 29 10:53:04.594513 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, > in _get_response > [Fri Nov 29 10:53:04.594515 2019] [:error] [pid 27395] response = > self.process_exception_by_middleware(e, request) > [Fri Nov 29 10:53:04.594518 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, > in _get_response > [Fri Nov 29 10:53:04.594521 2019] [:error] [pid 27395] response = > wrapped_callback(request, *callback_args, **callback_kwargs) > [Fri Nov 29 10:53:04.594523 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec > [Fri Nov 29 10:53:04.594526 2019] [:error] [pid 27395] return > view_func(request, *args, **kwargs) > [Fri Nov 29 10:53:04.594538 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec > [Fri Nov 29 10:53:04.594561 2019] [:error] [pid 27395] return > view_func(request, *args, **kwargs) > [Fri Nov 29 10:53:04.594563 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec > [Fri Nov 29 10:53:04.594566 2019] [:error] [pid 27395] return > view_func(request, *args, **kwargs) > [Fri Nov 29 10:53:04.594568 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec > [Fri Nov 29 10:53:04.594586 2019] [:error] [pid 27395] return > view_func(request, *args, **kwargs) > [Fri Nov 29 10:53:04.594589 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec > [Fri Nov 29 10:53:04.594591 2019] [:error] [pid 27395] return > view_func(request, *args, **kwargs) > [Fri Nov 29 10:53:04.594594 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, > in view > [Fri Nov 29 10:53:04.594597 2019] [:error] [pid 27395] return > self.dispatch(request, *args, **kwargs) > [Fri Nov 29 10:53:04.594600 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, > in dispatch > [Fri Nov 29 10:53:04.594603 2019] [:error] [pid 27395] return > handler(request, *args, **kwargs) > [Fri Nov 29 10:53:04.594605 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get > [Fri Nov 29 10:53:04.594608 2019] [:error] [pid 27395] handled = > self.construct_tables() > [Fri Nov 29 10:53:04.594611 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in > construct_tables > [Fri Nov 29 10:53:04.594613 2019] [:error] [pid 27395] handled = > self.handle_table(table) > [Fri Nov 29 10:53:04.594616 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in > handle_table > [Fri Nov 29 10:53:04.594638 2019] [:error] [pid 27395] File > "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 252, in > _get_data_dict > [Fri Nov 29 10:53:04.594641 2019] [:error] [pid 27395] self._data = > {self.table_class._meta.name: self.get_data()} > [Fri Nov 29 10:53:04.594644 2019] [:error] [pid 27395] File > "/usr/share/openstack-dashboard/openstack_dashboard/dashboards/project/instances/views.py", > line 186, in get_data > [Fri Nov 29 10:53:04.594661 2019] [:error] [pid 27395] > boot_volume.volume_image_metadata['image_id'] in > [Fri Nov 29 10:53:04.594664 2019] [:error] [pid 27395] KeyError: 'image_id > -- Radomir Dopieralski -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Nov 29 12:45:37 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 29 Nov 2019 13:45:37 +0100 Subject: [horizon][stein] issue In-Reply-To: References: Message-ID: Thanks Radek, it seems my case Il giorno ven 29 nov 2019 alle ore 12:43 Radek Dopieralski < rdopiera at redhat.com> ha scritto: > Looks like you hit https://bugs.launchpad.net/horizon/+bug/1834747 > > On Fri, Nov 29, 2019 at 12:20 PM Ignazio Cassano > wrote: > >> Hello, >> I just installed stein on centos 7 and when I have a log of instances >> clicking "next" on horizon the following messages appears in httpd error >> log (please help me): >> >> [Fri Nov 29 10:52:47.306231 2019] [:error] [pid 27395] INFO >> openstack_auth.forms Login successful for user "admin" using domain >> "default", remote address 10.102.184.193. >> [Fri Nov 29 10:53:04.594482 2019] [:error] [pid 27395] ERROR >> django.request Internal Server Error: /dashboard/project/instances/ >> [Fri Nov 29 10:53:04.594503 2019] [:error] [pid 27395] Traceback (most >> recent call last): >> [Fri Nov 29 10:53:04.594507 2019] [:error] [pid 27395] File >> "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line >> 41, in inner >> [Fri Nov 29 10:53:04.594510 2019] [:error] [pid 27395] response = >> get_response(request) >> [Fri Nov 29 10:53:04.594513 2019] [:error] [pid 27395] File >> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, >> in _get_response >> [Fri Nov 29 10:53:04.594515 2019] [:error] [pid 27395] response = >> self.process_exception_by_middleware(e, request) >> [Fri Nov 29 10:53:04.594518 2019] [:error] [pid 27395] File >> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, >> in _get_response >> [Fri Nov 29 10:53:04.594521 2019] [:error] [pid 27395] response = >> wrapped_callback(request, *callback_args, **callback_kwargs) >> [Fri Nov 29 10:53:04.594523 2019] [:error] [pid 27395] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >> [Fri Nov 29 10:53:04.594526 2019] [:error] [pid 27395] return >> view_func(request, *args, **kwargs) >> [Fri Nov 29 10:53:04.594538 2019] [:error] [pid 27395] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec >> [Fri Nov 29 10:53:04.594561 2019] [:error] [pid 27395] return >> view_func(request, *args, **kwargs) >> [Fri Nov 29 10:53:04.594563 2019] [:error] [pid 27395] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >> [Fri Nov 29 10:53:04.594566 2019] [:error] [pid 27395] return >> view_func(request, *args, **kwargs) >> [Fri Nov 29 10:53:04.594568 2019] [:error] [pid 27395] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec >> [Fri Nov 29 10:53:04.594586 2019] [:error] [pid 27395] return >> view_func(request, *args, **kwargs) >> [Fri Nov 29 10:53:04.594589 2019] [:error] [pid 27395] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec >> [Fri Nov 29 10:53:04.594591 2019] [:error] [pid 27395] return >> view_func(request, *args, **kwargs) >> [Fri Nov 29 10:53:04.594594 2019] [:error] [pid 27395] File >> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, >> in view >> [Fri Nov 29 10:53:04.594597 2019] [:error] [pid 27395] return >> self.dispatch(request, *args, **kwargs) >> [Fri Nov 29 10:53:04.594600 2019] [:error] [pid 27395] File >> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, >> in dispatch >> [Fri Nov 29 10:53:04.594603 2019] [:error] [pid 27395] return >> handler(request, *args, **kwargs) >> [Fri Nov 29 10:53:04.594605 2019] [:error] [pid 27395] File >> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get >> [Fri Nov 29 10:53:04.594608 2019] [:error] [pid 27395] handled = >> self.construct_tables() >> [Fri Nov 29 10:53:04.594611 2019] [:error] [pid 27395] File >> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in >> construct_tables >> [Fri Nov 29 10:53:04.594613 2019] [:error] [pid 27395] handled = >> self.handle_table(table) >> [Fri Nov 29 10:53:04.594616 2019] [:error] [pid 27395] File >> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in >> handle_table >> [Fri Nov 29 10:53:04.594638 2019] [:error] [pid 27395] File >> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 252, in >> _get_data_dict >> [Fri Nov 29 10:53:04.594641 2019] [:error] [pid 27395] self._data = >> {self.table_class._meta.name: self.get_data()} >> [Fri Nov 29 10:53:04.594644 2019] [:error] [pid 27395] File >> "/usr/share/openstack-dashboard/openstack_dashboard/dashboards/project/instances/views.py", >> line 186, in get_data >> [Fri Nov 29 10:53:04.594661 2019] [:error] [pid 27395] >> boot_volume.volume_image_metadata['image_id'] in >> [Fri Nov 29 10:53:04.594664 2019] [:error] [pid 27395] KeyError: 'image_id >> > > > -- > Radomir Dopieralski > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Nov 29 13:08:42 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 29 Nov 2019 14:08:42 +0100 Subject: [horizon][stein] issue In-Reply-To: References: Message-ID: Hello, in the dashboard admin section it does not happen and next and prev functions work fine. I have a lot of virtual machine and they booted from image but when I show the virtual machine, the image used for creating volume does not appear. Regards Ignazio Il giorno ven 29 nov 2019 alle ore 13:45 Ignazio Cassano < ignaziocassano at gmail.com> ha scritto: > Thanks Radek, it seems my case > > Il giorno ven 29 nov 2019 alle ore 12:43 Radek Dopieralski < > rdopiera at redhat.com> ha scritto: > >> Looks like you hit https://bugs.launchpad.net/horizon/+bug/1834747 >> >> On Fri, Nov 29, 2019 at 12:20 PM Ignazio Cassano < >> ignaziocassano at gmail.com> wrote: >> >>> Hello, >>> I just installed stein on centos 7 and when I have a log of instances >>> clicking "next" on horizon the following messages appears in httpd error >>> log (please help me): >>> >>> [Fri Nov 29 10:52:47.306231 2019] [:error] [pid 27395] INFO >>> openstack_auth.forms Login successful for user "admin" using domain >>> "default", remote address 10.102.184.193. >>> [Fri Nov 29 10:53:04.594482 2019] [:error] [pid 27395] ERROR >>> django.request Internal Server Error: /dashboard/project/instances/ >>> [Fri Nov 29 10:53:04.594503 2019] [:error] [pid 27395] Traceback (most >>> recent call last): >>> [Fri Nov 29 10:53:04.594507 2019] [:error] [pid 27395] File >>> "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line >>> 41, in inner >>> [Fri Nov 29 10:53:04.594510 2019] [:error] [pid 27395] response = >>> get_response(request) >>> [Fri Nov 29 10:53:04.594513 2019] [:error] [pid 27395] File >>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, >>> in _get_response >>> [Fri Nov 29 10:53:04.594515 2019] [:error] [pid 27395] response = >>> self.process_exception_by_middleware(e, request) >>> [Fri Nov 29 10:53:04.594518 2019] [:error] [pid 27395] File >>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, >>> in _get_response >>> [Fri Nov 29 10:53:04.594521 2019] [:error] [pid 27395] response = >>> wrapped_callback(request, *callback_args, **callback_kwargs) >>> [Fri Nov 29 10:53:04.594523 2019] [:error] [pid 27395] File >>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>> [Fri Nov 29 10:53:04.594526 2019] [:error] [pid 27395] return >>> view_func(request, *args, **kwargs) >>> [Fri Nov 29 10:53:04.594538 2019] [:error] [pid 27395] File >>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec >>> [Fri Nov 29 10:53:04.594561 2019] [:error] [pid 27395] return >>> view_func(request, *args, **kwargs) >>> [Fri Nov 29 10:53:04.594563 2019] [:error] [pid 27395] File >>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>> [Fri Nov 29 10:53:04.594566 2019] [:error] [pid 27395] return >>> view_func(request, *args, **kwargs) >>> [Fri Nov 29 10:53:04.594568 2019] [:error] [pid 27395] File >>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec >>> [Fri Nov 29 10:53:04.594586 2019] [:error] [pid 27395] return >>> view_func(request, *args, **kwargs) >>> [Fri Nov 29 10:53:04.594589 2019] [:error] [pid 27395] File >>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec >>> [Fri Nov 29 10:53:04.594591 2019] [:error] [pid 27395] return >>> view_func(request, *args, **kwargs) >>> [Fri Nov 29 10:53:04.594594 2019] [:error] [pid 27395] File >>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, >>> in view >>> [Fri Nov 29 10:53:04.594597 2019] [:error] [pid 27395] return >>> self.dispatch(request, *args, **kwargs) >>> [Fri Nov 29 10:53:04.594600 2019] [:error] [pid 27395] File >>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, >>> in dispatch >>> [Fri Nov 29 10:53:04.594603 2019] [:error] [pid 27395] return >>> handler(request, *args, **kwargs) >>> [Fri Nov 29 10:53:04.594605 2019] [:error] [pid 27395] File >>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get >>> [Fri Nov 29 10:53:04.594608 2019] [:error] [pid 27395] handled = >>> self.construct_tables() >>> [Fri Nov 29 10:53:04.594611 2019] [:error] [pid 27395] File >>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in >>> construct_tables >>> [Fri Nov 29 10:53:04.594613 2019] [:error] [pid 27395] handled = >>> self.handle_table(table) >>> [Fri Nov 29 10:53:04.594616 2019] [:error] [pid 27395] File >>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in >>> handle_table >>> [Fri Nov 29 10:53:04.594638 2019] [:error] [pid 27395] File >>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 252, in >>> _get_data_dict >>> [Fri Nov 29 10:53:04.594641 2019] [:error] [pid 27395] self._data = >>> {self.table_class._meta.name: self.get_data()} >>> [Fri Nov 29 10:53:04.594644 2019] [:error] [pid 27395] File >>> "/usr/share/openstack-dashboard/openstack_dashboard/dashboards/project/instances/views.py", >>> line 186, in get_data >>> [Fri Nov 29 10:53:04.594661 2019] [:error] [pid 27395] >>> boot_volume.volume_image_metadata['image_id'] in >>> [Fri Nov 29 10:53:04.594664 2019] [:error] [pid 27395] KeyError: >>> 'image_id >>> >> >> >> -- >> Radomir Dopieralski >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ces.eduardo98 at gmail.com Fri Nov 29 13:46:16 2019 From: ces.eduardo98 at gmail.com (Carlos Silva) Date: Fri, 29 Nov 2019 10:46:16 -0300 Subject: [stein][manila-ui] error In-Reply-To: References: Message-ID: Hi, Ignazio! Could you please provide more information about how have you installed it? It was via package, git, devstack? I'm trying to reproduce the issue in my environment but I'm not able to. Regards, Carlos Silva. Em sex., 29 de nov. de 2019 às 05:42, Ignazio Cassano < ignaziocassano at gmail.com> escreveu: > Hello, > I just installed openstack stein on centos. > Manila works fine my command line but when I click "share" in the > dashboard the following error appears: > > Environment: > > > Request Method: GET > Request URL: http://10.102.184.190/dashboard/project/shares/ > > Django Version: 1.11.20 > Python Version: 2.7.5 > Installed Applications: > ['openstack_dashboard.dashboards.project', > 'neutron_lbaas_dashboard', > 'heat_dashboard', > 'openstack_dashboard.dashboards.admin', > 'openstack_dashboard.dashboards.identity', > 'openstack_dashboard.dashboards.settings', > 'dashboards', > 'openstack_dashboard', > 'django.contrib.contenttypes', > 'django.contrib.auth', > 'django.contrib.sessions', > 'django.contrib.messages', > 'django.contrib.staticfiles', > 'django.contrib.humanize', > 'django_pyscss', > 'debreach', > 'openstack_dashboard.django_pyscss_fix', > 'compressor', > 'horizon', > 'openstack_auth'] > Installed Middleware: > ('openstack_auth.middleware.OpenstackAuthMonkeyPatchMiddleware', > 'debreach.middleware.RandomCommentMiddleware', > 'django.middleware.common.CommonMiddleware', > 'django.middleware.csrf.CsrfViewMiddleware', > 'django.contrib.sessions.middleware.SessionMiddleware', > 'django.contrib.auth.middleware.AuthenticationMiddleware', > 'horizon.middleware.OperationLogMiddleware', > 'django.contrib.messages.middleware.MessageMiddleware', > 'horizon.middleware.HorizonMiddleware', > 'horizon.themes.ThemeMiddleware', > 'django.middleware.locale.LocaleMiddleware', > 'django.middleware.clickjacking.XFrameOptionsMiddleware', > > 'openstack_dashboard.contrib.developer.profiler.middleware.ProfilerClientMiddleware', > > 'openstack_dashboard.contrib.developer.profiler.middleware.ProfilerMiddleware') > > > > Traceback: > > File "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py" > in inner > 41. response = get_response(request) > > File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py" in > _get_response > 187. response = self.process_exception_by_middleware(e, > request) > > File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py" in > _get_response > 185. response = wrapped_callback(request, > *callback_args, **callback_kwargs) > > File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec > 36. return view_func(request, *args, **kwargs) > > File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec > 52. return view_func(request, *args, **kwargs) > > File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec > 36. return view_func(request, *args, **kwargs) > > File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec > 113. return view_func(request, *args, **kwargs) > > File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec > 84. return view_func(request, *args, **kwargs) > > File "/usr/lib/python2.7/site-packages/django/views/generic/base.py" in > view > 68. return self.dispatch(request, *args, **kwargs) > > File "/usr/lib/python2.7/site-packages/django/views/generic/base.py" in > dispatch > 88. return handler(request, *args, **kwargs) > > File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in get > 223. handled = self.construct_tables() > > File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in > construct_tables > 214. handled = self.handle_table(table) > > File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in > handle_table > 123. data = self._get_data_dict() > > File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in > _get_data_dict > 43. data.extend(func()) > > File "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py" in > wrapped > 109. value = cache[key] = func(*args, **kwargs) > > File > "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py" > in get_shares_data > 57. share_nets = manila.share_network_list(self.request) > > File "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py" in > share_network_list > 280. return > manilaclient(request).share_networks.list(detailed=detailed, > > Exception Type: AttributeError at /project/shares/ > Exception Value: 'NoneType' object has no attribute 'share_networks' > > > Anyone can help, please ? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Nov 29 14:03:49 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 29 Nov 2019 15:03:49 +0100 Subject: [stein][manila-ui] error In-Reply-To: References: Message-ID: Hi Carlos, I am installing via yum command Thanks Ignazio Il giorno ven 29 nov 2019 alle ore 14:46 Carlos Silva < ces.eduardo98 at gmail.com> ha scritto: > Hi, Ignazio! > > Could you please provide more information about how have you installed it? > It was via package, git, devstack? > I'm trying to reproduce the issue in my environment but I'm not able to. > > Regards, > Carlos Silva. > > Em sex., 29 de nov. de 2019 às 05:42, Ignazio Cassano < > ignaziocassano at gmail.com> escreveu: > >> Hello, >> I just installed openstack stein on centos. >> Manila works fine my command line but when I click "share" in the >> dashboard the following error appears: >> >> Environment: >> >> >> Request Method: GET >> Request URL: http://10.102.184.190/dashboard/project/shares/ >> >> Django Version: 1.11.20 >> Python Version: 2.7.5 >> Installed Applications: >> ['openstack_dashboard.dashboards.project', >> 'neutron_lbaas_dashboard', >> 'heat_dashboard', >> 'openstack_dashboard.dashboards.admin', >> 'openstack_dashboard.dashboards.identity', >> 'openstack_dashboard.dashboards.settings', >> 'dashboards', >> 'openstack_dashboard', >> 'django.contrib.contenttypes', >> 'django.contrib.auth', >> 'django.contrib.sessions', >> 'django.contrib.messages', >> 'django.contrib.staticfiles', >> 'django.contrib.humanize', >> 'django_pyscss', >> 'debreach', >> 'openstack_dashboard.django_pyscss_fix', >> 'compressor', >> 'horizon', >> 'openstack_auth'] >> Installed Middleware: >> ('openstack_auth.middleware.OpenstackAuthMonkeyPatchMiddleware', >> 'debreach.middleware.RandomCommentMiddleware', >> 'django.middleware.common.CommonMiddleware', >> 'django.middleware.csrf.CsrfViewMiddleware', >> 'django.contrib.sessions.middleware.SessionMiddleware', >> 'django.contrib.auth.middleware.AuthenticationMiddleware', >> 'horizon.middleware.OperationLogMiddleware', >> 'django.contrib.messages.middleware.MessageMiddleware', >> 'horizon.middleware.HorizonMiddleware', >> 'horizon.themes.ThemeMiddleware', >> 'django.middleware.locale.LocaleMiddleware', >> 'django.middleware.clickjacking.XFrameOptionsMiddleware', >> >> 'openstack_dashboard.contrib.developer.profiler.middleware.ProfilerClientMiddleware', >> >> 'openstack_dashboard.contrib.developer.profiler.middleware.ProfilerMiddleware') >> >> >> >> Traceback: >> >> File "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py" >> in inner >> 41. response = get_response(request) >> >> File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py" in >> _get_response >> 187. response = self.process_exception_by_middleware(e, >> request) >> >> File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py" in >> _get_response >> 185. response = wrapped_callback(request, >> *callback_args, **callback_kwargs) >> >> File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec >> 36. return view_func(request, *args, **kwargs) >> >> File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec >> 52. return view_func(request, *args, **kwargs) >> >> File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec >> 36. return view_func(request, *args, **kwargs) >> >> File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec >> 113. return view_func(request, *args, **kwargs) >> >> File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec >> 84. return view_func(request, *args, **kwargs) >> >> File "/usr/lib/python2.7/site-packages/django/views/generic/base.py" in >> view >> 68. return self.dispatch(request, *args, **kwargs) >> >> File "/usr/lib/python2.7/site-packages/django/views/generic/base.py" in >> dispatch >> 88. return handler(request, *args, **kwargs) >> >> File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in get >> 223. handled = self.construct_tables() >> >> File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in >> construct_tables >> 214. handled = self.handle_table(table) >> >> File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in >> handle_table >> 123. data = self._get_data_dict() >> >> File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in >> _get_data_dict >> 43. data.extend(func()) >> >> File "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py" in >> wrapped >> 109. value = cache[key] = func(*args, **kwargs) >> >> File >> "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py" >> in get_shares_data >> 57. share_nets = manila.share_network_list(self.request) >> >> File "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py" in >> share_network_list >> 280. return >> manilaclient(request).share_networks.list(detailed=detailed, >> >> Exception Type: AttributeError at /project/shares/ >> Exception Value: 'NoneType' object has no attribute 'share_networks' >> >> >> Anyone can help, please ? >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yumeng_bao at yahoo.com Thu Nov 28 06:20:18 2019 From: yumeng_bao at yahoo.com (yumeng bao) Date: Thu, 28 Nov 2019 14:20:18 +0800 Subject: =?utf-8?Q?Re:_[keystone][nova][barbican][neutron][cinder][tc][po?= =?utf-8?Q?licy]_Proposal_for_policy_popup_team=E2=81=A9?= References: <53324F16-6BB5-4738-AEA5-D4ED304027C0.ref@yahoo.com> Message-ID: <53324F16-6BB5-4738-AEA5-D4ED304027C0@yahoo.com> >> Please nominate Yumeng Bao (yumeng_bao at yahoo.com) as the liaison with >> Cyborg team. She will contribute the spec. We have informed Howard of >> this. > Thanks Sundar, I have replaced Howard with Yumeng as the liaison for cyborg. > (Side note - the wiki can be edited by anyone, and I am tracking changes in it, so anyone may feel free to change their project liaison or add or remove team members and I will be notified of the change.) > Colleen Thanks Colleen for the update and the note reminder ! I will update the cyborg team progress on wiki paget later. Regards, Yumeng From bansalnehal26 at gmail.com Thu Nov 28 14:47:49 2019 From: bansalnehal26 at gmail.com (Nehal Bansal) Date: Thu, 28 Nov 2019 20:17:49 +0530 Subject: [Tacker] [Mistral] [Network Service] Regarding parameters to Network Service Message-ID: Hi, Is there a way to pass parameters from Network Service Descriptor to the VNF Descriptor using a parameter file? Kindly advise as it would be of great help. Thank you. Regards, Nehal -------------- next part -------------- An HTML attachment was scrubbed... URL: From brijesh.1961 at hsc.com Fri Nov 29 03:07:47 2019 From: brijesh.1961 at hsc.com (Brijesh) Date: Fri, 29 Nov 2019 03:07:47 +0000 Subject: Neutron sfc not working in openstack stein Message-ID: Hi I have installed openstack stein using Kolla Ansible along with neutron sfc. Everything is working fine apart from SFC. I have compared all the configuration related to sfc and it seems fine with me. I was trying to create a SFC with following flow: VM1 --> VM2 --> VM3 When i am pinging from vm1 to vm3 without SFC creation everything is working fine. But when i try to ping after SFC i am not able to ping. I cann't see packet on vm2 (using tcpdump) on any interface after sfc. I tried to see packet drop on br-int and br-tun using openflow rule but i cannot see any packet drop on that also. I have one controller and one compute, so all instances are on same host. All vm have 2 port of same network one is ingress and one egress. vm1 has port as p11 and p12 vm2 has port as p13 and p14 vm3 has port as p15 and p16 10.0.0.174 (vm3 ip) openstack server list +--------------------------------------+------+--------+---------------------------------+--------+----------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+------+--------+---------------------------------+--------+----------+ | ea663fee-c0c4-479d-8d33-42014bd633af | x3 | ACTIVE | demo-net=10.0.0.174, 10.0.0.76 | xenial | m1.small | | 11203289-e008-4a71-adee-51c69546a18c | x2 | ACTIVE | demo-net=10.0.0.187, 10.0.0.126 | xenial | m1.small | | 1699e74c-706b-42b8-b340-7f7077b062d6 | x1 | ACTIVE | demo-net=10.0.0.122, 10.0.0.42 | xenial | m1.small | +--------------------------------------+------+--------+---------------------------------+--------+----------+ openstack port list |grep p1 | 1391dea1-32bc-42d8-aafa-d66f9003a816 | p13 | fa:16:3e:0b:08:7d | ip_address='10.0.0.187', subnet_id='949b8df3-9832-42fb-86c1-0483f57d31d8' | ACTIVE | | 3f6e9791-940a-4466-b085-521a65511bbb | p16 | fa:16:3e:85:54:ef | ip_address='10.0.0.76', subnet_id='949b8df3-9832-42fb-86c1-0483f57d31d8' | ACTIVE | | 6955e1b1-4fcf-4998-8791-4a49ed298dc5 | p14 | fa:16:3e:4d:4e:31 | ip_address='10.0.0.126', subnet_id='949b8df3-9832-42fb-86c1-0483f57d31d8' | ACTIVE | | a51175db-6272-4748-a6e6-3af719a33b86 | p12 | fa:16:3e:6a:da:60 | ip_address='10.0.0.42', subnet_id='949b8df3-9832-42fb-86c1-0483f57d31d8' | ACTIVE | | dad20742-67e4-47c7-a759-8025f042df45 | p15 | fa:16:3e:96:eb:f8 | ip_address='10.0.0.174', subnet_id='949b8df3-9832-42fb-86c1-0483f57d31d8' | ACTIVE | | db8b775f-363a-4a61-88f4-5db1b42a7681 | p11 | fa:16:3e:b6:a9:0e | ip_address='10.0.0.122', subnet_id='949b8df3-9832-42fb-86c1-0483f57d31d8' | ACTIVE | openstack sfc flow classifier list +--------------------------------------+------+-----------------------------------------------------------------+ | ID | Name | Summary | +--------------------------------------+------+-----------------------------------------------------------------+ | 20a58a64-8339-493f-8418-922c756c2b91 | FC11 | protocol: any, | | | | source[port]: any[any:any], | | | | destination[port]: 10.0.0.174/32[any:any], | | | | neutron_source_port: db8b775f-363a-4a61-88f4-5db1b42a7681, | | | | neutron_destination_port: dad20742-67e4-47c7-a759-8025f042df45, | | | | l7_parameters: {} | +--------------------------------------+------+-----------------------------------------------------------------+ openstack sfc port chain list +--------------------------------------+------+-----------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+------------------------------------------------+----------+ | ID | Name | Port Pair Groups | Flow Classifiers | Chain Parameters | Chain ID | +--------------------------------------+------+-----------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+------------------------------------------------+----------+ | f342d2cc-dbfe-4832-9e7c-b22cc9629891 | PC11 | [u'080395fa-21fe-4d5b-b9c2-9f196e93ac98', u'89110fed-59cd-4ac8-ac45-8ce954fa99c0', u'd16b65a4-4a23-4b38-a937-a57a7058eee0'] | [u'20a58a64-8339-493f-8418-922c756c2b91'] | {u'symmetric': False, u'correlation': u'mpls'} | 1 | +--------------------------------------+------+-----------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+------------------------------------- openstack sfc port pair group list +--------------------------------------+-------+-------------------------------------------+---------------------------------------------------------------------------------------------+-------------+ | ID | Name | Port Pair | Port Pair Group Parameters | Tap Enabled | +--------------------------------------+-------+-------------------------------------------+---------------------------------------------------------------------------------------------+-------------+ | 080395fa-21fe-4d5b-b9c2-9f196e93ac98 | PPG11 | [u'10f9e355-3680-4dca-809f-7628bbbebd7f'] | {u'lb_fields': [], u'ppg_n_tuple_mapping': {u'ingress_n_tuple': {}, u'egress_n_tuple': {}}} | False | | 89110fed-59cd-4ac8-ac45-8ce954fa99c0 | PPG12 | [u'22af2ffa-72d1-477c-9b68-2582cca22f79'] | {u'lb_fields': [], u'ppg_n_tuple_mapping': {u'ingress_n_tuple': {}, u'egress_n_tuple': {}}} | False | | d16b65a4-4a23-4b38-a937-a57a7058eee0 | PPG13 | [u'cf8d263e-abab-499a-b82b-b290e85b2831'] | {u'lb_fields': [], u'ppg_n_tuple_mapping': {u'ingress_n_tuple': {}, u'egress_n_tuple': {}}} | False | +--------------------------------------+-------+-------------------------------------------+----------------------------------- Please help. If anyone can. Launchpad Bug #1854327. Thanks and Regards Brijesh DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Nov 29 10:57:59 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 29 Nov 2019 11:57:59 +0100 Subject: [horizon][stein] Message-ID: Hello, I just installed stein on centos 7 and when I have a log of instances clicking "next" on horizon the following images appears. Please help me [image: image.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 209122 bytes Desc: not available URL: From michael.stang at dhbw-mannheim.de Fri Nov 29 14:19:32 2019 From: michael.stang at dhbw-mannheim.de (Michael Stang) Date: Fri, 29 Nov 2019 15:19:32 +0100 (CET) Subject: [mitaka][keystone] Authentication over keycloak server possible? Message-ID: <1713979160.118784.1575037172218@ox.dhbw-mannheim.de> Hi, we have an OpenStack Mitaka installation running (yes I know it's pretty old ;-) ) at our lab and would like to use the keycloak-server from the central IT for authentication. So I would like to know if it is already possible in mitaka to use this external keycloak server or if this only possible in a later OpenStack version? Maybe anyone know and if yes is there any documentation how to do it? Was searching for it but found not much about it by now... Thanks :) Kind regards Michael Michael Stang Laboringenieur, Dipl. Inf. (FH) Duale Hochschule Baden-Württemberg Mannheim Baden-Wuerttemberg Cooperative State University Mannheim ZeMath Zentrum für mathematisch-naturwissenschaftliches Basiswissen Fachbereich Informatik, Fakultät Technik Coblitzallee 1-9 68163 Mannheim michael.stang at dhbw-mannheim.de mailto:michael.stang at dhbw-mannheim.de http://www.mannheim.dhbw.de http://www.dhbw-mannheim.de/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 28323 bytes Desc: not available URL: From ignaziocassano at gmail.com Fri Nov 29 14:28:49 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 29 Nov 2019 15:28:49 +0100 Subject: [stein][manila-ui] error In-Reply-To: References: Message-ID: Hello, furthter information: in the dashboard admin secion if I push share it does not list shares . I the project section it gives the error I sent previously Il giorno ven 29 nov 2019 alle ore 15:03 Ignazio Cassano < ignaziocassano at gmail.com> ha scritto: > Hi Carlos, I am installing via yum command > Thanks > Ignazio > > Il giorno ven 29 nov 2019 alle ore 14:46 Carlos Silva < > ces.eduardo98 at gmail.com> ha scritto: > >> Hi, Ignazio! >> >> Could you please provide more information about how have you installed >> it? It was via package, git, devstack? >> I'm trying to reproduce the issue in my environment but I'm not able to. >> >> Regards, >> Carlos Silva. >> >> Em sex., 29 de nov. de 2019 às 05:42, Ignazio Cassano < >> ignaziocassano at gmail.com> escreveu: >> >>> Hello, >>> I just installed openstack stein on centos. >>> Manila works fine my command line but when I click "share" in the >>> dashboard the following error appears: >>> >>> Environment: >>> >>> >>> Request Method: GET >>> Request URL: http://10.102.184.190/dashboard/project/shares/ >>> >>> Django Version: 1.11.20 >>> Python Version: 2.7.5 >>> Installed Applications: >>> ['openstack_dashboard.dashboards.project', >>> 'neutron_lbaas_dashboard', >>> 'heat_dashboard', >>> 'openstack_dashboard.dashboards.admin', >>> 'openstack_dashboard.dashboards.identity', >>> 'openstack_dashboard.dashboards.settings', >>> 'dashboards', >>> 'openstack_dashboard', >>> 'django.contrib.contenttypes', >>> 'django.contrib.auth', >>> 'django.contrib.sessions', >>> 'django.contrib.messages', >>> 'django.contrib.staticfiles', >>> 'django.contrib.humanize', >>> 'django_pyscss', >>> 'debreach', >>> 'openstack_dashboard.django_pyscss_fix', >>> 'compressor', >>> 'horizon', >>> 'openstack_auth'] >>> Installed Middleware: >>> ('openstack_auth.middleware.OpenstackAuthMonkeyPatchMiddleware', >>> 'debreach.middleware.RandomCommentMiddleware', >>> 'django.middleware.common.CommonMiddleware', >>> 'django.middleware.csrf.CsrfViewMiddleware', >>> 'django.contrib.sessions.middleware.SessionMiddleware', >>> 'django.contrib.auth.middleware.AuthenticationMiddleware', >>> 'horizon.middleware.OperationLogMiddleware', >>> 'django.contrib.messages.middleware.MessageMiddleware', >>> 'horizon.middleware.HorizonMiddleware', >>> 'horizon.themes.ThemeMiddleware', >>> 'django.middleware.locale.LocaleMiddleware', >>> 'django.middleware.clickjacking.XFrameOptionsMiddleware', >>> >>> 'openstack_dashboard.contrib.developer.profiler.middleware.ProfilerClientMiddleware', >>> >>> 'openstack_dashboard.contrib.developer.profiler.middleware.ProfilerMiddleware') >>> >>> >>> >>> Traceback: >>> >>> File >>> "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py" in >>> inner >>> 41. response = get_response(request) >>> >>> File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py" in >>> _get_response >>> 187. response = >>> self.process_exception_by_middleware(e, request) >>> >>> File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py" in >>> _get_response >>> 185. response = wrapped_callback(request, >>> *callback_args, **callback_kwargs) >>> >>> File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec >>> 36. return view_func(request, *args, **kwargs) >>> >>> File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec >>> 52. return view_func(request, *args, **kwargs) >>> >>> File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec >>> 36. return view_func(request, *args, **kwargs) >>> >>> File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec >>> 113. return view_func(request, *args, **kwargs) >>> >>> File "/usr/lib/python2.7/site-packages/horizon/decorators.py" in dec >>> 84. return view_func(request, *args, **kwargs) >>> >>> File "/usr/lib/python2.7/site-packages/django/views/generic/base.py" in >>> view >>> 68. return self.dispatch(request, *args, **kwargs) >>> >>> File "/usr/lib/python2.7/site-packages/django/views/generic/base.py" in >>> dispatch >>> 88. return handler(request, *args, **kwargs) >>> >>> File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in get >>> 223. handled = self.construct_tables() >>> >>> File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in >>> construct_tables >>> 214. handled = self.handle_table(table) >>> >>> File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in >>> handle_table >>> 123. data = self._get_data_dict() >>> >>> File "/usr/lib/python2.7/site-packages/horizon/tables/views.py" in >>> _get_data_dict >>> 43. data.extend(func()) >>> >>> File "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py" in >>> wrapped >>> 109. value = cache[key] = func(*args, **kwargs) >>> >>> File >>> "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py" >>> in get_shares_data >>> 57. share_nets = manila.share_network_list(self.request) >>> >>> File "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py" in >>> share_network_list >>> 280. return >>> manilaclient(request).share_networks.list(detailed=detailed, >>> >>> Exception Type: AttributeError at /project/shares/ >>> Exception Value: 'NoneType' object has no attribute 'share_networks' >>> >>> >>> Anyone can help, please ? >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sshnaidm at redhat.com Fri Nov 29 15:02:57 2019 From: sshnaidm at redhat.com (Sagi Shnaidman) Date: Fri, 29 Nov 2019 17:02:57 +0200 Subject: [tripleo][ironic][ansible][openstack-ansible][ansible-sig] Ironic/Baremetal Ansible modules In-Reply-To: <126374281574955397@iva5-2a2172cb7cff.qloud-c.yandex.net> References: <826531574942720@sas8-ed615920eca2.qloud-c.yandex.net> <126374281574955397@iva5-2a2172cb7cff.qloud-c.yandex.net> Message-ID: Let's choose an appropriate time in the etherpad [1], starting from Tue next week to avoid a short notice. I'll send an update with exact time in Sunday 01 Dec. Thanks [1] https://etherpad.openstack.org/p/ironic-ansible-modules On Thu, Nov 28, 2019 at 5:48 PM Dmitriy Rabotyagov wrote: > We have SIG meetings pretty ocasionally nowadays and it was a while since > the last one. But we used to held them on Firdays at 2pm UTC in > #openstack-ansible-sig IRC channel. > > > 28.11.2019, 14:47, "Dmitry Tantsur" : > > Hi, > > On Thu, Nov 28, 2019 at 1:07 PM Dmitriy Rabotyagov > wrote: > > Hi, > > I feel that this might be a good topic to discuss in terms of ansible SIG > [1] and it can become the new home. > So maybe we can plan a meeting for closer and prodictive discussion? > > > I think it's a great idea. Do you have any formal meetings yet or should > we maybe schedule a separate one? > > Dmitry > > > > [1] https://etherpad.openstack.org/p/ansible-sig > > > 28.11.2019, 11:45, "Mark Goddard" : > > On Wed, 27 Nov 2019 at 17:58, Sagi Shnaidman > wrote: > >> Hi, all > >> > >> in the light of finding the new home place for openstack related > ansible modules [1] I'd like to discuss the best strategy to create Ironic > ansible modules. Existing Ironic modules in Ansible repo don't cover even > half of Ironic functionality, don't fit current needs and definitely > require an additional work. There are a few topics that require attention > and better be solved before modules are written to save additional work. We > prepared an etherpad [2] with all these questions and if you have ideas or > suggestions on how it should look you're welcome to update it. > >> We'd like to decide the final place for them, name conventions (the > most complex one!), what they should look like and how better to implement. > >> Anybody interested in Ansible and baremetal management in Openstack, > you're more than welcome to contribute. > > > > Thanks for raising this, we're definitely missing some key things for > > ironic. I added a couple of roles and modules that we developed for > > kayobe to the etherpad. Would be happy to contribute them to the > > collection. > > > >> Thanks > >> > >> [1] https://review.opendev.org/#/c/684740/ > >> [2] https://etherpad.openstack.org/p/ironic-ansible-modules > >> > >> -- > >> Best regards > >> Sagi Shnaidman > > -- > Kind Regards, > Dmitriy Rabotyagov > > > > > > -- > Kind Regards, > Dmitriy Rabotyagov > > -- Best regards Sagi Shnaidman -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnaud.morin at gmail.com Fri Nov 29 15:53:35 2019 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Fri, 29 Nov 2019 15:53:35 +0000 Subject: [neutron][nova] Rootwrap daemon and privsep Message-ID: <20191129155335.GC27522@sync> Hey, If I believe both the nova and neutron documentation: https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.use_rootwrap_daemon https://docs.openstack.org/neutron/latest/configuration/neutron.html#agent.root_helper_daemon At scale, we are supposed to enable the rootwrap-daemon option. I have issues enabling that, but before going further on my issue, I'd like to understand the difference with privsep daemon. Is privsep a new daemon which is supposed to replace the rootwrap one? Is privsep being launch after rootwrap? IS privsep enabled by default, so I should not care about rootwrap at all? I'd like to understand more about that. Everything I try to find on popular search engine seems outdated, so if someone could give me a hand on finding the good page to understand that, I'l love it :p Cheers, -- Arnaud Morin From arnaud.morin at gmail.com Fri Nov 29 15:56:33 2019 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Fri, 29 Nov 2019 15:56:33 +0000 Subject: [neutron][nova][large scale SIG] Rootwrap daemon and privsep In-Reply-To: <20191129155335.GC27522@sync> References: <20191129155335.GC27522@sync> Message-ID: <20191129155633.GD27522@sync> Adding [large scale SIG], sorry! -- Arnaud Morin On 29.11.19 - 15:53, Arnaud Morin wrote: > Hey, > > If I believe both the nova and neutron documentation: > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.use_rootwrap_daemon > https://docs.openstack.org/neutron/latest/configuration/neutron.html#agent.root_helper_daemon > > At scale, we are supposed to enable the rootwrap-daemon option. > I have issues enabling that, but before going further on my issue, I'd > like to understand the difference with privsep daemon. > > Is privsep a new daemon which is supposed to replace the rootwrap one? > Is privsep being launch after rootwrap? > IS privsep enabled by default, so I should not care about rootwrap at > all? > > I'd like to understand more about that. > Everything I try to find on popular search engine seems outdated, so if > someone could give me a hand on finding the good page to understand that, > I'l love it :p > > Cheers, > > -- > Arnaud Morin > From mriedemos at gmail.com Fri Nov 29 17:01:14 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 29 Nov 2019 11:01:14 -0600 Subject: [requirements][stable] Capping requirements in stable branches In-Reply-To: <20191128062714.ehtglq7jeo2dukyp@mthode.org> References: <20191127223525.4ldrrjoliuzbt6o3@mthode.org> <20191128062714.ehtglq7jeo2dukyp@mthode.org> Message-ID: <36d8f149-eeb4-6459-8d4b-a89075169340@gmail.com> On 11/28/2019 12:27 AM, Matthew Thode wrote: > For stable branch issues with projects not 'requirements' I'd refer you > to the stable policy / team (already tagged in the subject line). I > suspect that we'd need to know what the cap for each project / version > would be for each release (is it a major version bump?, minor?, etc). Capping is bad in general and we don't want to do it on stable branches. Capping one thing can lead to breaking something else, potentially transitively, which gets to be a huge mess to untangle and is what we (stable and QA teams) used to deal with all the time in OpenStack. This is why we have upper-constraints and downstream packagers/deployers should be following it as the blessed "this is what works and is tested upstream" version of packages. So right now upper-constraints on stable/train has: python-novaclient===15.1.0 So anyone packaging downstream should be aware of this and not try to use python-novaclient > 15.1.0 with train versions of the services (horizon, heat, etc). -- Thanks, Matt From thierry at openstack.org Fri Nov 29 17:17:46 2019 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 29 Nov 2019 18:17:46 +0100 Subject: [neutron][nova][large scale SIG] Rootwrap daemon and privsep In-Reply-To: <20191129155633.GD27522@sync> References: <20191129155335.GC27522@sync> <20191129155633.GD27522@sync> Message-ID: <43282856-ccd3-fb30-01a2-47e6a1814a06@openstack.org> Arnaud Morin wrote: > [...] I'd like to understand the difference with privsep daemon. > > Is privsep a new daemon which is supposed to replace the rootwrap one? > Is privsep being launch after rootwrap? > IS privsep enabled by default, so I should not care about rootwrap at > all? I can help with that, since I originally created rootwrap. Rootwrap is a privilege escalation control mechanism. It serves as a way to filter what the service user on the machine can execute as root via sudo. The idea is that sudoers files do not provide enough granularity, so instead of trying to describe what is allowed and what is not in sudoers file, we basically allow calling "sudo rootwrap command" and let rootwrap figure it out. Rootwrap reads a number of (root-owned) configuration files and decides to allow calling the command or not. There are two issues with this mechanism. The first is that the performance is not great, as first you run a python executable (rootwrap), which in turn spawns another process if the command is allowed. If you do that for hundreds of "ip" calls as you set up networking in neutron, this can add up pretty fast. The second issue is that rootwrap is only as secure as its configuration. If for example you configure rootwrap to allow the 'nova' user to run the "chmod" command, well that's basically the same as allowing it run anything as root. You have to use advanced filters to further refine what it can actually do based on command parameter analysis, and there is only so much you can control that way. Rootwrap-daemon is a way to partially help with the first issue. Rather than calling a new rootwrap Python process every time a command needs to be called, you maintain a long-running rootwrap process that will process all requests. It significantly improves performance, but it adds inter-process communication complexity (never great in a security system). And it does nothing to address the second issue. Privsep is the "right" way of addressing both issues. Rather than having the code try to call shell commands as root, privsep allows the code to call Python functions as root. This solves the performance issue, as you don't have the overhead of a separate Python process + shell process every time you want to change the ownership of a file, you can just call a Python function that will call os.chown() and get ear to syscall efficiency. It also solves the granularity issue, by allowing to call a function that will only do what you want to do, rather than have to find a way to filter parameters so that the command you call cannot be abused to do other things. The main issue with privsep is that it requires changing the code. You have to set it up in every project (it's now done for most), but then every place the service calls utils.execute(command, run_as_root=True) needs to be changed to call a privileged Python function instead. The second issue with privsep is that it still needs root to start. The way this is usually done is by using rootwrap itself to bootstrap privsep... which can be confusing. There are obviously other ways to start the process as root, but since most of those services still make use of rootwrap anyway, that is what continues to be used for the initial bootstrapping. Ideally services would be completely transitioned to privsep, and we would discontinue rootwrap. Hoping this helps, -- Thierry Carrez (ttx) From smooney at redhat.com Fri Nov 29 18:06:25 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 29 Nov 2019 18:06:25 +0000 Subject: [requirements][stable] Capping requirements in stable branches In-Reply-To: <36d8f149-eeb4-6459-8d4b-a89075169340@gmail.com> References: <20191127223525.4ldrrjoliuzbt6o3@mthode.org> <20191128062714.ehtglq7jeo2dukyp@mthode.org> <36d8f149-eeb4-6459-8d4b-a89075169340@gmail.com> Message-ID: <663552760798e16b50ca1d0c8155925d1503db64.camel@redhat.com> On Fri, 2019-11-29 at 11:01 -0600, Matt Riedemann wrote: > On 11/28/2019 12:27 AM, Matthew Thode wrote: > > For stable branch issues with projects not 'requirements' I'd refer you > > to the stable policy / team (already tagged in the subject line). I > > suspect that we'd need to know what the cap for each project / version > > would be for each release (is it a major version bump?, minor?, etc). > > Capping is bad in general and we don't want to do it on stable branches. > Capping one thing can lead to breaking something else, potentially > transitively, which gets to be a huge mess to untangle and is what we > (stable and QA teams) used to deal with all the time in OpenStack. This > is why we have upper-constraints and downstream packagers/deployers > should be following it as the blessed "this is what works and is tested > upstream" version of packages. > > So right now upper-constraints on stable/train has: > > python-novaclient===15.1.0 > > So anyone packaging downstream should be aware of this and not try to > use python-novaclient > 15.1.0 with train versions of the services > (horizon, heat, etc). unfortunetly that advice is not always followed but i agree that in general distros should try to follow upper-constraints where possible. for security reasons sometime distros have to move to a newer version but that is rare. in such cacses idealy the issue would be adress by another stable release of the depency upstream with a backport of the scurity fix. > From nate.johnston at redhat.com Fri Nov 29 18:31:48 2019 From: nate.johnston at redhat.com (Nate Johnston) Date: Fri, 29 Nov 2019 13:31:48 -0500 Subject: [neutron] Proposing Jakub Libosvar as Neutron core reviewer In-Reply-To: <207982A0-CBE7-47D9-A19C-CCFCACCB34EF@redhat.com> References: <207982A0-CBE7-47D9-A19C-CCFCACCB34EF@redhat.com> Message-ID: <16244BF0-3D00-4621-93CF-B32D3E3F5234@redhat.com> Wholehearted +1. Jakub is a great reference on many topics and he will be an essential bridge helping OVN’s merge into Neutron be sustainable now and in the future. Nate > On Nov 28, 2019, at 4:03 AM, Slawek Kaplonski wrote: > > Hi neutrinos, > > We already started process of migrating networking-ovn driver to be one of in-tree neutron drivers. Blueprint for that is [1]. > As part of this process I today proposed to include networking-ovn-core group into neutron-core group. Mail about it can be found at [2]. > One of persons in networking-ovn-group is Jakub Libosvar who was Neutron core for very long time in the past. He knows very well not only ovn related code but also have great knowledge about all Neutron code base. > So I would like to propose to Jakub as Neutron core reviewer again as he will be back working on neutron again now, after ovn will be in-tree driver. > What do You think about it? > I will wait for Your opinions for 1 week from now. Thx for all Your comments about it. > > [1] https://blueprints.launchpad.net/neutron/+spec/neutron-ovn-merge > [2] http://lists.openstack.org/pipermail/openstack-discuss/2019-November/011240.html > > — > Slawek Kaplonski > Senior software engineer > Red Hat > > From eandersson at blizzard.com Fri Nov 29 21:42:43 2019 From: eandersson at blizzard.com (Erik Olof Gunnar Andersson) Date: Fri, 29 Nov 2019 21:42:43 +0000 Subject: [neutron] security group api slow at scale Message-ID: I was wondering if there has been any work since Rocky to improve the performance of CRUD operations for security groups. When you have many thousands of rules the performance of simple operations gets exponentially slower. It got a little better after we applied the patches from this bug. https://bugs.launchpad.net/neutron/+bug/1810563 But in one of our environments getting security group rule lists, even when there are only 6-8 rules in that group takes almost 60 seconds. Best Regards, Erik Olof Gunnar Andersson -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Sat Nov 30 14:45:29 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Sat, 30 Nov 2019 08:45:29 -0600 Subject: [requirements][stable] Capping requirements in stable branches In-Reply-To: <663552760798e16b50ca1d0c8155925d1503db64.camel@redhat.com> References: <20191127223525.4ldrrjoliuzbt6o3@mthode.org> <20191128062714.ehtglq7jeo2dukyp@mthode.org> <36d8f149-eeb4-6459-8d4b-a89075169340@gmail.com> <663552760798e16b50ca1d0c8155925d1503db64.camel@redhat.com> Message-ID: <5f62a71b-f0d5-8c6e-50cb-b3b364317649@gmail.com> On 11/29/2019 12:06 PM, Sean Mooney wrote: > unfortunetly that advice is not always followed but i agree that in general > distros should try to follow upper-constraints where possible. for security > reasons sometime distros have to move to a newer version but that is rare. > in such cacses idealy the issue would be adress by another stable release of > the depency upstream with a backport of the scurity fix. Yup, distros are going to distro. Upper constraints is the OpenStack way of saying, "this is the known good combination of packages that are known to work with the current version of the code" and if you diverge from that then you're on your own. -- Thanks, Matt From douglas.j.schmidt at gmail.com Sat Nov 30 14:53:06 2019 From: douglas.j.schmidt at gmail.com (doug schmidt) Date: Sat, 30 Nov 2019 09:53:06 -0500 Subject: Openstack - Rocky Install Message-ID: Hi, I'm having an issue with httpd not starting on my install. I have installed rocky on a windows 10 laptop with virtualbox and centos 7 guests. So far I have 2 controllers and 3 compute nodes. I have followed the install docs for rocky. https://docs.openstack.org/install-guide/openstack-services.html#minimal-deployment-for-rocky Minimal services have been installed and configured.keystone, glance, nova, neutron are configured and running. I followed the install for horizon dashboard, and that is where I'm getting httpd.service failing. I have gone over the configuration a few times, but I can not find what the issue is. Any ideas of where to look or what to fix? Thanks for any help with this ------------------------ [root at openstack-cntr1 ~]# systemctl | egrep -i 'openstack|neutron|httpd' ● httpd.service loaded failed failed The Apache HTTP Server neutron-dhcp-agent.service loaded active running OpenStack Neutron DHCP Agent neutron-linuxbridge-agent.service loaded active running OpenStack Neutron Linux Bridge Agent neutron-metadata-agent.service loaded active running OpenStack Neutron Metadata Agent neutron-server.service loaded active running OpenStack Neutron Server openstack-glance-api.service loaded active running OpenStack Image Service (code-named Glance) API server openstack-glance-registry.service loaded active running OpenStack Image Service (code-named Glance) Registry server openstack-nova-api.service loaded active running OpenStack Nova API Server openstack-nova-conductor.service loaded active running OpenStack Nova Conductor Server openstack-nova-consoleauth.service loaded active running OpenStack Nova VNC console auth Server openstack-nova-novncproxy.service loaded active running OpenStack Nova NoVNC Proxy Server openstack-nova-scheduler.service loaded active running OpenStack Nova Scheduler Server [root at openstack-cntr1 ~]# systemctl start httpd.service Job for httpd.service failed because the control process exited with error code. See "systemctl status httpd.service" and "journalctl -xe" for details. [root at openstack-cntr1 ~]# systemctl status httpd.service ● httpd.service - The Apache HTTP Server Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/httpd.service.d └─openstack-dashboard.conf Active: failed (Result: exit-code) since Sat 2019-11-30 09:45:46 EST; 15s ago Docs: man:httpd(8) man:apachectl(8) Process: 12855 ExecStartPre=/usr/bin/python /usr/share/openstack-dashboard/manage.py collectstatic --noinput --clear -v0 (code=exited, status=1/FAILURE) Nov 30 09:45:46 openstack-cntr1 python[12855]: File "/usr/share/openstack-dashboard/openstack_dashboard/settings.py", line 376, in Nov 30 09:45:46 openstack-cntr1 python[12855]: from local.local_settings import * # noqa: F403,H303 Nov 30 09:45:46 openstack-cntr1 python[12855]: File "/usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.py", line 399 Nov 30 09:45:46 openstack-cntr1 python[12855]: 'supported_vnic_types': ['*'], Nov 30 09:45:46 openstack-cntr1 python[12855]: ^ Nov 30 09:45:46 openstack-cntr1 python[12855]: IndentationError: unexpected indent Nov 30 09:45:46 openstack-cntr1 systemd[1]: httpd.service: control process exited, code=exited status=1 Nov 30 09:45:46 openstack-cntr1 systemd[1]: Failed to start The Apache HTTP Server. Nov 30 09:45:46 openstack-cntr1 systemd[1]: Unit httpd.service entered failed state. Nov 30 09:45:46 openstack-cntr1 systemd[1]: httpd.service failed. [root at openstack-cntr1 ~]# -------------- next part -------------- An HTML attachment was scrubbed... URL: From berndbausch at gmail.com Sat Nov 30 23:02:12 2019 From: berndbausch at gmail.com (Bernd Bausch) Date: Sun, 1 Dec 2019 00:02:12 +0100 Subject: Openstack - Rocky Install In-Reply-To: References: Message-ID: <9A11D218-CCE8-44B5-9C6A-BACD28AA90EB@gmail.com> The problem seems to be a Python syntax error in line 399 in local_settings.py: Nov 30 09:45:46 openstack-cntr1 python[12855]: File "/usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.py", line 399 Nov 30 09:45:46 openstack-cntr1 python[12855]: 'supported_vnic_types': ['*'], Nov 30 09:45:46 openstack-cntr1 python[12855]: ^ Nov 30 09:45:46 openstack-cntr1 python[12855]: IndentationError: unexpected indent Bernd > On Nov 30, 2019, at 15:53, doug schmidt wrote: > > Nov 30 09:45:46 openstack-cntr1 python[12855]: File "/usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.py", line 399 > Nov 30 09:45:46 openstack-cntr1 python[12855]: 'supported_vnic_types': ['*'], > Nov 30 09:45:46 openstack-cntr1 python[12855]: ^ > Nov 30 09:45:46 openstack-cntr1 python[12855]: IndentationError: unexpected indent