From flux.adam at gmail.com Fri Nov 1 02:29:44 2019 From: flux.adam at gmail.com (Adam Harwell) Date: Thu, 31 Oct 2019 19:29:44 -0700 Subject: [ospurge] looking for project owners / considering adoption In-Reply-To: <576E74EB-ED80-497F-9706-482FE0433208@gmail.com> References: <342983ed-1d22-8f3a-3335-f153512ec2b2@catalyst.net.nz> <576E74EB-ED80-497F-9706-482FE0433208@gmail.com> Message-ID: Yeah, I've halted work for now until the summit when I can talk to other folks (at the related meeting that is scheduled) -- and I believe I agree that merging this functionality into the SDK is the best path forward. I will probably be able to assist somewhat with that as well. I wish I knew about the discussions that happened last year, I could already have been working on that... T_T --Adam On Wed, Oct 30, 2019 at 7:43 AM Artem Goncharov wrote: > Hi Adam, > > Since I need this now as well I will start working on implementation how > it was agreed (in SDK and in OSC) during last summit by mid of November. > There is no need for discussing this further, it just need to be > implemented. Sad that we got no progress in half a year. > > Regards, > Artem (gtema). > > On 30. Oct 2019, at 14:26, Adam Harwell wrote: > > That's too bad that you won't be at the summit, but I think there may > still be some discussion planned about this topic. > > Yeah, I understand completely about priorities and such internally. Same > for me... It just happens that this IS priority work for us right now. :) > > > On Tue, Oct 29, 2019, 07:48 Adrian Turjak wrote: > >> My apologies I missed this email. >> >> Sadly I won't be at the summit this time around. There may be some public >> cloud focused discussions, and some of those often have this topic come up. >> Also if Monty from the SDK team is around, I'd suggest finding him and >> having a chat. >> >> I'll help if I can but we are swamped with internal work and I can't >> dedicate much time to do upstream work that isn't urgent. :( >> On 17/10/19 8:48 am, Adam Harwell wrote: >> >> That's interesting -- we have already started working to add features and >> improve ospurge, and it seems like a plenty useful tool for our needs, but >> I think I agree that it would be nice to have that functionality built into >> the sdk. I might be able to help with both, since one is immediately useful >> and we (like everyone) have deadlines to meet, and the other makes sense to >> me as a possible future direction that could be more widely supported. >> >> Will you or someone else be hosting and discussion about this at the >> Shanghai summit? I'll be there and would be happy to join and discuss. >> >> --Adam >> >> On Tue, Oct 15, 2019, 22:04 Adrian Turjak >> wrote: >> >>> I tried to get a community goal to do project deletion per project, but >>> we ended up deciding that a community goal wasn't ideal unless we did >>> build a bulk delete API in each service: >>> https://review.opendev.org/#/c/639010/ >>> https://etherpad.openstack.org/p/community-goal-project-deletion >>> https://etherpad.openstack.org/p/DEN-Deletion-of-resources >>> https://etherpad.openstack.org/p/DEN-Train-PublicCloudWG-brainstorming >>> >>> What we decided on, but didn't get a chance to work on, was building >>> into the OpenstackSDK OS-purge like functionality, as well as reporting >>> functionality (of all project resources to be deleted). That way we >>> could have per project per resource deletion logic, and all of that >>> defined in the SDK. >>> >>> I was up for doing some of the work, but ended up swamped with internal >>> work and just didn't drive or push for the deletion work upstream. >>> >>> If you want to do something useful, don't pursue OS-Purge, help us add >>> that official functionality to the SDK, and then we can push for bulk >>> deletion APIs in each project to make resource deletion more pleasant. >>> >>> I'd be happy to help with the work, and Monty on the SDK team will most >>> likely be happy to as well. :) >>> >>> Cheers, >>> Adrian >>> >>> On 1/10/19 11:48 am, Adam Harwell wrote: >>> > I haven't seen much activity on this project in a while, and it's been >>> > moved to opendev/x since the opendev migration... Who is the current >>> > owner of this project? Is there anyone who actually is maintaining it, >>> > or would mind if others wanted to adopt the project to move it forward? >>> > >>> > Thanks, >>> > --Adam Harwell >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From colleen at gazlene.net Fri Nov 1 02:49:37 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Thu, 31 Oct 2019 19:49:37 -0700 Subject: [dev][ops][ptg][keystone] Join the keystone onboarding session! Message-ID: <7e0350e7-d249-4f5c-8a54-50c883bfb350@www.fastmail.com> Hello Stackers, If you're a developer, technical writer, operator, or user and interested in getting involved in the keystone project, stop by the keystone onboarding session in Shanghai next week! We will be at the Kilo table in the Blue Room on Wednesday from 9 to 10:30. The format will be open ended, so come with all your questions about how you can participate on the keystone team. Can't make it to the session? Take a look at our contributing guide[1] and feel free to get in touch with me directly. Colleen Murphy / cmurphy (keystone PTL) [1] https://docs.openstack.org/keystone/latest/contributor/how-can-i-help.html From zhang.lei.fly+os-discuss at gmail.com Fri Nov 1 03:42:16 2019 From: zhang.lei.fly+os-discuss at gmail.com (Jeffrey Zhang) Date: Fri, 1 Nov 2019 11:42:16 +0800 Subject: [kolla][ptg] Kolla Ussuri PVPTG (Partially Virtual PTG) In-Reply-To: <196d5a99-53f4-68c7-d28d-b6962abb8b3b@linaro.org> References: <196d5a99-53f4-68c7-d28d-b6962abb8b3b@linaro.org> Message-ID: Zoom is usable in China right now. On Thu, Oct 31, 2019 at 4:58 PM Marcin Juszkiewicz < marcin.juszkiewicz at linaro.org> wrote: > W dniu 30.10.2019 o 23:23, Kendall Nelson pisze: > > > If people were going to be in Shanghai for the Summit (or live in > > China) they wouldn't be able to participate because of the firewall. > > Can you (or someone else present in Poland) provide an alternative > > solution to Google meet so that everyone interested could join? > > Tell us which of them work for you: > > - Bluejeans > - Zoom > > > As I have access to both platforms at work. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yamamoto at midokura.com Fri Nov 1 05:24:51 2019 From: yamamoto at midokura.com (Takashi Yamamoto) Date: Fri, 1 Nov 2019 14:24:51 +0900 Subject: [neutron][ptg] Team dinner In-Reply-To: <20191030211537.trgnve7df27g3jh4@skaplons-mac> References: <20191030211537.trgnve7df27g3jh4@skaplons-mac> Message-ID: hi, On Thu, Oct 31, 2019 at 6:19 AM Slawek Kaplonski wrote: > > Hi neutrinos, > > Thanks to LIU Yulong who helped me a lot to choose and book some restaurant, we > have now booked restaurant: > > Expo source B2, No.168, Shangnan Road, Pudong New Area, Shanghai, TEL: +86 21 > 58882117 > 书院人家(世博源店) 上海市浦东新区上南路168号世博源B2 > The Dianping page: http://www.dianping.com/shop/20877292 > > Dinner is scheduled to Tuesday, 5th Nov at 6pm. > > Restaurant is close to the Expo center. It's about 15 minutes walk according to > the Google maps: https://tinyurl.com/y2rc83ej google maps is quite inaccurate in China. https://j.map.baidu.com/e8/6h https://router.map.qq.com/short?l=8af77e82c3b0deff1adeffb171fedf19 > > -- > Slawek Kaplonski > Senior software engineer > Red Hat > > From cdent+os at anticdent.org Fri Nov 1 10:34:50 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Fri, 1 Nov 2019 10:34:50 +0000 (GMT) Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> Message-ID: On Thu, 31 Oct 2019, Matt Riedemann wrote: > After that we call the scheduler to find a host and that only takes about 1 > second. That POST /allocations call to placement shouldn't be taking around 3 > minutes so something crazy is going on there. Yeah, this is indeed weird and wrong. There are two obvious ways, in the placement code, that _set_allocations (which underlies all the allocation changing requests) could be slowed down: * If the resource provider's generation has changed during the request, because something else is also writing allocations at the same time, there will be up to 10 retries, server-side. * For each of those tries, there's (apparently) a chance of a db deadlock, because the method is guarded by a retry on deadlock. We added that in a long long time ago. However, if the generation retry was happening, it would show up in the logs as "Retrying allocations write on resource provider". And if we were getting deadlock we ought to see "Performing DB retry for function". There could be other non-obvious things, but more data required... (more) > Oct 31 16:52:24.721346 ubuntu-bionic-inap-mtl01-0012620879 > devstack at placement-api.service[8591]: DEBUG placement.requestlog > [req-275af2df-bd4e-4e64-b46e-6582e8de5148 > req-295f7350-2f0b-4f85-8cdf-d76801637221 None None] Starting request: > 198.72.124.104 "POST /placement/allocations" {{(pid=8593) __call__ > /opt/stack/placement/placement/requestlog.py:61}} We start the requestlog at the very beginning of the request so there are a few steps between here and the actual data interaction, so another possible angle of investigation here is that keystonemiddleware was being slow to validate a token for some reason. > I know Chris Dent has done a lot of profiling on placement recently but I'm > not sure if much has been done around profiling the POST /allocations call to > move allocations from one consumer to another. You're right that little was done to profile allocations, mostly because initial experiments showed that it was fast and get /a_c was slow. In the fullness of time I probably would have moved on to allocations but reality intervened. What's probably most important is evaluating allocation writes to the same set of several resource providers under high concurrency. However it doesn't seem like that is what is happening in this case (otherwise I'd expect to see more evidence in the logs). Instead it's "just being slow" which could be any of: * something in the auth process (which if other services are also being slow could be a unifying thing) * the database server having a sad (do we keep a slow query log?) * the vm/vms has/have a noisy neighbor Do we know anything about cpu and io stats during the slow period? We've (sorry, let me rephrase, I've) known for the entire 5.5 years I've been involved in OpenStack that the CI resources are way over-subscribed and controller nodes in the wild are typically way under-specified. Yes, our code should be robust in the face of that, but... Of course, I could be totally wrong, there could be something flat out wrong in the placement code, but if that were the case I'd expect (like Matt did) to see it more often. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From mriedemos at gmail.com Fri Nov 1 13:30:59 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 1 Nov 2019 08:30:59 -0500 Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> Message-ID: On 11/1/2019 5:34 AM, Chris Dent wrote: > Instead it's "just being slow" which could be any of: > > * something in the auth process (which if other services are also >   being slow could be a unifying thing) > * the database server having a sad (do we keep a slow query log?) Not by default but it can be enabled in devstack, like [1] but the resulting log file is so big I can't open it in my browser. So hitting this kind of issue with that enabled is going to be like finding a needle in a haystack I think. > * the vm/vms has/have a noisy neighbor This is my guess as to the culprit. Like Slawek said elsewhere, he's seen this in other APIs that are otherwise really fast. > > Do we know anything about cpu and io stats during the slow period? We have peakmemtracker [2] which shows a jump during the slow period noticed: Oct 31 16:51:43.132081 ubuntu-bionic-inap-mtl01-0012620879 memory_tracker.sh[26186]: [iscsid (pid:18546)]=34352KB; [dmeventd (pid:18039)]=119732KB; [ovs-vswitchd (pid:3626)]=691960KB Oct 31 16:51:43.133012 ubuntu-bionic-inap-mtl01-0012620879 memory_tracker.sh[26186]: ]]] Oct 31 16:55:23.220099 ubuntu-bionic-inap-mtl01-0012620879 memory_tracker.sh[26186]: [[[ Oct 31 16:55:23.221255 ubuntu-bionic-inap-mtl01-0012620879 memory_tracker.sh[26186]: Thu Oct 31 16:55:23 UTC 2019 I don't know what that means though. We also have dstat [3] (which I find hard as hell to read - do people throw that into a nice graphical tool to massage that data for inspection?) which shows a jump as well: Oct 31 16:53:31.515005 ubuntu-bionic-inap-mtl01-0012620879 dstat.sh[25748]: 31-10 16:53:31| 19 7 12 61 0|5806M 509M 350M 1194M| 612B 0 | 0 0 | 0 0 |1510 2087 |17.7 9.92 5.82|3.0 13 2.0| 0 0 |qemu-system-x86 24768 13% 0 0 |python2 25797 99k 451B 0%|mysqld 464M| 20M 8172M| 37 513 0 4 12 Oct 31 16:55:08.604634 ubuntu-bionic-inap-mtl01-0012620879 dstat.sh[25748]: 31-10 16:53:32| 20 7 12 61 0|5806M 509M 350M 1194M|1052B 0 | 0 0 | 0 0 |1495 2076 |17.7 9.92 5.82|4.0 13 0| 0 0 |qemu-system-x86 24768 13% 0 0 |python2 25797 99k 446B0.1%|mysqld 464M| 20M 8172M| 37 513 0 4 12 Looks like around the time of slowness mysqld is the top consuming process which is probably not surprising. > > We've (sorry, let me rephrase, I've) known for the entire 5.5 years > I've been involved in OpenStack that the CI resources are way > over-subscribed and controller nodes in the wild are typically way > under-specified.  Yes, our code should be robust in the face of > that, but... Looking at logstash this mostly hits on OVH and INAP nodes. Question to infra: do we know if those are more oversubscribed than some others like RAX or VEXXHOST nodes? [1] https://review.opendev.org/#/c/691995/ [2] https://13cf3dd11b8f009809dc-97cb3b32849366f5bed744685e46b266.ssl.cf5.rackcdn.com/692206/3/check/tempest-integrated-compute/35ecb4a/controller/logs/screen-peakmem_tracker.txt.gz [3] https://13cf3dd11b8f009809dc-97cb3b32849366f5bed744685e46b266.ssl.cf5.rackcdn.com/692206/3/check/tempest-integrated-compute/35ecb4a/controller/logs/screen-dstat.txt.gz -- Thanks, Matt From mriedemos at gmail.com Fri Nov 1 13:32:52 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 1 Nov 2019 08:32:52 -0500 Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> Message-ID: On 11/1/2019 8:30 AM, Matt Riedemann wrote: > Looks like around the time of slowness mysqld is the top consuming > process which is probably not surprising. Oh I also see this in the mysql error log [1] around the time of the slow period, right after things pick back up: 2019-10-31T16:55:08.603773Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 161281ms. The settings might not be optimal. (flushed=201 and evicted=0, during the time.) So obviously something is going very wrong with mysql (or the node in general) during that time. [1] https://13cf3dd11b8f009809dc-97cb3b32849366f5bed744685e46b266.ssl.cf5.rackcdn.com/692206/3/check/tempest-integrated-compute/35ecb4a/controller/logs/mysql/error_log.txt.gz -- Thanks, Matt From cboylan at sapwetik.org Fri Nov 1 14:55:08 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Fri, 01 Nov 2019 07:55:08 -0700 Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> Message-ID: <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> On Fri, Nov 1, 2019, at 6:30 AM, Matt Riedemann wrote: > On 11/1/2019 5:34 AM, Chris Dent wrote: > > Instead it's "just being slow" which could be any of: > > > > * something in the auth process (which if other services are also > >   being slow could be a unifying thing) > > * the database server having a sad (do we keep a slow query log?) > > Not by default but it can be enabled in devstack, like [1] but the > resulting log file is so big I can't open it in my browser. So hitting > this kind of issue with that enabled is going to be like finding a > needle in a haystack I think. > > > * the vm/vms has/have a noisy neighbor > > This is my guess as to the culprit. Like Slawek said elsewhere, he's > seen this in other APIs that are otherwise really fast. > > > > > Do we know anything about cpu and io stats during the slow period? > > We have peakmemtracker [2] which shows a jump during the slow period > noticed: > > Oct 31 16:51:43.132081 ubuntu-bionic-inap-mtl01-0012620879 > memory_tracker.sh[26186]: [iscsid (pid:18546)]=34352KB; [dmeventd > (pid:18039)]=119732KB; [ovs-vswitchd (pid:3626)]=691960KB > Oct 31 16:51:43.133012 ubuntu-bionic-inap-mtl01-0012620879 > memory_tracker.sh[26186]: ]]] > Oct 31 16:55:23.220099 ubuntu-bionic-inap-mtl01-0012620879 > memory_tracker.sh[26186]: [[[ > Oct 31 16:55:23.221255 ubuntu-bionic-inap-mtl01-0012620879 > memory_tracker.sh[26186]: Thu Oct 31 16:55:23 UTC 2019 > > I don't know what that means though. > > We also have dstat [3] (which I find hard as hell to read - do people > throw that into a nice graphical tool to massage that data for > inspection?) which shows a jump as well: I put the dstat csv files in https://lamada.eu/dstat-graph/ and that works reasonably well. > > Oct 31 16:53:31.515005 ubuntu-bionic-inap-mtl01-0012620879 > dstat.sh[25748]: 31-10 16:53:31| 19 7 12 61 0|5806M 509M 350M > 1194M| 612B 0 | 0 0 | 0 0 |1510 2087 |17.7 9.92 5.82|3.0 > 13 2.0| 0 0 |qemu-system-x86 24768 13% 0 0 |python2 > 25797 99k 451B 0%|mysqld 464M| 20M 8172M| 37 513 > 0 4 12 > Oct 31 16:55:08.604634 ubuntu-bionic-inap-mtl01-0012620879 > dstat.sh[25748]: 31-10 16:53:32| 20 7 12 61 0|5806M 509M 350M > 1194M|1052B 0 | 0 0 | 0 0 |1495 2076 |17.7 9.92 5.82|4.0 > 13 0| 0 0 |qemu-system-x86 24768 13% 0 0 |python2 > 25797 99k 446B0.1%|mysqld 464M| 20M 8172M| 37 513 > 0 4 12 > > Looks like around the time of slowness mysqld is the top consuming > process which is probably not surprising. > > > > > We've (sorry, let me rephrase, I've) known for the entire 5.5 years > > I've been involved in OpenStack that the CI resources are way > > over-subscribed and controller nodes in the wild are typically way > > under-specified.  Yes, our code should be robust in the face of > > that, but... > > Looking at logstash this mostly hits on OVH and INAP nodes. Question to > infra: do we know if those are more oversubscribed than some others like > RAX or VEXXHOST nodes? I believe both OVH and INAP give us dedicated hypervisors. This means that we will end up being our own noisy neighbors there. I don't know what level we oversubscribe at but amorin (OVH) and benj_ (INAP) can probably provide more info. INAP was also recently turned back on. It had been offline for redeployment and that was completed and added back to the pool. Possible that more than just the openstack version has changed? OVH controls the disk IOPs that we get pretty aggressively as well. Possible it is an IO thing? > > [1] https://review.opendev.org/#/c/691995/ > [2] > https://13cf3dd11b8f009809dc-97cb3b32849366f5bed744685e46b266.ssl.cf5.rackcdn.com/692206/3/check/tempest-integrated-compute/35ecb4a/controller/logs/screen-peakmem_tracker.txt.gz > [3] > https://13cf3dd11b8f009809dc-97cb3b32849366f5bed744685e46b266.ssl.cf5.rackcdn.com/692206/3/check/tempest-integrated-compute/35ecb4a/controller/logs/screen-dstat.txt.gz From mriedemos at gmail.com Fri Nov 1 15:22:45 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 1 Nov 2019 10:22:45 -0500 Subject: State of the Gate (placement?) In-Reply-To: <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> Message-ID: On 11/1/2019 9:55 AM, Clark Boylan wrote: > OVH controls the disk IOPs that we get pretty aggressively as well. Possible it is an IO thing? Yeah, so looking at the dstat output in that graph (thanks for pointing out that site, really nice) we basically have 0 I/O from 16:53 to 16:55, so uh, that's probably not good. -- Thanks, Matt From rfolco at redhat.com Fri Nov 1 16:45:53 2019 From: rfolco at redhat.com (Rafael Folco) Date: Fri, 1 Nov 2019 13:45:53 -0300 Subject: [tripleo] TripleO CI Summary: Sprint 38 Message-ID: Greetings, The TripleO CI team has just completed Sprint 38 / Unified Sprint 17 (Oct 10 thru Oct 30). The following is a summary of completed work during this sprint cycle: - Tested the temporary manifest implementation in the new promoter for not breaking promotion workflow. Issues to be bootstrapped in the next sprint. - Implemented CI jobs in zuul to build and run tests against ceph-ansible and podman pull requests in github. A PoC has been created to validate the usage of zuul-distro-jobs project for dealing with RPM builds. - Ceph-ansible: initial standalone job added to test pull requests - Podman: integration with RDO software factory is done - Closed-out Train release branching in both upstream and periodic realms as part of the mid-cycle technical debt. - Addressed required changes for building a CentOS8 node for upcoming distro release support across TripleO CI jobs. - All RDO third party and upstream multinode (master/train) jobs are now moved to os_tempest ansible role provided by Openstack-ansible team. - (Pushed to next sprint): Improve tests for verifying a full promotion workflow running on the staging environment. The planned work for the next sprint [1] are: - Evaluate and implement CI jobs in Zuul that deal with RPM build artifacts for ceph-ansible and podman 3rd party testing. - Design and create a PoC for individual component testing in the promotion pipeline. This effort will add an additional verification layer to check OpenStack components (compute, networking, storage, etc) with stable builds, and ease root cause determination when it breaks the code. - Continue to improve and fix the new promotion code by deploying and bootstrapping an isolated promoter server. The Ruck and Rover for this sprint are Sagi Shnaidman (sshnaidm) and Ronelle Landy (rlandy). Please direct questions or queries to them regarding CI status or issues in #tripleo, ideally to whomever has the ‘|ruck’ suffix on their nick. Ruck/rover notes are being tracked in etherpad [2]. Thanks, rfolco [1] https://tree.taiga.io/project/tripleo-ci-board/taskboard/unified-sprint-18 [2] https://etherpad.openstack.org/p/ruckroversprint18 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Fri Nov 1 20:35:46 2019 From: mihalis68 at gmail.com (Chris Morgan) Date: Fri, 1 Nov 2019 16:35:46 -0400 Subject: [ops] preliminary ops meetup proposal, jan 2020, London Message-ID: Please see https://twitter.com/osopsmeetup/status/1190365967106412544?s=20 We find this method of getting news out about operators meetups gets good engagement, so please also follow if of interest to you. Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Fri Nov 1 21:36:38 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Fri, 1 Nov 2019 17:36:38 -0400 Subject: [cinder] changing the weekly meeting time In-Reply-To: References: Message-ID: <72310d46-d878-de89-c04f-2926c8e0f016@gmail.com> On 10/30/19 10:45 PM, Fang, Liang A wrote: > Hi Brian, > > I agree with Rajat. Meeting happened in mid-night will prevent people to > attend, except they have topics that must to talk with team. > > Openstack is widely used in Asia, but there’s no much core from Asia (in > countries that the meeting in mid-night). Meeting time is one of the > reason😊 I agree, the current meeting time is not Asia-friendly. If we move the meeting 2 hours earlier (14:00 UTC), that's 6:00 a.m. on USA Pacific Coast, which is on the verge of not being Silicon Valley-friendly. So it would be good to hear from our Silicon Valley time zone contributors (we have at least 3 who attend fairly regularly). > > Nova have two meetings, two time zone friendly. But I don’t like to > divide the meeting to two. > I agree, I'd prefer not to do this unless there's no other way to work out a suitable time. Here's a chart that shows you what we're up against: https://www.timeanddate.com/worldclock/meetingtime.html?iso=20191120&p1=224&p2=179&p3=141&p4=44&p5=237&p6=248 We discussed this issue briefly at this week's Cinder meeting, and since we're hoping to pick up some new contributors at the Summit/PTG next week, we agreed to take a poll after the PTG so that new contributors can represent their respective time zones. In the meantime, it would be good to keep this ML discussion going -- maybe someone will have a creative idea for a solution that won't adversely impact any contributors!   > > Regards > > Liang > > *From:* Rajat Dhasmana > *Sent:* Thursday, October 31, 2019 12:54 AM > *To:* Brian Rosmaita > *Cc:* openstack-discuss at lists.openstack.org > *Subject:* Re: [cinder] changing the weekly meeting time > >   > > Hi Brian, > >   > > It's great that the change in weekly meeting time is considered, here > are my opinions on the same from the perspective of Asian countries > (having active upstream developers) > >   > > Current meeting time (16:00 - 17:00 UTC) > >   > > INDIA : is 9:30 - 10:30 PM IST (UTC+05:30) is a little late but manageable. > >   > > CHINA : is 12:00 - 01:00 AM CST (UTC+08:00) is almost impossible to attend. > >   > > JAPAN : is 01:00 - 02:00 AM JST (UTC+09:00) similar to the case as China. > >   > > IMO shifting the meeting time 2 hours earlier (UTC 14:00) might bring > more participation and would ease out timings for some (including me) > but these are just my thoughts. > >   > > Thanks and Regards > > Rajat Dhasmana > >   > > On Thu, Oct 24, 2019 at 3:05 AM Brian Rosmaita > > wrote: > > (Just to be completely clear -- we're only gathering information at this > point.  The Cinder weekly meeting is still Wednesdays at 16:00 UTC.) > > As we discussed at today's meeting [0], a request has been made to hold > the weekly meeting earlier so that it would be friendlier for people in > Asia time zones. > > Based on the people in attendance today, it seems that a move to 14:00 > UTC is not out of the question. > > Thus, the point of this email is to solicit comments on whether we > should change the meeting time to 14:00 UTC.  As you consider the impact > on yourself, if you are in a TZ that observes Daylight Savings Time, > keep in mind that most TZs go back to standard time over the next > few weeks. > > (I was going to insert an opinion here, but I will wait and respond in > this thread like everyone else.) > > cheers, > brian > > > [0] > http://eavesdrop.openstack.org/meetings/cinder/2019/cinder.2019-10-23-16.00.log.html#l-166 > From rosmaita.fossdev at gmail.com Fri Nov 1 21:58:23 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Fri, 1 Nov 2019 17:58:23 -0400 Subject: [cinder][ptg] team dinner in Shanghai Message-ID: <3ee6b488-3e54-64fc-039b-2ca0060f467b@gmail.com> This event is open to anyone who will be in Shanghai on Thursday 7 November and who has a constructive interest in Cinder. I know we have a few contributors in Shanghai who won't be at the PTG; hopefully, they'll be able to join us so we can meet face-to-face. We're planning for a 5:00 pm dinner; it's kind of early, but the restaurant is a close walk from the Expo Center, so I figured it would be easier to go directly there instead of dispersing to our hotels and regrouping later. Plus, an early dinner will enable people to attend the (unofficial) Game Night [0]. Details and a signup sheet (we need a head count) are at the top of the Cinder PTG etherpad: https://etherpad.openstack.org/p/shanghai-ptg-cinder I haven't been able to confirm the time and location yet, but I will keep the above etherpad updated with that info. cheers, brian [0] https://etherpad.openstack.org/p/pvg-game-night From rico.lin.guanyu at gmail.com Sat Nov 2 06:38:42 2019 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Sat, 2 Nov 2019 14:38:42 +0800 Subject: [auto-scaling] SIG PTG schedule. please join us :) In-Reply-To: References: Message-ID: Actually the schedule is changed so 6th it is! Rico Lin 於 2019年10月30日 週三,下午3:26寫道: > Sorry about this mistake, it should be 11/5 instead of 6th > > On Wed, Oct 30, 2019 at 3:05 PM Rico Lin > wrote: > >> Hi all >> >> PTG is right next week, so if you're interested in Auto-scaling features, >> please join us. >> Our session will be hosted on 11/6 for Half-day from 9:00 am to 12:30 pm >> in room 431. >> You can suggest PTG sessions in our etherpad: >> https://etherpad.openstack.org/p/PVG-auto-scaling-sig >> Feel free to join our IRC channel as well: #openstack-auto-scaling >> >> Also, you can check out our new documents for autoscaling: >> https://docs.openstack.org/auto-scaling-sig >> And provide any feedback/ bug report for any Auto-scaling issue related >> to OpenStack in >> https://storyboard.openstack.org/#!/project/openstack/auto-scaling-sig >> >> Here's my Wechat ID: RicoLinCloudGeek >> See you all in Shanghai! >> >> -- >> May The Force of OpenStack Be With You, >> >> *Rico Lin*irc: ricolin >> >> > > -- > May The Force of OpenStack Be With You, > > *Rico Lin*irc: ricolin > > -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From veeraready at yahoo.co.in Sat Nov 2 07:14:29 2019 From: veeraready at yahoo.co.in (VeeraReddy) Date: Sat, 2 Nov 2019 07:14:29 +0000 (UTC) Subject: [openstack-dev][kuryr] Error in creating LB In-Reply-To: References: <1979762272.3801941.1572509785865.ref@mail.yahoo.com> <1979762272.3801941.1572509785865@mail.yahoo.com> <2ea50863deddb3bd158ccab869536c3c4f9693d5.camel@redhat.com> Message-ID: <619091810.118353.1572678869777@mail.yahoo.com> Hi Michal,Thanks for your help , deployment was successful. Problem is with libvirt .Libvirt is already installed in host machine with 1 vm up and running , i destroyed vm and uninstall libvirt , problem solved, this issue was identified from nova logs.  Thanks, Veera. On Friday, 1 November, 2019, 03:19:38 am IST, Michael Johnson wrote: Yeah, this likely means something is wrong with your nova setup. Either it is too slow to boot a VM or there is some other error. Try looking for the "amphora" instances in nova (openstack server list) then do a show on them (openstack server show ). There is an error field from nova that may contain the error. Michael On Thu, Oct 31, 2019 at 1:46 AM Michał Dulko wrote: > > On Thu, 2019-10-31 at 08:16 +0000, VeeraReddy wrote: > > Hi, > > > > I am trying to install openstack & Kubernetes using devstack > > > > https://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html > > > >  Error Log            : http://paste.openstack.org/show/785670/ > >  Local.conf          : Paste #785671 | LodgeIt! > > This error happens when Octavia is unable to create a load balancer in > 5 minutes. Seems like your LB is still PENDING_CREATE, so this seems to > be just unusually slow Octavia. This might happen if e.g. your host has > no nested virtualization enabled. > > Try increasing KURYR_WAIT_TIMEOUT in local.conf. In the gate we use up > to 20 minutes (value of 1200). > > > Regards, > > Veera. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.rydberg at citynetwork.eu Sat Nov 2 08:19:14 2019 From: tobias.rydberg at citynetwork.eu (Tobias Rydberg) Date: Sat, 2 Nov 2019 16:19:14 +0800 Subject: [ospurge] looking for project owners / considering adoption In-Reply-To: <576E74EB-ED80-497F-9706-482FE0433208@gmail.com> References: <342983ed-1d22-8f3a-3335-f153512ec2b2@catalyst.net.nz> <576E74EB-ED80-497F-9706-482FE0433208@gmail.com> Message-ID: <2ca832bb-4b71-b775-160a-e1868dcb21d2@citynetwork.eu> Hi, A Forum session is planned for this topic, Monday 11:40. Suites perfect to continue the discussions there as well. https://www.openstack.org/summit/shanghai-2019/summit-schedule/events/24407/project-resource-cleanup-followup BR, Tobias Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED On 2019-10-30 15:43, Artem Goncharov wrote: > Hi Adam, > > Since I need this now as well I will start working on implementation > how it was agreed (in SDK and in OSC) during last summit by mid of > November. There is no need for discussing this further, it just need > to be implemented. Sad that we got no progress in half a year. > > Regards, > Artem (gtema). > >> On 30. Oct 2019, at 14:26, Adam Harwell > > wrote: >> >> That's too bad that you won't be at the summit, but I think there may >> still be some discussion planned about this topic. >> >> Yeah, I understand completely about priorities and such internally. >> Same for me... It just happens that this IS priority work for us >> right now. :) >> >> >> On Tue, Oct 29, 2019, 07:48 Adrian Turjak > > wrote: >> >> My apologies I missed this email. >> >> Sadly I won't be at the summit this time around. There may be >> some public cloud focused discussions, and some of those often >> have this topic come up. Also if Monty from the SDK team is >> around, I'd suggest finding him and having a chat. >> >> I'll help if I can but we are swamped with internal work and I >> can't dedicate much time to do upstream work that isn't urgent. :( >> >> On 17/10/19 8:48 am, Adam Harwell wrote: >>> That's interesting -- we have already started working to add >>> features and improve ospurge, and it seems like a plenty useful >>> tool for our needs, but I think I agree that it would be nice to >>> have that functionality built into the sdk. I might be able to >>> help with both, since one is immediately useful and we (like >>> everyone) have deadlines to meet, and the other makes sense to >>> me as a possible future direction that could be more widely >>> supported. >>> >>> Will you or someone else be hosting and discussion about this at >>> the Shanghai summit? I'll be there and would be happy to join >>> and discuss. >>> >>>     --Adam >>> >>> On Tue, Oct 15, 2019, 22:04 Adrian Turjak >>> > wrote: >>> >>> I tried to get a community goal to do project deletion per >>> project, but >>> we ended up deciding that a community goal wasn't ideal >>> unless we did >>> build a bulk delete API in each service: >>> https://review.opendev.org/#/c/639010/ >>> https://etherpad.openstack.org/p/community-goal-project-deletion >>> https://etherpad.openstack.org/p/DEN-Deletion-of-resources >>> https://etherpad.openstack.org/p/DEN-Train-PublicCloudWG-brainstorming >>> >>> What we decided on, but didn't get a chance to work on, was >>> building >>> into the OpenstackSDK OS-purge like functionality, as well >>> as reporting >>> functionality (of all project resources to be deleted). That >>> way we >>> could have per project per resource deletion logic, and all >>> of that >>> defined in the SDK. >>> >>> I was up for doing some of the work, but ended up swamped >>> with internal >>> work and just didn't drive or push for the deletion work >>> upstream. >>> >>> If you want to do something useful, don't pursue OS-Purge, >>> help us add >>> that official functionality to the SDK, and then we can push >>> for bulk >>> deletion APIs in each project to make resource deletion more >>> pleasant. >>> >>> I'd be happy to help with the work, and Monty on the SDK >>> team will most >>> likely be happy to as well. :) >>> >>> Cheers, >>> Adrian >>> >>> On 1/10/19 11:48 am, Adam Harwell wrote: >>> > I haven't seen much activity on this project in a while, >>> and it's been >>> > moved to opendev/x since the opendev migration... Who is >>> the current >>> > owner of this project? Is there anyone who actually is >>> maintaining it, >>> > or would mind if others wanted to adopt the project to >>> move it forward? >>> > >>> > Thanks, >>> >    --Adam Harwell >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4017 bytes Desc: S/MIME Cryptographic Signature URL: From skaplons at redhat.com Sat Nov 2 08:59:58 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Sat, 2 Nov 2019 09:59:58 +0100 Subject: [neutron][ptg] Team dinner In-Reply-To: References: <20191030211537.trgnve7df27g3jh4@skaplons-mac> Message-ID: <20191102085958.oyahgqi2yufml3vg@skaplons-mac> Hi, On Fri, Nov 01, 2019 at 02:24:51PM +0900, Takashi Yamamoto wrote: > hi, > > On Thu, Oct 31, 2019 at 6:19 AM Slawek Kaplonski wrote: > > > > Hi neutrinos, > > > > Thanks to LIU Yulong who helped me a lot to choose and book some restaurant, we > > have now booked restaurant: > > > > Expo source B2, No.168, Shangnan Road, Pudong New Area, Shanghai, TEL: +86 21 > > 58882117 > > 书院人家(世博源店) 上海市浦东新区上南路168号世博源B2 > > The Dianping page: http://www.dianping.com/shop/20877292 > > > > Dinner is scheduled to Tuesday, 5th Nov at 6pm. > > > > Restaurant is close to the Expo center. It's about 15 minutes walk according to > > the Google maps: https://tinyurl.com/y2rc83ej > > google maps is quite inaccurate in China. > https://j.map.baidu.com/e8/6h > https://router.map.qq.com/short?l=8af77e82c3b0deff1adeffb171fedf19 Thx for this links. I puted it in etherpad also. > > > > > -- > > Slawek Kaplonski > > Senior software engineer > > Red Hat > > > > -- Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Sat Nov 2 09:02:02 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Sat, 2 Nov 2019 10:02:02 +0100 Subject: [ptg][neutron] Onboarding during the PTG In-Reply-To: <20191031223451.yljtzrp2zulwpzoc@skaplons-mac> References: <20191031223451.yljtzrp2zulwpzoc@skaplons-mac> Message-ID: <20191102090202.54z3a4wgvmyj53c6@skaplons-mac> Hi, I forgot to add that we will be in *Room 431* according to [1] [1] https://www.openstack.org/ptg/ On Thu, Oct 31, 2019 at 11:34:51PM +0100, Slawek Kaplonski wrote: > Hi all new (and existing) Neutrinos, > > During the PTG in Shanghai we are planning to organize onboarding session. > It will take place on Wednesday in morning sessions. > It is planned to be started at 9:00 am and finished just before the lunch > on 12:30. See on [1] for details. > All people who wants to learn about Neutron and contribution to it are welcome > on this session. > Also all existing team members are welcome to be there to show to new > contributors and help them with onboarding process :) > > See You all in Shanghai! > > [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning > > -- > Slawek Kaplonski > Senior software engineer > Red Hat -- Slawek Kaplonski Senior software engineer Red Hat From umesh.mishra at click2cloud.net Sat Nov 2 15:18:09 2019 From: umesh.mishra at click2cloud.net (Umesh Mishra) Date: Sat, 2 Nov 2019 15:18:09 +0000 Subject: Installation DOC for Three node Message-ID: Dear Sir, This is inform you that, We want to build the openstack in our data center so request you please help to provide me doc so that we can build asap . Please help. Best Regards, Umesh Mishra Manager-IT&Devops | Click2Cloud Inc. | www.click2cloud.net Email: umesh.mishra at click2cloud.net | Mobile: +91 7738599311 -------------- next part -------------- An HTML attachment was scrubbed... URL: From emccormick at cirrusseven.com Sat Nov 2 16:27:10 2019 From: emccormick at cirrusseven.com (Erik McCormick) Date: Sat, 2 Nov 2019 12:27:10 -0400 Subject: [ops] Shanghai Sessions Message-ID: Hello all, For anyone attending the Shanghai Summit, I wanted to point out a couple of interesting Ops things. First, we are redoing the ever popular Ops War Stories at the Forum on Monday. These are a series of lightning talks by anyone with a good story to tell. No presentations needed. Sign up here: https://etherpad.openstack.org/p/shanghai-ptg-ops-war-stories Second, we have space in the PTG area Thursday afternoon (1:30pm - 4:30pm, Room 430) to do a mini Ops Meetup. Please come join us! Share ideas for topics here so we can hit the ground running. https://etherpad.openstack.org/p/shanghai-ptg-ops-meetup Cheers, Erik -------------- next part -------------- An HTML attachment was scrubbed... URL: From emccormick at cirrusseven.com Sat Nov 2 16:40:24 2019 From: emccormick at cirrusseven.com (Erik McCormick) Date: Sat, 2 Nov 2019 12:40:24 -0400 Subject: [ops] Shanghai Sessions Message-ID: Hello all, For anyone attending the Shanghai Summit, I wanted to point out a couple of interesting Ops things. First, we are redoing the ever popular Ops War Stories at the Forum on Monday. These are a series of lightning talks by anyone with a good story to tell. No presentations needed. Sign up here: https://etherpad.openstack.org/p/shanghai-ptg-ops-war-stories Second, we have space in the PTG area Thursday afternoon (1:30pm - 4:30pm, Room 430) to do a mini Ops Meetup. Please come join us! Share ideas for topics here so we can hit the ground running. https://etherpad.openstack.org/p/shanghai-ptg-ops-meetup Cheers, Erik -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Sun Nov 3 00:03:44 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Sat, 2 Nov 2019 20:03:44 -0400 Subject: [cinder] core list update Message-ID: <039cc381-d757-57e4-9014-82327b31a3d8@gmail.com> In preparation for the PTG, it's time to revise the list of Cinder core contributors. The following people have made great contributions to Cinder, but unfortunately Cinder is no longer their current focus, and they've indicated that they no longer have sufficient time to act as core reviewers for the project: - John Griffith - TommyLike Hu - Xing Yang - Yikun Jiang On behalf of the entire Cinder project team, I thank John, TommyLike, Yikun, and Xing for their past service to Cinder, and hope that they may find more time to spend on Cinder in the future. While I'm thanking people, I should also thank the current members of the cinder-core group for all their work during the Train cycle (and thank them in advance for all the work I look forward to them doing during Ussuri!): - Eric Harney - Gorka Eguileor - Ivan Kolodyazhny - Jay Bryant - Rajat Dhasmana - Sean McGinnis - Walter A. Boring IV There will leave some openings for new core contributors during the Ussuri cycle. If you're interested in getting some advice about how to position yourself to become a Cinder core, please seek out me or any of the active cores listed above during the PTG. For people who won't be at the PTG, you can always look for us in #openstack-cinder. Let's have a productive PTG! cheers, brian From Arkady.Kanevsky at dell.com Sun Nov 3 01:42:46 2019 From: Arkady.Kanevsky at dell.com (Arkady.Kanevsky at dell.com) Date: Sun, 3 Nov 2019 01:42:46 +0000 Subject: [cinder] core list update In-Reply-To: <039cc381-d757-57e4-9014-82327b31a3d8@gmail.com> References: <039cc381-d757-57e4-9014-82327b31a3d8@gmail.com> Message-ID: <2c5ff878cfbe40939cfe36cccf5f0a8e@AUSX13MPS304.AMER.DELL.COM> Thanks to all past and current core contributors. -----Original Message----- From: Brian Rosmaita Sent: Saturday, November 2, 2019 7:04 PM To: openstack-discuss at lists.openstack.org Subject: [cinder] core list update [EXTERNAL EMAIL] In preparation for the PTG, it's time to revise the list of Cinder core contributors. The following people have made great contributions to Cinder, but unfortunately Cinder is no longer their current focus, and they've indicated that they no longer have sufficient time to act as core reviewers for the project: - John Griffith - TommyLike Hu - Xing Yang - Yikun Jiang On behalf of the entire Cinder project team, I thank John, TommyLike, Yikun, and Xing for their past service to Cinder, and hope that they may find more time to spend on Cinder in the future. While I'm thanking people, I should also thank the current members of the cinder-core group for all their work during the Train cycle (and thank them in advance for all the work I look forward to them doing during Ussuri!): - Eric Harney - Gorka Eguileor - Ivan Kolodyazhny - Jay Bryant - Rajat Dhasmana - Sean McGinnis - Walter A. Boring IV There will leave some openings for new core contributors during the Ussuri cycle. If you're interested in getting some advice about how to position yourself to become a Cinder core, please seek out me or any of the active cores listed above during the PTG. For people who won't be at the PTG, you can always look for us in #openstack-cinder. Let's have a productive PTG! cheers, brian From katonalala at gmail.com Mon Nov 4 00:26:48 2019 From: katonalala at gmail.com (Lajos Katona) Date: Mon, 4 Nov 2019 01:26:48 +0100 Subject: [neutron] Bug deputy report for week of November 01 Message-ID: Hi, On the week from October 28 to November 3 I was the bug deputy of neutron, here's a short summary of the bugs arrived. Critical bugs - https://bugs.launchpad.net/neutron/+bug/1850288: scenario test test_multicast_between_vms_on_same_network fails - workaround to skip unstable test is merged, more investigation is needed. - https://bugs.launchpad.net/neutron/+bug/1850626 neutron-dynamic-routing: TypeError: bind() takes 4 positional arguments but 5 were given - assigned, workaround to skip falling test is there ( https://review.opendev.org/692372) need to check how to fix the issue with https://review.opendev.org/288271 High bugs - https://bugs.launchpad.net/neutron/+bug/1850639 FloatingIP list bad performance - Assigned, in progress - https://bugs.launchpad.net/neutron/+bug/1850800 UDP port forwarding test is failing often High - The workaround to make the test unstable is there, more investigation is needed Medium bugs - https://bugs.launchpad.net/neutron/+bug/1850558 "AttributeError: 'str' object has no attribute 'content_type' in functional tests - assigned, in progress - https://bugs.launchpad.net/neutron/+bug/1850557 DHCP connectivity after migration/resize not working - More investigation is necessary - https://bugs.launchpad.net/neutron/+bug/1850779 [L3] snat-ns will be initialized twice for DVR+HA routers during agent restart medium assigned - assigned, in progress - https://bugs.launchpad.net/neutron/+bug/1850864 DHCP agent takes very long time to report when port is provisioned Medium - Assigned In progress Low bugs - https://bugs.launchpad.net/neutron/+bug/1849980: Do not inherit from built-in "dict" - assigned, in progress - https://bugs.launchpad.net/neutron/+bug/1850602: remove firewall_v1 exceptions in neutron-lib - assigned, in progress RFE - https://bugs.launchpad.net/neutron/+bug/1850818 [RFE][floatingip port_forwarding] Add description field - assigned More investigation is needed - https://bugs.launchpad.net/neutron/+bug/1850630 firewall rule update validating func is not robust enough,missing considering the stock data - https://bugs.launchpad.net/neutron/+bug/1850137: Hosts in a VPNaaS-VPNaas VPN lose their interconnect. Regards Lajos -------------- next part -------------- An HTML attachment was scrubbed... URL: From akekane at redhat.com Mon Nov 4 01:46:58 2019 From: akekane at redhat.com (Abhishek Kekane) Date: Mon, 4 Nov 2019 07:16:58 +0530 Subject: [glance] PTG Schedule Message-ID: Hi All, Welcome to China for the OpenInfra Summit and PTG. I have prepared a PTG schedule for Glance [1]. We will have a small meet with QA team on Wednesday 6th November, full day session on Thursday 7th November and half day on Friday 8th November. I have kept 1 hour for Open discussion on Thursday and Friday, so If someone wants to join us with their topics can do the same during this time. Friday 11:30 to 12:00, we have an interview scheduled to share updates on Glance. Have a productive Summit and PTG. [1] https://etherpad.openstack.org/p/Glance-Ussuri-PTG-planning Thanks & Best Regards, Abhishek Kekane -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmendiza at redhat.com Mon Nov 4 02:08:21 2019 From: dmendiza at redhat.com (=?UTF-8?Q?Douglas_Mendiz=c3=a1bal?=) Date: Sun, 3 Nov 2019 20:08:21 -0600 Subject: [barbican] PTG Schedule Message-ID: <0451df6b-23cc-2604-b28a-e1e9f6aac6f8@redhat.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hello Barbicaneers! I hope everyone made it to Shanghai safely. Our PTG session will be on Wednesday Nov 6 from 10:30am - 4:30p at the Kilo table. We've set up an etherpad to collect topics to talk about. Please feel free to add any topics you're interested in: https://etherpad.openstack.org/p/barbican-ussuri-ptg Additionally we've reserved a spot for a Team Photo on Thursday at 11am. Hope to see y'all soon! Cheers, - - Douglas Mendizabal -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEwcapj5oGTj2zd3XogB6WFOq/OrcFAl2/iBEACgkQgB6WFOq/ OrfFew/+IjhZe+qRCi/4EmaVEDf7QxJyZDIVUlLsPWHmF98wCdj+GsbzoWUuFfHM sJCpfpVAUjxrIFOEo5uF9WiZhU36G9pgoLd1Y8Kb0/QRIQEQQcKGnlYhCn+jQjbW J2tlDrkU0GBwEBzDt91gM5JCncviY8yT6nhlr/SSLqvZRQnPewerJNyJbYsVh6N2 moXQzfeRjg1SGqR0KVUcDVPe/pE+at8A5ARFCxDiJaOIUTP0qcfKtDXh714bevyi Sw2qgDZHbLHa1nEv3umuYGcrGpKz8Uuj5ju+7oGpPh4hX4pfPxbVDSzu8srfzTui ggvcxFrpZQvdff3Lec1eclxnB+c9Z1tBKYF7pPUVtN3NPfCATkVCSQACYORPZLdh GAnyxiiUXRwzIfOo0b6koa2pRi7ZWoz0DjVzpnl+D7qztUzyiguaj3KDnuTvlfQl iMQev1QHD6fAVvByHgDRj4dyUqUi2+V/DtNZ9w29AX7C+U/afSbNGvygc8yNCtHF vbkw68aPpj5zeB0OTjPQ6N5vsUc6bSXYGnECuGw24untnutvPKR+W9g9VQEUyN1h vhvn0IPHZ9QyBJ0ctpdfA6O9PNsjY/DQNyDeiNGljTIpBjepUmqMTXvycsn8VN/E yY0OL2QFGPhcsK7Q/yeUCzMm1sken2zMg8Bdxt10qbj4GsCMtyQ= =fdMR -----END PGP SIGNATURE----- From kota.tsuyuzaki.pc at hco.ntt.co.jp Fri Nov 1 10:18:52 2019 From: kota.tsuyuzaki.pc at hco.ntt.co.jp (Kota Tsuyuzaki) Date: Fri, 01 Nov 2019 19:18:52 +0900 Subject: [ptg][storlets] Shanghi PTG plan for Storlets Message-ID: <001c01d5909d$c2c25ac0$48471040$@hco.ntt.co.jp_1> Hello guys, The Shanghai PTG will happen soon! The Storlets core team prepared the etherpad for Shanghi PTG here, https://etherpad.openstack.org/p/storlets-ptg-shanghai Please feel free to add any topics you want to discuss there. Note that, Storlets reserved the room for 1.5 days but we'll use the latter half day actually because of the other schedules of core team members. If anyone want to catch me in other time, please let me know via E-mail, or something (I'm not sure IRC and any other tools would work in Shanghi PTG network, though) -------------------------------------------- 露崎 浩太 (Kota Tsuyuzaki) kota.tsuyuzaki.pc at hco.ntt.co.jp NTTソフトウェアイノベーションセンタ 分散処理基盤プロジェクト 0422-59-2837 --------------------------------------------- From zodiac.nv at gmail.com Sat Nov 2 22:53:55 2019 From: zodiac.nv at gmail.com (Nikita Belov) Date: Sun, 3 Nov 2019 01:53:55 +0300 Subject: [tempest] Train release and zero-disk error Message-ID: Hello! I try to use rally with Train release of Openstack. I have 842 success tests and 436 failures because of "Only volume-backed servers are allowed for flavors with zero disk". What can I do with this error? Report attached. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From dangtrinhnt at gmail.com Mon Nov 4 02:18:54 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Mon, 4 Nov 2019 11:18:54 +0900 Subject: [OpenInfra Summit Shanghai] Some wrong information at Shanghai Summit Message-ID: Dear OSF, I don't attend the Shanghai summit but I notice some pictures of the presentation which Jonathan Bryce is delivering. The attached photo is about the OpenInfra Days around the world and the one on the left is *OpenInfra Days Vietnam*, not Korea. I know it's just a little thing but because it already upset some of our community members I would like to ask for correction. The Vietnam OpenInfra Community has been putting lots of effort to be recognize so please understand. Bests, Trinh -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: photo_2019-11-04_11-00-27.jpg Type: image/jpeg Size: 191246 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: photo_2019-11-04_11-09-59.jpg Type: image/jpeg Size: 112337 bytes Desc: not available URL: From adriant at catalyst.net.nz Mon Nov 4 03:00:08 2019 From: adriant at catalyst.net.nz (Adrian Turjak) Date: Mon, 4 Nov 2019 16:00:08 +1300 Subject: [ospurge] looking for project owners / considering adoption In-Reply-To: References: <342983ed-1d22-8f3a-3335-f153512ec2b2@catalyst.net.nz> Message-ID: <539f50a0-c67e-9b36-5608-beadb5b68d02@catalyst.net.nz> Also of potential interest is our own internal variant of project termination: https://gitlab.com/catalyst-cloud/python-opsclient/blob/master/opsclient/ops/v1/project.py Note, a recent thing we ran into was a lack of support for Swift Bulk deletion... which we are now turning on and fixing, because deleting a project with 2mil + objects one by one is... slow. On 31/10/19 2:26 am, Adam Harwell wrote: > That's too bad that you won't be at the summit, but I think there may > still be some discussion planned about this topic.  > > Yeah, I understand completely about priorities and such internally. > Same for me... It just happens that this IS priority work for us right > now. :) > > > On Tue, Oct 29, 2019, 07:48 Adrian Turjak > wrote: > > My apologies I missed this email. > > Sadly I won't be at the summit this time around. There may be some > public cloud focused discussions, and some of those often have > this topic come up. Also if Monty from the SDK team is around, I'd > suggest finding him and having a chat. > > I'll help if I can but we are swamped with internal work and I > can't dedicate much time to do upstream work that isn't urgent. :( > > On 17/10/19 8:48 am, Adam Harwell wrote: >> That's interesting -- we have already started working to add >> features and improve ospurge, and it seems like a plenty useful >> tool for our needs, but I think I agree that it would be nice to >> have that functionality built into the sdk. I might be able to >> help with both, since one is immediately useful and we (like >> everyone) have deadlines to meet, and the other makes sense to me >> as a possible future direction that could be more widely supported. >> >> Will you or someone else be hosting and discussion about this at >> the Shanghai summit? I'll be there and would be happy to join and >> discuss. >> >>     --Adam >> >> On Tue, Oct 15, 2019, 22:04 Adrian Turjak >> > wrote: >> >> I tried to get a community goal to do project deletion per >> project, but >> we ended up deciding that a community goal wasn't ideal >> unless we did >> build a bulk delete API in each service: >> https://review.opendev.org/#/c/639010/ >> https://etherpad.openstack.org/p/community-goal-project-deletion >> https://etherpad.openstack.org/p/DEN-Deletion-of-resources >> https://etherpad.openstack.org/p/DEN-Train-PublicCloudWG-brainstorming >> >> What we decided on, but didn't get a chance to work on, was >> building >> into the OpenstackSDK OS-purge like functionality, as well as >> reporting >> functionality (of all project resources to be deleted). That >> way we >> could have per project per resource deletion logic, and all >> of that >> defined in the SDK. >> >> I was up for doing some of the work, but ended up swamped >> with internal >> work and just didn't drive or push for the deletion work >> upstream. >> >> If you want to do something useful, don't pursue OS-Purge, >> help us add >> that official functionality to the SDK, and then we can push >> for bulk >> deletion APIs in each project to make resource deletion more >> pleasant. >> >> I'd be happy to help with the work, and Monty on the SDK team >> will most >> likely be happy to as well. :) >> >> Cheers, >> Adrian >> >> On 1/10/19 11:48 am, Adam Harwell wrote: >> > I haven't seen much activity on this project in a while, >> and it's been >> > moved to opendev/x since the opendev migration... Who is >> the current >> > owner of this project? Is there anyone who actually is >> maintaining it, >> > or would mind if others wanted to adopt the project to move >> it forward? >> > >> > Thanks, >> >    --Adam Harwell >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbryce at jbryce.com Mon Nov 4 03:00:18 2019 From: jbryce at jbryce.com (Jonathan Bryce) Date: Sun, 03 Nov 2019 21:00:18 -0600 Subject: [OpenInfra Summit Shanghai] Some wrong information at Shanghai Summit In-Reply-To: References: Message-ID: <16e345a4ee0.27a5.eb5fa01e01bf15c6e0d805bdb1ad935e@jbryce.com> Hi Trinh, I apologize for the error on the presentation. We have corrected it in version of the slides that will be distributed. We definitely appreciate the efforts of all of the community organizers, and I'm sorry for the mix up. Jonathan On November 3, 2019 20:55:34 Trinh Nguyen wrote: > Dear OSF, > > I don't attend the Shanghai summit but I notice some pictures of the > presentation which Jonathan Bryce is delivering. The attached photo is > about the OpenInfra Days around the world and the one on the left is > OpenInfra Days Vietnam, not Korea. I know it's just a little thing but > because it already upset some of our community members I would like to ask > for correction. The Vietnam OpenInfra Community has been putting lots of > effort to be recognize so please understand. > > Bests, > > Trinh > > -- > > Trinh Nguyen > www.edlab.xyz -------------- next part -------------- An HTML attachment was scrubbed... URL: From dangtrinhnt at gmail.com Mon Nov 4 03:02:49 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Mon, 4 Nov 2019 12:02:49 +0900 Subject: [OpenInfra Summit Shanghai] Some wrong information at Shanghai Summit In-Reply-To: <16e345a4ee0.27a5.eb5fa01e01bf15c6e0d805bdb1ad935e@jbryce.com> References: <16e345a4ee0.27a5.eb5fa01e01bf15c6e0d805bdb1ad935e@jbryce.com> Message-ID: Thank Jonathan for the quick response. I really appreciate that. Bests, On Mon, Nov 4, 2019 at 12:00 PM Jonathan Bryce wrote: > Hi Trinh, > > I apologize for the error on the presentation. We have corrected it in > version of the slides that will be distributed. > > We definitely appreciate the efforts of all of the community organizers, > and I'm sorry for the mix up. > > Jonathan > > > On November 3, 2019 20:55:34 Trinh Nguyen wrote: > >> Dear OSF, >> >> I don't attend the Shanghai summit but I notice some pictures of the >> presentation which Jonathan Bryce is delivering. The attached photo is >> about the OpenInfra Days around the world and the one on the left is *OpenInfra >> Days Vietnam*, not Korea. I know it's just a little thing but because it >> already upset some of our community members I would like to ask for >> correction. The Vietnam OpenInfra Community has been putting lots of effort >> to be recognize so please understand. >> >> Bests, >> >> Trinh >> >> -- >> *Trinh Nguyen* >> *www.edlab.xyz * >> >> > -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From flux.adam at gmail.com Mon Nov 4 03:09:10 2019 From: flux.adam at gmail.com (Adam Harwell) Date: Mon, 4 Nov 2019 11:09:10 +0800 Subject: [ospurge] looking for project owners / considering adoption In-Reply-To: <539f50a0-c67e-9b36-5608-beadb5b68d02@catalyst.net.nz> References: <342983ed-1d22-8f3a-3335-f153512ec2b2@catalyst.net.nz> <539f50a0-c67e-9b36-5608-beadb5b68d02@catalyst.net.nz> Message-ID: Interesting... Well, hopefully I will see some people in about 30 minutes about this. :) On Mon, Nov 4, 2019, 11:00 AM Adrian Turjak wrote: > Also of potential interest is our own internal variant of project > termination: > > https://gitlab.com/catalyst-cloud/python-opsclient/blob/master/opsclient/ops/v1/project.py > > Note, a recent thing we ran into was a lack of support for Swift Bulk > deletion... which we are now turning on and fixing, because deleting a > project with 2mil + objects one by one is... slow. > On 31/10/19 2:26 am, Adam Harwell wrote: > > That's too bad that you won't be at the summit, but I think there may > still be some discussion planned about this topic. > > Yeah, I understand completely about priorities and such internally. Same > for me... It just happens that this IS priority work for us right now. :) > > > On Tue, Oct 29, 2019, 07:48 Adrian Turjak wrote: > >> My apologies I missed this email. >> >> Sadly I won't be at the summit this time around. There may be some public >> cloud focused discussions, and some of those often have this topic come up. >> Also if Monty from the SDK team is around, I'd suggest finding him and >> having a chat. >> >> I'll help if I can but we are swamped with internal work and I can't >> dedicate much time to do upstream work that isn't urgent. :( >> On 17/10/19 8:48 am, Adam Harwell wrote: >> >> That's interesting -- we have already started working to add features and >> improve ospurge, and it seems like a plenty useful tool for our needs, but >> I think I agree that it would be nice to have that functionality built into >> the sdk. I might be able to help with both, since one is immediately useful >> and we (like everyone) have deadlines to meet, and the other makes sense to >> me as a possible future direction that could be more widely supported. >> >> Will you or someone else be hosting and discussion about this at the >> Shanghai summit? I'll be there and would be happy to join and discuss. >> >> --Adam >> >> On Tue, Oct 15, 2019, 22:04 Adrian Turjak >> wrote: >> >>> I tried to get a community goal to do project deletion per project, but >>> we ended up deciding that a community goal wasn't ideal unless we did >>> build a bulk delete API in each service: >>> https://review.opendev.org/#/c/639010/ >>> https://etherpad.openstack.org/p/community-goal-project-deletion >>> https://etherpad.openstack.org/p/DEN-Deletion-of-resources >>> https://etherpad.openstack.org/p/DEN-Train-PublicCloudWG-brainstorming >>> >>> What we decided on, but didn't get a chance to work on, was building >>> into the OpenstackSDK OS-purge like functionality, as well as reporting >>> functionality (of all project resources to be deleted). That way we >>> could have per project per resource deletion logic, and all of that >>> defined in the SDK. >>> >>> I was up for doing some of the work, but ended up swamped with internal >>> work and just didn't drive or push for the deletion work upstream. >>> >>> If you want to do something useful, don't pursue OS-Purge, help us add >>> that official functionality to the SDK, and then we can push for bulk >>> deletion APIs in each project to make resource deletion more pleasant. >>> >>> I'd be happy to help with the work, and Monty on the SDK team will most >>> likely be happy to as well. :) >>> >>> Cheers, >>> Adrian >>> >>> On 1/10/19 11:48 am, Adam Harwell wrote: >>> > I haven't seen much activity on this project in a while, and it's been >>> > moved to opendev/x since the opendev migration... Who is the current >>> > owner of this project? Is there anyone who actually is maintaining it, >>> > or would mind if others wanted to adopt the project to move it forward? >>> > >>> > Thanks, >>> > --Adam Harwell >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Mon Nov 4 03:29:40 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Mon, 4 Nov 2019 12:29:40 +0900 Subject: [ptg][storlets] Shanghi PTG plan for Storlets In-Reply-To: <001c01d5909d$c2c25ac0$48471040$@hco.ntt.co.jp_1> References: <001c01d5909d$c2c25ac0$48471040$@hco.ntt.co.jp_1> Message-ID: Hi Kota, I would suggest you to update the etherpad list of Shanghai PTG. http://ptg.openstack.org/etherpads.html The instruction is found at https://opendev.org/openstack/ptgbot/src/branch/master/README.rst#etherpad Akihiro On Mon, Nov 4, 2019 at 11:50 AM Kota Tsuyuzaki wrote: > > Hello guys, > > The Shanghai PTG will happen soon! The Storlets core team prepared the etherpad for Shanghi PTG here, > https://etherpad.openstack.org/p/storlets-ptg-shanghai > Please feel free to add any topics you want to discuss there. Note that, Storlets reserved the room for 1.5 days but we'll use the > latter half day actually because of the other schedules of core team members. > > If anyone want to catch me in other time, please let me know via E-mail, or something (I'm not sure IRC and any other tools would > work in Shanghi PTG network, though) > > -------------------------------------------- > 露崎 浩太 (Kota Tsuyuzaki) > kota.tsuyuzaki.pc at hco.ntt.co.jp > NTTソフトウェアイノベーションセンタ > 分散処理基盤プロジェクト > 0422-59-2837 > --------------------------------------------- > > > > > From umesh.mishra at click2cloud.net Mon Nov 4 04:27:19 2019 From: umesh.mishra at click2cloud.net (Umesh Mishra) Date: Mon, 4 Nov 2019 04:27:19 +0000 Subject: [ptg][storlets] Shanghi PTG plan for Storlets In-Reply-To: References: <001c01d5909d$c2c25ac0$48471040$@hco.ntt.co.jp_1> Message-ID: Dear Sir, This is inform you that, We want to build the open stack in our data center so request you please help to provide me doc so that we can build asap Best Regards, Umesh Mishra Manager-IT&Devops | Click2Cloud Inc. | www.click2cloud.net Email: umesh.mishra at click2cloud.net |  Mobile: +91 7738599311 -----Original Message----- From: Akihiro Motoki Sent: Monday, November 4, 2019 9:00 AM To: Kota Tsuyuzaki Cc: openstack-discuss Subject: Re: [ptg][storlets] Shanghi PTG plan for Storlets Hi Kota, I would suggest you to update the etherpad list of Shanghai PTG. http://ptg.openstack.org/etherpads.html The instruction is found at https://opendev.org/openstack/ptgbot/src/branch/master/README.rst#etherpad Akihiro On Mon, Nov 4, 2019 at 11:50 AM Kota Tsuyuzaki wrote: > > Hello guys, > > The Shanghai PTG will happen soon! The Storlets core team prepared the > etherpad for Shanghi PTG here, > https://etherpad.openstack.org/p/storlets-ptg-shanghai > Please feel free to add any topics you want to discuss there. Note > that, Storlets reserved the room for 1.5 days but we'll use the latter half day actually because of the other schedules of core team members. > > If anyone want to catch me in other time, please let me know via > E-mail, or something (I'm not sure IRC and any other tools would work > in Shanghi PTG network, though) > > -------------------------------------------- > 露崎 浩太 (Kota Tsuyuzaki) > kota.tsuyuzaki.pc at hco.ntt.co.jp > NTTソフトウェアイノベーションセンタ > 分散処理基盤プロジェクト > 0422-59-2837 > --------------------------------------------- > > > > > From veeraready at yahoo.co.in Mon Nov 4 08:20:46 2019 From: veeraready at yahoo.co.in (VeeraReddy) Date: Mon, 4 Nov 2019 08:20:46 +0000 (UTC) Subject: [openstack-dev][kuryr] Unable to create pod References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> Message-ID: <1134736099.729706.1572855646143@mail.yahoo.com> HI,Pod is not creating . Status of pod is always "ContainerCreating" I installed openstack & kubernetes using below linkhttps://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html devstack at kuryr-daemon.service log :http://paste.openstack.org/show/785758/ Regards, Veera. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/png Size: 17533 bytes Desc: not available URL: From umesh.mishra at click2cloud.net Mon Nov 4 08:31:10 2019 From: umesh.mishra at click2cloud.net (Umesh Mishra) Date: Mon, 4 Nov 2019 08:31:10 +0000 Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: <1134736099.729706.1572855646143@mail.yahoo.com> References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> Message-ID: Dear Sir, I want to build the open stack (train or stack version) in our premises so request you to please help me the correct document so that we can install the same. Best Regards, Umesh Mishra Manager-IT&Devops | Click2Cloud Inc. | www.click2cloud.net Email: umesh.mishra at click2cloud.net | Mobile: +91 7738599311 From: VeeraReddy Sent: Monday, November 4, 2019 1:51 PM To: openstack-dev at lists.openstack.org Subject: [openstack-dev][kuryr] Unable to create pod HI, Pod is not creating . [Inline image] Status of pod is always "ContainerCreating" I installed openstack & kubernetes using below link https://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html devstack at kuryr-daemon.service log :http://paste.openstack.org/show/785758/ Regards, Veera. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 17533 bytes Desc: image001.png URL: From ltomasbo at redhat.com Mon Nov 4 08:34:31 2019 From: ltomasbo at redhat.com (Luis Tomas Bolivar) Date: Mon, 4 Nov 2019 09:34:31 +0100 Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: <1134736099.729706.1572855646143@mail.yahoo.com> References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> Message-ID: Seems like your kuryr-cni is not able to reach the k8s API (10.0.0.129) On Mon, Nov 4, 2019 at 9:26 AM VeeraReddy wrote: > HI, > Pod is not creating . > [image: Inline image] > Status of pod is always "ContainerCreating" > > > I installed openstack & kubernetes using below link > > https://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html > > > *devstack at kuryr-daemon.service log* : > http://paste.openstack.org/show/785758/ > > Regards, > Veera. > -- LUIS TOMÁS BOLÍVAR Senior Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/jpeg Size: 17533 bytes Desc: not available URL: From veeraready at yahoo.co.in Mon Nov 4 08:50:38 2019 From: veeraready at yahoo.co.in (VeeraReddy) Date: Mon, 4 Nov 2019 08:50:38 +0000 (UTC) Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> Message-ID: <1392109807.755528.1572857438445@mail.yahoo.com> Hi Umesh, Use below devstack link to install openstackhttps://docs.openstack.org/devstack/latest/#download-devstack Instead of "git clone https://opendev.org/openstack/devstackuse git clone https://opendev.org/openstack/devstack -b stable/train Regards, Veera. On Monday, 4 November, 2019, 02:06:16 pm IST, Umesh Mishra wrote: #yiv3082777750 #yiv3082777750 -- _filtered #yiv3082777750 {font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv3082777750 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv3082777750 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}#yiv3082777750 #yiv3082777750 p.yiv3082777750MsoNormal, #yiv3082777750 li.yiv3082777750MsoNormal, #yiv3082777750 div.yiv3082777750MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv3082777750 a:link, #yiv3082777750 span.yiv3082777750MsoHyperlink {color:blue;text-decoration:underline;}#yiv3082777750 a:visited, #yiv3082777750 span.yiv3082777750MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv3082777750 p.yiv3082777750msonormal0, #yiv3082777750 li.yiv3082777750msonormal0, #yiv3082777750 div.yiv3082777750msonormal0 {margin-right:0in;margin-left:0in;font-size:11.0pt;font-family:sans-serif;}#yiv3082777750 span.yiv3082777750EmailStyle19 {font-family:sans-serif;color:windowtext;}#yiv3082777750 .yiv3082777750MsoChpDefault {font-size:10.0pt;} _filtered #yiv3082777750 {margin:1.0in 1.0in 1.0in 1.0in;}#yiv3082777750 div.yiv3082777750WordSection1 {}#yiv3082777750 Dear Sir,   I want to build the open stack (train or stack version) in our premises so request you to please help me the correct document so that we can install the same.       Best Regards, Umesh Mishra Manager-IT&Devops | Click2Cloud Inc. |www.click2cloud.net Email: umesh.mishra at click2cloud.net |  Mobile: +91 7738599311   From: VeeraReddy Sent: Monday, November 4, 2019 1:51 PM To: openstack-dev at lists.openstack.org Subject: [openstack-dev][kuryr] Unable to create pod   HI, Pod is not creating . Status of pod is always "ContainerCreating"     I installed openstack & kubernetes using below link https://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html     devstack at kuryr-daemon.service log :http://paste.openstack.org/show/785758/   Regards, Veera. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 17533 bytes Desc: not available URL: From ekcs.openstack at gmail.com Mon Nov 4 09:06:21 2019 From: ekcs.openstack at gmail.com (Eric K) Date: Mon, 4 Nov 2019 09:06:21 +0000 Subject: [self-healing][autohealing][ptg] self-healing session 11/5@1:40PM Message-ID: Looking forward to seeing you tomorrow at the PTG! The self-healing SIG will meet to continue the work on making self-healing easy and available. We’re especially eager to hear/discuss user feedback, feature requests, and ideas for making self-healing easier and better. The session will take place on Tuesday 11/5 at 1:40 - 3:10 PM in room 431 [1]. Planned topics include predictive analytics, cross-project testing, user feedback, and unified bug reporting. Additional topics and comments welcome in the etherpad: https://etherpad.openstack.org/p/SHA-self-healing-SIG [1] Map and schedule https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/Uploads/PTG-Shanghai2019-Map-Schedule.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From rishabh.sikka at hpe.com Mon Nov 4 09:15:15 2019 From: rishabh.sikka at hpe.com (Sikka, Rishabh) Date: Mon, 4 Nov 2019 09:15:15 +0000 Subject: Openstack third pary CI Implementation(Zuul V3 , Nodepool, Queens) Message-ID: Dear Team , I am installing Zuul V3 ,Nodepool for our openstack third party CI ,Please let me know if any documentation is available for the same as I am struggling with the steps shared on zuul official page. Also please do let me know if #openstack-third-party-ci is correct IRC channel related to third party ci implementation, if it is correct please refer some of the name who had already implemented it. I had already tried posting my questions on same IRC channel but did not get the desired reply. Note -: If above PDL is not correct , please refer it to the correct PDL. Regards Rishabh Sikka -------------- next part -------------- An HTML attachment was scrubbed... URL: From veeraready at yahoo.co.in Mon Nov 4 09:45:12 2019 From: veeraready at yahoo.co.in (VeeraReddy) Date: Mon, 4 Nov 2019 09:45:12 +0000 (UTC) Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> Message-ID: <705559766.789758.1572860712151@mail.yahoo.com> HI ,I am getting errors in "kuryr-daemon"http://paste.openstack.org/show/785762/ And kuryr-cni daemon is terminating Regards, Veera. On Monday, 4 November, 2019, 02:10:45 pm IST, Luis Tomas Bolivar wrote: Seems like your kuryr-cni is not able to reach the k8s API (10.0.0.129) On Mon, Nov 4, 2019 at 9:26 AM VeeraReddy wrote: HI,Pod is not creating . Status of pod is always "ContainerCreating" I installed openstack & kubernetes using below linkhttps://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html devstack at kuryr-daemon.service log :http://paste.openstack.org/show/785758/ Regards, Veera. -- LUIS TOMÁS BOLÍVAR Senior Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com     -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/jpeg Size: 17533 bytes Desc: not available URL: From mark at stackhpc.com Mon Nov 4 10:09:41 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 4 Nov 2019 10:09:41 +0000 Subject: [neutron][docs] networking-onos EOL? Message-ID: Hi, We (kolla) had a bug report [1] from someone trying to use the neutron onos_ml2 ML2 driver for the ONOS SDN controller. As far as I can tell [2], this project hasn't been released since 2015. However, the 'latest' documentation is still accessible [3], and does not mention that the project is dead. What can we do to help steer people away from projects like this? Cheers, Mark [1] https://bugs.launchpad.net/bugs/1850763 [2] https://pypi.org/project/networking-onos/#history [3] https://docs.openstack.org/networking-onos/latest/ From mdemaced at redhat.com Mon Nov 4 10:15:57 2019 From: mdemaced at redhat.com (Maysa De Macedo Souza) Date: Mon, 4 Nov 2019 11:15:57 +0100 Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: <705559766.789758.1572860712151@mail.yahoo.com> References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> <705559766.789758.1572860712151@mail.yahoo.com> Message-ID: Hi VeeraReddy, Could you check if the API load balancer is ACTIVE? Best, Maysa. On Mon, Nov 4, 2019 at 10:49 AM VeeraReddy wrote: > HI , > I am getting errors in "kuryr-daemon" > http://paste.openstack.org/show/785762/ > > And kuryr-cni daemon is terminating > > Regards, > Veera. > > > On Monday, 4 November, 2019, 02:10:45 pm IST, Luis Tomas Bolivar < > ltomasbo at redhat.com> wrote: > > > Seems like your kuryr-cni is not able to reach the k8s API (10.0.0.129) > > On Mon, Nov 4, 2019 at 9:26 AM VeeraReddy wrote: > > HI, > Pod is not creating . > [image: Inline image] > Status of pod is always "ContainerCreating" > > > I installed openstack & kubernetes using below link > > https://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html > > > *devstack at kuryr-daemon.service log* : > http://paste.openstack.org/show/785758/ > > Regards, > Veera. > > > > -- > LUIS TOMÁS BOLÍVAR > Senior Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/jpeg Size: 17533 bytes Desc: not available URL: From veeraready at yahoo.co.in Mon Nov 4 11:09:00 2019 From: veeraready at yahoo.co.in (VeeraReddy) Date: Mon, 4 Nov 2019 11:09:00 +0000 (UTC) Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> <705559766.789758.1572860712151@mail.yahoo.com> Message-ID: <469441979.831114.1572865740439@mail.yahoo.com> Hi Maysa, stack at user-OptiPlex-7050:~$ openstack service show octavia+-------------+----------------------------------+| Field       | Value                            |+-------------+----------------------------------+| description | Octavia Load Balancing Service   || enabled     | True                             || id          | 7ddcb424fdad4281aad3652dbbb1ca42 || name        | octavia                          || type        | load-balancer                    |+-------------+----------------------------------+ Regards, Veera. On Monday, 4 November, 2019, 03:51:41 pm IST, Maysa De Macedo Souza wrote: Hi VeeraReddy, Could you check if the API load balancer is ACTIVE? Best,Maysa. On Mon, Nov 4, 2019 at 10:49 AM VeeraReddy wrote: HI ,I am getting errors in "kuryr-daemon"http://paste.openstack.org/show/785762/ And kuryr-cni daemon is terminating Regards, Veera. On Monday, 4 November, 2019, 02:10:45 pm IST, Luis Tomas Bolivar wrote: Seems like your kuryr-cni is not able to reach the k8s API (10.0.0.129) On Mon, Nov 4, 2019 at 9:26 AM VeeraReddy wrote: HI,Pod is not creating . Status of pod is always "ContainerCreating" I installed openstack & kubernetes using below linkhttps://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html devstack at kuryr-daemon.service log :http://paste.openstack.org/show/785758/ Regards, Veera. -- LUIS TOMÁS BOLÍVAR Senior Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com     -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/jpeg Size: 17533 bytes Desc: not available URL: From ltomasbo at redhat.com Mon Nov 4 11:18:19 2019 From: ltomasbo at redhat.com (Luis Tomas Bolivar) Date: Mon, 4 Nov 2019 12:18:19 +0100 Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: <469441979.831114.1572865740439@mail.yahoo.com> References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> <705559766.789758.1572860712151@mail.yahoo.com> <469441979.831114.1572865740439@mail.yahoo.com> Message-ID: Hi Veera, She referred to the LoadBalancer VM, you will need to do 'openstack server list --all' and/or 'openstack loadbalancer amphora list' On Mon, Nov 4, 2019 at 12:09 PM VeeraReddy wrote: > Hi Maysa, > > stack at user-OptiPlex-7050:~$ openstack service show octavia > +-------------+----------------------------------+ > | Field | Value | > +-------------+----------------------------------+ > | description | Octavia Load Balancing Service | > | enabled | True | > | id | 7ddcb424fdad4281aad3652dbbb1ca42 | > | name | octavia | > | type | load-balancer | > +-------------+----------------------------------+ > > > > > Regards, > Veera. > > > On Monday, 4 November, 2019, 03:51:41 pm IST, Maysa De Macedo Souza < > mdemaced at redhat.com> wrote: > > > Hi VeeraReddy, > > Could you check if the API load balancer is ACTIVE? > > Best, > Maysa. > > On Mon, Nov 4, 2019 at 10:49 AM VeeraReddy wrote: > > HI , > I am getting errors in "kuryr-daemon" > http://paste.openstack.org/show/785762/ > > And kuryr-cni daemon is terminating > > Regards, > Veera. > > > On Monday, 4 November, 2019, 02:10:45 pm IST, Luis Tomas Bolivar < > ltomasbo at redhat.com> wrote: > > > Seems like your kuryr-cni is not able to reach the k8s API (10.0.0.129) > > On Mon, Nov 4, 2019 at 9:26 AM VeeraReddy wrote: > > HI, > Pod is not creating . > [image: Inline image] > Status of pod is always "ContainerCreating" > > > I installed openstack & kubernetes using below link > > https://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html > > > *devstack at kuryr-daemon.service log* : > http://paste.openstack.org/show/785758/ > > Regards, > Veera. > > > > -- > LUIS TOMÁS BOLÍVAR > Senior Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > > -- LUIS TOMÁS BOLÍVAR Senior Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/jpeg Size: 17533 bytes Desc: not available URL: From mdemaced at redhat.com Mon Nov 4 11:21:51 2019 From: mdemaced at redhat.com (Maysa De Macedo Souza) Date: Mon, 4 Nov 2019 12:21:51 +0100 Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: <469441979.831114.1572865740439@mail.yahoo.com> References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> <705559766.789758.1572860712151@mail.yahoo.com> <469441979.831114.1572865740439@mail.yahoo.com> Message-ID: Hi VeeraReddy, You checked if Octavia is enabled on your system. In order to check if the API load balancer is active, you should check the field provisioning_status on the output of the following command: "openstack loadbalancer list | grep 10.0.0.129" According to your last logs I would imagine the API is down. If that is the case, you would need to manually recreate the lbaas or retrigger the installation. Best, Maysa. On Mon, Nov 4, 2019 at 12:09 PM VeeraReddy wrote: > Hi Maysa, > > stack at user-OptiPlex-7050:~$ openstack service show octavia > +-------------+----------------------------------+ > | Field | Value | > +-------------+----------------------------------+ > | description | Octavia Load Balancing Service | > | enabled | True | > | id | 7ddcb424fdad4281aad3652dbbb1ca42 | > | name | octavia | > | type | load-balancer | > +-------------+----------------------------------+ > > > > > Regards, > Veera. > > > On Monday, 4 November, 2019, 03:51:41 pm IST, Maysa De Macedo Souza < > mdemaced at redhat.com> wrote: > > > Hi VeeraReddy, > > Could you check if the API load balancer is ACTIVE? > > Best, > Maysa. > > On Mon, Nov 4, 2019 at 10:49 AM VeeraReddy wrote: > > HI , > I am getting errors in "kuryr-daemon" > http://paste.openstack.org/show/785762/ > > And kuryr-cni daemon is terminating > > Regards, > Veera. > > > On Monday, 4 November, 2019, 02:10:45 pm IST, Luis Tomas Bolivar < > ltomasbo at redhat.com> wrote: > > > Seems like your kuryr-cni is not able to reach the k8s API (10.0.0.129) > > On Mon, Nov 4, 2019 at 9:26 AM VeeraReddy wrote: > > HI, > Pod is not creating . > [image: Inline image] > Status of pod is always "ContainerCreating" > > > I installed openstack & kubernetes using below link > > https://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html > > > *devstack at kuryr-daemon.service log* : > http://paste.openstack.org/show/785758/ > > Regards, > Veera. > > > > -- > LUIS TOMÁS BOLÍVAR > Senior Software Engineer > Red Hat > Madrid, Spain > ltomasbo at redhat.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/jpeg Size: 17533 bytes Desc: not available URL: From cdent+os at anticdent.org Mon Nov 4 11:37:36 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Mon, 4 Nov 2019 11:37:36 +0000 (GMT) Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> Message-ID: On Fri, 1 Nov 2019, Matt Riedemann wrote: > On 11/1/2019 9:55 AM, Clark Boylan wrote: >> OVH controls the disk IOPs that we get pretty aggressively as well. >> Possible it is an IO thing? > > Yeah, so looking at the dstat output in that graph (thanks for pointing out > that site, really nice) we basically have 0 I/O from 16:53 to 16:55, so uh, > that's probably not good. What happens in a case like this? Is there an official procedure for "hey, can you give is more IO?" or (if that's not an option) "can you give us less CPU?". Is that something that is automated, is is something that is monitored and alarming? "INAP ran out of IO X times in the last N hours, light the beacons!" -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From lennyb at mellanox.com Mon Nov 4 11:38:30 2019 From: lennyb at mellanox.com (Lenny Verkhovsky) Date: Mon, 4 Nov 2019 11:38:30 +0000 Subject: Openstack third pary CI Implementation(Zuul V3 , Nodepool, Queens) In-Reply-To: References: Message-ID: Hi, Yes, irc channel is correct, sorry I missed your question. There are few docs and examples of how to do it that are working[1]. We had few issues with this configuration, and since we are using physical servers with our Hardware We decided to migrate from zuul2 to Jenkins Gerrit Plugin[2] Best Regards Lenny. [1] https://docs.openstack.org/infra/manual/zuulv3.html https://zuul-ci.org/docs/zuul/admin/quick-start.html https://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3-3rd-party-ci.html https://docs.openstack.org/infra/system-config/third_party.html https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html [2] https://wiki.jenkins.io/display/JENKINS/Gerrit+Trigger From: Sikka, Rishabh Sent: Monday, November 4, 2019 11:15 AM To: openstack-dev at lists.openstack.org Cc: Rane, Vishal Subject: Openstack third pary CI Implementation(Zuul V3 , Nodepool, Queens) Dear Team , I am installing Zuul V3 ,Nodepool for our openstack third party CI ,Please let me know if any documentation is available for the same as I am struggling with the steps shared on zuul official page. Also please do let me know if #openstack-third-party-ci is correct IRC channel related to third party ci implementation, if it is correct please refer some of the name who had already implemented it. I had already tried posting my questions on same IRC channel but did not get the desired reply. Note -: If above PDL is not correct , please refer it to the correct PDL. Regards Rishabh Sikka -------------- next part -------------- An HTML attachment was scrubbed... URL: From doka.ua at gmx.com Mon Nov 4 11:47:55 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Mon, 4 Nov 2019 13:47:55 +0200 Subject: BGP dynamic routing Message-ID: <2e0d4ab1-824c-b605-91ad-93536044ae69@gmx.com> Dear colleagues, "BGP dynamic routing" doc (https://docs.openstack.org/neutron/rocky/admin/config-bgp-dynamic-routing.html) says only about advertisement of routes: "BGP dynamic routing enables advertisement of self-service (private) network prefixes to physical network devices that support BGP such as routers, thus removing the conventional dependency on static routes." and nothing about receiving of routes from external peers. Whether it is ever possible using Neutron to have fully dynamic routing inside the project, both advertising/receiving (and updating VRs configuration) routes to/from remote peers? Thank you. -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison From mark at stackhpc.com Mon Nov 4 14:15:17 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 4 Nov 2019 14:15:17 +0000 Subject: [kolla][ptg] Kolla Ussuri PVPTG (Partially Virtual PTG) In-Reply-To: References: Message-ID: On Wed, 30 Oct 2019 at 17:26, Radosław Piliszek wrote: > > Hello Everyone, > > As you may already know, Kolla core team is mostly not present on summit in Shanghai. > Instead we are organizing a PTG next week, 7-8th Nov (Thu-Fri), in Białystok, Poland. > Please let me know this week if you are interested in coming in person. > > We invite operators, contributors and contributors-to-be to join us for the virtual PTG online. > The time schedule will be advertised later. After polling participants, we have agreed to meet at 1400 - 1800 UTC on Thursday and Friday this week. Since not all participants can make the first hour, we will adjust the schedule accordingly. Marcin will follow with connection details for the Zoom video conference. Please continue to update the etherpad with potential topics for discussion. I will propose a rough agenda over the next few days. Mark > > Please fill yourself in on the whiteboard [1]. > New ideas are welcome. > > [1] https://etherpad.openstack.org/p/kolla-ussuri-ptg > > Kind regards, > Radek aka yoctozepto > From swamireddy at gmail.com Mon Nov 4 14:34:13 2019 From: swamireddy at gmail.com (M Ranga Swami Reddy) Date: Mon, 4 Nov 2019 20:04:13 +0530 Subject: Cinder multi backend quota update In-Reply-To: References: Message-ID: Great. Its working. Thanks Swami On Wed, Oct 30, 2019 at 11:18 PM Mohammed Naser wrote: > I didn't try this but.. > > openstack quota set --volume-type ceph --volumes 20 project-id > > should do the trick. > > Bonne chance > > On Wed, Oct 30, 2019 at 1:38 PM M Ranga Swami Reddy > wrote: > > > > Hello, > > We use 2 types of volume, like volumes and volumes_ceph. > > I can update the quota for volumes quota using "cinder quota-update > --volumes=20 project-id" > > > > But for volumes_ceph, the above CLI failed with volumes_ceph un > recognised option.. > > Any suggestions here? > > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. https://vexxhost.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Mon Nov 4 14:57:44 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 4 Nov 2019 08:57:44 -0600 Subject: State of the Gate (placement?) In-Reply-To: <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> Message-ID: On 11/1/2019 9:55 AM, Clark Boylan wrote: > INAP was also recently turned back on. It had been offline for redeployment and that was completed and added back to the pool. Possible that more than just the openstack version has changed? > > OVH controls the disk IOPs that we get pretty aggressively as well. Possible it is an IO thing? Related to slow nodes, I noticed this failed recently, it's a synchronous RPC call from nova-api to nova-compute that timed out after 60 seconds [1]. Looking at MessagingTimeout errors in the nova-api logs shows it's mostly in INAP and OVH nodes [2] so there seems to be a pattern emerging with those being slow nodes causing issues. There are ways we could workaround this a bit on the nova side [3] but I'm not sure how much we want to make parts of nova super resilient to very slow nodes when real life operations would probably need to know about this kind of thing to scale up/out their control plane. [1] https://zuul.opendev.org/t/openstack/build/ef0196fe84804b44ac106d011c8c29ea/log/controller/logs/screen-n-api.txt.gz?severity=4 [2] http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22MessagingTimeout%5C%22%20AND%20tags%3A%5C%22screen-n-api.txt%5C%22&from=7d [3] https://review.opendev.org/#/c/692550/ -- Thanks, Matt From donny at fortnebula.com Mon Nov 4 15:08:52 2019 From: donny at fortnebula.com (Donny Davis) Date: Mon, 4 Nov 2019 10:08:52 -0500 Subject: BGP dynamic routing In-Reply-To: <2e0d4ab1-824c-b605-91ad-93536044ae69@gmx.com> References: <2e0d4ab1-824c-b605-91ad-93536044ae69@gmx.com> Message-ID: The way I use it is to dynamically advertise my tenant networks to the edge. The edge router still handles routes in the rest of my infra. Works pretty well for me. Donny Davis c: 805 814 6800 On Mon, Nov 4, 2019, 6:52 AM Volodymyr Litovka wrote: > Dear colleagues, > > "BGP dynamic routing" doc > ( > https://docs.openstack.org/neutron/rocky/admin/config-bgp-dynamic-routing.html > ) > says only about advertisement of routes: "BGP dynamic routing enables > advertisement of self-service (private) network prefixes to physical > network devices that support BGP such as routers, thus removing the > conventional dependency on static routes." and nothing about receiving > of routes from external peers. > > Whether it is ever possible using Neutron to have fully dynamic routing > inside the project, both advertising/receiving (and updating VRs > configuration) routes to/from remote peers? > > Thank you. > > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doka.ua at gmx.com Mon Nov 4 16:28:11 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Mon, 4 Nov 2019 18:28:11 +0200 Subject: BGP dynamic routing In-Reply-To: References: <2e0d4ab1-824c-b605-91ad-93536044ae69@gmx.com> Message-ID: <4016b2a9-f3f4-3e4e-211f-4d6c1a63dc3e@gmx.com> Hi Donny, the question if I have few peers to few PoPs, everyone with own set of prefixes and need to import these external prefixes INTO the tenant. On 04.11.2019 17:08, Donny Davis wrote: > The way I use it is to dynamically advertise my tenant networks to the > edge. The edge router still handles routes in the rest of my infra. > > Works pretty well for me. > > Donny Davis > c: 805 814 6800 > > On Mon, Nov 4, 2019, 6:52 AM Volodymyr Litovka > wrote: > > Dear colleagues, > > "BGP dynamic routing" doc > (https://docs.openstack.org/neutron/rocky/admin/config-bgp-dynamic-routing.html) > says only about advertisement of routes: "BGP dynamic routing enables > advertisement of self-service (private) network prefixes to physical > network devices that support BGP such as routers, thus removing the > conventional dependency on static routes." and nothing about receiving > of routes from external peers. > > Whether it is ever possible using Neutron to have fully dynamic > routing > inside the project, both advertising/receiving (and updating VRs > configuration) routes to/from remote peers? > > Thank you. > > -- > Volodymyr Litovka >    "Vision without Execution is Hallucination." -- Thomas Edison > > -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From donny at fortnebula.com Mon Nov 4 16:31:36 2019 From: donny at fortnebula.com (Donny Davis) Date: Mon, 4 Nov 2019 11:31:36 -0500 Subject: BGP dynamic routing In-Reply-To: <4016b2a9-f3f4-3e4e-211f-4d6c1a63dc3e@gmx.com> References: <2e0d4ab1-824c-b605-91ad-93536044ae69@gmx.com> <4016b2a9-f3f4-3e4e-211f-4d6c1a63dc3e@gmx.com> Message-ID: To be honest I only use it for the use case I listed before, so beyond that I am not going to be much help. However.. they are both speaking bgp I would imagine that it works the same way as any bgp instance. Give it a whirl and let us know how it works out. :) On Mon, Nov 4, 2019 at 11:28 AM Volodymyr Litovka wrote: > Hi Donny, > > the question if I have few peers to few PoPs, everyone with own set of > prefixes and need to import these external prefixes INTO the tenant. > > > On 04.11.2019 17:08, Donny Davis wrote: > > The way I use it is to dynamically advertise my tenant networks to the > edge. The edge router still handles routes in the rest of my infra. > > Works pretty well for me. > > Donny Davis > c: 805 814 6800 > > On Mon, Nov 4, 2019, 6:52 AM Volodymyr Litovka wrote: > >> Dear colleagues, >> >> "BGP dynamic routing" doc >> ( >> https://docs.openstack.org/neutron/rocky/admin/config-bgp-dynamic-routing.html >> ) >> says only about advertisement of routes: "BGP dynamic routing enables >> advertisement of self-service (private) network prefixes to physical >> network devices that support BGP such as routers, thus removing the >> conventional dependency on static routes." and nothing about receiving >> of routes from external peers. >> >> Whether it is ever possible using Neutron to have fully dynamic routing >> inside the project, both advertising/receiving (and updating VRs >> configuration) routes to/from remote peers? >> >> Thank you. >> >> -- >> Volodymyr Litovka >> "Vision without Execution is Hallucination." -- Thomas Edison >> >> >> > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison > > -- ~/DonnyD C: 805 814 6800 "No mission too difficult. No sacrifice too great. Duty First" -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcin.juszkiewicz at linaro.org Mon Nov 4 17:04:50 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Mon, 4 Nov 2019 18:04:50 +0100 Subject: [kolla][ptg] Kolla Ussuri PVPTG (Partially Virtual PTG) In-Reply-To: References: Message-ID: <8a6d214d-970e-c923-570e-e031aa364305@linaro.org> On 04.11.2019 15:15, Mark Goddard wrote: > After polling participants, we have agreed to meet at 1400 - 1800 UTC > on Thursday and Friday this week. Since not all participants can make > the first hour, we will adjust the schedule accordingly. > > Marcin will follow with connection details for the Zoom video conference. As we agreed on Zoom I did a setup of meeting. https://zoom.us/j/157063687 will be available for 1400-1800 UTC on both Thursday and Friday. Sessions will be recorded by platform. From pierre at stackhpc.com Mon Nov 4 22:55:03 2019 From: pierre at stackhpc.com (Pierre Riteau) Date: Mon, 4 Nov 2019 23:55:03 +0100 Subject: [blazar] Shanghai Summit and PTG activities for Blazar; no IRC meetings Message-ID: Hello, Several of the Blazar core reviewers and contributors will be in Shanghai this week. Don't hesitate to talk to them if you are interested in resource reservation as a service. On Tuesday there will be a project update: https://www.openstack.org/summit/shanghai-2019/summit-schedule/events/24373/blazar-project-update-november-2019 And on Friday at the PTG, there will be a project onboarding session, as well as technical discussions. Since most of the team is in Shanghai, IRC meetings are cancelled this week. Best wishes, Pierre From mriedemos at gmail.com Tue Nov 5 00:56:00 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 4 Nov 2019 18:56:00 -0600 Subject: [infra][qa] multiline tracebacks not being indexed anymore Message-ID: We used to be able to query for things like this: message:"in reserve_block_device_name" AND message:"MessagingTimeout" AND tags:"screen-n-api.txt" to fingerprint a traceback in logstash like this [1] but that no longer works. The multiline logstash filter is at [2] but doesn't seem to be getting applied anymore. I asked about this in -infra today and fungi said: "(4:44:00 PM) fungi: mriedem: i suspect that coincided with switching away from osla, we may need some means of parsing tracebacks out of logs in the indexer" I don't know what that means (what's osla? is [2] no longer used?) but if someone could point me at some things to look at I could see if I can generate a fix. [1] https://zuul.opendev.org/t/openstack/build/ef0196fe84804b44ac106d011c8c29ea/log/controller/logs/screen-n-api.txt.gz?severity=4#31403 [2] https://opendev.org/openstack/logstash-filters/src/branch/master/filters/openstack-filters.conf -- Thanks, Matt From cboylan at sapwetik.org Tue Nov 5 00:58:22 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 05 Nov 2019 08:58:22 +0800 Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> Message-ID: On Mon, Nov 4, 2019, at 7:37 PM, Chris Dent wrote: > On Fri, 1 Nov 2019, Matt Riedemann wrote: > > > On 11/1/2019 9:55 AM, Clark Boylan wrote: > >> OVH controls the disk IOPs that we get pretty aggressively as well. > >> Possible it is an IO thing? > > > > Yeah, so looking at the dstat output in that graph (thanks for pointing out > > that site, really nice) we basically have 0 I/O from 16:53 to 16:55, so uh, > > that's probably not good. > > What happens in a case like this? Is there an official procedure for > "hey, can you give is more IO?" or (if that's not an option) "can > you give us less CPU?". Is that something that is automated, is is > something that is monitored and alarming? "INAP ran out of IO X > times in the last N hours, light the beacons!" Typically we try to work with the clouds to properly root cause the issue. Then from there we can figure out what the best fix may be. They are running our software after all and there is a good chance the problems are in openstack. I'm in shanghai at the moment but if others want to reach out feel free. benj_ and mgagne are at inap and amorin has been helpful at ovh. The test node logs include a hostid in them somewhere which an be used to identify hypervisors if necessary. Clark From cboylan at sapwetik.org Tue Nov 5 01:03:39 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 05 Nov 2019 09:03:39 +0800 Subject: [infra][qa] multiline tracebacks not being indexed anymore In-Reply-To: References: Message-ID: <9c854572-e486-4515-8a5f-c9ba5b8d0fa7@www.fastmail.com> On Tue, Nov 5, 2019, at 8:56 AM, Matt Riedemann wrote: > We used to be able to query for things like this: > > message:"in reserve_block_device_name" AND message:"MessagingTimeout" > AND tags:"screen-n-api.txt" > > to fingerprint a traceback in logstash like this [1] but that no longer > works. The multiline logstash filter is at [2] but doesn't seem to be > getting applied anymore. > > I asked about this in -infra today and fungi said: > > "(4:44:00 PM) fungi: mriedem: i suspect that coincided with switching > away from osla, we may need some means of parsing tracebacks out of logs > in the indexer" > > I don't know what that means (what's osla? is [2] no longer used?) but > if someone could point me at some things to look at I could see if I can > generate a fix. os-loganalyze, https://opendev.org/openstack/os-loganalyze, was in use on the old log server to do filtering of severity and related manipulation. One thing it would do is collapse lines that didn't have a timestamps or severity prefix. However I think that may have only been for the html rendering which logstash didn't use. I'm not sure this is the issue. As for debugging this you can grab a log file and send it through logstash locally and fiddle with the rules until you get what you want. I'd help but currently at the summit and not in a good spot to do so. > > [1] > https://zuul.opendev.org/t/openstack/build/ef0196fe84804b44ac106d011c8c29ea/log/controller/logs/screen-n-api.txt.gz?severity=4#31403 > [2] > https://opendev.org/openstack/logstash-filters/src/branch/master/filters/openstack-filters.conf > > -- > > Thanks, > > Matt > > From eandersson at blizzard.com Tue Nov 5 01:11:03 2019 From: eandersson at blizzard.com (Erik Olof Gunnar Andersson) Date: Tue, 5 Nov 2019 01:11:03 +0000 Subject: [Senlin] Splitting senlin-engine into three services Message-ID: We are looking into splitting the senlin-engine into three components (senlin-conductor, senlin-engine and senlin-health-manager) and wanted to get some feedback. The main goal here is to make the components more resilient and to reduce the number of threads per worker. Each one of the components already had it's own thread pool and in theory each worker could end up with thousands of thread. In the current version (Train) the engine process hosts these services. https://github.com/openstack/senlin/blob/stable/train/senlin/engine/dispatcher.py#L31 https://github.com/openstack/senlin/blob/stable/train/senlin/engine/health_manager.py#L865 https://github.com/openstack/senlin/blob/stable/train/senlin/engine/service.py#L79 In my patch we move two our of these out of the engine and into it's own service namespace. Split engine service into three services https://review.opendev.org/#/c/688784/ Please feel free to comment on the patch set, or let reply to this email with general feedback or concerns. Best Regards, Erik Olof Gunnar Andersson -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Tue Nov 5 01:52:11 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Tue, 5 Nov 2019 02:52:11 +0100 Subject: Installation DOC for Three node In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Tue Nov 5 06:32:50 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 5 Nov 2019 14:32:50 +0800 Subject: [cinder] ussuri PTG schedule Message-ID: The Ussuri PTG schedule is live: https://etherpad.openstack.org/p/shanghai-ptg-cinder Please check the schedule and let me know right away if your session causes a conflict for you. Except for the few fixed-time topics, we will follow the cinder tradition of dynamic scheduling, giving each topic exactly as much time as it needs and adjusting as we go. cheers, brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From akekane at redhat.com Tue Nov 5 04:07:34 2019 From: akekane at redhat.com (Abhishek Kekane) Date: Tue, 5 Nov 2019 09:37:34 +0530 Subject: [glance] Shanghai Project Update Message-ID: Hi All, We had a very good project update session in Shanghai OpenInfra summit, where we have covered what we have done in Train cycle and what our priorities are in upcoming Ussuri cycle. Attaching the project update PDF file here for your reference. Thanks & Best Regards, Abhishek Kekane -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Glance Project Update-Train.pdf Type: application/pdf Size: 95919 bytes Desc: not available URL: From veeraready at yahoo.co.in Tue Nov 5 07:01:43 2019 From: veeraready at yahoo.co.in (VeeraReddy) Date: Tue, 5 Nov 2019 07:01:43 +0000 (UTC) Subject: [openstack-dev][kuryr] Unable to create pod In-Reply-To: References: <1134736099.729706.1572855646143.ref@mail.yahoo.com> <1134736099.729706.1572855646143@mail.yahoo.com> <705559766.789758.1572860712151@mail.yahoo.com> <469441979.831114.1572865740439@mail.yahoo.com> Message-ID: <1252984897.1189394.1572937303384@mail.yahoo.com> Hi Maysa,My API load balancer is Active kublet log with error : http://paste.openstack.org/show/785793/ Regards, Veera. On Monday, 4 November, 2019, 04:57:33 pm IST, Maysa De Macedo Souza wrote: Hi VeeraReddy, You checked if Octavia is enabled on your system. In order to check if the API load balancer is active, you should check the field provisioning_status on the output of the following command: "openstack loadbalancer list | grep 10.0.0.129" According to your last logs I would imagine the API is down. If that is the case, you would need to manually recreate the lbaas or retrigger the installation. Best,Maysa. On Mon, Nov 4, 2019 at 12:09 PM VeeraReddy wrote: Hi Maysa, stack at user-OptiPlex-7050:~$ openstack service show octavia+-------------+----------------------------------+| Field       | Value                            |+-------------+----------------------------------+| description | Octavia Load Balancing Service   || enabled     | True                             || id          | 7ddcb424fdad4281aad3652dbbb1ca42 || name        | octavia                          || type        | load-balancer                    |+-------------+----------------------------------+ Regards, Veera. On Monday, 4 November, 2019, 03:51:41 pm IST, Maysa De Macedo Souza wrote: Hi VeeraReddy, Could you check if the API load balancer is ACTIVE? Best,Maysa. On Mon, Nov 4, 2019 at 10:49 AM VeeraReddy wrote: HI ,I am getting errors in "kuryr-daemon"http://paste.openstack.org/show/785762/ And kuryr-cni daemon is terminating Regards, Veera. On Monday, 4 November, 2019, 02:10:45 pm IST, Luis Tomas Bolivar wrote: Seems like your kuryr-cni is not able to reach the k8s API (10.0.0.129) On Mon, Nov 4, 2019 at 9:26 AM VeeraReddy wrote: HI,Pod is not creating . Status of pod is always "ContainerCreating" I installed openstack & kubernetes using below linkhttps://docs.openstack.org/kuryr-kubernetes/latest/installation/devstack/basic.html devstack at kuryr-daemon.service log :http://paste.openstack.org/show/785758/ Regards, Veera. -- LUIS TOMÁS BOLÍVAR Senior Software Engineer Red Hat Madrid, Spain ltomasbo at redhat.com     -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572937258877blob.jpg Type: image/png Size: 14244 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1572855537758blob.jpg Type: image/jpeg Size: 17533 bytes Desc: not available URL: From colleen at gazlene.net Tue Nov 5 08:37:39 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Tue, 05 Nov 2019 16:37:39 +0800 Subject: [dev][ops][ptg][keystone] Join the keystone onboarding session! In-Reply-To: <7e0350e7-d249-4f5c-8a54-50c883bfb350@www.fastmail.com> References: <7e0350e7-d249-4f5c-8a54-50c883bfb350@www.fastmail.com> Message-ID: Don't forget to join me at the keystone onboarding session tomorrow morning (Wednesday Nov. 6) at the PTG! Colleen On Fri, Nov 1, 2019, at 10:49, Colleen Murphy wrote: > Hello Stackers, > > If you're a developer, technical writer, operator, or user and > interested in getting involved in the keystone project, stop by the > keystone onboarding session in Shanghai next week! We will be at the > Kilo table in the Blue Room on Wednesday from 9 to 10:30. The format > will be open ended, so come with all your questions about how you can > participate on the keystone team. > > Can't make it to the session? Take a look at our contributing guide[1] > and feel free to get in touch with me directly. > > Colleen Murphy / cmurphy (keystone PTL) > > [1] https://docs.openstack.org/keystone/latest/contributor/how-can-i-help.html > > From gmann at ghanshyammann.com Tue Nov 5 08:57:56 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 05 Nov 2019 16:57:56 +0800 Subject: [qa] QA Office hour canceled for this week Message-ID: <16e3ac8eaa6.e357366e69902.1578074033606401731@ghanshyammann.com> Hello Everyone, As we are in PTG, I will cancel the QA office hour for this week and will continue the same on 14th Nov week. -gmann From ralonsoh at redhat.com Tue Nov 5 09:51:27 2019 From: ralonsoh at redhat.com (Rodolfo Alonso) Date: Tue, 05 Nov 2019 09:51:27 +0000 Subject: [neutron][QoS] QoS meeting cancelled November 5 Message-ID: <3fdfc0b8bcfe70487d5d21e8b69f44804810e339.camel@redhat.com> Hello Neutrinos: Due to the summit this week, the Neutron QoS meeting will be cancelled. Next meeting will be November 19. Regards. From dharmendra.kushwaha at india.nec.com Tue Nov 5 11:57:31 2019 From: dharmendra.kushwaha at india.nec.com (Dharmendra Kushwaha) Date: Tue, 5 Nov 2019 11:57:31 +0000 Subject: [tacker] Ussuri PTG Message-ID: Tacker Folks, As our scheduled sessions will be finished by 12:30 pm, so lets meet after lunch. I will be available in PTG area full day. https://etherpad.openstack.org/p/Tacker-PTG-Ussuri Thanks & Regards Dharmendra Kushwaha ________________________________ The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NECTI or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NECTI or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. From umesh.mishra at click2cloud.net Tue Nov 5 12:41:53 2019 From: umesh.mishra at click2cloud.net (Umesh Mishra) Date: Tue, 5 Nov 2019 12:41:53 +0000 Subject: Installation DOC for Three node In-Reply-To: References: Message-ID: Dear Sir, We are trying to create the Machin but we unable to create could you please help or share me your skype id or contact number so that we can solve our issue. Best Regards, Umesh Mishra Manager-IT&Devops | Click2Cloud Inc. | www.click2cloud.net Email: umesh.mishra at click2cloud.net | Mobile: +91 7738599311 From: Sean McGinnis Sent: Tuesday, November 5, 2019 7:22 AM To: Umesh Mishra Cc: openstack-discuss at lists.openstack.org Subject: Re: Installation DOC for Three node Hi Umesh, Each project maintains installation guides for installing using system packages. That documentation can be found here: https://docs.openstack.org/train/install/ What you are more likely to want is a deployment tool that takes care of the installation for you. There are several options available, depending on your needs (container-based, perferred config management system, etc.). Information about those can be found here: https://www.openstack.org/software/project-navigator/deployment-tools Sent: Saturday, November 02, 2019 at 9:18 AM From: "Umesh Mishra" > To: "openstack-discuss at lists.openstack.org" > Subject: Installation DOC for Three node Dear Sir, This is inform you that, We want to build the openstack in our data center so request you please help to provide me doc so that we can build asap . Please help. Best Regards, Umesh Mishra Manager-IT&Devops | Click2Cloud Inc. | www.click2cloud.net Email: umesh.mishra at click2cloud.net | Mobile: +91 7738599311 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Tue Nov 5 14:41:37 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 5 Nov 2019 22:41:37 +0800 Subject: [neutron] PTG - remote access and reminders Message-ID: <20191105144137.qlr35jei4xcefred@skaplons-mac> Hi, Tomorrow (Wednesday) we are starting Neutron PTG session. Agenda is available at [1]. If You are not in Shanghai but would maybe try to participate remotely in the sessions, please reach out to me directly through email or IRC. I will try to provide some access and stream from the session. Also, please remember that on Wednesday morning we have onboarding session for new contributors. So if You are interested in contributing to Neutron, feel free to reach out to us in *room 431* - we are starting at 9:00 am :) [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning -- Slawek Kaplonski Senior software engineer Red Hat From jbeuque at cisco.com Tue Nov 5 14:45:47 2019 From: jbeuque at cisco.com (Jean Bernard Beuque (jbeuque)) Date: Tue, 5 Nov 2019 14:45:47 +0000 Subject: [neutron][tap-as-a-service] ERSPAN support Message-ID: Hello, I'd like to add ERSPAN support to the Tap-as-a-Service project. I've currently implemented a prototype that can be used with networking-vpp: https://opendev.org/x/networking-vpp The modified version of tap as a service is available here (The API has been extended to support ERSPAN): https://github.com/jbeuque/tap-as-a-service I don't know who maintains the Taas project. But if you think adding this functionality could be useful, please contact me. (Please take the modified version of Taas as a proposal to be discussed). Regards, Jean-Bernard Beuque -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Tue Nov 5 16:23:40 2019 From: openstack at fried.cc (Eric Fried) Date: Tue, 5 Nov 2019 10:23:40 -0600 Subject: [cinder][osc][docs] openstackclient docs for cinder v2 vs v3 Message-ID: <656f60e9-b5d6-5390-13c6-34347a9bc2e1@fried.cc> Howdy cinderinos. I've been on a mission to get all of the python-openstackclient command [1] and plugin [2] docs autogenerated rather than hardcoded [3] so you don't have to remember to update two places when you add/change a subcommand option. I'm almost done -- cinder is the last one -- but I want to confirm some odd observations before I dig in. - All of the v3 subcommands are implemented by code in the openstackclient.volume.v2 package. Where there's overlap, the command classes are identical from v2 to v3. However, it appears as though the v2 commands are a *superset* of the v3 commands. Specifically, the following appear in v2 but not v3 [4]: volume_backup_record_export volume_backup_record_import volume_backend_capability_show volume_backend_pool_list volume_host_failover Observations: * v3 has no other 'volume backup record' subcommands, but otherwise has the same 'volume backup' subcommands as v2. * v3 has no 'volume backend' subcommands. * v2 has both 'volume host failover' and 'volume host set', but v3 has only the latter. * It seems suspicious that the "missing" v3 commands comprise a contiguous block under the v2 entry point. So before I go creating a mess of v2-only and v2+v3 documents, I wanted to confirm that the above was actually intentional. - The existing hardcoded documents mention v1 and/or v2, but don't mention v3 at all (e.g. [5]). I want to confirm that it's okay for me to add mention of v3 where appropriate. Thanks, efried [1] https://docs.openstack.org/python-openstackclient/latest/cli/command-list.html [2] https://docs.openstack.org/python-openstackclient/latest/cli/plugin-commands/index.html [3] https://review.opendev.org/#/q/topic:generate-docs+(status:open+OR+status:merged) [4] https://opendev.org/openstack/python-openstackclient/src/tag/4.0.0/setup.cfg#L610-L616 [5] https://docs.openstack.org/python-openstackclient/train/cli/command-objects/volume.html From mriedemos at gmail.com Tue Nov 5 16:45:52 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 5 Nov 2019 10:45:52 -0600 Subject: [nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations? Message-ID: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> I was helping someone recover from a stuck live migration today where the migration record was stuck in pre-migrating status and somehow the request never hit the compute or was lost. The guest was stopped on the guest and basically the live migration either never started or never completed properly (maybe rabbit dropped the request or the compute service was restarted, I don't know). I instructed them to update the database to set the migration record status to 'error' and hard reboot the instance to get it running again. Then they pointed out they were seeing this in the compute logs: "There are allocations remaining against the source host that might need to be removed" That's because the source node allocations are still tracked in placement by the migration record and the dest node allocations are tracked by the instance. Cleaning that up is non-trivial. I have a troubleshooting doc started for manually cleaning up that kind of stuff here [1] but ultimately just told them to delete the allocations in placement for both the migration and the instance and then run the heal_allocations command to recreate the allocations for the instance. Since this person's nova deployment was running Stein, they don't have the --dry-run [2] or --instance [3] options for the heal_allocations command. This isn't a huge problem but it does mean they could be healing allocations for instances they didn't expect. They could work around this by installing nova from train or master in a VM/container/virtual environment and running it against the stein setup, but that's maybe more work than they want to do. The question I'm posing is if people would like to see those options backported to stein and if so, would the stable team be OK with it? I'd say this falls into a gray area where these are things that are optional, not used by default, and are operational tooling so less risk to backport, but it's not zero risk. It's also worth noting that when I wrote those patches I did so with the intent that people could backport them at least internally. [1] https://review.opendev.org/#/c/691427/ [2] https://review.opendev.org/#/c/651932/ [3] https://review.opendev.org/#/c/651945/ -- Thanks, Matt From dms at danplanet.com Tue Nov 5 16:51:13 2019 From: dms at danplanet.com (Dan Smith) Date: Tue, 05 Nov 2019 08:51:13 -0800 Subject: [nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations? In-Reply-To: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> (Matt Riedemann's message of "Tue, 5 Nov 2019 10:45:52 -0600") References: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> Message-ID: > The question I'm posing is if people would like to see those options > backported to stein and if so, would the stable team be OK with it? > I'd say this falls into a gray area where these are things that are > optional, not used by default, and are operational tooling so less > risk to backport, but it's not zero risk. It's also worth noting that > when I wrote those patches I did so with the intent that people could > backport them at least internally. Backporting features to operator tooling that helps them recover from bugs or other failures without doing database surgery seems like a good thing. Hard to argue that the risk outweighs the benefit, IMHO. --Dan From mriedemos at gmail.com Tue Nov 5 17:23:32 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 5 Nov 2019 11:23:32 -0600 Subject: Installation DOC for Three node In-Reply-To: References: Message-ID: <70039a26-1b30-bb4a-0838-39082ae68085@gmail.com> On 11/5/2019 6:41 AM, Umesh Mishra wrote: > We are trying to create the Machin but we unable to create could you > please help or share me your skype id or contact number so that we can > solve our issue. This is not really an appropriate request for this mailing list. It's OK to ask for help and support for specific issues in this mailing list but there is a chance that if the issue you're reporting is too generic you might not get a reply, which is the case here. You're looking for docs on how to install openstack. Sean provided links to the project install guides and deployment tools to automate that if you don't want to do it manually. If you have specific issues going through the install guides manually or using one of the deployment tools, then post the specific issues and someone may be able to help. Requesting contact information from a community member to help you directly isn't appropriate. If setting everything up yourself is untenable, then I recommend getting in touch with a vendor: https://www.openstack.org/marketplace/ -- Thanks, Matt From Albert.Braden at synopsys.com Tue Nov 5 20:11:00 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Tue, 5 Nov 2019 20:11:00 +0000 Subject: CPU pinning blues In-Reply-To: References: Message-ID: I found the offending UUID in the nova_api and placement databases. Do I need to delete these entries from the DB or is there a safer way to get rid of the "phantom" VM? MariaDB [(none)]> select * from nova_api.instance_mappings where instance_uuid = '4856d505-c220-4873-b881-836b5b75f7bb'; | created_at | updated_at | id | instance_uuid | cell_id | project_id | queued_for_delete | | 2019-10-08 21:26:03 | NULL | 589 | 4856d505-c220-4873-b881-836b5b75f7bb | NULL | 474ae347d8ad426f8118e55eee47dcfd | 0 | MariaDB [(none)]> select * from nova_api.request_specs where instance_uuid = '4856d505-c220-4873-b881-836b5b75f7bb'; | created_at | updated_at | id | instance_uuid | spec | | 2019-10-08 21:26:03 | NULL | 589 | 4856d505-c220-4873-b881-836b5b75f7bb | {"nova_object.version": "1.11", "nova_object.changes": ["requested_destination", "instance_uuid", "retry", "num_instances", "pci_requests", "limits", "availability_zone", "force_nodes", "image", "instance_group", "force_hosts", "ignore_hosts", "numa_topology", "is_bfv", "user_id", "flavor", "project_id", "security_groups", "scheduler_hints"], "nova_object.name": "RequestSpec", "nova_object.data": {"requested_destination": null, "instance_uuid": "4856d505-c220-4873-b881-836b5b75f7bb", "retry": null, "num_instances": 1, "pci_requests": {"nova_object.version": "1.1", "nova_object.changes": ["requests"], "nova_object.name": "InstancePCIRequests", "nova_object.data": {"requests": []}, "nova_object.namespace": "nova"}, "limits": {"nova_object.version": "1.0", "nova_object.changes": ["vcpu", "memory_mb", "disk_gb", "numa_topology"], "nova_object.name": "SchedulerLimits", "nova_object.data": {"vcpu": null, "memory_mb": null, "disk_gb": null, "numa_topology": null}, "nova_object.namespace": "nova"}, "availability_zone": null, "force_nodes": null, "image": {"nova_object.version": "1.8", "nova_object.changes": ["status", "name", "container_format", "created_at", "disk_format", "updated_at", "id", "min_disk", "min_ram", "checksum", "owner", "properties", "size"], "nova_object.name": "ImageMeta", "nova_object.data": {"status": "active", "created_at": "2019-10-02T01:10:04Z", "name": "QSC-P-CentOS6.6-19P1-v4", "container_format": "bare", "min_ram": 0, "disk_format": "qcow2", "updated_at": "2019-10-02T01:10:44Z", "id": "200cb134-2716-4662-8183-33642078547f", "min_disk": 0, "checksum": "94d33caafd85b45519fca331ee7ea03e", "owner": "474ae347d8ad426f8118e55eee47dcfd", "properties": {"nova_object.version": "1.20", "nova_object.name": "ImageMetaProps", "nova_object.data": {}, "nova_object.namespace": "nova"}, "size": 4935843840}, "nova_object.namespace": "nova"}, "instance_group": null, "force_hosts": null, "ignore_hosts": null, "numa_topology": null, "is_bfv": false, "user_id": "2cb6757679d54a69803a5b6e317b3a93", "flavor": {"nova_object.version": "1.2", "nova_object.name": "Flavor", "nova_object.data": {"disabled": false, "root_gb": 35, "description": null, "flavorid": "e8b42da7-d352-441e-b494-77d6a6cd7366", "deleted": false, "created_at": "2019-09-23T21:19:50Z", "ephemeral_gb": 10, "updated_at": null, "memory_mb": 4096, "vcpus": 1, "extra_specs": {}, "swap": 3072, "rxtx_factor": 1.0, "is_public": true, "deleted_at": null, "vcpu_weight": 0, "id": 2, "name": "s1.1cx4g"}, "nova_object.namespace": "nova"}, "project_id": "474ae347d8ad426f8118e55eee47dcfd", "security_groups": {"nova_object.version": "1.1", "nova_object.changes": ["objects"], "nova_object.name": "SecurityGroupList", "nova_object.data": {"objects": [{"nova_object.version": "1.2", "nova_object.changes": ["name"], "nova_object.name": "SecurityGroup", "nova_object.data": {"name": "default"}, "nova_object.namespace": "nova"}]}, "nova_object.namespace": "nova"}, "scheduler_hints": {}}, "nova_object.namespace": "nova"} | 1 row in set (0.001 sec) MariaDB [(none)]> SELECT * FROM placement.allocations WHERE consumer_id = '4856d505-c220-4873-b881-836b5b75f7bb'; | created_at | updated_at | id | resource_provider_id | consumer_id | resource_class_id | used | | 2019-10-08 22:03:33 | NULL | 3073 | 1024 | 4856d505-c220-4873-b881-836b5b75f7bb | 0 | 1 | | 2019-10-08 22:03:33 | NULL | 3074 | 1024 | 4856d505-c220-4873-b881-836b5b75f7bb | 1 | 4096 | | 2019-10-08 22:03:33 | NULL | 3075 | 1024 | 4856d505-c220-4873-b881-836b5b75f7bb | 2 | 48 | 3 rows in set (0.001 sec) MariaDB [(none)]> SELECT * FROM placement.consumers WHERE uuid = '4856d505-c220-4873-b881-836b5b75f7bb'; | created_at | updated_at | id | uuid | project_id | user_id | generation | | 2019-10-08 22:03:33 | 2019-10-08 22:03:33 | 734 | 4856d505-c220-4873-b881-836b5b75f7bb | 1 | 1 | 1 | 1 row in set (0.000 sec) From: Albert Braden > Sent: Thursday, October 31, 2019 10:50 AM To: openstack-discuss at lists.openstack.org Subject: CPU pinning blues I'm following this document to setup CPU pinning on Rocky: https://www.redhat.com/en/blog/driving-fast-lane-cpu-pinning-and-numa-topology-awareness-openstack-compute I followed all of the steps except for modifying non-pinned flavors and I have one aggregate containing a single NUMA-capable host: root at us01odc-dev1-ctrl1:/var/log/nova# os aggregate list +----+-------+-------------------+ | ID | Name | Availability Zone | +----+-------+-------------------+ | 4 | perf3 | None | +----+-------+-------------------+ root at us01odc-dev1-ctrl1:/var/log/nova# os aggregate show 4 +-------------------+----------------------------+ | Field | Value | +-------------------+----------------------------+ | availability_zone | None | | created_at | 2019-10-30T23:05:41.000000 | | deleted | False | | deleted_at | None | | hosts | [u'us01odc-dev1-hv003'] | | id | 4 | | name | perf3 | | properties | pinned='true' | | updated_at | None | +-------------------+----------------------------+ I have a flavor with the NUMA properties: root at us01odc-dev1-ctrl1:/var/log/nova# os flavor show s1.perf3 +----------------------------+-------------------------------------------------------------------------+ | Field | Value | +----------------------------+-------------------------------------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | disk | 35 | | id | be3d21c4-7e91-42a2-b832-47f42fdd3907 | | name | s1.perf3 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:pinned='true', hw:cpu_policy='dedicated' | | ram | 30720 | | rxtx_factor | 1.0 | | swap | 7168 | | vcpus | 4 | +----------------------------+-------------------------------------------------------------------------+ I create a VM with that flavor: openstack server create --flavor s1.perf3 --image NOT-QSC-CentOS6.10-19P1-v4 --network it-network alberttest4 but it goes to error status, and I see this in the logs: *** *** Post with logs got moderated so they are here: https://paste.fedoraproject.org/paste/3bza6CJstXFPy8LatRJruA -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Tue Nov 5 22:44:33 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 5 Nov 2019 16:44:33 -0600 Subject: CPU pinning blues In-Reply-To: References: Message-ID: <3345e570-f328-5f43-1e65-8c95a5e20d46@gmail.com> On 11/5/2019 2:11 PM, Albert Braden wrote: > I found the offending UUID in the nova_api and placement databases. Do I > need to delete these entries from the DB or is there a safer way to get > rid of the “phantom” VM? > > MariaDB [(none)]> select * from nova_api.instance_mappings where > instance_uuid = '4856d505-c220-4873-b881-836b5b75f7bb'; > > | created_at          | updated_at | id  | > instance_uuid                        | cell_id | > project_id                       | queued_for_delete | > > | 2019-10-08 21:26:03 | NULL       | 589 | > 4856d505-c220-4873-b881-836b5b75f7bb |    NULL | > 474ae347d8ad426f8118e55eee47dcfd |                 0 | > Interesting. So there is an instance mapping but it's not pointing at any cell. I'm assuming there is no entry for this instance in the nova_api.build_requests table either? A couple of related patches for that instance mapping thing: 1. I have a patch that adds a nova-manage command to cleanup busted instance mappings [1]. In this case you'd just --purge that broken instance mapping. 2. mnaser has reported similar weird issues where an instance mapping exists but doesn't point at a cell and the build request is gone and the instance isn't in cell0. For that we have a sanity check patch [2] which might be helpful to you if you hit this again. If either of those patches are helpful to you, please vote on the changes so we can draw some more eyes to the reviews. As for the allocations, you can remove those from placement using the osc-placement CLI plugin [3]: openstack resource provider allocation delete 4856d505-c220-4873-b881-836b5b75f7bb [1] https://review.opendev.org/#/c/655908/ [2] https://review.opendev.org/#/c/683730/ [3] https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-allocation-delete -- Thanks, Matt From Albert.Braden at synopsys.com Tue Nov 5 22:51:25 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Tue, 5 Nov 2019 22:51:25 +0000 Subject: CPU pinning blues In-Reply-To: <3345e570-f328-5f43-1e65-8c95a5e20d46@gmail.com> References: <3345e570-f328-5f43-1e65-8c95a5e20d46@gmail.com> Message-ID: Thanks Matt! I saw your "any interest" email earlier and tried that procedure, and it fixed the problem. -----Original Message----- From: Matt Riedemann Sent: Tuesday, November 5, 2019 2:45 PM To: openstack-discuss at lists.openstack.org Subject: Re: CPU pinning blues On 11/5/2019 2:11 PM, Albert Braden wrote: > I found the offending UUID in the nova_api and placement databases. Do I > need to delete these entries from the DB or is there a safer way to get > rid of the "phantom" VM? > > MariaDB [(none)]> select * from nova_api.instance_mappings where > instance_uuid = '4856d505-c220-4873-b881-836b5b75f7bb'; > > | created_at          | updated_at | id  | > instance_uuid                        | cell_id | > project_id                       | queued_for_delete | > > | 2019-10-08 21:26:03 | NULL       | 589 | > 4856d505-c220-4873-b881-836b5b75f7bb |    NULL | > 474ae347d8ad426f8118e55eee47dcfd |                 0 | > Interesting. So there is an instance mapping but it's not pointing at any cell. I'm assuming there is no entry for this instance in the nova_api.build_requests table either? A couple of related patches for that instance mapping thing: 1. I have a patch that adds a nova-manage command to cleanup busted instance mappings [1]. In this case you'd just --purge that broken instance mapping. 2. mnaser has reported similar weird issues where an instance mapping exists but doesn't point at a cell and the build request is gone and the instance isn't in cell0. For that we have a sanity check patch [2] which might be helpful to you if you hit this again. If either of those patches are helpful to you, please vote on the changes so we can draw some more eyes to the reviews. As for the allocations, you can remove those from placement using the osc-placement CLI plugin [3]: openstack resource provider allocation delete 4856d505-c220-4873-b881-836b5b75f7bb [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__review.opendev.org_-23_c_655908_&d=DwID-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=lVMaLB96zC2yeBzxfcfyNlaIItnmylkLNso5971gZCQ&s=LP7-0mN2MJ5Qbv28Oodg41N8KpIOlKgcBy--M2vTgjw&e= [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__review.opendev.org_-23_c_683730_&d=DwID-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=lVMaLB96zC2yeBzxfcfyNlaIItnmylkLNso5971gZCQ&s=tSCdhr2PxDvww4kksTXG6Z-vvX3WRhahzynEELjMwXw&e= [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_osc-2Dplacement_latest_cli_index.html-23resource-2Dprovider-2Dallocation-2Ddelete&d=DwID-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=lVMaLB96zC2yeBzxfcfyNlaIItnmylkLNso5971gZCQ&s=0bzrScr45Jbu5_a1c6OHvfexVJXeasxzGoOllYGCwRQ&e= -- Thanks, Matt From dtroyer at gmail.com Tue Nov 5 22:58:29 2019 From: dtroyer at gmail.com (Dean Troyer) Date: Tue, 5 Nov 2019 16:58:29 -0600 Subject: [cinder][osc][docs] openstackclient docs for cinder v2 vs v3 In-Reply-To: <656f60e9-b5d6-5390-13c6-34347a9bc2e1@fried.cc> References: <656f60e9-b5d6-5390-13c6-34347a9bc2e1@fried.cc> Message-ID: On Tue, Nov 5, 2019 at 10:26 AM Eric Fried wrote: > - All of the v3 subcommands are implemented by code in the > openstackclient.volume.v2 package. Where there's overlap, the command > classes are identical from v2 to v3. However, it appears as though the > v2 commands are a *superset* of the v3 commands. Specifically, the > following appear in v2 but not v3 [4]: A number of commands were deprecated between v2 and v3, some were just renamed. However, that crux of this problem is that this pass-through was ever done in the first place. This is the only place in OSc that we did this rather than just copy the code between the API version modules. IMO that is what we need to finally do to fix this, complete the actual duplication of the v2 bits still being called by v3 in the v3 directories. > So before I go creating a mess of v2-only and v2+v3 documents, I wanted > to confirm that the above was actually intentional. > > - The existing hardcoded documents mention v1 and/or v2, but don't > mention v3 at all (e.g. [5]). I want to confirm that it's okay for me to > add mention of v3 where appropriate. Again, folks wanted to avoid doing the work to set up v3 properly, now the debt collector comes calling...I would hold off doing anything with the docs until the code and tests have been properly straightened out. dt -- Dean Troyer dtroyer at gmail.com From sfinucan at redhat.com Wed Nov 6 02:21:25 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Wed, 06 Nov 2019 10:21:25 +0800 Subject: [nova][ptg] Team dinner Message-ID: Hi all, Thanks to Alex Xu, we have organized a table for dinner this evening, Wed 6th November at 7pm. Xibo restaurant (Xinjiang food) 3/F 83 Changshu Rd, Jingan Qu, Shanghai, China (near "Changshu Road" or "Jing An Temple" subway stations) 中國上海市静安区常熟路83号 The address is on Google Maps - hopefully it's accurate :) Anyone working on nova is welcome, though I'd ask that you'd note your attendance on the PTG etherpad [1]. Looking forward to seeing everyone, Stephen (stephenfin) [1] https://etherpad.openstack.org/p/nova-shanghai-ptg From mdulko at redhat.com Wed Nov 6 02:23:00 2019 From: mdulko at redhat.com (Michal Dulko) Date: Wed, 6 Nov 2019 03:23:00 +0100 Subject: [kuryr] Kuryr team at the PTG Message-ID: Hi, I had not reserved Kuryr space on the PTG as we weren't expecting many Kuryr team members here, but turns out there's some representation. We'll meet on Thursday at 2 PM Shanghai time to discuss anything related to Kuryr. We want meet in the Blue Room (where the tables are) and will try to find some space to run the discussion. Today you can find me at the K8s SIG table. Feel free to join! Thanks, Michał From i at liuyulong.me Wed Nov 6 03:31:44 2019 From: i at liuyulong.me (=?utf-8?B?TElVIFl1bG9uZw==?=) Date: Wed, 6 Nov 2019 11:31:44 +0800 Subject: [Neutron] cancel the L3 meeting today In-Reply-To: References: Message-ID: Alright, we are all in Shanghai Today (30th Nov), so the L3 meeting will also be cancelled.     ------------------ Original ------------------ From:  "LIU Yulong" From flux.adam at gmail.com Wed Nov 6 04:00:34 2019 From: flux.adam at gmail.com (Adam Harwell) Date: Wed, 6 Nov 2019 12:00:34 +0800 Subject: [barbican] PTG Schedule In-Reply-To: <0451df6b-23cc-2604-b28a-e1e9f6aac6f8@redhat.com> References: <0451df6b-23cc-2604-b28a-e1e9f6aac6f8@redhat.com> Message-ID: FYI, team photo is moved to Friday at 11:00am. :) On Mon, Nov 4, 2019, 10:12 AM Douglas Mendizábal wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > Hello Barbicaneers! > > I hope everyone made it to Shanghai safely. Our PTG session will be > on Wednesday Nov 6 from 10:30am - 4:30p at the Kilo table. > > We've set up an etherpad to collect topics to talk about. Please feel > free to add any topics you're interested in: > > https://etherpad.openstack.org/p/barbican-ussuri-ptg > > Additionally we've reserved a spot for a Team Photo on Thursday at > 11am. Hope to see y'all soon! > > Cheers, > - - Douglas Mendizabal > -----BEGIN PGP SIGNATURE----- > > iQIzBAEBCAAdFiEEwcapj5oGTj2zd3XogB6WFOq/OrcFAl2/iBEACgkQgB6WFOq/ > OrfFew/+IjhZe+qRCi/4EmaVEDf7QxJyZDIVUlLsPWHmF98wCdj+GsbzoWUuFfHM > sJCpfpVAUjxrIFOEo5uF9WiZhU36G9pgoLd1Y8Kb0/QRIQEQQcKGnlYhCn+jQjbW > J2tlDrkU0GBwEBzDt91gM5JCncviY8yT6nhlr/SSLqvZRQnPewerJNyJbYsVh6N2 > moXQzfeRjg1SGqR0KVUcDVPe/pE+at8A5ARFCxDiJaOIUTP0qcfKtDXh714bevyi > Sw2qgDZHbLHa1nEv3umuYGcrGpKz8Uuj5ju+7oGpPh4hX4pfPxbVDSzu8srfzTui > ggvcxFrpZQvdff3Lec1eclxnB+c9Z1tBKYF7pPUVtN3NPfCATkVCSQACYORPZLdh > GAnyxiiUXRwzIfOo0b6koa2pRi7ZWoz0DjVzpnl+D7qztUzyiguaj3KDnuTvlfQl > iMQev1QHD6fAVvByHgDRj4dyUqUi2+V/DtNZ9w29AX7C+U/afSbNGvygc8yNCtHF > vbkw68aPpj5zeB0OTjPQ6N5vsUc6bSXYGnECuGw24untnutvPKR+W9g9VQEUyN1h > vhvn0IPHZ9QyBJ0ctpdfA6O9PNsjY/DQNyDeiNGljTIpBjepUmqMTXvycsn8VN/E > yY0OL2QFGPhcsK7Q/yeUCzMm1sken2zMg8Bdxt10qbj4GsCMtyQ= > =fdMR > -----END PGP SIGNATURE----- > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From soulxu at gmail.com Wed Nov 6 04:03:29 2019 From: soulxu at gmail.com (Alex Xu) Date: Wed, 6 Nov 2019 12:03:29 +0800 Subject: [nova][ptg] Team dinner In-Reply-To: References: Message-ID: You can tell waitress, it is order by Mr. Xu, and the last few phone number is 9564 Stephen Finucane 于2019年11月6日周三 上午10:25写道: > Hi all, > > Thanks to Alex Xu, we have organized a table for dinner this evening, > Wed 6th November at 7pm. > > Xibo restaurant (Xinjiang food) > 3/F 83 Changshu Rd, Jingan Qu, Shanghai, China (near "Changshu Road" > or "Jing An Temple" subway stations) > 中國上海市静安区常熟路83号 > > The address is on Google Maps - hopefully it's accurate :) > > Anyone working on nova is welcome, though I'd ask that you'd note your > attendance on the PTG etherpad [1]. > > Looking forward to seeing everyone, > Stephen (stephenfin) > > [1] https://etherpad.openstack.org/p/nova-shanghai-ptg > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucioseki at gmail.com Wed Nov 6 05:15:43 2019 From: lucioseki at gmail.com (Lucio Seki) Date: Wed, 6 Nov 2019 13:15:43 +0800 Subject: [cinder] ussuri PTG schedule In-Reply-To: References: Message-ID: Hi rosmaita, I'm gonna leave for Manila project team photo 11h50-12h00, and it might conflict with the item I'm interested in (Mutable options). If it does conflict, is it possible to swap with some other item in the list? Lucio Seki (lseki) On Tue, Nov 5, 2019, 14:36 Brian Rosmaita wrote: > The Ussuri PTG schedule is live: > https://etherpad.openstack.org/p/shanghai-ptg-cinder > > Please check the schedule and let me know right away if your session > causes a conflict for you. Except for the few fixed-time topics, we will > follow the cinder tradition of dynamic scheduling, giving each topic > exactly as much time as it needs and adjusting as we go. > > cheers, > brian > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexey.perevalov at hotmail.com Wed Nov 6 07:09:27 2019 From: alexey.perevalov at hotmail.com (Perevalov Alexey) Date: Wed, 6 Nov 2019 07:09:27 +0000 Subject: [kuryr] Kuryr team at the PTG In-Reply-To: References: Message-ID: Hi, we'll be there. See you tomorrow. Are you going to organize zoom meeting in parallel? ________________________________ От: Michal Dulko Отправлено: 6 ноября 2019 г. 5:23 Кому: openstack-discuss Тема: [kuryr] Kuryr team at the PTG Hi, I had not reserved Kuryr space on the PTG as we weren't expecting many Kuryr team members here, but turns out there's some representation. We'll meet on Thursday at 2 PM Shanghai time to discuss anything related to Kuryr. We want meet in the Blue Room (where the tables are) and will try to find some space to run the discussion. Today you can find me at the K8s SIG table. Feel free to join! Thanks, Michał -------------- next part -------------- An HTML attachment was scrubbed... URL: From kendall at openstack.org Wed Nov 6 07:20:16 2019 From: kendall at openstack.org (Kendall Waters) Date: Wed, 6 Nov 2019 15:20:16 +0800 Subject: [kuryr] Kuryr team at the PTG In-Reply-To: References: Message-ID: <37CF31D0-F40F-49BB-B8D7-CC7FA79C218A@openstack.org> Hi Michal, We do not have any extra space in the Blue Hall tomorrow, however, there are plenty of tables in the prefunction area that you are welcome to use for your meeting. Cheers, Kendall Kendall Waters OpenStack Marketing & Events kendall at openstack.org > On Nov 6, 2019, at 3:09 PM, Perevalov Alexey wrote: > > Hi, > we'll be there. See you tomorrow. Are you going to organize zoom meeting in parallel? > > От: Michal Dulko > Отправлено: 6 ноября 2019 г. 5:23 > Кому: openstack-discuss > Тема: [kuryr] Kuryr team at the PTG > > Hi, > > I had not reserved Kuryr space on the PTG as we weren't expecting many > Kuryr team members here, but turns out there's some representation. > We'll meet on Thursday at 2 PM Shanghai time to discuss anything > related to Kuryr. We want meet in the Blue Room (where the tables are) > and will try to find some space to run the discussion. > > Today you can find me at the K8s SIG table. > > Feel free to join! > > Thanks, > Michał -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdulko at redhat.com Wed Nov 6 08:05:30 2019 From: mdulko at redhat.com (Michal Dulko) Date: Wed, 6 Nov 2019 09:05:30 +0100 Subject: [kuryr] Kuryr team at the PTG In-Reply-To: <37CF31D0-F40F-49BB-B8D7-CC7FA79C218A@openstack.org> References: <37CF31D0-F40F-49BB-B8D7-CC7FA79C218A@openstack.org> Message-ID: Sure, thanks! On Wed, Nov 6, 2019 at 8:20 AM Kendall Waters wrote: > > Hi Michal, > > We do not have any extra space in the Blue Hall tomorrow, however, there are plenty of tables in the prefunction area that you are welcome to use for your meeting. > > Cheers, > Kendall > > Kendall Waters > OpenStack Marketing & Events > kendall at openstack.org > > > > On Nov 6, 2019, at 3:09 PM, Perevalov Alexey wrote: > > Hi, > we'll be there. See you tomorrow. Are you going to organize zoom meeting in parallel? > > ________________________________ > От: Michal Dulko > Отправлено: 6 ноября 2019 г. 5:23 > Кому: openstack-discuss > Тема: [kuryr] Kuryr team at the PTG > > Hi, > > I had not reserved Kuryr space on the PTG as we weren't expecting many > Kuryr team members here, but turns out there's some representation. > We'll meet on Thursday at 2 PM Shanghai time to discuss anything > related to Kuryr. We want meet in the Blue Room (where the tables are) > and will try to find some space to run the discussion. > > Today you can find me at the K8s SIG table. > > Feel free to join! > > Thanks, > Michał > > From mdulko at redhat.com Wed Nov 6 08:08:14 2019 From: mdulko at redhat.com (Michal Dulko) Date: Wed, 6 Nov 2019 09:08:14 +0100 Subject: [kuryr] Kuryr team at the PTG In-Reply-To: References: Message-ID: Hey! It'll be 7 AM for Maysa and Luis, so I guess too early, but if there's someone else interested in participating that has a better timezone fit, we can do it. Thanks, Michał On Wed, Nov 6, 2019 at 8:09 AM Perevalov Alexey wrote: > > Hi, > we'll be there. See you tomorrow. Are you going to organize zoom meeting in parallel? > > ________________________________ > От: Michal Dulko > Отправлено: 6 ноября 2019 г. 5:23 > Кому: openstack-discuss > Тема: [kuryr] Kuryr team at the PTG > > Hi, > > I had not reserved Kuryr space on the PTG as we weren't expecting many > Kuryr team members here, but turns out there's some representation. > We'll meet on Thursday at 2 PM Shanghai time to discuss anything > related to Kuryr. We want meet in the Blue Room (where the tables are) > and will try to find some space to run the discussion. > > Today you can find me at the K8s SIG table. > > Feel free to join! > > Thanks, > Michał > > From missile0407 at gmail.com Wed Nov 6 12:28:50 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Wed, 6 Nov 2019 20:28:50 +0800 Subject: [kolla] Repository setup in non-internet environment. Message-ID: Hi, I'm thinking about the deployment in non-internet environment. As we know Kolla has already prepared docker registry and kolla-build to let user can create local registry for deployment. But there's still have two problem about non-internet deployment. 1. Docker-ce repository. 2. Pip repository. (Also having others perhaps.) Does Kolla planning to support non-internet deployment? I would like to do this if possible. Looking forward to hearing from you, Eddie. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yamamoto at midokura.com Wed Nov 6 13:52:12 2019 From: yamamoto at midokura.com (Takashi Yamamoto) Date: Wed, 6 Nov 2019 21:52:12 +0800 Subject: [neutron][tap-as-a-service] ERSPAN support In-Reply-To: References: Message-ID: i guess i'm the maintainer of taas these days. thank you for the interest in the project. On Tue, Nov 5, 2019 at 10:51 PM Jean Bernard Beuque (jbeuque) wrote: > > Hello, > > > > I'd like to add ERSPAN support to the Tap-as-a-Service project. > > > > I've currently implemented a prototype that can be used with networking-vpp: > > https://opendev.org/x/networking-vpp > > The modified version of tap as a service is available here (The API has been extended to support ERSPAN): > > https://github.com/jbeuque/tap-as-a-service do i need to take a tree diff to see what was changed? it's easier for me to read the change if you submit the change on gerrit. > > > > I don't know who maintains the Taas project. But if you think adding this functionality could be useful, please contact me. > > (Please take the modified version of Taas as a proposal to be discussed). > > > > Regards, > > Jean-Bernard Beuque > > From melwittt at gmail.com Wed Nov 6 17:02:23 2019 From: melwittt at gmail.com (melanie witt) Date: Wed, 6 Nov 2019 09:02:23 -0800 Subject: [nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations? In-Reply-To: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> References: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> Message-ID: On 11/5/19 08:45, Matt Riedemann wrote: > I was helping someone recover from a stuck live migration today where > the migration record was stuck in pre-migrating status and somehow the > request never hit the compute or was lost. The guest was stopped on the > guest and basically the live migration either never started or never > completed properly (maybe rabbit dropped the request or the compute > service was restarted, I don't know). > > I instructed them to update the database to set the migration record > status to 'error' and hard reboot the instance to get it running again. > > Then they pointed out they were seeing this in the compute logs: > > "There are allocations remaining against the source host that might need > to be removed" > > That's because the source node allocations are still tracked in > placement by the migration record and the dest node allocations are > tracked by the instance. Cleaning that up is non-trivial. I have a > troubleshooting doc started for manually cleaning up that kind of stuff > here [1] but ultimately just told them to delete the allocations in > placement for both the migration and the instance and then run the > heal_allocations command to recreate the allocations for the instance. > Since this person's nova deployment was running Stein, they don't have > the --dry-run [2] or --instance [3] options for the heal_allocations > command. This isn't a huge problem but it does mean they could be > healing allocations for instances they didn't expect. > > They could work around this by installing nova from train or master in a > VM/container/virtual environment and running it against the stein setup, > but that's maybe more work than they want to do. > > The question I'm posing is if people would like to see those options > backported to stein and if so, would the stable team be OK with it? I'd > say this falls into a gray area where these are things that are > optional, not used by default, and are operational tooling so less risk > to backport, but it's not zero risk. It's also worth noting that when I > wrote those patches I did so with the intent that people could backport > them at least internally. I think tools like this that provide significant operability benefit are worthwhile to backport and that the value is much greater than the risk. Related but not nearly as simple, I've backported nova-manage db purge and nova-manage db archive_deleted_rows --purge, --before, and --all-cells downstream because of the amount of bugs support/operators have opened around database cleanup pain. These were all pretty difficult to backport with the number of differences and conflicts, but my point is that I understand the motivation well and support the idea. The fact that the patches in question were written with backportability in mind is A Good Thing. -melanie > [1] https://review.opendev.org/#/c/691427/ > [2] https://review.opendev.org/#/c/651932/ > [3] https://review.opendev.org/#/c/651945/ > From mriedemos at gmail.com Wed Nov 6 17:14:04 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 6 Nov 2019 11:14:04 -0600 Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> Message-ID: On 11/4/2019 6:58 PM, Clark Boylan wrote: > Typically we try to work with the clouds to properly root cause the issue. Then from there we can figure out what the best fix may be. They are running our software after all and there is a good chance the problems are in openstack. > > I'm in shanghai at the moment but if others want to reach out feel free. benj_ and mgagne are at inap and amorin has been helpful at ovh. The test node logs include a hostid in them somewhere which an be used to identify hypervisors if necessary. I noticed this today [1]. That doesn't always result in failed jobs but I correlated it to a failure in a timeout in a nova functional job [2] and those normally don't have these types of problems. Note the correlation to when it spikes, midnight and noon it looks like. The dip on 11/2 and 11/3 was the weekend. And it's mostly OVH nodes. So they must have some kind of cron or something that hits at those times? Anecdotally, I'll also note that it seems like the gate is much more stable this week while the summit is happening. We're actually able to merge some changes in nova which is kind of amazing given the last month or so of rechecks we've had to do. [1] http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Function%20'nova.servicegroup.drivers.db.DbDriver._report_state'%20run%20outlasted%20interval%20by%5C%22&from=7d [2] https://zuul.opendev.org/t/openstack/build/63001bbd58c244cea70c995f1ebf61fb/log/job-output.txt#3092 -- Thanks, Matt From jbeuque at cisco.com Wed Nov 6 18:10:39 2019 From: jbeuque at cisco.com (Jean Bernard Beuque (jbeuque)) Date: Wed, 6 Nov 2019 18:10:39 +0000 Subject: [neutron][tap-as-a-service] ERSPAN support In-Reply-To: References: Message-ID: Hello Takashi, Thanks for your answer. OK, I'll submit the changes on gerrit. Regards, Jean-Bernard -----Original Message----- From: Takashi Yamamoto Sent: mercredi 6 novembre 2019 14:52 To: Jean Bernard Beuque (jbeuque) Cc: openstack-discuss at lists.openstack.org; Ian Wells (iawells) ; Jerome Tollet (jtollet) Subject: Re: [neutron][tap-as-a-service] ERSPAN support i guess i'm the maintainer of taas these days. thank you for the interest in the project. On Tue, Nov 5, 2019 at 10:51 PM Jean Bernard Beuque (jbeuque) wrote: > > Hello, > > > > I'd like to add ERSPAN support to the Tap-as-a-Service project. > > > > I've currently implemented a prototype that can be used with networking-vpp: > > https://opendev.org/x/networking-vpp > > The modified version of tap as a service is available here (The API has been extended to support ERSPAN): > > https://github.com/jbeuque/tap-as-a-service do i need to take a tree diff to see what was changed? it's easier for me to read the change if you submit the change on gerrit. > > > > I don't know who maintains the Taas project. But if you think adding this functionality could be useful, please contact me. > > (Please take the modified version of Taas as a proposal to be discussed). > > > > Regards, > > Jean-Bernard Beuque > > From Albert.Braden at synopsys.com Wed Nov 6 18:16:56 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Wed, 6 Nov 2019 18:16:56 +0000 Subject: CPU pinning blues In-Reply-To: <3345e570-f328-5f43-1e65-8c95a5e20d46@gmail.com> References: <3345e570-f328-5f43-1e65-8c95a5e20d46@gmail.com> Message-ID: Will these patches work on Rocky? -----Original Message----- From: Matt Riedemann Sent: Tuesday, November 5, 2019 2:45 PM To: openstack-discuss at lists.openstack.org Subject: Re: CPU pinning blues On 11/5/2019 2:11 PM, Albert Braden wrote: > I found the offending UUID in the nova_api and placement databases. Do I > need to delete these entries from the DB or is there a safer way to get > rid of the "phantom" VM? > > MariaDB [(none)]> select * from nova_api.instance_mappings where > instance_uuid = '4856d505-c220-4873-b881-836b5b75f7bb'; > > | created_at          | updated_at | id  | > instance_uuid                        | cell_id | > project_id                       | queued_for_delete | > > | 2019-10-08 21:26:03 | NULL       | 589 | > 4856d505-c220-4873-b881-836b5b75f7bb |    NULL | > 474ae347d8ad426f8118e55eee47dcfd |                 0 | > Interesting. So there is an instance mapping but it's not pointing at any cell. I'm assuming there is no entry for this instance in the nova_api.build_requests table either? A couple of related patches for that instance mapping thing: 1. I have a patch that adds a nova-manage command to cleanup busted instance mappings [1]. In this case you'd just --purge that broken instance mapping. 2. mnaser has reported similar weird issues where an instance mapping exists but doesn't point at a cell and the build request is gone and the instance isn't in cell0. For that we have a sanity check patch [2] which might be helpful to you if you hit this again. If either of those patches are helpful to you, please vote on the changes so we can draw some more eyes to the reviews. As for the allocations, you can remove those from placement using the osc-placement CLI plugin [3]: openstack resource provider allocation delete 4856d505-c220-4873-b881-836b5b75f7bb [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__review.opendev.org_-23_c_655908_&d=DwID-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=lVMaLB96zC2yeBzxfcfyNlaIItnmylkLNso5971gZCQ&s=LP7-0mN2MJ5Qbv28Oodg41N8KpIOlKgcBy--M2vTgjw&e= [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__review.opendev.org_-23_c_683730_&d=DwID-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=lVMaLB96zC2yeBzxfcfyNlaIItnmylkLNso5971gZCQ&s=tSCdhr2PxDvww4kksTXG6Z-vvX3WRhahzynEELjMwXw&e= [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_osc-2Dplacement_latest_cli_index.html-23resource-2Dprovider-2Dallocation-2Ddelete&d=DwID-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=lVMaLB96zC2yeBzxfcfyNlaIItnmylkLNso5971gZCQ&s=0bzrScr45Jbu5_a1c6OHvfexVJXeasxzGoOllYGCwRQ&e= -- Thanks, Matt From dms at danplanet.com Wed Nov 6 19:03:19 2019 From: dms at danplanet.com (Dan Smith) Date: Wed, 06 Nov 2019 11:03:19 -0800 Subject: [nova] Operator input on automatic heal behaviors Message-ID: Hi all, If you're a nova operator, you probably know (and love) our increasing number of "heal $thing" commands in nova-manage. Despite appearances, we do not try to make these inconsistent, duplicative, and confusing. However, in reality, they pretty much are. Further, they require something being broken, an operator noticing, and then manual execution to (hopefully) fix things. While reviewing another such proposed nova-manage command today, I decided to propose a potential solution to make this better in the future. I've got a spec proposed to create new standalone command/service for nova that will consolidate all of these into a very consistent interface, with a daemon mode that can be run in the background to constantly (and slowly) periodically audit these things and heal them when issues are found. If you're an operator and have strong feelings on this topic, please review and opine here: https://review.opendev.org/#/c/693226 Thanks! --Dan From mriedemos at gmail.com Wed Nov 6 19:03:35 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 6 Nov 2019 13:03:35 -0600 Subject: CPU pinning blues In-Reply-To: References: <3345e570-f328-5f43-1e65-8c95a5e20d46@gmail.com> Message-ID: <61c70900-7b68-0d93-95c4-fd6ba09d33ed@gmail.com> On 11/6/2019 12:16 PM, Albert Braden wrote: > Will these patches work on Rocky? I don't know, I haven't tried backporting them. -- Thanks, Matt From melwittt at gmail.com Wed Nov 6 19:12:42 2019 From: melwittt at gmail.com (melanie witt) Date: Wed, 6 Nov 2019 11:12:42 -0800 Subject: State of the Gate (placement?) In-Reply-To: References: <20191031211535.vk7rtiq3pvsb6j2t@skaplons-mac> <8a6f54cb-76a7-4522-8a16-93822c4cdcb5@www.fastmail.com> Message-ID: On 11/4/19 16:58, Clark Boylan wrote: > On Mon, Nov 4, 2019, at 7:37 PM, Chris Dent wrote: >> On Fri, 1 Nov 2019, Matt Riedemann wrote: >> >>> On 11/1/2019 9:55 AM, Clark Boylan wrote: >>>> OVH controls the disk IOPs that we get pretty aggressively as well. >>>> Possible it is an IO thing? >>> >>> Yeah, so looking at the dstat output in that graph (thanks for pointing out >>> that site, really nice) we basically have 0 I/O from 16:53 to 16:55, so uh, >>> that's probably not good. >> >> What happens in a case like this? Is there an official procedure for >> "hey, can you give is more IO?" or (if that's not an option) "can >> you give us less CPU?". Is that something that is automated, is is >> something that is monitored and alarming? "INAP ran out of IO X >> times in the last N hours, light the beacons!" > > Typically we try to work with the clouds to properly root cause the issue. Then from there we can figure out what the best fix may be. They are running our software after all and there is a good chance the problems are in openstack. > > I'm in shanghai at the moment but if others want to reach out feel free. benj_ and mgagne are at inap and amorin has been helpful at ovh. The test node logs include a hostid in them somewhere which an be used to identify hypervisors if necessary. Just wanted to throw this out there to the ML in case anyone has any thoughts: Since we know that I/O is overloaded in these cases, would it make any sense to have infra/tempest use a flavor which sets disk I/O quotas [1] to help prevent any one process from getting starved out? I agree that properly troubleshooting the root cause is necessary and maybe adding limits would not be desired for concern of it potentially hiding issues. -melanie [1] https://docs.openstack.org/nova/latest/user/flavors.html#extra-specs-disk-tuning From immo.wetzel at adtran.com Wed Nov 6 22:37:28 2019 From: immo.wetzel at adtran.com (Immo Wetzel) Date: Wed, 6 Nov 2019 22:37:28 +0000 Subject: ephemeral dics storage Message-ID: Gday mates, We do run a pike installation and do run into a problem with the ephemeral discs. As written these are on one side the common way for normal VMs to create a disk as long as the vm exists. On the other side these are stored on local discs. But these are usually not the fastest way and therefore the documentation said that in production environments, which we are going to be, these should be stored on shared spaces too. Like a SAN. I found some descriptions how to use ceph for locale storage via rbd backend but we don't use ceph. Each compute node has an FC connection which is used via cinder for the volumes. So what would be the recommendation to use ephemeral disc with FC SAN ? THX a lot Immo -------------- next part -------------- An HTML attachment was scrubbed... URL: From eandersson at blizzard.com Wed Nov 6 23:07:53 2019 From: eandersson at blizzard.com (Erik Olof Gunnar Andersson) Date: Wed, 6 Nov 2019 23:07:53 +0000 Subject: [nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations? In-Reply-To: References: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> Message-ID: Yea - this is our number one pain point with Nova and Rocky, and having this backported would be invaluable. Since we are on the topic some additional issues we are having. - Sometimes heal_allocations just fails without a good error (e.g. Compute host could not be found.) - Errors are always sequential and always halt execution, so if you have a lot of errors, you'll end up fixing them all one-by-one. - Better logging when unexpected errors do happen (maybe something more verbose like --debug would be good?). Best Regards, Erik Olof Gunnar Andersson -----Original Message----- From: melanie witt Sent: Wednesday, November 6, 2019 9:02 AM To: openstack-discuss at lists.openstack.org Subject: Re: [nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations? On 11/5/19 08:45, Matt Riedemann wrote: > I was helping someone recover from a stuck live migration today where > the migration record was stuck in pre-migrating status and somehow the > request never hit the compute or was lost. The guest was stopped on > the guest and basically the live migration either never started or > never completed properly (maybe rabbit dropped the request or the > compute service was restarted, I don't know). > > I instructed them to update the database to set the migration record > status to 'error' and hard reboot the instance to get it running again. > > Then they pointed out they were seeing this in the compute logs: > > "There are allocations remaining against the source host that might > need to be removed" > > That's because the source node allocations are still tracked in > placement by the migration record and the dest node allocations are > tracked by the instance. Cleaning that up is non-trivial. I have a > troubleshooting doc started for manually cleaning up that kind of > stuff here [1] but ultimately just told them to delete the allocations > in placement for both the migration and the instance and then run the > heal_allocations command to recreate the allocations for the instance. > Since this person's nova deployment was running Stein, they don't have > the --dry-run [2] or --instance [3] options for the heal_allocations > command. This isn't a huge problem but it does mean they could be > healing allocations for instances they didn't expect. > > They could work around this by installing nova from train or master in > a VM/container/virtual environment and running it against the stein > setup, but that's maybe more work than they want to do. > > The question I'm posing is if people would like to see those options > backported to stein and if so, would the stable team be OK with it? > I'd say this falls into a gray area where these are things that are > optional, not used by default, and are operational tooling so less > risk to backport, but it's not zero risk. It's also worth noting that > when I wrote those patches I did so with the intent that people could > backport them at least internally. I think tools like this that provide significant operability benefit are worthwhile to backport and that the value is much greater than the risk. Related but not nearly as simple, I've backported nova-manage db purge and nova-manage db archive_deleted_rows --purge, --before, and --all-cells downstream because of the amount of bugs support/operators have opened around database cleanup pain. These were all pretty difficult to backport with the number of differences and conflicts, but my point is that I understand the motivation well and support the idea. The fact that the patches in question were written with backportability in mind is A Good Thing. -melanie > [1] > https://urldefense.com/v3/__https://review.opendev.org/*/c/691427/__;I > w!2E0gRdhhnqPNNL0!37tRTxqquwil9Vw_imfj9qg3SczjE--jSBbK3qUS_UO_wOddekP_ > GkxCspm5LX4aBQ$ [2] > https://urldefense.com/v3/__https://review.opendev.org/*/c/651932/__;I > w!2E0gRdhhnqPNNL0!37tRTxqquwil9Vw_imfj9qg3SczjE--jSBbK3qUS_UO_wOddekP_ > GkxCspniVit4uQ$ [3] > https://urldefense.com/v3/__https://review.opendev.org/*/c/651945/__;I > w!2E0gRdhhnqPNNL0!37tRTxqquwil9Vw_imfj9qg3SczjE--jSBbK3qUS_UO_wOddekP_ > GkxCsplE-E_TJw$ > From mriedemos at gmail.com Wed Nov 6 23:29:19 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 6 Nov 2019 17:29:19 -0600 Subject: [nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations? In-Reply-To: References: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> Message-ID: <5b9128b6-ca60-64aa-8e95-222412d072c1@gmail.com> On 11/6/2019 5:07 PM, Erik Olof Gunnar Andersson wrote: > Yea - this is our number one pain point with Nova and Rocky, and having this backported would be invaluable. I posted [1] today. If that's accepted I can work on Rocky afterward. > > Since we are on the topic some additional issues we are having. > > - Sometimes heal_allocations just fails without a good error (e.g. Compute host could not be found.) > - Errors are always sequential and always halt execution, so if you have a lot of errors, you'll end up fixing them all one-by-one. > - Better logging when unexpected errors do happen (maybe something more verbose like --debug would be good?). Could you open a bug with more details about the issues you're hitting. Like in what case do you hit ComputeHostNotFound? The sequential errors thing is pretty obvious but I'm not sure what to do about it off the top of my head besides some option to say "process as much as possible storing up all of the errors to dump at the end" kind of thing. As for better logging about unexpected errors, it's hard to know what to log that's better when it's unexpected, you know? If you have examples can you throw those into the bug report? [1] https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:stable/stein+topic:heal_allocations_dry_run -- Thanks, Matt From rui.zang at yandex.com Thu Nov 7 03:01:51 2019 From: rui.zang at yandex.com (rui zang) Date: Thu, 07 Nov 2019 11:01:51 +0800 Subject: ephemeral dics storage In-Reply-To: References: Message-ID: <71054891573095711@myt3-a8f6b0e91bb2.qloud-c.yandex.net> An HTML attachment was scrubbed... URL: From sinan at turka.nl Thu Nov 7 08:23:08 2019 From: sinan at turka.nl (Sinan Polat) Date: Thu, 7 Nov 2019 09:23:08 +0100 (CET) Subject: Change Volume Type, but in use Message-ID: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> Hi, I am using Ceph as the backend for Cinder. Within Ceph we have defined 2 RBD pools (ssdvolumes, sasvolumes). In OpenStack I created 2 Volume Types (ssd, sas). Each Volume Type has property "volume_backend_name='tripleo_ceph_'". In the Cinder configuration I have the following backends configured: [tripleo_ceph_ssd] backend_host=hostgroup volume_backend_name=tripleo_ceph_ssd volume_driver=cinder.volume.drivers.rbd.RBDDriver rbd_ceph_conf=/etc/ceph/ceph.conf rbd_user=openstack rbd_pool=ssdvolumes [tripleo_ceph_sas] backend_host=hostgroup volume_backend_name=tripleo_ceph_sas volume_driver=cinder.volume.drivers.rbd.RBDDriver rbd_ceph_conf=/etc/ceph/ceph.conf rbd_user=openstack rbd_pool=sasvolumes As you might have noticed, the backend name (tripleo_ceph_ssd) and the RBD pool name (ssdvolumes, not ssd) does not match. So far, we do not have any problems. But I want to correct the names and I do not want to have the mismatch anymore. So I want to change the value of key volume_backend_name for both Volume Types (tripleo_ceph_ssd => tripleo_ceph_ssdvolumes). I tried the following: $ openstack volume type show 80cb25ff-376a-4483-b4f7-d8c75839e0ce +--------------------+----------------------------------------+ | Field | Value | +--------------------+----------------------------------------+ | access_project_ids | None | | description | | | id | 80cb25ff-376a-4483-b4f7-d8c75839e0ce | | is_public | True | | name | ssd | | properties | volume_backend_name='tripleo_ceph_ssd' | | qos_specs_id | None | +--------------------+----------------------------------------+ $ $ openstack volume type set --property volume_backend_name='tripleo_ceph_ssdvolumes' 80cb25ff-376a-4483-b4f7-d8c75839e0ce Failed to set volume type property: Volume Type is currently in use. (HTTP 400) (Request-ID: req-5efaa5b7-910f-4802-8494-3115cfc4ab93) Command Failed: One or more of the operations failed $ How to solve my problem? Thanks! Sinan -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Thu Nov 7 08:31:16 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 7 Nov 2019 09:31:16 +0100 Subject: [kolla] Repository setup in non-internet environment. In-Reply-To: References: Message-ID: Hello Eddie, We would welcome such a feature of course! -yoctozepto śr., 6 lis 2019 o 13:31 Eddie Yen napisał(a): > Hi, > > I'm thinking about the deployment in non-internet environment. > As we know Kolla has already prepared docker registry and kolla-build to > let user can create local registry for deployment. But there's still have > two problem about non-internet deployment. > > 1. Docker-ce repository. > 2. Pip repository. > (Also having others perhaps.) > > Does Kolla planning to support non-internet deployment? I would like to do > this if possible. > > Looking forward to hearing from you, > Eddie. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From berendt at betacloud-solutions.de Thu Nov 7 09:25:02 2019 From: berendt at betacloud-solutions.de (Christian Berendt) Date: Thu, 7 Nov 2019 10:25:02 +0100 Subject: [kolla] Repository setup in non-internet environment. In-Reply-To: References: Message-ID: <4850D8F8-C425-40EA-A9A6-B7DF3ECC0C3D@betacloud-solutions.de> Hello Eddie. > On 6. Nov 2019, at 13:28, Eddie Yen wrote: > > 1. Docker-ce repository. Use an APT mirror. For example Aptly. > 2. Pip repository. > (Also having others perhaps.) Packages from Pypi should no longer be necessary for the use of Kolle-Ansible. For some time now. If that's still the case, use a Pypi Mirror. For example Devpi. The Docker images can also be mirrored. Use a local Docker registry to do this. The use of an HTTP proxy like Squid is also possible. This proxy must have online access. The use of Nexus OSS is also a possibility. Then you only have one central mirror service. If you want to build completely offline you can't avoid single mirrors for the single packages (Docker, APT, Pypi). We provide a role under https://github.com/osism/ansible-mirror to deploy individual mirror services with Docker Compose. > Does Kolla planning to support non-internet deployment? I would like to do this if possible. This is already possible and we do this very often. HTH, Christian. -- Christian Berendt Chief Executive Officer (CEO) Mail: berendt at betacloud-solutions.de Web: https://www.betacloud-solutions.de Betacloud Solutions GmbH Teckstrasse 62 / 70190 Stuttgart / Deutschland Geschäftsführer: Christian Berendt Unternehmenssitz: Stuttgart Amtsgericht: Stuttgart, HRB 756139 From missile0407 at gmail.com Thu Nov 7 10:31:00 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Thu, 7 Nov 2019 18:31:00 +0800 Subject: [kolla] Repository setup in non-internet environment. In-Reply-To: <4850D8F8-C425-40EA-A9A6-B7DF3ECC0C3D@betacloud-solutions.de> References: <4850D8F8-C425-40EA-A9A6-B7DF3ECC0C3D@betacloud-solutions.de> Message-ID: Hi Christian, thanks for your reply and suggestion. In some cases we met, all kinds of internet access method (proxy server, mobile internet, etc.) are restricted. And in some previous release (like Rocky), pip packages still necessary. So we will prepare the whole local repository and registry in this kind of environment. When kolla-ansible going to bootstrapping servers, it will insert docker-ce repository. But this is already hard-coded (pointed to download.docker.com). Also no pip local repository setup during bootstrapping. So I gonna do is let them become functional. User can configure local docker-ce and pip repository in globals.yml directly if needed. BTW, glad to know about ansible-mirror. I'd like to try it if I have a time. Many thanks, Eddie. Christian Berendt 於 2019年11月7日 週四 下午5:25寫道: > Hello Eddie. > > > On 6. Nov 2019, at 13:28, Eddie Yen wrote: > > > > 1. Docker-ce repository. > > Use an APT mirror. For example Aptly. > > > > 2. Pip repository. > > (Also having others perhaps.) > > Packages from Pypi should no longer be necessary for the use of > Kolle-Ansible. For some time now. > > If that's still the case, use a Pypi Mirror. For example Devpi. > > > The Docker images can also be mirrored. Use a local Docker registry to do > this. > > > The use of an HTTP proxy like Squid is also possible. This proxy must have > online access. > > The use of Nexus OSS is also a possibility. Then you only have one central > mirror service. > > > If you want to build completely offline you can't avoid single mirrors for > the single packages (Docker, APT, Pypi). > > We provide a role under https://github.com/osism/ansible-mirror to deploy > individual mirror services with Docker Compose. > > > > Does Kolla planning to support non-internet deployment? I would like to > do this if possible. > > This is already possible and we do this very often. > > HTH, Christian. > > -- > Christian Berendt > Chief Executive Officer (CEO) > > Mail: berendt at betacloud-solutions.de > Web: https://www.betacloud-solutions.de > > Betacloud Solutions GmbH > Teckstrasse 62 / 70190 Stuttgart / Deutschland > > Geschäftsführer: Christian Berendt > Unternehmenssitz: Stuttgart > Amtsgericht: Stuttgart, HRB 756139 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Thu Nov 7 10:48:35 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 7 Nov 2019 10:48:35 +0000 Subject: [kolla] Repository setup in non-internet environment. In-Reply-To: <4850D8F8-C425-40EA-A9A6-B7DF3ECC0C3D@betacloud-solutions.de> References: <4850D8F8-C425-40EA-A9A6-B7DF3ECC0C3D@betacloud-solutions.de> Message-ID: On Thu, 7 Nov 2019 at 09:26, Christian Berendt wrote: > > Hello Eddie. > > > On 6. Nov 2019, at 13:28, Eddie Yen wrote: > > > > 1. Docker-ce repository. > > Use an APT mirror. For example Aptly. > Or yum if using CentOS. Pulp or artifactory or $other should work. Presumably there is already some mirror solution for your OS packages? > > > 2. Pip repository. > > (Also having others perhaps.) > > Packages from Pypi should no longer be necessary for the use of Kolle-Ansible. For some time now. You at least need to install Kolla Ansible's Python dependencies. You could consider building a docker image containing Kolla Ansible and using this for your deployments, if that helps. In kolla-ansible bootstrap-servers we also use the easy_install and pip Ansible modules to install pip and the Docker python package. > > If that's still the case, use a Pypi Mirror. For example Devpi. > > > The Docker images can also be mirrored. Use a local Docker registry to do this. > > > The use of an HTTP proxy like Squid is also possible. This proxy must have online access. > > The use of Nexus OSS is also a possibility. Then you only have one central mirror service. > > > If you want to build completely offline you can't avoid single mirrors for the single packages (Docker, APT, Pypi). > > We provide a role under https://github.com/osism/ansible-mirror to deploy individual mirror services with Docker Compose. > > > > Does Kolla planning to support non-internet deployment? I would like to do this if possible. > > This is already possible and we do this very often. > > HTH, Christian. > > -- > Christian Berendt > Chief Executive Officer (CEO) > > Mail: berendt at betacloud-solutions.de > Web: https://www.betacloud-solutions.de > > Betacloud Solutions GmbH > Teckstrasse 62 / 70190 Stuttgart / Deutschland > > Geschäftsführer: Christian Berendt > Unternehmenssitz: Stuttgart > Amtsgericht: Stuttgart, HRB 756139 > > From berendt at betacloud-solutions.de Thu Nov 7 11:06:39 2019 From: berendt at betacloud-solutions.de (Christian Berendt) Date: Thu, 7 Nov 2019 12:06:39 +0100 Subject: [kolla] Repository setup in non-internet environment. In-Reply-To: References: <4850D8F8-C425-40EA-A9A6-B7DF3ECC0C3D@betacloud-solutions.de> Message-ID: <4FA700EE-C9BC-4C14-9074-179CE1F94017@betacloud-solutions.de> Hello Mark. > On 7. Nov 2019, at 11:48, Mark Goddard wrote: > > You at least need to install Kolla Ansible's Python dependencies. You > could consider building a docker image containing Kolla Ansible and > using this for your deployments, if that helps. That's right. That's why we put it in images to avoid this problem with Pypi. Works very well in everyday life. Christian. -- Christian Berendt Chief Executive Officer (CEO) Mail: berendt at betacloud-solutions.de Web: https://www.betacloud-solutions.de Betacloud Solutions GmbH Teckstrasse 62 / 70190 Stuttgart / Deutschland Geschäftsführer: Christian Berendt Unternehmenssitz: Stuttgart Amtsgericht: Stuttgart, HRB 756139 From mark at stackhpc.com Thu Nov 7 14:11:13 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 7 Nov 2019 14:11:13 +0000 Subject: [kolla][ptg] Kolla Ussuri PVPTG (Partially Virtual PTG) In-Reply-To: <8a6d214d-970e-c923-570e-e031aa364305@linaro.org> References: <8a6d214d-970e-c923-570e-e031aa364305@linaro.org> Message-ID: Zoom didn't work well, now trying meet: https://meet.google.com/nyh-gzvy-nnw On Mon, 4 Nov 2019 at 17:05, Marcin Juszkiewicz wrote: > > On 04.11.2019 15:15, Mark Goddard wrote: > > > After polling participants, we have agreed to meet at 1400 - 1800 UTC > > on Thursday and Friday this week. Since not all participants can make > > the first hour, we will adjust the schedule accordingly. > > > > Marcin will follow with connection details for the Zoom video conference. > > As we agreed on Zoom I did a setup of meeting. > > https://zoom.us/j/157063687 will be available for 1400-1800 UTC on both > Thursday and Friday. Sessions will be recorded by platform. > From corey.bryant at canonical.com Thu Nov 7 19:11:41 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Thu, 7 Nov 2019 14:11:41 -0500 Subject: [tc] Add non-voting py38 for ussuri Message-ID: Hello TC members, Python 3.8 is available in Ubuntu Bionic now and while I understand it's too late to enable voting py38 unit tests for ussuri, I'd like to at least enable non-voting py38 unit tests. This email is seeking approval and direction from the TC to move forward with enabling non-voting py38 tests. For some further background: The next release of Ubuntu, Focal (20.04) LTS, is scheduled to release in April 2020. Python 3.8 will be the default in the Focal release, so I'm hopeful that non-voting unit tests will help close some of the gap. I have a review here for the zuul project template enablement for ussuri: https://review.opendev.org/#/c/693401 Also should this be updated considering py38 would be non-voting? https://governance.openstack.org/tc/reference/runtimes/ussuri.html Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From gouthampravi at gmail.com Thu Nov 7 19:47:53 2019 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Thu, 7 Nov 2019 11:47:53 -0800 Subject: Change Volume Type, but in use In-Reply-To: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> References: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> Message-ID: Hey Sinat, The error message suggests that you have volumes that use the volume type you're modifying. If yes, with cinder API, since the Rocky release [1], you cannot modify volume types that are currently in use. The original design for cinder volume types was that they were always mutable - and changes to existing volume types didn't affect pre-existing volumes. However, this behavior was modified in the Ocata release [2], and finally removed in the Rocky release. One of your options is to also rename the existing volume type and create a new one if you'd like, with the original name. [1] https://docs.openstack.org/releasenotes/cinder/rocky.html#relnotes-13-0-0-stable-rocky-upgrade-notes [2] https://review.opendev.org/#/c/440680/ On Thu, Nov 7, 2019 at 12:33 AM Sinan Polat wrote: > Hi, > > I am using Ceph as the backend for Cinder. Within Ceph we have defined 2 > RBD pools (ssdvolumes, sasvolumes). > In OpenStack I created 2 Volume Types (ssd, sas). Each Volume Type has > property "volume_backend_name='tripleo_ceph_'". > > In the Cinder configuration I have the following backends configured: > > [tripleo_ceph_ssd] > backend_host=hostgroup > volume_backend_name=tripleo_ceph_ssd > volume_driver=cinder.volume.drivers.rbd.RBDDriver > rbd_ceph_conf=/etc/ceph/ceph.conf > rbd_user=openstack > rbd_pool=ssdvolumes > > [tripleo_ceph_sas] > backend_host=hostgroup > volume_backend_name=tripleo_ceph_sas > volume_driver=cinder.volume.drivers.rbd.RBDDriver > rbd_ceph_conf=/etc/ceph/ceph.conf > rbd_user=openstack > rbd_pool=sasvolumes > > As you might have noticed, the backend name (tripleo_ceph_ssd) and the RBD > pool name (ssdvolumes, not ssd) does not match. So far, we do not have any > problems. But I want to correct the names and I do not want to have the > mismatch anymore. > > So I want to change the value of key volume_backend_name for both Volume > Types (tripleo_ceph_ssd => tripleo_ceph_ssdvolumes). > > I tried the following: > $ openstack volume type show 80cb25ff-376a-4483-b4f7-d8c75839e0ce > +--------------------+----------------------------------------+ > | Field | Value | > +--------------------+----------------------------------------+ > | access_project_ids | None | > | description | | > | id | 80cb25ff-376a-4483-b4f7-d8c75839e0ce | > | is_public | True | > | name | ssd | > | properties | volume_backend_name='tripleo_ceph_ssd' | > | qos_specs_id | None | > +--------------------+----------------------------------------+ > $ > > > $ openstack volume type set --property > volume_backend_name='tripleo_ceph_ssdvolumes' > 80cb25ff-376a-4483-b4f7-d8c75839e0ce > Failed to set volume type property: Volume Type is currently in use. (HTTP > 400) (Request-ID: req-5efaa5b7-910f-4802-8494-3115cfc4ab93) > Command Failed: One or more of the operations failed > $ > > How to solve my problem? > > Thanks! > > Sinan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sinan at turka.nl Thu Nov 7 20:58:26 2019 From: sinan at turka.nl (Sinan Polat) Date: Thu, 7 Nov 2019 21:58:26 +0100 (CET) Subject: Change Volume Type, but in use In-Reply-To: References: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> Message-ID: <844791131.568096.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> Hi Goutham, Thanks for your response. It is correct that I have volumes that are using the volume type I want to modify. If I understand correctly, modifying volumes types is not possible as long as there are volumes using the volume type. Renaming the volume type and creating a new volume type won't work I guess. Since the backend name in the cinder configuration will be changed, the volumes that are using the renamed volume type won't able to find the backend and will fail, not? I would have the backend named "ssdvolumes" in the cinder configuration. The new volume type would have the correct backend name (ssdvolumes) but the renamed volume type would still have "ssd" as its backend name. Kind regards, Sinan > Op 7 november 2019 om 20:47 schreef Goutham Pacha Ravi > : > > Hey Sinat, > > The error message suggests that you have volumes that use the volume type > you're modifying. If yes, with cinder API, since the Rocky release [1], you > cannot modify volume types that are currently in use. The original design for > cinder volume types was that they were always mutable - and changes to > existing volume types didn't affect pre-existing volumes. However, this > behavior was modified in the Ocata release [2], and finally removed in the > Rocky release. > > One of your options is to also rename the existing volume type and create > a new one if you'd like, with the original name. > > [1] > https://docs.openstack.org/releasenotes/cinder/rocky.html#relnotes-13-0-0-stable-rocky-upgrade-notes > [2] https://review.opendev.org/#/c/440680/ > > > On Thu, Nov 7, 2019 at 12:33 AM Sinan Polat mailto:sinan at turka.nl > wrote: > > > > > > Hi, > > > > I am using Ceph as the backend for Cinder. Within Ceph we have > > defined 2 RBD pools (ssdvolumes, sasvolumes). > > In OpenStack I created 2 Volume Types (ssd, sas). Each Volume Type > > has property "volume_backend_name='tripleo_ceph_'". > > > > In the Cinder configuration I have the following backends > > configured: > > > > [tripleo_ceph_ssd] > > backend_host=hostgroup > > volume_backend_name=tripleo_ceph_ssd > > volume_driver=cinder.volume.drivers.rbd.RBDDriver > > rbd_ceph_conf=/etc/ceph/ceph.conf > > rbd_user=openstack > > rbd_pool=ssdvolumes > > > > [tripleo_ceph_sas] > > backend_host=hostgroup > > volume_backend_name=tripleo_ceph_sas > > volume_driver=cinder.volume.drivers.rbd.RBDDriver > > rbd_ceph_conf=/etc/ceph/ceph.conf > > rbd_user=openstack > > rbd_pool=sasvolumes > > > > As you might have noticed, the backend name (tripleo_ceph_ssd) and > > the RBD pool name (ssdvolumes, not ssd) does not match. So far, we do not > > have any problems. But I want to correct the names and I do not want to have > > the mismatch anymore. > > > > So I want to change the value of key volume_backend_name for both > > Volume Types (tripleo_ceph_ssd => tripleo_ceph_ssdvolumes). > > > > I tried the following: > > $ openstack volume type show 80cb25ff-376a-4483-b4f7-d8c75839e0ce > > +--------------------+----------------------------------------+ > > | Field | Value | > > +--------------------+----------------------------------------+ > > | access_project_ids | None | > > | description | | > > | id | 80cb25ff-376a-4483-b4f7-d8c75839e0ce | > > | is_public | True | > > | name | ssd | > > | properties | volume_backend_name='tripleo_ceph_ssd' | > > | qos_specs_id | None | > > +--------------------+----------------------------------------+ > > $ > > > > > > $ openstack volume type set --property > > volume_backend_name='tripleo_ceph_ssdvolumes' > > 80cb25ff-376a-4483-b4f7-d8c75839e0ce > > Failed to set volume type property: Volume Type is currently in use. > > (HTTP 400) (Request-ID: req-5efaa5b7-910f-4802-8494-3115cfc4ab93) > > Command Failed: One or more of the operations failed > > $ > > > > How to solve my problem? > > > > Thanks! > > > > Sinan > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Thu Nov 7 22:48:17 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 7 Nov 2019 23:48:17 +0100 Subject: Change Volume Type, but in use In-Reply-To: <844791131.568096.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> References: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> <844791131.568096.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> Message-ID: An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Thu Nov 7 22:56:51 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 7 Nov 2019 23:56:51 +0100 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: My non-TC take on this...   > Python 3.8 is available in Ubuntu Bionic now and while I understand it's too late to enable voting py38 unit tests for ussuri, I'd like to at least enable non-voting py38 unit tests. This email is seeking approval and direction from the TC to move forward with enabling non-voting py38 tests. I think it would be great to start testing 3.8 so there are no surprises once we need to officially move there. But I would actually not want to see that run on every since patch in every single repo. > For some further background: The next release of Ubuntu, Focal (20.04) LTS, is scheduled to release in April 2020. Python 3.8 will be the default in the Focal release, so I'm hopeful that non-voting unit tests will help close some of the gap.   > I have a review here for the zuul project template enablement for ussuri: > https://review.opendev.org/#/c/693401 I do not think it should be added to the ussuri jobs template. I think it would be more useful as its own job for now that can be added to a select few repos as a full tempest run so a smaller number of test runs can cover a broader cross-section of projects. Otherwise as maybe a periodic job for now so it doesn't add to the run time and noise on every patch being submitted. Any idea so far from manual py38 testing if there are breaking changes that are going to impact us?   > Also should this be updated considering py38 would be non-voting? > https://governance.openstack.org/tc/reference/runtimes/ussuri.html[https://governance.openstack.org/tc/reference/runtimes/ussuri.html] I not think it would be appropriate to list 3.8 under the Ussuri runtimes. That should only list the officially targeted runtimes for the release. From sinan at turka.nl Thu Nov 7 22:56:43 2019 From: sinan at turka.nl (Sinan Polat) Date: Thu, 7 Nov 2019 23:56:43 +0100 Subject: Change Volume Type, but in use In-Reply-To: References: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> <844791131.568096.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> Message-ID: <9737AF0E-4B21-4214-B0F7-D9F47D056D60@turka.nl> Hi Sean, Currently: Ceph RBD pool name = ssdvolumes. Volume Type name = ssd Volume Type metadata pointing to backend name ssd Backend name in Cinder conf = ssd New situation: Backend name in Cinder conf will be ssdvolumes, and not ssd anymore. Since it seems it is not possible to modify a Volume Type when in use by volumes, I do not see other options. Sinan > Op 7 nov. 2019 om 23:48 heeft Sean McGinnis het volgende geschreven: > > Hi Sinan, > > Changing a backend name is generally not recommended. It is only an internal detail seen by administrators anyway, so I'm not sure it would be worth the effort to try to change it just to add "volumes" to the end. > > Sean > > Sent: Thursday, November 07, 2019 at 2:58 PM > From: "Sinan Polat" > To: "Goutham Pacha Ravi" > Cc: "OpenStack Discuss" > Subject: Re: Change Volume Type, but in use > Hi Goutham, > > Thanks for your response. > > It is correct that I have volumes that are using the volume type I want to modify. If I understand correctly, modifying volumes types is not possible as long as there are volumes using the volume type. > > Renaming the volume type and creating a new volume type won't work I guess. Since the backend name in the cinder configuration will be changed, the volumes that are using the renamed volume type won't able to find the backend and will fail, not? > > I would have the backend named "ssdvolumes" in the cinder configuration. The new volume type would have the correct backend name (ssdvolumes) but the renamed volume type would still have "ssd" as its backend name. > > Kind regards, > Sinan > > Op 7 november 2019 om 20:47 schreef Goutham Pacha Ravi : > > Hey Sinat, > > The error message suggests that you have volumes that use the volume type you're modifying. If yes, with cinder API, since the Rocky release [1], you cannot modify volume types that are currently in use. The original design for cinder volume types was that they were always mutable - and changes to existing volume types didn't affect pre-existing volumes. However, this behavior was modified in the Ocata release [2], and finally removed in the Rocky release. > > One of your options is to also rename the existing volume type and create a new one if you'd like, with the original name. > > [1] https://docs.openstack.org/releasenotes/cinder/rocky.html#relnotes-13-0-0-stable-rocky-upgrade-notes > [2] https://review.opendev.org/#/c/440680/ > > > On Thu, Nov 7, 2019 at 12:33 AM Sinan Polat wrote: > Hi, > > I am using Ceph as the backend for Cinder. Within Ceph we have defined 2 RBD pools (ssdvolumes, sasvolumes). > In OpenStack I created 2 Volume Types (ssd, sas). Each Volume Type has property "volume_backend_name='tripleo_ceph_'". > > In the Cinder configuration I have the following backends configured: > > [tripleo_ceph_ssd] > backend_host=hostgroup > volume_backend_name=tripleo_ceph_ssd > volume_driver=cinder.volume.drivers.rbd.RBDDriver > rbd_ceph_conf=/etc/ceph/ceph.conf > rbd_user=openstack > rbd_pool=ssdvolumes > > [tripleo_ceph_sas] > backend_host=hostgroup > volume_backend_name=tripleo_ceph_sas > volume_driver=cinder.volume.drivers.rbd.RBDDriver > rbd_ceph_conf=/etc/ceph/ceph.conf > rbd_user=openstack > rbd_pool=sasvolumes > > As you might have noticed, the backend name (tripleo_ceph_ssd) and the RBD pool name (ssdvolumes, not ssd) does not match. So far, we do not have any problems. But I want to correct the names and I do not want to have the mismatch anymore. > > So I want to change the value of key volume_backend_name for both Volume Types (tripleo_ceph_ssd => tripleo_ceph_ssdvolumes). > > I tried the following: > $ openstack volume type show 80cb25ff-376a-4483-b4f7-d8c75839e0ce > +--------------------+----------------------------------------+ > | Field | Value | > +--------------------+----------------------------------------+ > | access_project_ids | None | > | description | | > | id | 80cb25ff-376a-4483-b4f7-d8c75839e0ce | > | is_public | True | > | name | ssd | > | properties | volume_backend_name='tripleo_ceph_ssd' | > | qos_specs_id | None | > +--------------------+----------------------------------------+ > $ > > > $ openstack volume type set --property volume_backend_name='tripleo_ceph_ssdvolumes' 80cb25ff-376a-4483-b4f7-d8c75839e0ce > Failed to set volume type property: Volume Type is currently in use. (HTTP 400) (Request-ID: req-5efaa5b7-910f-4802-8494-3115cfc4ab93) > Command Failed: One or more of the operations failed > $ > > How to solve my problem? > > Thanks! > > Sinan -------------- next part -------------- An HTML attachment was scrubbed... URL: From nate.johnston at redhat.com Fri Nov 8 01:10:05 2019 From: nate.johnston at redhat.com (Nate Johnston) Date: Thu, 7 Nov 2019 20:10:05 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: <20191108011005.yknhtfbkvsleckdx@firewall> On Thu, Nov 07, 2019 at 11:56:51PM +0100, Sean McGinnis wrote: > My non-TC take on this... > >   > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's too late to enable voting py38 unit tests for ussuri, I'd like to at least enable non-voting py38 unit tests. This email is seeking approval and direction from the TC to move forward with enabling non-voting py38 tests. > > I think it would be great to start testing 3.8 so there are no surprises once we need to officially move there. But I would actually not want to see that run on every since patch in every single repo. > > > For some further background: The next release of Ubuntu, Focal (20.04) LTS, is scheduled to release in April 2020. Python 3.8 will be the default in the Focal release, so I'm hopeful that non-voting unit tests will help close some of the gap. >   > > I have a review here for the zuul project template enablement for ussuri: > > https://review.opendev.org/#/c/693401 > > I do not think it should be added to the ussuri jobs template. Would it be possible to add it to the template, but under the experimental queue? That way we leverage the template's ability to do the work for all projects but the job won't be executed without a specific experimental check. Thanks, Nate > I think it would be more useful as its own job for now that can be added to a select few repos as a full tempest run so a smaller number of test runs can cover a broader cross-section of projects. > > Otherwise as maybe a periodic job for now so it doesn't add to the run time and noise on every patch being submitted. > > Any idea so far from manual py38 testing if there are breaking changes that are going to impact us? >   > > Also should this be updated considering py38 would be non-voting? > > https://governance.openstack.org/tc/reference/runtimes/ussuri.html[https://governance.openstack.org/tc/reference/runtimes/ussuri.html] > > I not think it would be appropriate to list 3.8 under the Ussuri runtimes. That should only list the officially targeted runtimes for the release. > From skaplons at redhat.com Fri Nov 8 05:03:22 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 8 Nov 2019 13:03:22 +0800 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: <20191108011005.yknhtfbkvsleckdx@firewall> References: <20191108011005.yknhtfbkvsleckdx@firewall> Message-ID: <20191108050322.kpzhombboymjk4wf@skaplons-mac> On Thu, Nov 07, 2019 at 08:10:05PM -0500, Nate Johnston wrote: > On Thu, Nov 07, 2019 at 11:56:51PM +0100, Sean McGinnis wrote: > > My non-TC take on this... > > > >   > > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's too late to enable voting py38 unit tests for ussuri, I'd like to at least enable non-voting py38 unit tests. This email is seeking approval and direction from the TC to move forward with enabling non-voting py38 tests. > > > > I think it would be great to start testing 3.8 so there are no surprises once we need to officially move there. But I would actually not want to see that run on every since patch in every single repo. > > > > > For some further background: The next release of Ubuntu, Focal (20.04) LTS, is scheduled to release in April 2020. Python 3.8 will be the default in the Focal release, so I'm hopeful that non-voting unit tests will help close some of the gap. > >   > > > I have a review here for the zuul project template enablement for ussuri: > > > https://review.opendev.org/#/c/693401 > > > > I do not think it should be added to the ussuri jobs template. > > Would it be possible to add it to the template, but under the > experimental queue? That way we leverage the template's ability to do > the work for all projects but the job won't be executed without a > specific experimental check. Personally from neutron point of view I think that periodic is better than experimental as with periodic jobs we don't need to do any additional actions to run this job and see results. And we are checking periodic jobs' results every week on CI meeting. But ofcourse experimental would also work :) > > Thanks, > > Nate > > > I think it would be more useful as its own job for now that can be added to a select few repos as a full tempest run so a smaller number of test runs can cover a broader cross-section of projects. > > > > Otherwise as maybe a periodic job for now so it doesn't add to the run time and noise on every patch being submitted. > > > > Any idea so far from manual py38 testing if there are breaking changes that are going to impact us? > >   > > > Also should this be updated considering py38 would be non-voting? > > > https://governance.openstack.org/tc/reference/runtimes/ussuri.html[https://governance.openstack.org/tc/reference/runtimes/ussuri.html] > > > > I not think it would be appropriate to list 3.8 under the Ussuri runtimes. That should only list the officially targeted runtimes for the release. > > > > -- Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Fri Nov 8 05:08:16 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 8 Nov 2019 13:08:16 +0800 Subject: [kolla][ptg] Kolla Ussuri PVPTG (Partially Virtual PTG) In-Reply-To: <196d5a99-53f4-68c7-d28d-b6962abb8b3b@linaro.org> References: <196d5a99-53f4-68c7-d28d-b6962abb8b3b@linaro.org> Message-ID: <20191108050816.xs5yourrkqqxt43g@skaplons-mac> Hi, >From what I can say now, I'm streaming Neutron PTG sessions through BlueJeans for last 3 days and it works without any problems and without any vpn connection. So this may be useful also for You. On Thu, Oct 31, 2019 at 09:55:26AM +0100, Marcin Juszkiewicz wrote: > W dniu 30.10.2019 o 23:23, Kendall Nelson pisze: > > > If people were going to be in Shanghai for the Summit (or live in > > China) they wouldn't be able to participate because of the firewall. > > Can you (or someone else present in Poland) provide an alternative > > solution to Google meet so that everyone interested could join? > > Tell us which of them work for you: > > - Bluejeans > - Zoom > > > As I have access to both platforms at work. > -- Slawek Kaplonski Senior software engineer Red Hat From mdulko at redhat.com Fri Nov 8 05:53:20 2019 From: mdulko at redhat.com (=?UTF-8?Q?Micha=C5=82?= Dulko) Date: Fri, 08 Nov 2019 06:53:20 +0100 Subject: [kuryr] Meeting at PTG - follow up Message-ID: <5eee4b4f468c5b8c9b7407762165179d7b665f3f.camel@redhat.com> Hi, Thank you all for a constructive discussion at the Shanghai PTG. First of all I want to clarify that the Neutron improvements in bulk port creation code I talked about were only merged in Stein and are not available in Queens. Sorry about that, my mistake. Below I want to list some highlights from the discussion. * There seemed no pushback on switching to independent release model, so I'll follow up on that soon. * We discussed using RPC instead of API polling to detect when port becomes ACTIVE in Neutron (https://review.opendev.org/#/c/669642/). I commented on the review with a follow up that came from discussion with Neutron team. Apparently there are 2 better ways of doing this. Let's continue that discussion on the review. * We know that we don't support running as a second CNI plugin on Multus because of eth0 being hardcoded as interface name and kuryr- controller handling pods that do not had CNI requests. This is something to solve. * Apparently we all suffer the same problems with Neutron performance when starting a bigger workload on Kuryr-configured K8s cluster (think big Helm chart, multiple operators or simultaneous test suite). This all comes to reducing the number of calls to Neutron. We still don't see a feasible solution to solve this problem, but it seems to be a priority for both Samsung and Red Hat at the moment. Let's see if we'll meet again in Vancouver! Thanks, Michał From mdulko at redhat.com Fri Nov 8 06:05:26 2019 From: mdulko at redhat.com (=?UTF-8?Q?Micha=C5=82?= Dulko) Date: Fri, 08 Nov 2019 07:05:26 +0100 Subject: [kuryr] Meeting at PTG - follow up In-Reply-To: <5eee4b4f468c5b8c9b7407762165179d7b665f3f.camel@redhat.com> References: <5eee4b4f468c5b8c9b7407762165179d7b665f3f.camel@redhat.com> Message-ID: <3e8d7b10556ef04353894ff0cbaffdd17da24450.camel@redhat.com> On Fri, 2019-11-08 at 06:53 +0100, Michał Dulko wrote: > Hi, > > Thank you all for a constructive discussion at the Shanghai PTG. First > of all I want to clarify that the Neutron improvements in bulk port > creation code I talked about were only merged in Stein and are not > available in Queens. Sorry about that, my mistake. > > Below I want to list some highlights from the discussion. > > * There seemed no pushback on switching to independent release model, > so I'll follow up on that soon. > * We discussed using RPC instead of API polling to detect when port > becomes ACTIVE in Neutron (https://review.opendev.org/#/c/669642/). > I commented on the review with a follow up that came from discussion > with Neutron team. Apparently there are 2 better ways of doing this. > Let's continue that discussion on the review. > * We know that we don't support running as a second CNI plugin on > Multus because of eth0 being hardcoded as interface name and kuryr- > controller handling pods that do not had CNI requests. This is > something to solve. > * Apparently we all suffer the same problems with Neutron performance > when starting a bigger workload on Kuryr-configured K8s cluster > (think big Helm chart, multiple operators or simultaneous test > suite). This all comes to reducing the number of calls to Neutron. > We still don't see a feasible solution to solve this problem, but it > seems to be a priority for both Samsung and Red Hat at the moment. > > Let's see if we'll meet again in Vancouver! > > Thanks, > Michał I forgot to add link to raw etherpad with notes: https://etherpad.openstack.org/p/kuryr-PVG From balazs.gibizer at est.tech Fri Nov 8 06:33:11 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 8 Nov 2019 06:33:11 +0000 Subject: [nova][ptg] Virtual PTG In-Reply-To: <4254ccd8-88ca-b21d-29b6-ab4e427f3ee4@fried.cc> References: <4254ccd8-88ca-b21d-29b6-ab4e427f3ee4@fried.cc> Message-ID: <1573194777.23158.0@est.tech> On Thu, Oct 24, 2019 at 17:28, Eric Fried wrote: > Hello nova contributors and other stakeholders. > > As you are aware, nova maintainers will be sparser than usual at the > ussuri PTG. For that reason, and also because it promotes better > inclusion anyway, I'd like us to do the majority of decision making > via > the mailing list. The PTG is still a useful place to talk through > design > ideas, but this will give those not attending a voice in the final > direction. > > To that end, I call your attention to the etherpad [1]. As usual, list > your topics there. And if your topic is something for which you only > need (or wish to start with) in-person discussions (e.g. "I'd like to > do > $thing but could use some help figuring out $how"), you're done. > > But if what you're shooting for is discussion leading to some kind of > decision, like... > > - My spec has been stalled because we can't decide among N different > approaches; we need to reach a consensus. > - My feature is really important; can we please prioritize it for > ussuri? > > ...then in addition to putting your topic on the etherpad, please > initiate a (separate) thread on this mailing list, including > [nova][ptg] > in your subject line. > > Some of these topics may be resolved before the PTG itself. Others may > be discussed in Shanghai. However, even if a consensus is reached in > person, expect that decision to be tentative pending closure of the ML > thread. Now as the PTG is over (for nova at least as we finished a bit earlier) Sylvain and me will start sending out summary mails about the topics we discussed to create a place to further discuss the topics with the rest of the nova team. The etherpad[1] contains most of the agreements we reached during the discussions but of course none of them are final as we did not have a core quorum on the PTG. Thanks for everyone who contributed in any way to the PTG discussions! Cheers, gibi > > Thanks, > efried > > [1] https://etherpad.openstack.org/p/nova-shanghai-ptg > From balazs.gibizer at est.tech Fri Nov 8 07:09:31 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 8 Nov 2019 07:09:31 +0000 Subject: [nova][ptg] pinned and unpinned CPUs in one instance Message-ID: <1573196961.23158.1@est.tech> spec: https://review.opendev.org/668656 Agreements from the PTG: How we will test it: * do functional test with libvirt driver, like the pinned cpu tests we have today * donyd's CI supports nested virt so we can do pinned cpu testing but not realtime. As this CI is still work in progress we should not block on this. * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to have Naming: use the 'shared' and 'dedicated' terminology Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax will have less expression power until nova models NUMA in placement. So nova will try to evenly distribute PCPUs between numa nodes. If it not possible we reject the request and ask the user to use the hw:pinvcpus=3 syntax. Realtime mask is an exclusion mask, any vcpus not listed there has to be in the dedicated set of the instance. TODOInvestigate whether we want to enable NUMA by default * Pros: Simpler, everything is NUMA by default * Cons: We'll either have to break/make configurablethe 1:1 guest:host NUMA mapping else we won't be able to boot e.g. a 40 core shared instance on a 40 core, 2 NUMA node host Cheers, gibi From balazs.gibizer at est.tech Fri Nov 8 07:24:18 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 8 Nov 2019 07:24:18 +0000 Subject: [nova][ptg] Community Goals Message-ID: <1573197848.23158.2@est.tech> python3 readiness: Stick with only considering py3 as default when writing code, not caring about refactoring existing code for the purpose of being Py3 pedantic. Write Py3-only code on master, and eventually make the backport py2-compatible on stable branches if needed. dropping paste: Need an operator ML post: is anybody using this? aarents (OVH?) said they are using paste.ini to insert middlewares. So we cannot simply drop paste. improved contributor documentation: we have PTL doc in tree [2]. Work on the CONTRIBUTING.rst [3]. [1] https://etherpad.openstack.org/p/nova-shanghai-ptg [2] https://docs.openstack.org/nova/latest/contributor/ptl-guide.html [3] https://review.opendev.org/#/c/640970/ From balazs.gibizer at est.tech Fri Nov 8 07:32:47 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 8 Nov 2019 07:32:47 +0000 Subject: [nova][ptg]Support volume local cache Message-ID: <1573198357.23158.3@est.tech> spec: https://review.opendev.org/#/c/689070/ TODOs: LiangFang to update the spec based on the discussion in the room[1]: * use traits to driver scheduling. The cache is not sliced per instance so it cannot be a resource class * document the alternative between doing a hard scheduling decision or only implement caching as a best effort optimization for the guest. * document the alternative to do the whole cache management on libvirt (or QEMU) level [1] https://etherpad.openstack.org/p/nova-shanghai-ptg From balazs.gibizer at est.tech Fri Nov 8 08:01:58 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 8 Nov 2019 08:01:58 +0000 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship Message-ID: <1573200108.23158.4@est.tech> spec: https://review.opendev.org/#/c/650188/ Agreements from the room[1]: * new config option 'using_shared_disk_provider' (just an example name) on the compute level to ignore the DISK_GB reporting from the driver * new config option 'sharing_disk_aggregate' (just an example name) on the compute level to tell nova compute what is the UUID of the placement aggregate that contains the sharing DISK_GB providers in placement. * the "using_shared_disk_provider" flag necessarly has to be explicit since if not, it would be a chicken-and-egg problem on a greenfields install as the shared RP wouldn't be created * deployer needs to create the sharing disk RP and report inventory / traits on it * deployer needs to define the placement aggregate and add the sharing disk RP into it * when compute restarts and sees that 'using_shared_disk_provider' = True in the config, it adds the its compute RP to the aggregate defined in 'sharing_disk_aggregate' Then if it sees that the root RP still has DISK_GB inventory then trigger a reshape * os-hypervisor API response (in a new microversion) will have a link to the sharing disk RP if the compute is so configured. TODO: * tpatil to update the spec [1] https://etherpad.openstack.org/p/nova-shanghai-ptg From Tushar.Patil at nttdata.com Fri Nov 8 09:39:49 2019 From: Tushar.Patil at nttdata.com (Patil, Tushar) Date: Fri, 8 Nov 2019 09:39:49 +0000 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: <1573200108.23158.4@est.tech> References: <1573200108.23158.4@est.tech> Message-ID: Hi All, > TODO: > * tpatil to update the spec I will update the specs next week and upload it for review. Regards, tpatil ________________________________________ From: Balázs Gibizer Sent: Friday, November 8, 2019 5:01 PM To: openstack-discuss Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship spec: https://review.opendev.org/#/c/650188/ Agreements from the room[1]: * new config option 'using_shared_disk_provider' (just an example name) on the compute level to ignore the DISK_GB reporting from the driver * new config option 'sharing_disk_aggregate' (just an example name) on the compute level to tell nova compute what is the UUID of the placement aggregate that contains the sharing DISK_GB providers in placement. * the "using_shared_disk_provider" flag necessarly has to be explicit since if not, it would be a chicken-and-egg problem on a greenfields install as the shared RP wouldn't be created * deployer needs to create the sharing disk RP and report inventory / traits on it * deployer needs to define the placement aggregate and add the sharing disk RP into it * when compute restarts and sees that 'using_shared_disk_provider' = True in the config, it adds the its compute RP to the aggregate defined in 'sharing_disk_aggregate' Then if it sees that the root RP still has DISK_GB inventory then trigger a reshape * os-hypervisor API response (in a new microversion) will have a link to the sharing disk RP if the compute is so configured. TODO: * tpatil to update the spec [1] https://etherpad.openstack.org/p/nova-shanghai-ptg Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. From antoine.millet at enix.fr Fri Nov 8 09:53:57 2019 From: antoine.millet at enix.fr (Antoine Millet) Date: Fri, 08 Nov 2019 10:53:57 +0100 Subject: [ops][nova][neutron] Proper way to migrate instances between nodes with different ML2 agents types Message-ID: Hi here, I'm trying to find a solution to migrate instances between hypervisors of an openstack cluster with nodes running different ML2 agents (OVS and bridges, I'm actually migrating the whole cluster to the latter). The cluster is running Rocky. I enabled both mechanisms in the neutron- server configuration and some nodes are running the neutron- openvswitch-agent and some other the neutron-linuxbridge-agent. My network nodes (running the l3 agent) are currently running the neutron- openvswitch-agent. I also noticed that when nova-compute is starting up, VIF plugins for OVS and Bridges are loaded ("INFO os_vif [-] Loaded VIF plugins: ovs, linux_bridge"). When I start a live migration for an instance running on an hypervisor using the OVS agent to an hypervisor using the bridge agent, it fails because the destination hypervisor try to execute 'ovs-*' commands to bind the VM to its network. I also tried cold migration and just restarting an hypervisor with the bridge agent instead of the OVS one, but it fails similarly when the instances startup. After some research, I discovered that the mechanism used to bind an instance port to a network is stored in the port binding configuration in the database and that the code that executes the 'ovs-*' commands is actually located in the os_vif library that is used by the nova-compute agent. So, I tried to remove the OVS plugin from the os_vif library. Ubuntu ship both plugins in the same package so I just deleted the plugin directory in /usr/lib/python2.7/dist-packages directory (don't judge me please, it's for science ;-)). And... it worked as expected (port bindings are converted to bridge mechanism), at least for the cold migration (hot migration is cancelled without any error message, I need to investigate more). How can I do those migration the proper way? Thank you for any help! Antoine -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From info at dantalion.nl Fri Nov 8 11:45:17 2019 From: info at dantalion.nl (info at dantalion.nl) Date: Fri, 8 Nov 2019 12:45:17 +0100 Subject: [olso][taskflow] graph-flow failed task will halt execution of children? In-Reply-To: References: Message-ID: <953ef54c-4463-2f80-2997-aca339b9a369@dantalion.nl> I have a short and simple question which I couldn't find a clear answer for in the documentation. I understand that when a task raises a exception in a graph flow it will revert all parents, however, I fail to find any information if it will subsequently prevent the execution of all children. I imagine yes as the dependencies for these tasks are now unmet but I would like to know for sure. TL;DR; Does an exception in a graph-flow task prevent the execution of children? Kind Regards, Corne Lukken (Dantali0n) From smooney at redhat.com Fri Nov 8 12:20:52 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 08 Nov 2019 12:20:52 +0000 Subject: [nova][ptg] pinned and unpinned CPUs in one instance In-Reply-To: <1573196961.23158.1@est.tech> References: <1573196961.23158.1@est.tech> Message-ID: On Fri, 2019-11-08 at 07:09 +0000, Balázs Gibizer wrote: > spec: https://review.opendev.org/668656 > > Agreements from the PTG: > > How we will test it: > * do functional test with libvirt driver, like the pinned cpu tests we > have today > * donyd's CI supports nested virt so we can do pinned cpu testing but > not realtime. As this CI is still work in progress we should not block > on this. we can do realtime testing in that ci. i already did. also there is a new label that is available across 3 providers so we wont just be relying on donyd's good work. > * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to > have > > Naming: use the 'shared' and 'dedicated' terminology didn't we want to have a hw:cpu_policy=mixed specificaly for this case? > > Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra > specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax > will have less expression power until nova models NUMA in placement. So > nova will try to evenly distribute PCPUs between numa nodes. If it not > possible we reject the request and ask the user to use the > hw:pinvcpus=3 syntax. > > Realtime mask is an exclusion mask, any vcpus not listed there has to > be in the dedicated set of the instance. > > TODOInvestigate whether we want to enable NUMA by default > * Pros: Simpler, everything is NUMA by default > * Cons: We'll either have to break/make configurablethe 1:1 guest:host in the context of mix if we dont enable numa affinity by default we should remove that behavior from all case where we do it today. > NUMA mapping else we won't be able to boot e.g. a 40 core shared > instance on a 40 core, 2 NUMA node host if this is a larger question of if we should have all instance be numa by default i have argued yes for quite a while as i think having 1 code path has many advantages. that said im aware of this limitation. one way to solve this was the use of the proposed can_split placmenent paramter. so if you did not specify a numa toplogy we would add can_split=vCPUs and then create a singel or multiple numa node toplogy based on the allcoations. if we combine that with a allocation weigher we could sort the allocation candiates by smallest number of numa nodes so we would prefer landing on hosts that can fit it on 1 numa node. its a big change but long overdue. that said i have also argued the other point too in responce to pushback on "all vms have numa of 1 unless you say otherwise" i.e. that the 1:1 between mapping virtual and host numa nodes shoudl be configurable and is not required by the api today. the backwards compatible way to do that is its not requried by default if you are using shared cores and is required if you are using pinned but that is a littel confusing. i dont really know what the right answer to this is but i think its a seperate question form the topic of this thread. we dont need to solve this to enable pinned and unpinned cpus in one instance but we do need to adress this before we can model numa in placment. > > > Cheers, > gibi > > > From mriedemos at gmail.com Fri Nov 8 14:03:28 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 8 Nov 2019 08:03:28 -0600 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: <1573200108.23158.4@est.tech> References: <1573200108.23158.4@est.tech> Message-ID: <4ffbcc4c-c043-5d7a-7f7a-d78de9fc75d7@gmail.com> On 11/8/2019 2:01 AM, Balázs Gibizer wrote: > * deployer needs to create the sharing disk RP and report inventory / > traits on it > * deployer needs to define the placement aggregate and add the sharing > disk RP into it > * when compute restarts and sees that 'using_shared_disk_provider' = > True in the config, it adds the its compute RP to the aggregate defined > in 'sharing_disk_aggregate' Then if it sees that the root RP still has > DISK_GB inventory then trigger a reshape Does the compute host also get added to a nova host aggregate which mirrors the resource provider aggregate in placmeent or do we only need the placement resource provider sharing DISK_GB aggregate? -- Thanks, Matt From mriedemos at gmail.com Fri Nov 8 14:05:33 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 8 Nov 2019 08:05:33 -0600 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: <1573200108.23158.4@est.tech> References: <1573200108.23158.4@est.tech> Message-ID: <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> On 11/8/2019 2:01 AM, Balázs Gibizer wrote: > * when compute restarts and sees that 'using_shared_disk_provider' = > True in the config, it adds the its compute RP to the aggregate defined > in 'sharing_disk_aggregate' Then if it sees that the root RP still has > DISK_GB inventory then trigger a reshape Conversely, if the deployer decides to use local disk for the host again, what are the steps? 1. Change using_shared_disk_provider=False 2. Restart/SIGHUP compute service 3. Compute removes itself from the aggregate 4. Compute reshapes to add DISK_GB inventory on the root compute node resource provider and moves DISK_GB allocations from the sharing provider back to the root compute node provider. Correct? -- Thanks, Matt From corey.bryant at canonical.com Fri Nov 8 14:09:51 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Fri, 8 Nov 2019 09:09:51 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: On Thu, Nov 7, 2019 at 5:56 PM Sean McGinnis wrote: > My non-TC take on this... > > > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's > too late to enable voting py38 unit tests for ussuri, I'd like to at least > enable non-voting py38 unit tests. This email is seeking approval and > direction from the TC to move forward with enabling non-voting py38 tests. > > I think it would be great to start testing 3.8 so there are no surprises > once we need to officially move there. But I would actually not want to see > that run on every since patch in every single repo. > Just to be clear I'm only talking about unit tests right now which are generally light on resource requirements. However it would be great to also have py38 function test enablement and periodic would make sense for function tests at this point. For unit tests though it seems the benefit of knowing whether your patch regresses unit tests for the latest python version far outweighs the resources required, so I don't see much benefit in adding periodic unit test jobs. > > For some further background: The next release of Ubuntu, Focal (20.04) > LTS, is scheduled to release in April 2020. Python 3.8 will be the default > in the Focal release, so I'm hopeful that non-voting unit tests will help > close some of the gap. > > > I have a review here for the zuul project template enablement for ussuri: > > https://review.opendev.org/#/c/693401 > > I do not think it should be added to the ussuri jobs template. > > I think it would be more useful as its own job for now that can be added > to a select few repos as a full tempest run so a smaller number of test > runs can cover a broader cross-section of projects. > > Otherwise as maybe a periodic job for now so it doesn't add to the run > time and noise on every patch being submitted. > > Do you feel the same on the 2 points above for unit test enablement? Any idea so far from manual py38 testing if there are breaking changes that > are going to impact us? > I don't have enough details yet so I'll have to get back to you on that but yes there is breakage that I haven't dug into yet. > > Also should this be updated considering py38 would be non-voting? > > > https://governance.openstack.org/tc/reference/runtimes/ussuri.html[https://governance.openstack.org/tc/reference/runtimes/ussuri.html] > > I not think it would be appropriate to list 3.8 under the Ussuri runtimes. > That should only list the officially targeted runtimes for the release. > Ok, makes sense. Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Fri Nov 8 14:16:27 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 8 Nov 2019 08:16:27 -0600 Subject: [ops][nova][neutron] Proper way to migrate instances between nodes with different ML2 agents types In-Reply-To: References: Message-ID: <8b93198d-5d61-0d25-7ebd-5a10ee01642b@gmail.com> On 11/8/2019 3:53 AM, Antoine Millet wrote: > How can I do those migration the proper way? [1] was implemented in Rocky to support live migration between different networking backends (vif types). A couple of things to check: 1. Is Neutron fully upgraded to Rocky and exposing the "Port Bindings Extended" (binding-extended) extension? Nova uses that to determine if neutron is new enough to create an inactive port binding for the target host prior to starting the live migration. 2. Are your nova-compute services all upgraded to at least Rocky and reporting version >=35 in the services table in the cell1 DB? [2] 3. Do you have [compute]/upgrade_levels RPC pinned to anything below Rocky? Or is that configured to "auto"? These are things to check just to make sure the basic upgrade requirements are satisfied before the code will even attempt to do the new style binding flow for live migration. If that's all working properly, you should see this DEBUG log message on the source host during live migration [4]. [1] https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/neutron-new-port-binding-api.html [2] https://github.com/openstack/nova/blob/a90fe1951200ebd27fe74788c0a96c01104ac2cf/nova/conductor/tasks/live_migrate.py#L41 [3] https://docs.openstack.org/nova/rocky/configuration/config.html#upgrade_levels.compute [4] https://github.com/openstack/nova/blob/a90fe1951200ebd27fe74788c0a96c01104ac2cf/nova/virt/libvirt/migration.py#L317 -- Thanks, Matt From openstack at fried.cc Fri Nov 8 14:36:17 2019 From: openstack at fried.cc (Eric Fried) Date: Fri, 8 Nov 2019 08:36:17 -0600 Subject: [nova][ptg] Virtual PTG In-Reply-To: <1573194777.23158.0@est.tech> References: <4254ccd8-88ca-b21d-29b6-ab4e427f3ee4@fried.cc> <1573194777.23158.0@est.tech> Message-ID: <20db8f68-f841-6d34-a63f-305967ef9af2@fried.cc> Thanks very much to all those who facilitated and participated on site! > Now as the PTG is over (for nova at least as we finished a bit earlier) > Sylvain and me will start sending out summary mails about the topics we > discussed to create a place to further discuss the topics with the rest > of the nova team. The etherpad[1] contains most of the agreements we > reached during the discussions but of course none of them are final as > we did not have a core quorum on the PTG. efried . From antoine.millet at enix.fr Fri Nov 8 15:53:27 2019 From: antoine.millet at enix.fr (Antoine Millet) Date: Fri, 08 Nov 2019 16:53:27 +0100 Subject: [ops][nova][neutron] Proper way to migrate instances between nodes with different ML2 agents types In-Reply-To: <8b93198d-5d61-0d25-7ebd-5a10ee01642b@gmail.com> References: <8b93198d-5d61-0d25-7ebd-5a10ee01642b@gmail.com> Message-ID: Matt, Thank you for your answer! > > 1. Is Neutron fully upgraded to Rocky and exposing the "Port > Bindings > Extended" (binding-extended) extension? Nova uses that to determine > if > neutron is new enough to create an inactive port binding for the > target > host prior to starting the live migration. I'm not sure how to test that but my neutron components are all upgraded to rocky / 13.0.4. > > 2. Are your nova-compute services all upgraded to at least Rocky and > reporting version >=35 in the services table in the cell1 DB? [2] I can confirm that. > > 3. Do you have [compute]/upgrade_levels RPC pinned to anything below > Rocky? Or is that configured to "auto"? All the upgrade_levels on compute are pinned to auto (control plane and nodes). > If that's all working properly, you should see this DEBUG log message > on > the source host during live migration [4]. I can actually see this message on the nova-compute logs: 2019-11-08 16:34:18.599 3995 DEBUG nova.virt.libvirt.migration [-] [instance: 3dbf401b-19bf-4342-a04e-6ac9cff99efe] Updating guest XML with vif config: And here is the problem at the same time at the destination: 2019-11-08 15:34:23.720 4434 ERROR os_vif AgentError: Error during following call to agent: ['ovs-vsctl', '--timeout=120', '--', '--if- exists', 'del-port', u'br-int', u'qvoea312a58-2e'] Antoine -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From smooney at redhat.com Fri Nov 8 16:23:22 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 08 Nov 2019 16:23:22 +0000 Subject: [ops][nova][neutron] Proper way to migrate instances between nodes with different ML2 agents types In-Reply-To: References: Message-ID: On Fri, 2019-11-08 at 10:53 +0100, Antoine Millet wrote: > Hi here, > > I'm trying to find a solution to migrate instances between hypervisors > of an openstack cluster with nodes running different ML2 agents (OVS > and bridges, I'm actually migrating the whole cluster to the latter). > > The cluster is running Rocky. I enabled both mechanisms in the neutron- > server configuration and some nodes are running the neutron- > openvswitch-agent and some other the neutron-linuxbridge-agent. My > network nodes (running the l3 agent) are currently running the neutron- > openvswitch-agent. I also noticed that when nova-compute is starting > up, VIF plugins for OVS and Bridges are loaded ("INFO os_vif [-] Loaded > VIF plugins: ovs, linux_bridge"). > > When I start a live migration for an instance running on an hypervisor > using the OVS agent to an hypervisor using the bridge agent, it fails > because the destination hypervisor try to execute 'ovs-*' commands to > bind the VM to its network. I also tried cold migration and just > restarting an hypervisor with the bridge agent instead of the OVS one, > but it fails similarly when the instances startup. > > After some research, I discovered that the mechanism used to bind an > instance port to a network is stored in the port binding configuration > in the database and that the code that executes the 'ovs-*' commands is > actually located in the os_vif library that is used by the nova-compute > agent. > > So, I tried to remove the OVS plugin from the os_vif library. Ubuntu > ship both plugins in the same package so I just deleted the plugin > directory in /usr/lib/python2.7/dist-packages directory (don't judge me > please, it's for science ;-)). And... it worked as expected (port > bindings are converted to bridge mechanism), at least for the cold > migration (hot migration is cancelled without any error message, I need > to investigate more). so while that is an inventive approch os-vif is not actully involved in the port binding process it handles port pluggin later. i did some testing aroudn this usecase back in 2018 and found a number of gaps that need to be addressed to support live migration between linux brige and ovs or viseversa first the bridge name not set in vif:binding-details by ml2/linux-bridge https://bugs.launchpad.net/neutron/+bug/1788012 os if we try to go from ovs to linuxbridge we generates the wrong xml and try to add the port to a linux bridge called br-int Updating guest XML with vif config: Aug 14 12:15:27 devstack1 nova-compute[14852]: Aug 14 12:15:27 devstack1 nova-compute[14852]: Aug 14 12:15:27 devstack1 nova-compute[14852]: Aug 14 12:15:27 devstack1 nova-compute[14852]: Aug 14 12:15:27 devstack1 nova-compute[14852]: Aug 14 12:15:27 devstack1 nova-compute[14852]: using mixed linux bridnge and ovs host also has other proablems if you are using vxlan or gre because neutron does not form mesh tunnel overly between different ml2 driver. https://bugs.launchpad.net/neutron/+bug/1788023 the linux bridge plugin also uses a different tcp port for reasons(vxlan was merged in teh linux kernel before the inan port number was assigned.) so in effect there is not support way to do this with a live migration in rocky but there ways to force it to work. the simpelest way to do this is to cold migrate followed by a hard reboot but you need to add both ovs and linux bridge tools on each host but only have 1 agent running. you can also live migrate twice to the same host and hard reboot. the first migration will fail. the second should succeed but result in the vm tap device being connected to the wrong bridge and the hard reboot fixes it. > way? > > Thank you for any help! > > Antoine From johnsomor at gmail.com Fri Nov 8 16:35:48 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Fri, 8 Nov 2019 08:35:48 -0800 Subject: [olso][taskflow] graph-flow failed task will halt execution of children? In-Reply-To: <953ef54c-4463-2f80-2997-aca339b9a369@dantalion.nl> References: <953ef54c-4463-2f80-2997-aca339b9a369@dantalion.nl> Message-ID: Short answer is yes. When a task fails and the revert path starts, it goes back up the graph executing the revert methods and will not execute any children beyond the failed task. That said, there is an option in the engine to disable reverts (execution will simply halt at the failed task), there are ways to make decision paths, and there is a pretty robust set of retry tools that can be applied in a revert situation. Michael On Fri, Nov 8, 2019 at 2:49 AM info at dantalion.nl wrote: > > I have a short and simple question which I couldn't find a clear > answer for in the documentation. > > I understand that when a task raises a exception in a graph flow it > will revert all parents, however, I fail to find any information if it > will subsequently prevent the execution of all children. > > I imagine yes as the dependencies for these tasks are now unmet but I > would like to know for sure. > > TL;DR; Does an exception in a graph-flow task prevent the execution of > children? > > Kind Regards, > Corne Lukken (Dantali0n) > From mark at stackhpc.com Fri Nov 8 18:20:02 2019 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 8 Nov 2019 18:20:02 +0000 Subject: [kolla][ptg] Kolla Ussuri PVPTG (Partially Virtual PTG) In-Reply-To: References: Message-ID: Thank you to everyone who joined in the discussion over the last two days in the Kolla PTG. It is nice to put a face/voice to some IRC nicks :) We covered a lot of topics, and I think we made some good progress on many of them. At the end of today's session, we listed all of the potential work for the Ussuri cycle in an Etherpad [1] and voted for which items we think should be prioritised in the Ussuri cycle. I would like to invite anyone in the Kolla community who has not yet voted to do so, even if you did not attend the PTG sessions. Input from the community is very valuable, and will help to guide our efforts as maintainers. The voting process is described at the top of the pad. For anyone who would like to see the notes from the discussions, they are at [2]. Please get in touch via IRC if you have any questions. [1] https://etherpad.openstack.org/p/kolla-ussuri-priorities [2] https://etherpad.openstack.org/p/kolla-ussuri-ptg Cheers, Mark On Mon, 4 Nov 2019 at 14:15, Mark Goddard wrote: > > On Wed, 30 Oct 2019 at 17:26, Radosław Piliszek > wrote: > > > > Hello Everyone, > > > > As you may already know, Kolla core team is mostly not present on summit in Shanghai. > > Instead we are organizing a PTG next week, 7-8th Nov (Thu-Fri), in Białystok, Poland. > > Please let me know this week if you are interested in coming in person. > > > > We invite operators, contributors and contributors-to-be to join us for the virtual PTG online. > > The time schedule will be advertised later. > > After polling participants, we have agreed to meet at 1400 - 1800 UTC > on Thursday and Friday this week. Since not all participants can make > the first hour, we will adjust the schedule accordingly. > > Marcin will follow with connection details for the Zoom video conference. > > Please continue to update the etherpad with potential topics for > discussion. I will propose a rough agenda over the next few days. > > Mark > > > > > Please fill yourself in on the whiteboard [1]. > > New ideas are welcome. > > > > [1] https://etherpad.openstack.org/p/kolla-ussuri-ptg > > > > Kind regards, > > Radek aka yoctozepto > > From zigo at debian.org Fri Nov 8 21:17:57 2019 From: zigo at debian.org (Thomas Goirand) Date: Fri, 8 Nov 2019 22:17:57 +0100 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: On 11/7/19 11:56 PM, Sean McGinnis wrote: > My non-TC take on this... > >   >> Python 3.8 is available in Ubuntu Bionic now and while I understand it's too late to enable voting py38 unit tests for ussuri, I'd like to at least enable non-voting py38 unit tests. This email is seeking approval and direction from the TC to move forward with enabling non-voting py38 tests. > > I think it would be great to start testing 3.8 so there are no surprises once we need to officially move there. But I would actually not want to see that run on every since patch in every single repo. > >> For some further background: The next release of Ubuntu, Focal (20.04) LTS, is scheduled to release in April 2020. Python 3.8 will be the default in the Focal release, so I'm hopeful that non-voting unit tests will help close some of the gap. >   >> I have a review here for the zuul project template enablement for ussuri: >> https://review.opendev.org/#/c/693401 > > I do not think it should be added to the ussuri jobs template. > > I think it would be more useful as its own job for now that can be added to a select few repos as a full tempest run so a smaller number of test runs can cover a broader cross-section of projects. > > Otherwise as maybe a periodic job for now so it doesn't add to the run time and noise on every patch being submitted. > > Any idea so far from manual py38 testing if there are breaking changes that are going to impact us? >   >> Also should this be updated considering py38 would be non-voting? >> https://governance.openstack.org/tc/reference/runtimes/ussuri.html[https://governance.openstack.org/tc/reference/runtimes/ussuri.html] > > I not think it would be appropriate to list 3.8 under the Ussuri runtimes. That should only list the officially targeted runtimes for the release. Sean and everyone else, Pardon me, but I have to rant here... :) Please try see things from a downstream consumer point of view. This isn't the Python 2.7 era anymore, where we had a stable python for like forever. OpenStack needs to move quicker to newer Python 3 versions, especially considering that Python 2.7 isn't an option for anyone anymore. While your proposal (ie: less jobs on Python 3.8) looks like a nice first approach, it is my strong believe that the project should quickly move to voting and full Python 3.8 testing, and preferably, have it in order, with functional testing, for the whole project, by the time Ussuri is released. I know what's going to happen: I'll tell about a bug in Python 3.8 on IRC, and someone will reply to me "hey, that's not supported by OpenStack before the V release, please go away", even though as downstream distribution package maintainer, we don't have the power to decide what version of Python our distribution runs on (ie: both Debian Sid, Ubuntu and Fedora are quickly moving targets). There's absolutely no excuse for the OpenStack project to be dragging its feet, apart maybe the fact that it may not be easy to setup the infra to run tests on Py3.8 just yet. It isn't a normal situation that downstream distributions get the shit (pardon my french) and get to be the ones fixing issues and proposing patches (Corey, you've done awesome job on this...), yet it's been the case for nearly every single Python 3 releases. I very much would appreciate this situation to be fixed, and the project moving faster. Cheers, Thomas Goirand (zigo) From skaplons at redhat.com Sat Nov 9 02:16:07 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Sat, 9 Nov 2019 10:16:07 +0800 Subject: [infra] Etherpad problem Message-ID: <20191109021607.ffm5lkwkfohl22fq@skaplons-mac> Hi, Just at the end of the ptg sessions, neutron etherpad was got broken somehow. Now when I try to open [1] I see only something like: An error occurred The error was reported with the following id: 'igzOahZ6ruH0eSUAWKaj' Please press and hold Ctrl and press F5 to reload this page, if the problem persists please send this error message to your webmaster: 'ErrorId: igzOahZ6ruH0eSUAWKaj URL: https://etherpad.openstack.org/p/Shanghai-Neutron-Planning UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:70.0) Gecko/20100101 Firefox/70.0 TypeError: r.dropdowns is undefined in https://etherpad.openstack.org/javascripts/lib/ep_etherpad-lite/static/js/pad.js?callback=require.define at line 18' We can open one of the previous versions which is available at [2] but I don't know how we can fix original etherpad or restore version from [2] to be original etherpad and make it working again. Can someone from infra team check that for us maybe? Thx in advance for any help. [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning/timeslider#16117 -- Slawek Kaplonski Senior software engineer Red Hat From haleyb.dev at gmail.com Sat Nov 9 16:31:36 2019 From: haleyb.dev at gmail.com (Brian Haley) Date: Sun, 10 Nov 2019 00:31:36 +0800 Subject: [infra] Etherpad problem In-Reply-To: <20191109021607.ffm5lkwkfohl22fq@skaplons-mac> References: <20191109021607.ffm5lkwkfohl22fq@skaplons-mac> Message-ID: <6c3b8ca3-d2d4-7eb4-d615-5396eaf1cdef@gmail.com> On 11/9/19 10:16 AM, Slawek Kaplonski wrote: > Hi, > > Just at the end of the ptg sessions, neutron etherpad was got broken somehow. > Now when I try to open [1] I see only something like: > > An error occurred > The error was reported with the following id: 'igzOahZ6ruH0eSUAWKaj' > > Please press and hold Ctrl and press F5 to reload this page, if the problem > persists please send this error message to your webmaster: > 'ErrorId: igzOahZ6ruH0eSUAWKaj > URL: https://etherpad.openstack.org/p/Shanghai-Neutron-Planning > UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:70.0) Gecko/20100101 > Firefox/70.0 > TypeError: r.dropdowns is undefined in > https://etherpad.openstack.org/javascripts/lib/ep_etherpad-lite/static/js/pad.js?callback=require.define > at line 18' > > > We can open one of the previous versions which is available at [2] but I don't > know how we can fix original etherpad or restore version from [2] to be original > etherpad and make it working again. > Can someone from infra team check that for us maybe? > Thx in advance for any help. > > [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning > [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning/timeslider#16117 Hi Slawek, When I just went to check this etherpad now, noticed I had a tab open that was in "Force reconnect" state. I made a copy of that, just might be a little out of date on the last items. The formatting is also a little odd, but at least it's better than nothing if we can't get the original back. https://etherpad.openstack.org/p/neutron-ptg-temp -Brian From fungi at yuggoth.org Sat Nov 9 17:09:44 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sat, 9 Nov 2019 17:09:44 +0000 Subject: [infra] Etherpad problem In-Reply-To: <6c3b8ca3-d2d4-7eb4-d615-5396eaf1cdef@gmail.com> References: <20191109021607.ffm5lkwkfohl22fq@skaplons-mac> <6c3b8ca3-d2d4-7eb4-d615-5396eaf1cdef@gmail.com> Message-ID: <20191109170943.w4l2g5qd34g5m2tw@yuggoth.org> On 2019-11-10 00:31:36 +0800 (+0800), Brian Haley wrote: [...] > When I just went to check this etherpad now, noticed I had a tab > open that was in "Force reconnect" state. I made a copy of that, > just might be a little out of date on the last items. The > formatting is also a little odd, but at least it's better than > nothing if we can't get the original back. [...] This happens from time to time. We can get a dump of the previous revision from the API if needed and paste that into a new pad, but yeah formatting and author colors will be lost in the process. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From balazs.gibizer at est.tech Sun Nov 10 08:09:18 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 08:09:18 +0000 Subject: [nova][ptg] Resource provider delete at service delete Message-ID: <1573373353.31166.0@est.tech> ML thread: http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007135.html Agreements in the room: * Check ongoing migration and reject the delete if migration with this compute having the source node exists. Let operator confirm the migrations * Cascade delete providers and allocations in placement. * in case of evacuated instances this is the right thing to do * in any other dangling allocation case nova has the final thrut so nova has the authority to delete them. * Document possible ways to reconcile Placement with Nova using heal_allocations and eventually the audit command once it's merged. Cheers, gibi From balazs.gibizer at est.tech Sun Nov 10 08:10:58 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 08:10:58 +0000 Subject: [nova][ptg] Flavor explosion Message-ID: <1573373452.31166.1@est.tech> original spec: https://review.opendev.org/#/c/663563 with -2s The first round of discussion was resulted in no agreement. Then on Friday we revisited the issue based on mdbooth's proposal about composability. Agreement in the room: * Do not try to change the model of the flavor in nova code and in the db. * Define a "ComposableFlavorBit" (bikeshed on the name please) REST API entity that can hold any kind of flavor bits (extra specs, normal flavor fields), propose some format in the spec for it. This entity can only be created by the admin by default * Extend the server create REST API to allow the end user to specify what "ComposableFlavorBit"s she wants to add to the "base" flavor she used in the create request. * The nova api then merges the "ComposableFalvorBit"s with the base flavor and embed the resulted flavor object into the instance. * Do a similar thing for resize TODOs: * brinzhang (with possible help from yawang) to re-write the spec Cheers, gibi From sean.mcginnis at gmx.com Sun Nov 10 10:44:24 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Sun, 10 Nov 2019 04:44:24 -0600 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: <20191110104424.GA6424@sm-workstation> > > Sean and everyone else, > > Pardon me, but I have to rant here... :) > Please try see things from a downstream consumer point of view. > > This isn't the Python 2.7 era anymore, where we had a stable python for > like forever. OpenStack needs to move quicker to newer Python 3 > versions, especially considering that Python 2.7 isn't an option for > anyone anymore. While your proposal (ie: less jobs on Python 3.8) looks > like a nice first approach, it is my strong believe that the project > should quickly move to voting and full Python 3.8 testing, and > preferably, have it in order, with functional testing, for the whole > project, by the time Ussuri is released. > We've had this debate many times now. Nothing has changed the fact that we cannot make something an official runtime until there is an official distro out there with that as the runtime. There is not for Ussuri. > I know what's going to happen: I'll tell about a bug in Python 3.8 on > IRC, and someone will reply to me "hey, that's not supported by > OpenStack before the V release, please go away", even though as > downstream distribution package maintainer, we don't have the power to > decide what version of Python our distribution runs on (ie: both Debian > Sid, Ubuntu and Fedora are quickly moving targets). > I very highly doubt that, and very much disagree that someone will say to go away. From what I have seen, the majority of the community is very responsive to issues raised about future version problems. Fixing and working with py3.8 is not what is being discussed here. Only whether those jobs to validate py3.8 should run on every patch or not. > There's absolutely no excuse for the OpenStack project to be dragging > its feet, apart maybe the fact that it may not be easy to setup the > infra to run tests on Py3.8 just yet. It isn't a normal situation that > downstream distributions get the shit (pardon my french) and get to be > the ones fixing issues and proposing patches (Corey, you've done awesome > job on this...), yet it's been the case for nearly every single Python 3 > releases. I very much would appreciate this situation to be fixed, and > the project moving faster. > > Cheers, > > Thomas Goirand (zigo) > From zhangbailin at inspur.com Sun Nov 10 12:51:07 2019 From: zhangbailin at inspur.com (=?gb2312?B?QnJpbiBaaGFuZyjVxbDZwdYp?=) Date: Sun, 10 Nov 2019 12:51:07 +0000 Subject: [nova][ptg] Flavor explosion Message-ID: Gibi: The summary on the ML write in the nova ptg etherpad line 245 cannot open, raised a 404 error. Etherpad: https://etherpad.openstack.org/p/nova-shanghai-ptg Thanks. Brin Zhang items: [lists.openstack.org代发][nova][ptg] Flavor explosion original spec: https://review.opendev.org/#/c/663563 with -2s The first round of discussion was resulted in no agreement. Then on Friday we revisited the issue based on mdbooth's proposal about composability. Agreement in the room: * Do not try to change the model of the flavor in nova code and in the db. * Define a "ComposableFlavorBit" (bikeshed on the name please) REST API entity that can hold any kind of flavor bits (extra specs, normal flavor fields), propose some format in the spec for it. This entity can only be created by the admin by default * Extend the server create REST API to allow the end user to specify what "ComposableFlavorBit"s she wants to add to the "base" flavor she used in the create request. * The nova api then merges the "ComposableFalvorBit"s with the base flavor and embed the resulted flavor object into the instance. * Do a similar thing for resize TODOs: * brinzhang (with possible help from yawang) to re-write the spec Cheers, gibi From fungi at yuggoth.org Sun Nov 10 15:41:39 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sun, 10 Nov 2019 15:41:39 +0000 Subject: [nova][ptg] Flavor explosion In-Reply-To: References: Message-ID: <20191110154139.sdom27sbhncpp6lm@yuggoth.org> On 2019-11-10 12:51:07 +0000 (+0000), Brin Zhang(张百林) wrote: > The summary on the ML write in the nova ptg etherpad line 245 > cannot open, raised a 404 error. > Etherpad: https://etherpad.openstack.org/p/nova-shanghai-ptg [...] It links to http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010642.html which opens just fine for me. Maybe someone has corrected it? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From balazs.gibizer at est.tech Sun Nov 10 16:05:01 2019 From: balazs.gibizer at est.tech (=?utf-8?B?QmFsw6F6cyBHaWJpemVy?=) Date: Sun, 10 Nov 2019 16:05:01 +0000 Subject: [nova][ptg] Flavor explosion In-Reply-To: <20191110154139.sdom27sbhncpp6lm@yuggoth.org> References: <20191110154139.sdom27sbhncpp6lm@yuggoth.org> Message-ID: <1573401895.31166.2@est.tech> On Sun, Nov 10, 2019 at 15:41, Jeremy Stanley wrote: > On 2019-11-10 12:51:07 +0000 (+0000), Brin Zhang(张百林) wrote: >> The summary on the ML write in the nova ptg etherpad line 245 >> cannot open, raised a 404 error. >> Etherpad: https://etherpad.openstack.org/p/nova-shanghai-ptg > [...] > > It links to > http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010642.html > which opens just fine for me. Maybe someone has corrected it? The link was wrong at L269 ( it ended with htm instead of html). It is fixed now. Cheers, gibi > -- > Jeremy Stanley From zhangbailin at inspur.com Sun Nov 10 16:09:06 2019 From: zhangbailin at inspur.com (=?gb2312?B?QnJpbiBaaGFuZyjVxbDZwdYp?=) Date: Sun, 10 Nov 2019 16:09:06 +0000 Subject: [nova][ptg] Flavor explosion Message-ID: Hi all, Based on the discussion on the Train PTG, and reference to the records on the etherpad and ML, I was updated that SPEC, and I think there are some details need to be discussed, and I have listed some details, if there are any other things that I have not considered, or if some place that I thoughtless, please post a discussion. List some details as follows, and you can review that spec in https://review.opendev.org/#/c/663563. Listed details: - Don't change the model of the flavor in nova code and in the db. - No change for operators who choose not to request the flavor extra specs group. - Requested more than one flavor extra specs groups, if there are different values for the same spec will be raised a 409. - Flavor in request body of server create that has the same spec in the request ``flavor_extra_specs_group``, it will be raised a 409. - When resize an instance, you need to compare the ``flavor_extra_specs_group`` with the spec request spec, otherwise raise a 400. ---------------------------------------------------------------------------------------- Items: [lists.openstack.org代发][nova][ptg] Flavor explosion original spec: https://review.opendev.org/#/c/663563 with -2s The first round of discussion was resulted in no agreement. Then on Friday we revisited the issue based on mdbooth's proposal about composability. Agreement in the room: * Do not try to change the model of the flavor in nova code and in the db. * Define a "ComposableFlavorBit" (bikeshed on the name please) REST API entity that can hold any kind of flavor bits (extra specs, normal flavor fields), propose some format in the spec for it. This entity can only be created by the admin by default * Extend the server create REST API to allow the end user to specify what "ComposableFlavorBit"s she wants to add to the "base" flavor she used in the create request. * The nova api then merges the "ComposableFalvorBit"s with the base flavor and embed the resulted flavor object into the instance. * Do a similar thing for resize TODOs: * brinzhang (with possible help from yawang) to re-write the spec Cheers, gibi From balazs.gibizer at est.tech Sun Nov 10 16:15:15 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 16:15:15 +0000 Subject: [nova][ptg] Expose auto converge and post copy Message-ID: <1573402509.31166.3@est.tech> spec: https://review.opendev.org/#/c/687199 There was multiple discussion during the PTG around this. I think at the end mdbooth and yawang found a possible solution that they liked and sounded OK to me too. Unfortunately the etherpad was not updated and my memory is bad enough that I cannot gather what was the exact proposal. yawang: could you please post a short summary here and / or simply update the spec with the discussed solution? Cheers, gibi From balazs.gibizer at est.tech Sun Nov 10 16:17:52 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 16:17:52 +0000 Subject: [nova][ptg] Other small discussions Message-ID: <1573402667.31166.4@est.tech> image precache support * aarents (OVH?) interested in the feauter * TODO: aarents to talk to dansmith about future improvements like rate limiting nova implications of a starlingx bug: https://bugs.launchpad.net/starlingx/+bug/1850834 * TODO: yawang to file a nova bug if reproducible * TODO: yawang to propose a patch in nova based on the starlingx fix https://gist.github.com/VictorRodriguez/e137a8cd87cf821f8076e9acc02ce195 vm scoped sriov numa affinity * spec: https://review.opendev.org/#/c/683174/4 * there was not enough knowledge in the room to really discuss * TODO: gibi to read and comment the spec midcylce * stephenfin will propose a midcycle disussion on the ML From balazs.gibizer at est.tech Sun Nov 10 16:23:03 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 16:23:03 +0000 Subject: [nova][ironic][ptg] reconfigure the baremetal node through the resize API Message-ID: <1573402969.31166.5@est.tech> ironic spec https://review.opendev.org/#/c/672252/ TODOs: * alex_xu (?) to create a nova spec to list possible alternatives about which nova REST API could be used to trigger reconfigure (resize, reboot, new api) * A potential way forward would be Ironic supporting resize (confirm/revert) as well with some magical autodetection of an Ironic instance *or* some magical flavor-based solution Cheers, gibi From balazs.gibizer at est.tech Sun Nov 10 16:29:53 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 16:29:53 +0000 Subject: [nova][ptg] drop the shadow table concept Message-ID: <1573403387.31166.6@est.tech> CERN reported two issues with archive_deleted_rows CLI: * When one record gets inserted into the shadow_instance_extra but didn't get deleted from instance_extra (I know this is in a single transaction but sometimes it happens), needs manual cleanup on the database * Also there could be two cells running this command at the same time fighting for the API db lock, TODOs: * tssurya to report bugs / improvements on archive_deleted_rows CLI based on CERN's experience with long table locking * mnaser to report a wishlist bug / specless bp about one step db purge CLI which would skip the shadow tables Cheers, gibi From balazs.gibizer at est.tech Sun Nov 10 16:32:35 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 16:32:35 +0000 Subject: [nova][ptg] Ussuri cycle themes Message-ID: <1573403550.31166.7@est.tech> Ussuri themes proposal: * Policy work Note for the operators that this work only provides value after the whole work is ready and it is pretty possible that this will not finish in the current cycle. * Unified limits Discuss on the ML how can we help oslo to progress with the oslo-limits work. If we can help with that then, make unified limits as a cycle Goal. * Cyborg - Nova integration (bauzas) I'm OK with making Cyborg a priority for this cycle provided we don't hold any attempt to start thinking on fixing PCI tracking. We definitely needs to discuss this further with the whole team. Cheers, gibi From balazs.gibizer at est.tech Sun Nov 10 16:35:40 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 16:35:40 +0000 Subject: [nova][neutron][ptg] Nova - Neutron cross project dicussions Message-ID: <1573403737.31166.8@est.tech> https://etherpad.openstack.org/p/ptg-ussuri-xproj-nova-neutron Nova-Neutron live-migration - ACTIVE/INACTIVE port bindings issue * nova needs vif plug event for inactive bindings * live migration with ml2-ovs only works by luck * agentless drivers does not support this https://bugs.launchpad.net/neutron/+bug/1834045 https://bugs.launchpad.net/neutron/+bug/1840147 Agreements: * document the nova-neutron live migration workflow to create a common base for discussion. Add ijw and sean-k-mooney to the review SR-IOV live migration utilizing kernel NET_FAILOVER feature * aim to support live migration with SRIOV nic without traffic interrupt by bonding a virtio interface to the SRIOV nic in the guest kernel with NET_FAILOVER. * https://www.kernel.org/doc/html/latest/networking/net_failover.html * There was multiple solution proposlas but no agreements about which one to pursue. * TODO: adrianc to propose a nova spec based on the discussion Correlation of Bandwidth RP with PCI RP when PCI will be tracked in Placement * Early heads up to Neutron team that Nova things about modelling PCI in Placement * Current best thinking on Nova side is to correlate BW RP with PCI RP in placement with a placement aggregate. * Nova will do the correlation * Neutron needs to prepare for possible BW RP generation conflict * Agreement: Neutron is OK with this plan. rubasov can help with the neutron impact when nova makes progress. From balazs.gibizer at est.tech Sun Nov 10 16:44:55 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sun, 10 Nov 2019 16:44:55 +0000 Subject: [nova][ironic][ptg] Resource tracker scaling issues Message-ID: <1573404293.31166.9@est.tech> * COMPUTE_RESOURCE_SEMAPHORE blocks instance creation on all nodes (on the same host) while the _update_available_resource runs on all nodes. On 3500 baremetal nodes _update_available_resource takes 1.5 hour. * Do we still need _update_available_resource periodic task to run for ironic nodes? * Reduce the scope of the COMPUTE_RESOURCE_SEMAPHORE lock * https://review.opendev.org/#/c/682242/ * https://review.opendev.org/#/c/677790/ * changing a locking scheme is frightening => we need more testing Agreement: * Do a tempest test with a lot of fake ironic node records to have a way to test if changing the locking scheme breaks anything * Log a bug and propose a patch for having a per-node lock instead of the same object for all the ResourceTrackers * See also whether concurrency helps * Propose a spec if you really want to pursue the idea of being somehow inconsistent with data by not having a lock Cheers, gibi From mriedemos at gmail.com Sun Nov 10 20:41:11 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Sun, 10 Nov 2019 14:41:11 -0600 Subject: [nova][ptg] drop the shadow table concept In-Reply-To: <1573403387.31166.6@est.tech> References: <1573403387.31166.6@est.tech> Message-ID: On 11/10/2019 10:29 AM, Balázs Gibizer wrote: > * Also there could be two cells running this command at the same time > fighting for the API db lock, In Train the --all-cells option was added to the CLI so that should resolve this issue. I think Mel said she backported those changes internally so I'm not sure how hard it would be for those to go back to Stein or Rocky or whatever release CERN is using now. -- Thanks, Matt From mriedemos at gmail.com Sun Nov 10 21:04:56 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Sun, 10 Nov 2019 15:04:56 -0600 Subject: [nova][ptg] drop the shadow table concept In-Reply-To: <1573403387.31166.6@est.tech> References: <1573403387.31166.6@est.tech> Message-ID: <30ebbfc9-bfcc-2fc7-ce39-a1996266ec0b@gmail.com> On 11/10/2019 10:29 AM, Balázs Gibizer wrote: > * When one record gets inserted into the shadow_instance_extra but > didn't get deleted from instance_extra (I know this is in a single > transaction but sometimes it happens), needs manual cleanup on the > database Is this potentially caused by the issue attempting to be fixed here? https://review.opendev.org/#/c/412771/ -- Thanks, Matt From mriedemos at gmail.com Sun Nov 10 21:07:51 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Sun, 10 Nov 2019 15:07:51 -0600 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: <1573404293.31166.9@est.tech> References: <1573404293.31166.9@est.tech> Message-ID: <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> On 11/10/2019 10:44 AM, Balázs Gibizer wrote: > On 3500 baremetal nodes _update_available_resource takes 1.5 hour. Why have a single nova-compute service manage this many nodes? Or even 1000? Why not try to partition things a bit more reasonably like a normal cell where you might have ~200 nodes per compute service host (I think CERN keeps their cells to around 200 physical compute hosts for scaling)? That way you can also leverage the compute service hashring / failover feature for HA? I realize the locking stuff is not great, but at what point is it unreasonable to expect a single compute service to manage that many nodes/instances? -- Thanks, Matt From eandersson at blizzard.com Sun Nov 10 23:06:16 2019 From: eandersson at blizzard.com (Erik Olof Gunnar Andersson) Date: Sun, 10 Nov 2019 23:06:16 +0000 Subject: [Senlin] Splitting senlin-engine into three services In-Reply-To: References: Message-ID: We are looking at merging this in a few days if there is no additional feedback. Also, in case someone has experience with adding new services to a project. Is there a general way of requesting rpms etc to be updated for the next release to support the new services? Especially for the workflows that are handled outside of the official openstack repos. ________________________________ From: Erik Olof Gunnar Andersson Sent: Monday, November 4, 2019 5:11 PM To: openstack-discuss at lists.openstack.org Cc: Duc Truong Subject: [Senlin] Splitting senlin-engine into three services We are looking into splitting the senlin-engine into three components (senlin-conductor, senlin-engine and senlin-health-manager) and wanted to get some feedback. The main goal here is to make the components more resilient and to reduce the number of threads per worker. Each one of the components already had it's own thread pool and in theory each worker could end up with thousands of thread. In the current version (Train) the engine process hosts these services. https://github.com/openstack/senlin/blob/stable/train/senlin/engine/dispatcher.py#L31 https://github.com/openstack/senlin/blob/stable/train/senlin/engine/health_manager.py#L865 https://github.com/openstack/senlin/blob/stable/train/senlin/engine/service.py#L79 In my patch we move two our of these out of the engine and into it's own service namespace. Split engine service into three services https://review.opendev.org/#/c/688784/ Please feel free to comment on the patch set, or let reply to this email with general feedback or concerns. Best Regards, Erik Olof Gunnar Andersson -------------- next part -------------- An HTML attachment was scrubbed... URL: From wang.ya at 99cloud.net Mon Nov 11 02:42:35 2019 From: wang.ya at 99cloud.net (wang.ya) Date: Mon, 11 Nov 2019 10:42:35 +0800 Subject: [nova][ptg] Expose auto converge and post copy In-Reply-To: <1573402509.31166.3@est.tech> References: <1573402509.31166.3@est.tech> Message-ID: <5717FBD3-8373-4EAA-9EF4-5F3EFDE7B53F@99cloud.net> Hi: * The original method(expose the auto converge/post copy in image properties/flavor extra specs) exposed features of hypervisor layer directly, and it affects scheduling. Therefore, it's not appropriate. * The new method is add a new parameter: "no-performance-impact". If the parameter set during live migrate, the libvirt driver will disable the auto converge and post copy functions. User can tag their instances as "no performance impact" in instance metadata or somewhere else, operator can check the tag to decide whether add the parameter before live migrate. I will write a new spec to describe these in detail :) Best Regards On 2019/11/11, 12:15 AM, "Balázs Gibizer" wrote: spec: https://review.opendev.org/#/c/687199 There was multiple discussion during the PTG around this. I think at the end mdbooth and yawang found a possible solution that they liked and sounded OK to me too. Unfortunately the etherpad was not updated and my memory is bad enough that I cannot gather what was the exact proposal. yawang: could you please post a short summary here and / or simply update the spec with the discussed solution? Cheers, gibi From bxzhu_5355 at 163.com Mon Nov 11 03:33:54 2019 From: bxzhu_5355 at 163.com (Boxiang Zhu) Date: Mon, 11 Nov 2019 11:33:54 +0800 (GMT+08:00) Subject: [manila][kolla-ansible] Fail to use manila with cephfs NFS share backend Message-ID: <7f1c7f52.45d8.16e58866b9d.Coremail.bxzhu_5355@163.com> Hi everyone, I have deployed a OpenStack cluster with AIO mode by kolla-ansible. In my globals.yml, something is enabled as followed: ...... enable_ceph: "yes" enable_ceph_mds: "yes" enable_ceph_nfs: "yes" enable_manila: "yes" enable_manila_backend_cephfs_nfs: "yes" ....... And in my manila.conf, some configs are as followed: [DEFAULT] enabled_share_protocols = NFS,CIFS ...... [cephfsnfs1] driver_handles_share_servers = False share_backend_name = CEPHFSNFS1 share_driver = manila.share.drivers.cephfs.driver.CephFSDriver cephfs_protocol_helper_type = NFS cephfs_conf_path = /etc/ceph/ceph.conf cephfs_auth_id = manila cephfs_cluster_name = ceph cephfs_enable_snapshots = False cephfs_ganesha_server_is_remote = False cephfs_ganesha_server_ip = 172.16.60.84 ...... I use CLI to create the nfs, some commands as followed: -> manila type-create cephfsnfstype false -> manila type-key cephfsnfstype set vendor_name=Ceph storage_protocol=NFS -> manila create --share-type cephfsnfstype --name cephnfsshare1 nfs 1 -> manila share-export-location-list cephnfsshare1 +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ | ID | Path | Preferred | +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ | b101bf59-4cd7-4a09-a12e-b6dd48a5bb18 | 172.16.60.84:/volumes/_nogroup/93b1e23d-0166-41a4-a12a-51bf4c3654a5 | False | +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ -> manila access-allow cephnfsshare1 ip 172.16.60.119 But I have got some error messages from /var/lib/docker/volumes/kolla_logs/_data/manila/manila-share.log 2019-11-11 10:11:12.035 26 ERROR manila.share.drivers.ganesha.manager [req-673584f9-eaed-4464-819e-ace63dfa0f49 6f4bd93b18c546c880d5d4e526a0243e 70dba7786a8b4326a35e03d0ad8707f2 - - -] Error while executing management command on Ganesha node : dbus call exportmgr.AddExport.: ProcessExecutionError: Unexpected error while running command. 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server [req-673584f9-eaed-4464-819e-ace63dfa0f49 6f4bd93b18c546c880d5d4e526a0243e 70dba7786a8b4326a35e03d0ad8707f2 - - -] Exception during message handling: GaneshaCommandFailure: Ganesha management command failed. Command: sudo manila-rootwrap /etc/manila/rootwrap.conf dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/etc/ganesha/export.d/share-93b1e23d-0166-41a4-a12a-51bf4c3654a5.conf string:EXPORT(Export_Id=105) Exit code: 1 Stdout: u'' Stderr: u'Error org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was not provided by any .service files\n' 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/manager.py", line 187, in wrapped 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return f(self, *args, **kwargs) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/utils.py", line 568, in wrapper 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return func(self, *args, **kwargs) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/manager.py", line 3554, in update_access 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", line 283, in update_access_rules 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", line 322, in _update_access_rules 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", line 390, in _update_rules_through_share_driver 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/cephfs/driver.py", line 289, in update_access 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/ganesha/__init__.py", line 308, in update_access 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server self.ganesha.add_export(share['name'], confdict) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/ganesha/manager.py", line 491, in add_export 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server cmd=e.cmd) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server GaneshaCommandFailure: Ganesha management command failed. 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Command: sudo manila-rootwrap /etc/manila/rootwrap.conf dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/etc/ganesha/export.d/share-93b1e23d-0166-41a4-a12a-51bf4c3654a5.conf string:EXPORT(Export_Id=105) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Exit code: 1 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Stdout: u'' 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Stderr: u'Error org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was not provided by any .service files\n' 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server I found some possible solutions[0][1][2][3] to try to fix this, but failed. Any suggestions? [0] https://github.com/nfs-ganesha/nfs-ganesha/issues/483 [1] https://github.com/nfs-ganesha/nfs-ganesha/issues/219 [2] https://github.com/gluster/storhaug/issues/14 [3] https://sourceforge.net/p/nfs-ganesha/mailman/message/32227132/ Thanks, Boxiang -------------- next part -------------- An HTML attachment was scrubbed... URL: From wang.ya at 99cloud.net Mon Nov 11 07:45:12 2019 From: wang.ya at 99cloud.net (wang.ya) Date: Mon, 11 Nov 2019 15:45:12 +0800 Subject: [nova][ptg] Expose auto converge and post copy In-Reply-To: <5717FBD3-8373-4EAA-9EF4-5F3EFDE7B53F@99cloud.net> References: <1573402509.31166.3@est.tech> <5717FBD3-8373-4EAA-9EF4-5F3EFDE7B53F@99cloud.net> Message-ID: <75036C56-67F9-4C14-9ECD-BFF1DEAD006B@99cloud.net> Hi: Here is the spec [1]_ Because the exist spec [2]_ has gap with the agreement, so I rewrote a new spec. .. [1]: https://review.opendev.org/#/c/693655/ .. [2]: https://review.opendev.org/#/c/687199/ Best Regards On 2019/11/11, 10:43 AM, "wang.ya" wrote: Hi: * The original method(expose the auto converge/post copy in image properties/flavor extra specs) exposed features of hypervisor layer directly, and it affects scheduling. Therefore, it's not appropriate. * The new method is add a new parameter: "no-performance-impact". If the parameter set during live migrate, the libvirt driver will disable the auto converge and post copy functions. User can tag their instances as "no performance impact" in instance metadata or somewhere else, operator can check the tag to decide whether add the parameter before live migrate. I will write a new spec to describe these in detail :) Best Regards On 2019/11/11, 12:15 AM, "Balázs Gibizer" wrote: spec: https://review.opendev.org/#/c/687199 There was multiple discussion during the PTG around this. I think at the end mdbooth and yawang found a possible solution that they liked and sounded OK to me too. Unfortunately the etherpad was not updated and my memory is bad enough that I cannot gather what was the exact proposal. yawang: could you please post a short summary here and / or simply update the spec with the discussed solution? Cheers, gibi From radoslaw.piliszek at gmail.com Mon Nov 11 08:43:41 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 11 Nov 2019 09:43:41 +0100 Subject: [manila][kolla-ansible] Fail to use manila with cephfs NFS share backend In-Reply-To: <7f1c7f52.45d8.16e58866b9d.Coremail.bxzhu_5355@163.com> References: <7f1c7f52.45d8.16e58866b9d.Coremail.bxzhu_5355@163.com> Message-ID: Is this Stein on CentOS? Looks like a bug to me. Please report to: https://bugs.launchpad.net/kolla-ansible with all the details we ask for. Thanks. -yoctozepto pon., 11 lis 2019 o 04:45 Boxiang Zhu napisał(a): > > Hi everyone, > > I have deployed a OpenStack cluster with AIO mode by kolla-ansible. > In my globals.yml, something is enabled as followed: > ...... > enable_ceph: "yes" > enable_ceph_mds: "yes" > enable_ceph_nfs: "yes" > enable_manila: "yes" > enable_manila_backend_cephfs_nfs: "yes" > ....... > And in my manila.conf, some configs are as followed: > [DEFAULT] > enabled_share_protocols = NFS,CIFS > ...... > [cephfsnfs1] > driver_handles_share_servers = False > share_backend_name = CEPHFSNFS1 > share_driver = manila.share.drivers.cephfs.driver.CephFSDriver > cephfs_protocol_helper_type = NFS > cephfs_conf_path = /etc/ceph/ceph.conf > cephfs_auth_id = manila > cephfs_cluster_name = ceph > cephfs_enable_snapshots = False > cephfs_ganesha_server_is_remote = False > cephfs_ganesha_server_ip = 172.16.60.84 > ...... > I use CLI to create the nfs, some commands as followed: > -> manila type-create cephfsnfstype false > -> manila type-key cephfsnfstype set vendor_name=Ceph > storage_protocol=NFS > -> manila create --share-type cephfsnfstype --name cephnfsshare1 nfs 1 > -> manila share-export-location-list cephnfsshare1 > > +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ > | ID > | Path > | Preferred | > > +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ > | b101bf59-4cd7-4a09-a12e-b6dd48a5bb18 | 172.16.60.84:/volumes/_nogroup/93b1e23d-0166-41a4-a12a-51bf4c3654a5 > | False | > > +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ > -> manila access-allow cephnfsshare1 ip 172.16.60.119 > But I have got some error messages > from /var/lib/docker/volumes/kolla_logs/_data/manila/manila-share.log > 2019-11-11 10:11:12.035 26 ERROR manila.share.drivers.ganesha.manager > [req-673584f9-eaed-4464-819e-ace63dfa0f49 6f4bd93b18c546c880d5d4e526a0243e > 70dba7786a8b4326a35e03d0ad8707f2 - - -] Error while executing management > command on Ganesha node : dbus call exportmgr.AddExport.: > ProcessExecutionError: Unexpected error while running command. > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > [req-673584f9-eaed-4464-819e-ace63dfa0f49 6f4bd93b18c546c880d5d4e526a0243e > 70dba7786a8b4326a35e03d0ad8707f2 - - -] Exception during message handling: > GaneshaCommandFailure: Ganesha management command failed. > Command: sudo manila-rootwrap /etc/manila/rootwrap.conf dbus-send > --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr > org.ganesha.nfsd.exportmgr.AddExport > string:/etc/ganesha/export.d/share-93b1e23d-0166-41a4-a12a-51bf4c3654a5.conf > string:EXPORT(Export_Id=105) > Exit code: 1 > Stdout: u'' > Stderr: u'Error org.freedesktop.DBus.Error.ServiceUnknown: The name > org.ganesha.nfsd was not provided by any .service files\n' > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Traceback (most > recent call last): > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", > line 165, in _process_incoming > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server res = > self.dispatcher.dispatch(message) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", > line 274, in dispatch > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return > self._do_dispatch(endpoint, method, ctxt, args) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", > line 194, in _do_dispatch > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server result = > func(ctxt, **new_args) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/manager.py", > line 187, in wrapped > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return > f(self, *args, **kwargs) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/utils.py", line > 568, in wrapper > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return > func(self, *args, **kwargs) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/manager.py", > line 3554, in update_access > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > share_server=share_server) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", > line 283, in update_access_rules > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > share_server=share_server) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", > line 322, in _update_access_rules > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > share_server) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", > line 390, in _update_rules_through_share_driver > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > share_server=share_server > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/cephfs/driver.py", > line 289, in update_access > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > share_server=share_server) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/ganesha/__init__.py", > line 308, in update_access > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > self.ganesha.add_export(share['name'], confdict) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File > "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/ganesha/manager.py", > line 491, in add_export > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server cmd=e.cmd) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > GaneshaCommandFailure: Ganesha management command failed. > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Command: sudo > manila-rootwrap /etc/manila/rootwrap.conf dbus-send --print-reply --system > --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr > org.ganesha.nfsd.exportmgr.AddExport > string:/etc/ganesha/export.d/share-93b1e23d-0166-41a4-a12a-51bf4c3654a5.conf > string:EXPORT(Export_Id=105) > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Exit code: 1 > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Stdout: u'' > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Stderr: u'Error > org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was > not provided by any .service files\n' > 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server > > I found some possible solutions[0][1][2][3] to try to fix this, but > failed. > Any suggestions? > > [0] https://github.com/nfs-ganesha/nfs-ganesha/issues/483 > [1] https://github.com/nfs-ganesha/nfs-ganesha/issues/219 > [2] https://github.com/gluster/storhaug/issues/14 > [3] https://sourceforge.net/p/nfs-ganesha/mailman/message/32227132/ > > Thanks, > Boxiang > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bxzhu_5355 at 163.com Mon Nov 11 09:25:26 2019 From: bxzhu_5355 at 163.com (Boxiang Zhu) Date: Mon, 11 Nov 2019 17:25:26 +0800 (CST) Subject: =?UTF-8?Q?=E5=9B=9E=E5=A4=8D:Re:_[manila][kolla-ansible]_Fail_to_?= =?UTF-8?Q?use_manila_with_cephfs_NFS_share_backend?= In-Reply-To: References: <7f1c7f52.45d8.16e58866b9d.Coremail.bxzhu_5355@163.com> Message-ID: <64c11eb2.8a8d.16e59c8431f.Coremail.bxzhu_5355@163.com> hi yoctozepto, Here is the link https://bugs.launchpad.net/kolla-ansible/+bug/1852055 BTW, in fact, before I met the problem, I found anther issue from ceph-nfs.log. The ganesha can not connect to the /run/dbus/system_bus_socket, So I change the kolla-ansible/share/kolla-ansible/ansible/roles/ceph/tasks/start_nfss.yml, add `` - "/run/:/run/:shared" `` under volumes section. Thanks, Boxiang At 2019-11-11 16:43:41, "Radosław Piliszek" wrote: Is this Stein on CentOS? Looks like a bug to me. Please report to: https://bugs.launchpad.net/kolla-ansible with all the details we ask for. Thanks. -yoctozepto pon., 11 lis 2019 o 04:45 Boxiang Zhu napisał(a): Hi everyone, I have deployed a OpenStack cluster with AIO mode by kolla-ansible. In my globals.yml, something is enabled as followed: ...... enable_ceph: "yes" enable_ceph_mds: "yes" enable_ceph_nfs: "yes" enable_manila: "yes" enable_manila_backend_cephfs_nfs: "yes" ....... And in my manila.conf, some configs are as followed: [DEFAULT] enabled_share_protocols = NFS,CIFS ...... [cephfsnfs1] driver_handles_share_servers = False share_backend_name = CEPHFSNFS1 share_driver = manila.share.drivers.cephfs.driver.CephFSDriver cephfs_protocol_helper_type = NFS cephfs_conf_path = /etc/ceph/ceph.conf cephfs_auth_id = manila cephfs_cluster_name = ceph cephfs_enable_snapshots = False cephfs_ganesha_server_is_remote = False cephfs_ganesha_server_ip = 172.16.60.84 ...... I use CLI to create the nfs, some commands as followed: -> manila type-create cephfsnfstype false -> manila type-key cephfsnfstype set vendor_name=Ceph storage_protocol=NFS -> manila create --share-type cephfsnfstype --name cephnfsshare1 nfs 1 -> manila share-export-location-list cephnfsshare1 +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ | ID | Path | Preferred | +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ | b101bf59-4cd7-4a09-a12e-b6dd48a5bb18 | 172.16.60.84:/volumes/_nogroup/93b1e23d-0166-41a4-a12a-51bf4c3654a5 | False | +-----------------------------------------------+------------------------------------------------------------------------------------+-----------+ -> manila access-allow cephnfsshare1 ip 172.16.60.119 But I have got some error messages from /var/lib/docker/volumes/kolla_logs/_data/manila/manila-share.log 2019-11-11 10:11:12.035 26 ERROR manila.share.drivers.ganesha.manager [req-673584f9-eaed-4464-819e-ace63dfa0f49 6f4bd93b18c546c880d5d4e526a0243e 70dba7786a8b4326a35e03d0ad8707f2 - - -] Error while executing management command on Ganesha node : dbus call exportmgr.AddExport.: ProcessExecutionError: Unexpected error while running command. 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server [req-673584f9-eaed-4464-819e-ace63dfa0f49 6f4bd93b18c546c880d5d4e526a0243e 70dba7786a8b4326a35e03d0ad8707f2 - - -] Exception during message handling: GaneshaCommandFailure: Ganesha management command failed. Command: sudo manila-rootwrap /etc/manila/rootwrap.conf dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/etc/ganesha/export.d/share-93b1e23d-0166-41a4-a12a-51bf4c3654a5.conf string:EXPORT(Export_Id=105) Exit code: 1 Stdout: u'' Stderr: u'Error org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was not provided by any .service files\n' 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/manager.py", line 187, in wrapped 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return f(self, *args, **kwargs) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/utils.py", line 568, in wrapper 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server return func(self, *args, **kwargs) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/manager.py", line 3554, in update_access 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", line 283, in update_access_rules 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", line 322, in _update_access_rules 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/access.py", line 390, in _update_rules_through_share_driver 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/cephfs/driver.py", line 289, in update_access 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server share_server=share_server) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/ganesha/__init__.py", line 308, in update_access 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server self.ganesha.add_export(share['name'], confdict) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/manila/share/drivers/ganesha/manager.py", line 491, in add_export 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server cmd=e.cmd) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server GaneshaCommandFailure: Ganesha management command failed. 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Command: sudo manila-rootwrap /etc/manila/rootwrap.conf dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/etc/ganesha/export.d/share-93b1e23d-0166-41a4-a12a-51bf4c3654a5.conf string:EXPORT(Export_Id=105) 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Exit code: 1 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Stdout: u'' 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server Stderr: u'Error org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was not provided by any .service files\n' 2019-11-11 10:11:14.954 26 ERROR oslo_messaging.rpc.server I found some possible solutions[0][1][2][3] to try to fix this, but failed. Any suggestions? [0] https://github.com/nfs-ganesha/nfs-ganesha/issues/483 [1] https://github.com/nfs-ganesha/nfs-ganesha/issues/219 [2] https://github.com/gluster/storhaug/issues/14 [3] https://sourceforge.net/p/nfs-ganesha/mailman/message/32227132/ Thanks, Boxiang -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.sgaravatto at gmail.com Mon Nov 11 09:32:16 2019 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Mon, 11 Nov 2019 10:32:16 +0100 Subject: [ops][glance] Quota for max number of images/snapshots per projects Message-ID: As far as I can see it is not possible to set a quota for the maximum number of images-snapshots for a given project, at least in the Rocky release If I am not wrong it is only possible to set the max size of an image or the total size of space used for glance by a project, but these settings would be the same for all projects Are there plans to implement this capability ? Thanks, Massimo -------------- next part -------------- An HTML attachment was scrubbed... URL: From arne.wiebalck at cern.ch Mon Nov 11 10:19:59 2019 From: arne.wiebalck at cern.ch (Arne Wiebalck) Date: Mon, 11 Nov 2019 11:19:59 +0100 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> Message-ID: Hi Matt, On 10.11.19 22:07, Matt Riedemann wrote: > On 11/10/2019 10:44 AM, Balázs Gibizer wrote: >> On 3500 baremetal nodes _update_available_resource takes 1.5 hour. > > Why have a single nova-compute service manage this many nodes? Or even > 1000? > > Why not try to partition things a bit more reasonably like a normal cell > where you might have ~200 nodes per compute service host (I think CERN > keeps their cells to around 200 physical compute hosts for scaling)? > > That way you can also leverage the compute service hashring / failover > feature for HA? > > I realize the locking stuff is not great, but at what point is it > unreasonable to expect a single compute service to manage that many > nodes/instances? > I agree that using sharding and/or multiple cells to manage that many nodes is sensible. One reason we haven't done it yet is that we got away with this very simple setup so far ;) Sharding with and/or within cells will help to some degree (and we are actively looking into this as you probably know), but I think that should not stop us from checking if there are algorithmic improvements (e.g. when collecting the data), or if moving to a different locking granularity or even parallelising the update are feasible additional improvements. Cheers, Arne -- Arne Wiebalck CERN IT From huaqiang.wang at intel.com Mon Nov 11 11:58:27 2019 From: huaqiang.wang at intel.com (Wang, Huaqiang) Date: Mon, 11 Nov 2019 11:58:27 +0000 Subject: [nova][ptg] pinned and unpinned CPUs in one instance In-Reply-To: <1573196961.23158.1@est.tech> References: <1573196961.23158.1@est.tech> Message-ID: <77E9D723B6A15C4CB27F7C3F130DE8624776582B@shsmsx102.ccr.corp.intel.com> > -----Original Message----- > From: Balázs Gibizer > Sent: Friday, November 8, 2019 3:10 PM > To: openstack-discuss > Subject: [nova][ptg] pinned and unpinned CPUs in one instance > > spec: https://review.opendev.org/668656 > > Agreements from the PTG: > > How we will test it: > * do functional test with libvirt driver, like the pinned cpu tests we have > today > * donyd's CI supports nested virt so we can do pinned cpu testing but not > realtime. As this CI is still work in progress we should not block on this. > * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to > have > > Naming: use the 'shared' and 'dedicated' terminology > > Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra > specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax will > have less expression power until nova models NUMA in placement. So nova > will try to evenly distribute PCPUs between numa nodes. If it not possible we > reject the request and ask the user to use the > hw:pinvcpus=3 syntax. > > Realtime mask is an exclusion mask, any vcpus not listed there has to be in > the dedicated set of the instance. > > TODOInvestigate whether we want to enable NUMA by default > * Pros: Simpler, everything is NUMA by default > * Cons: We'll either have to break/make configurablethe 1:1 guest:host > NUMA mapping else we won't be able to boot e.g. a 40 core shared instance > on a 40 core, 2 NUMA node host For the case of 'booting a 40 core shared instance on 40 core 2NUMA node' that will not be covered by the new 'mixed' policy. It is just a legacy 'shared' instance with no assumption about instance NUMA topology. By the way if you want a 'shared' instance, with 40 cores, to be scheduled on a host of 40cores, 2 NUMA nodes, you also need to register all host cores as 'shared' cpus through 'conf.compute.cpu_shared_set'. For instance with 'mixed' policy, what I want to propose is the instance should demand at least one 'dedicated'(or PCPU) core. Thus, any 'mixed' instance or 'dedicated' instance will not be scheduled one this host due to no PCPU available on this host. And also, a 'mixed' instance should also demand at least one 'shared' (or VCPU) core. a 'mixed' instance demanding all cores from PCPU resource should be considered as an invalid one. And an instance demanding all cores from PCPU resource is just a legacy 'dedicated' instance, which CPU allocation policy is 'dedicated'. In conclusion, a instance with the policy of 'mixed' -. demands at least one 'dedicated' cpu and at least one 'shared' cpu. -. with NUMA topology by default due to requesting pinned cpu In my understanding the cons does not exist by making above rules. Br Huaqiang > > > Cheers, > gibi > > From huaqiang.wang at intel.com Mon Nov 11 12:45:30 2019 From: huaqiang.wang at intel.com (Wang, Huaqiang) Date: Mon, 11 Nov 2019 12:45:30 +0000 Subject: [nova][ptg] pinned and unpinned CPUs in one instance In-Reply-To: References: <1573196961.23158.1@est.tech> Message-ID: <77E9D723B6A15C4CB27F7C3F130DE862477658E0@shsmsx102.ccr.corp.intel.com> > -----Original Message----- > From: Sean Mooney > Sent: Friday, November 8, 2019 8:21 PM > To: Balázs Gibizer ; openstack-discuss discuss at lists.openstack.org> > Subject: Re: [nova][ptg] pinned and unpinned CPUs in one instance > > On Fri, 2019-11-08 at 07:09 +0000, Balázs Gibizer wrote: > > spec: https://review.opendev.org/668656 > > > > Agreements from the PTG: > > > > How we will test it: > > * do functional test with libvirt driver, like the pinned cpu tests we > > have today > > * donyd's CI supports nested virt so we can do pinned cpu testing but > > not realtime. As this CI is still work in progress we should not block > > on this. > we can do realtime testing in that ci. > i already did. also there is a new label that is available across 3 providers so > we wont just be relying on donyd's good work. > > > * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to > > have > > > > Naming: use the 'shared' and 'dedicated' terminology > didn't we want to have a hw:cpu_policy=mixed specificaly for this case? > > > > Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra > > specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax > > will have less expression power until nova models NUMA in placement. > > So nova will try to evenly distribute PCPUs between numa nodes. If it > > not possible we reject the request and ask the user to use the > > hw:pinvcpus=3 syntax. > > > > Realtime mask is an exclusion mask, any vcpus not listed there has to > > be in the dedicated set of the instance. > > > > TODOInvestigate whether we want to enable NUMA by default > > * Pros: Simpler, everything is NUMA by default > > * Cons: We'll either have to break/make configurablethe 1:1 guest:host > in the context of mix if we dont enable numa affinity by default we should > remove that behavior from all case where we do it today. > > NUMA mapping else we won't be able to boot e.g. a 40 core shared > > instance on a 40 core, 2 NUMA node host Hi gabi or sean, To help me to understand the issue under discussion, if I change the instance requirement a little bit to: -. an instance demanding 1 dedicated core and 39 shared cores -. instance vcpu allocation ratio is 1 -. host has 2 NUMA nodes and 40 cores in total -. 39 of 40 cores are registered as VCPU resource the 1 core is registered as PCPU It will raise the same problem, right? because it hopes the instance to be scheduled on the host. > if this is a larger question of if we should have all instance be numa by > default i have argued yes for quite a while as i think having 1 code path has > many advantages. that said im aware of this limitation. > one way to solve this was the use of the proposed can_split placmenent > paramter. so if you did not specify a numa toplogy we would add > can_split=vCPUs and then create a singel or multiple numa node toplogy > based on the allcoations. if we combine that with a allocation weigher we > could sort the allocation candiates by smallest number of numa nodes so we > would prefer landing on hosts that can fit it on 1 numa node. > its a big change but long overdue. > I have read the 'can_split' spec, it will help if I understand the issue correctly. Then I agree with Sean that it is another issue that is not belong to spec 668656. > that said i have also argued the other point too in responce to pushback on > "all vms have numa of 1 unless you say otherwise" i.e. that the 1:1 between > mapping virtual and host numa nodes shoudl be configurable and is not > required by the api today. the backwards compatible way to do that is its not > requried by default if you are using shared cores and is required if you are > using pinned but that is a littel confusing. > > i dont really know what the right answer to this is but i think its a seperate > question form the topic of this thread. > we dont need to solve this to enable pinned and unpinned cpus in one > instance but we do need to adress this before we can model numa in > placment. > > > > > > > Cheers, > > gibi > > > > > > > From cdent+os at anticdent.org Mon Nov 11 13:03:54 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Mon, 11 Nov 2019 13:03:54 +0000 (GMT) Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> Message-ID: On Sun, 10 Nov 2019, Matt Riedemann wrote: > On 11/10/2019 10:44 AM, Balázs Gibizer wrote: >> On 3500 baremetal nodes _update_available_resource takes 1.5 hour. > > Why have a single nova-compute service manage this many nodes? Or even 1000? > > Why not try to partition things a bit more reasonably like a normal cell > where you might have ~200 nodes per compute service host (I think CERN keeps > their cells to around 200 physical compute hosts for scaling)? Without commenting on the efficacy of doing things this way, I can report that 1000 (or even 3500) instances (not nodes) is a thing that can happen in some openstack + vsphere setups and tends to exercise some of the same architectural problems that a lots-of- ironic (nodes) setup encounters. As far as I can tell the root architecture problem is: a) there are lots loops b) there is an expectation that those loops will have a small number of iterations (b) is generally true for a run of the mill KVM setup, but not otherwise. (b) not being true in other contexts creates an impedance mismatch that is hard to overcome without doing at least one of the two things suggested elsewhere in this thread: 1. manage fewer pieces per nova-compute (Matt) 2. "algorithmic improvement" (Arne) On 2, I wonder if there's been any exploration of using something like a circular queue and time-bounding the periodic jobs? Or using separate processes? For the ironic and vsphere contexts, increased CPU usage by the nova-compute process does not impact on the workload resources, so parallization is likely a good option. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From gmann at ghanshyammann.com Mon Nov 11 13:04:56 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 11 Nov 2019 21:04:56 +0800 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: <20191110104424.GA6424@sm-workstation> References: <20191110104424.GA6424@sm-workstation> Message-ID: <16e5a91380c.cce9f68a100336.2769893163805162012@ghanshyammann.com> ---- On Sun, 10 Nov 2019 18:44:24 +0800 Sean McGinnis wrote ---- > > > > Sean and everyone else, > > > > Pardon me, but I have to rant here... :) > > Please try see things from a downstream consumer point of view. > > > > This isn't the Python 2.7 era anymore, where we had a stable python for > > like forever. OpenStack needs to move quicker to newer Python 3 > > versions, especially considering that Python 2.7 isn't an option for > > anyone anymore. While your proposal (ie: less jobs on Python 3.8) looks > > like a nice first approach, it is my strong believe that the project > > should quickly move to voting and full Python 3.8 testing, and > > preferably, have it in order, with functional testing, for the whole > > project, by the time Ussuri is released. > > > > We've had this debate many times now. Nothing has changed the fact that we > cannot make something an official runtime until there is an official distro out > there with that as the runtime. There is not for Ussuri. ++. We cannot add for Ussuri at this stage. We can go with the below plan: - Start an experimental unit tests (functional and integration are next step) job first and projects can slowly start fixing (if any failure) those based on their bandwidth. - Once job pass then make it periodic or n-v to capture the new code checking on py3.8. - Same process for functional jobs. - Integration jobs are not required to be duplicated. they can be moved to the latest py version later. In future release, we iterate the results of jobs and discuss to be add in testing run time mplate. -gmann > > > I know what's going to happen: I'll tell about a bug in Python 3.8 on > > IRC, and someone will reply to me "hey, that's not supported by > > OpenStack before the V release, please go away", even though as > > downstream distribution package maintainer, we don't have the power to > > decide what version of Python our distribution runs on (ie: both Debian > > Sid, Ubuntu and Fedora are quickly moving targets). > > > > I very highly doubt that, and very much disagree that someone will say to go > away. From what I have seen, the majority of the community is very responsive > to issues raised about future version problems. > > Fixing and working with py3.8 is not what is being discussed here. Only whether > those jobs to validate py3.8 should run on every patch or not. > > > There's absolutely no excuse for the OpenStack project to be dragging > > its feet, apart maybe the fact that it may not be easy to setup the > > infra to run tests on Py3.8 just yet. It isn't a normal situation that > > downstream distributions get the shit (pardon my french) and get to be > > the ones fixing issues and proposing patches (Corey, you've done awesome > > job on this...), yet it's been the case for nearly every single Python 3 > > releases. I very much would appreciate this situation to be fixed, and > > the project moving faster. > > > > Cheers, > > > > Thomas Goirand (zigo) > > > > From smooney at redhat.com Mon Nov 11 13:15:24 2019 From: smooney at redhat.com (Sean Mooney) Date: Mon, 11 Nov 2019 13:15:24 +0000 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: <20191110104424.GA6424@sm-workstation> References: <20191110104424.GA6424@sm-workstation> Message-ID: <844804a4bee95e2dac2d69746e817c9678eae9d9.camel@redhat.com> On Sun, 2019-11-10 at 04:44 -0600, Sean McGinnis wrote: > > > > Sean and everyone else, > > > > Pardon me, but I have to rant here... :) > > Please try see things from a downstream consumer point of view. > > > > This isn't the Python 2.7 era anymore, where we had a stable python for > > like forever. OpenStack needs to move quicker to newer Python 3 > > versions, especially considering that Python 2.7 isn't an option for > > anyone anymore. While your proposal (ie: less jobs on Python 3.8) looks > > like a nice first approach, it is my strong believe that the project > > should quickly move to voting and full Python 3.8 testing, and > > preferably, have it in order, with functional testing, for the whole > > project, by the time Ussuri is released. > > > > We've had this debate many times now. Nothing has changed the fact that we > cannot make something an official runtime until there is an official distro out > there with that as the runtime. There is not for Ussuri. > > > I know what's going to happen: I'll tell about a bug in Python 3.8 on > > IRC, and someone will reply to me "hey, that's not supported by > > OpenStack before the V release, please go away", even though as > > downstream distribution package maintainer, we don't have the power to > > decide what version of Python our distribution runs on (ie: both Debian > > Sid, Ubuntu and Fedora are quickly moving targets). > > > > I very highly doubt that, and very much disagree that someone will say to go > away. From what I have seen, the majority of the community is very responsive > to issues raised about future version problems. > > Fixing and working with py3.8 is not what is being discussed here. Only whether > those jobs to validate py3.8 should run on every patch or not. i think the only push back you would get is if fixing py38 compatibility would break py27 or py36 support. py27 is less of a concern at this point although i know some project might support it longer then required. The point being if we had to choose between a supported python and a newer python that we dont yet support we would prefer the supported version but generaly there is a way to support all the version we care about. it just means we can use the py38 only features until that becomes our minium supported version. i think periodic jobs make more sense personally then experimental. depending on the velocity of the project there may be little different between a non voting check job and a peroidc at least for the smaller project. for larger project like nova a periodic would give more coverage then experimental and would use alot less resources but having a periodic job is only useful i someone checks it so im not sure if adding it to the default template makes sesne. i would suggest we create a python-runtime-next template that add a py38 unit test job to that that adds the job to the periodic and experimental piplines. project that will actually check the periodic jobs in there weekly meeting like neutorn can opt in those that wont dont need to add the template. > > > There's absolutely no excuse for the OpenStack project to be dragging > > its feet, apart maybe the fact that it may not be easy to setup the > > infra to run tests on Py3.8 just yet. It isn't a normal situation that > > downstream distributions get the shit (pardon my french) and get to be > > the ones fixing issues and proposing patches (Corey, you've done awesome > > job on this...), yet it's been the case for nearly every single Python 3 > > releases. I very much would appreciate this situation to be fixed, and > > the project moving faster. > > > > Cheers, > > > > Thomas Goirand (zigo) > > > > From geguileo at redhat.com Mon Nov 11 13:48:40 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Mon, 11 Nov 2019 14:48:40 +0100 Subject: Change Volume Type, but in use In-Reply-To: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> References: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> Message-ID: <20191111134840.bmgjlncfiwaerrqg@localhost> On 07/11, Sinan Polat wrote: > Hi, > > I am using Ceph as the backend for Cinder. Within Ceph we have defined 2 RBD > pools (ssdvolumes, sasvolumes). > In OpenStack I created 2 Volume Types (ssd, sas). Each Volume Type has property > "volume_backend_name='tripleo_ceph_'". > > In the Cinder configuration I have the following backends configured: > > [tripleo_ceph_ssd] > backend_host=hostgroup > volume_backend_name=tripleo_ceph_ssd > volume_driver=cinder.volume.drivers.rbd.RBDDriver > rbd_ceph_conf=/etc/ceph/ceph.conf > rbd_user=openstack > rbd_pool=ssdvolumes > > [tripleo_ceph_sas] > backend_host=hostgroup > volume_backend_name=tripleo_ceph_sas > volume_driver=cinder.volume.drivers.rbd.RBDDriver > rbd_ceph_conf=/etc/ceph/ceph.conf > rbd_user=openstack > rbd_pool=sasvolumes > > As you might have noticed, the backend name (tripleo_ceph_ssd) and the RBD pool > name (ssdvolumes, not ssd) does not match. So far, we do not have any problems. > But I want to correct the names and I do not want to have the mismatch anymore. > > So I want to change the value of key volume_backend_name for both Volume Types > (tripleo_ceph_ssd => tripleo_ceph_ssdvolumes). > Hi, I agree with Sean, I wouldn't change it since this is only aesthetic. Having said that, there's always a way to do most things, even if it's NOT RECOMMENDED: - Update cinder.conf - Get the volume type id for the 2 volume types to change - Stop cinder services - Go into the DB and manually update the volume types changes in the "volume_type_extra_specs" table filtering by the volume_type_id and the key "volume_backend_name" and setting the new "value". - Use the "cinder-manage volume update_host" to update existing volumes to the new backend (you could also do this directly in the DB). - Start cinder services - Remove the old service from the DB (they appear as down now) using the "cinder-manage service remove" command. Cinder-manage docs: https://docs.openstack.org/cinder/latest/cli/cinder-manage.html Regards, Gorka. > I tried the following: > $ openstack volume type show 80cb25ff-376a-4483-b4f7-d8c75839e0ce > +--------------------+----------------------------------------+ > | Field | Value | > +--------------------+----------------------------------------+ > | access_project_ids | None | > | description | | > | id | 80cb25ff-376a-4483-b4f7-d8c75839e0ce | > | is_public | True | > | name | ssd | > | properties | volume_backend_name='tripleo_ceph_ssd' | > | qos_specs_id | None | > +--------------------+----------------------------------------+ > $ > > > $ openstack volume type set --property > volume_backend_name='tripleo_ceph_ssdvolumes' > 80cb25ff-376a-4483-b4f7-d8c75839e0ce > Failed to set volume type property: Volume Type is currently in use. (HTTP 400) > (Request-ID: req-5efaa5b7-910f-4802-8494-3115cfc4ab93) > Command Failed: One or more of the operations failed > $ > > How to solve my problem? > > Thanks! > > Sinan From sinan at turka.nl Mon Nov 11 13:56:26 2019 From: sinan at turka.nl (Sinan Polat) Date: Mon, 11 Nov 2019 14:56:26 +0100 Subject: Change Volume Type, but in use In-Reply-To: <20191111134840.bmgjlncfiwaerrqg@localhost> References: <173780005.559005.5146f0b9-c9fe-4380-97e7-9b1b340a8477.open-xchange@webmail.pcextreme.nl> <20191111134840.bmgjlncfiwaerrqg@localhost> Message-ID: <8C45CFCD-3903-46FB-9691-3E7257061975@turka.nl> Hi, Sorry for not being totally clear. The cluster is managed by TripleO. After each deployment/update, the cinder configuration is updated with incorrect names. Currently we correct it manually in the cinder configuration. So it is not only aesthetic. Sinan > Op 11 nov. 2019 om 14:48 heeft Gorka Eguileor het volgende geschreven: > >> On 07/11, Sinan Polat wrote: >> Hi, >> >> I am using Ceph as the backend for Cinder. Within Ceph we have defined 2 RBD >> pools (ssdvolumes, sasvolumes). >> In OpenStack I created 2 Volume Types (ssd, sas). Each Volume Type has property >> "volume_backend_name='tripleo_ceph_'". >> >> In the Cinder configuration I have the following backends configured: >> >> [tripleo_ceph_ssd] >> backend_host=hostgroup >> volume_backend_name=tripleo_ceph_ssd >> volume_driver=cinder.volume.drivers.rbd.RBDDriver >> rbd_ceph_conf=/etc/ceph/ceph.conf >> rbd_user=openstack >> rbd_pool=ssdvolumes >> >> [tripleo_ceph_sas] >> backend_host=hostgroup >> volume_backend_name=tripleo_ceph_sas >> volume_driver=cinder.volume.drivers.rbd.RBDDriver >> rbd_ceph_conf=/etc/ceph/ceph.conf >> rbd_user=openstack >> rbd_pool=sasvolumes >> >> As you might have noticed, the backend name (tripleo_ceph_ssd) and the RBD pool >> name (ssdvolumes, not ssd) does not match. So far, we do not have any problems. >> But I want to correct the names and I do not want to have the mismatch anymore. >> >> So I want to change the value of key volume_backend_name for both Volume Types >> (tripleo_ceph_ssd => tripleo_ceph_ssdvolumes). >> > > Hi, > > I agree with Sean, I wouldn't change it since this is only aesthetic. > > Having said that, there's always a way to do most things, even if it's > NOT RECOMMENDED: > > - Update cinder.conf > - Get the volume type id for the 2 volume types to change > - Stop cinder services > - Go into the DB and manually update the volume types changes in the > "volume_type_extra_specs" table filtering by the volume_type_id and > the key "volume_backend_name" and setting the new "value". > - Use the "cinder-manage volume update_host" to update existing volumes > to the new backend (you could also do this directly in the DB). > - Start cinder services > - Remove the old service from the DB (they appear as down now) using the > "cinder-manage service remove" command. > > Cinder-manage docs: > https://docs.openstack.org/cinder/latest/cli/cinder-manage.html > > Regards, > Gorka. > >> I tried the following: >> $ openstack volume type show 80cb25ff-376a-4483-b4f7-d8c75839e0ce >> +--------------------+----------------------------------------+ >> | Field | Value | >> +--------------------+----------------------------------------+ >> | access_project_ids | None | >> | description | | >> | id | 80cb25ff-376a-4483-b4f7-d8c75839e0ce | >> | is_public | True | >> | name | ssd | >> | properties | volume_backend_name='tripleo_ceph_ssd' | >> | qos_specs_id | None | >> +--------------------+----------------------------------------+ >> $ >> >> >> $ openstack volume type set --property >> volume_backend_name='tripleo_ceph_ssdvolumes' >> 80cb25ff-376a-4483-b4f7-d8c75839e0ce >> Failed to set volume type property: Volume Type is currently in use. (HTTP 400) >> (Request-ID: req-5efaa5b7-910f-4802-8494-3115cfc4ab93) >> Command Failed: One or more of the operations failed >> $ >> >> How to solve my problem? >> >> Thanks! >> >> Sinan > > From geguileo at redhat.com Mon Nov 11 14:00:16 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Mon, 11 Nov 2019 15:00:16 +0100 Subject: [cinder] consistency group not working In-Reply-To: <7adf0a5d-43b3-c606-2ba8-00d97b96cbdc@everyware.ch> References: <7adf0a5d-43b3-c606-2ba8-00d97b96cbdc@everyware.ch> Message-ID: <20191111140016.qyftq5iy27ekmdtj@localhost> On 27/09, Francois Scheurer wrote: > Dear Cinder Experts > > > We are running the rocky release. > > |We can create a consistency group: openstack consistency group create > --volume-type b9f67298-cf68-4cb2-bed2-c806c5f83487 fsc-consgroup Bug 1: but > adding volumes is not working: openstack consistency group add volume > c3f49ef0-601e-4558-a75a-9b758304ce3b b48752e3-641f-4a49-a892-6cb54ab6b74d > c0022411-59a4-4c7c-9474-c7ea8ccc7691 0f4c6493-dbe2-4f75-8e37-5541a267e3f2 => > Invalid volume: Volume is not local to this node. (HTTP 400) (Request-ID: > req-7f67934a-5835-40ef-b25c-12591fd79f85) Bug 2: deleting consistency group > is also not working (silently failing): openstack consistency group delete > c3f49ef0-601e-4558-a75a-9b758304ce3b |||=> AttributeError: 'RBDDriver' > object has no attribute 'delete_consistencygroup'| See details below. Using > the --force option makes no difference and the consistency group is not > deleted. Do you think this is a bug or a configuration issue? Thank you in > advance. | > > Cheers > > Francois Hi, It seems you are trying to use consistency groups with the RBD driver, which doesn't currently support consistency groups. Cheers, Gorka. > > |Details: ==> > /var/lib/docker/volumes/kolla_logs/_data/cinder/cinder-api-access.log <== > 10.0.129.17 - - [27/Sep/2019:12:16:24 +0200] "POST /v3/f099965b37ac41489e9cac8c9d208711/consistencygroups/3706bbab-e2df-4507-9168-08ef811e452c/delete > HTTP/1.1" 202 - 109720 "-" "python-cinderclient" ==> > /var/lib/docker/volumes/kolla_logs/_data/cinder/cinder-volume.log <== > 2019-09-27 12:16:24.491 30 ERROR oslo_messaging.rpc.server > [req-9010336e-d569-47ad-84e2-8dd8b729939c b141574ee71f49a0b53a05ae968576c5 > f099965b37ac41489e9cac8c9d208711 - default default] Exception during message > handling: AttributeError: 'RBDDriver' object has no attribute > 'delete_consistencygroup' 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server Traceback (most recent call last): 2019-09-27 > 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", > line 163, in _process_incoming 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-09-27 > 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", > line 265, in dispatch 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, > args) 2019-09-27 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", > line 194, in _do_dispatch 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-09-27 > 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/osprofiler/profiler.py", > line 159, in wrapper 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server result = f(*args, **kwargs) 2019-09-27 > 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/cinder/volume/manager.py", > line 3397, in delete_group 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server vol_obj.save() 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", > line 220, in __exit__ 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server self.force_reraise() 2019-09-27 12:16:24.491 30 > ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", > line 196, in force_reraise 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) > 2019-09-27 12:16:24.491 30 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/cinder/volume/manager.py", > line 3362, in delete_group 2019-09-27 12:16:24.491 30 ERROR > oslo_messaging.rpc.server self.driver.delete_consistencygroup(context, cg, > 2019-09-27 12:16:24.491 30 ERROR oslo_messaging.rpc.server AttributeError: > 'RBDDriver' object has no attribute 'delete_consistencygroup' 2019-09-27 > 12:16:24.491 30 ERROR oslo_messaging.rpc.server| > > > > > -- > > > EveryWare AG > François Scheurer > Senior Systems Engineer > Zurlindenstrasse 52a > CH-8003 Zürich > > tel: +41 44 466 60 00 > fax: +41 44 466 60 10 > mail: francois.scheurer at everyware.ch > web: http://www.everyware.ch > From dms at danplanet.com Mon Nov 11 14:53:06 2019 From: dms at danplanet.com (Dan Smith) Date: Mon, 11 Nov 2019 06:53:06 -0800 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: (Arne Wiebalck's message of "Mon, 11 Nov 2019 11:19:59 +0100") References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> Message-ID: > Sharding with and/or within cells will help to some degree (and we are > actively looking into this as you probably know), but I think that > should not stop us from checking if there are algorithmic improvements > (e.g. when collecting the data), or if moving to a different locking > granularity or even parallelising the update are feasible additional > improvements. All of that code was designed around one node per compute host. In the ironic case it was expanded (hacked) to support N where N is not huge. Giving it a huge number, and using a driver where nodes go into maintenance/cleaning for long periods of time is asking for trouble. Given there is only one case where N can legitimately be greater than one, I'm really hesitant to back a proposal to redesign it for large values of N. Perhaps we as a team just need to document what sane, tested, and expected-to-work values for N are? --Dan From tidwellrdev at gmail.com Mon Nov 11 15:31:53 2019 From: tidwellrdev at gmail.com (Ryan Tidwell) Date: Mon, 11 Nov 2019 09:31:53 -0600 Subject: [neutron] Bug Deputy Report Nov. 4-11 Message-ID: Hello neutrinos, here is the bug deputy report for the week of Nov. 4th: High: * https://bugs.launchpad.net/neutron/+bug/1851659 "removing a network from a DHCP agent removes L3 rules even if it shouldn't" This was found on stable/rocky, we should obviously see if it can be reproduced on master, stein, and train as well. There does seem to be a workaround, but VM's lost connectivity for a brief period. Medium: * https://bugs.launchpad.net/neutron/+bug/1851500 ""test_show_port_chain" conflicts with Flow Classifier" This is a gate issue, with what appears to be some instability with a test. Initial triage indicates it's an intermittent issue. * https://bugs.launchpad.net/neutron/+bug/1851194 "FWaaSv2 configures iptables with invalid port name" This similar to https://bugs.launchpad.net/neutron/+bug/1798577, FWaaSv2 seems to be referencing the wrong port names. RFE: * https://bugs.launchpad.net/neutron/+bug/1851609 Add an option for graceful l3 agent shutdown -Ryan Tidwell -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack-dev at storpool.com Mon Nov 11 16:06:21 2019 From: openstack-dev at storpool.com (Peter Penchev) Date: Mon, 11 Nov 2019 18:06:21 +0200 Subject: [nova][cinder] Volume-backed instance disks and some operations that do not support those yet Message-ID: Hi, First of all, thanks to everyone involved for all the work on Nova, Cinder, os-brick, and actually all the rest of OpenStack, too! So, yeah, I guess it is kind of weird that I'm asking this on the list just a couple of days after the PTG where I could have asked in person, but here goes :) There seem to still be some quirks with Nova and volume-backed instance disks; some actions on instances are not allowed, others produce somewhat weird results. From a quick look at the code it seems to me that currently these are: - taking a snapshot of an instance (produces a zero-sized file, no real data backed up) - backing an instance up (refuses outright) - rescuing an instance (refuses outright) ...and maybe there are some that I've missed. So, possibly stupid question here, but what are the project's plans about these - is there an intention to implement them at some point, or are there some very, very hard theoreitcal or practical problems (so something like "guess not for the present"), or is somebody working on something? The main reason that I am asking is that we, StorPool, have a shared-storage Cinder driver, and every now and then a customer comes up and asks about one or more of these actions. Every now and then we come back to the idea of writing a vendor-specific Nova image backend, but, first off, we are not really sure whether we want to do this, and second, we are not really sure whether it will be accepted upstream. A couple of years ago people told us "don't do that" and there was some talk about having an image backend for storage drivers supported by libvirt, but that effort seems to have stalled. Of course, we know that in all software projects, including, but certainly not limited to, the more-or-less volunteer free/libre/open-source projects, there are many tasks and many demands on the developers so that it is only natural that not everything is implemented or adapted at once; things happen, priorities shift, people get redirected, nobody else steps up to continue - it happens. With this in mind, where do things stand right now, should we consider writing an image backend, are there other options or plans? So thanks for reading through my ramblings, I guess, and keep up the great work! Best regards, Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From dms at danplanet.com Mon Nov 11 16:32:35 2019 From: dms at danplanet.com (Dan Smith) Date: Mon, 11 Nov 2019 08:32:35 -0800 Subject: [nova][cinder] Volume-backed instance disks and some operations that do not support those yet In-Reply-To: (Peter Penchev's message of "Mon, 11 Nov 2019 18:06:21 +0200") References: Message-ID: > With this in mind, where do things stand right now, should we consider > writing an image backend, are there other options or plans? I don't think you should, no. The image backend code is messy and problematic for a lot of reasons, and building on what we have there is a path to madness I think. Rewriting it is no small feat, and I think that if we did we'd want to do so in such a way that makes use of cinder for anything other than local disk. That's a really nice ideal, but it's a huge amount of work to do (and review) and also unlikely to ever actually happen. We can do a lot better by reducing the feature gap with volume-backed instances. Implementing the features that aren't supported, and improving the ones that are *weird* when used on a volume-backed instance. These would be much smaller changes, easier to review, easier to gain acceptance for, etc. Personally, if you want to do some work in this area, I'd recommend picking a weird behavior and trying to propose an improvement to it. --Dan From melwittt at gmail.com Mon Nov 11 16:50:48 2019 From: melwittt at gmail.com (melanie witt) Date: Mon, 11 Nov 2019 08:50:48 -0800 Subject: [nova][ptg] drop the shadow table concept In-Reply-To: References: <1573403387.31166.6@est.tech> Message-ID: On 11/10/19 12:41, Matt Riedemann wrote: > On 11/10/2019 10:29 AM, Balázs Gibizer wrote: >> * Also there could be two cells running this command at the same time >> fighting for the API db lock, > > In Train the --all-cells option was added to the CLI so that should > resolve this issue. I think Mel said she backported those changes > internally so I'm not sure how hard it would be for those to go back to > Stein or Rocky or whatever release CERN is using now. That's correct, I backported --all-cells [1][2][3][4] to Stein, Rocky, and Queens downstream. I found it not to be easy but YMMV. The primary conflicts in Stein were with --before, so I went ahead and brought those patches back as well [5][6][7] since we also needed --before to help people avoid the "orphaned virt guests if archive runs while nova-compute is down" problem. Same deal for Rocky. And finally with Queens, there's an additional conflict around deleting instance group members [8], so I also brought that back because it's related to all of the database cleanup issues that support has repeatedly faced with customers. Hope this helps anyone considering backporting --all-cells. Cheers, -melanie [1] https://review.opendev.org/675218 [2] https://review.opendev.org/675209 [3] https://review.opendev.org/675205 [4] https://review.opendev.org/507486 [5] https://review.opendev.org/661289 [6] https://review.opendev.org/556751 [7] https://review.opendev.org/643779 [8] https://review.opendev.org/598953 From doka.ua at gmx.com Mon Nov 11 17:06:43 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Mon, 11 Nov 2019 19:06:43 +0200 Subject: [Neutron] OVS forwarding issues Message-ID: Dear colleagues, just faced an issue with Openvswitch, which looks strange for me. The problem is that any particular VM receives a lot of packets, which are unicasted: - from other VMs which reside on the same host (let's name them "local VMs") - to other VMs which reside on other hosts (let's name them "remote VMs") Long output from "ovs-ofctl dump-flows br-int" which, as far as I can narrow, ends there: # ovs-ofctl dump-flows br-int |grep " table=94," |egrep "n_packets=[123456789]"  cookie=0xaf6b1435fe826bdf, duration=2952350.695s, table=94, n_packets=291494723, n_bytes=40582103074, idle_age=0, hard_age=65534, priority=1 actions=NORMAL coming to normal processing (classic MAC learning). Looking into br-int MAC-table (ovs-appctl fdb/show br-int) shows, that there are really no MAC addresses of remote VMs and br-int behaves in the right way, flooding unknown unicast to all ports in this L2 segment. Of course, there is br-tun which connected over vxlan to all other hosts and to br-int:     Bridge br-tun         Controller "tcp:127.0.0.1:6633"             is_connected: true         fail_mode: secure         Port "vxlan-0a960008"             Interface "vxlan-0a960008"                 type: vxlan                 options: {df_default="true", in_key=flow, local_ip="10.150.0.5", out_key=flow, remote_ip="10.150.0.8"}         [ ... ]         Port br-tun             Interface br-tun                 type: internal         Port patch-int             Interface patch-int                 type: patch                 options: {peer=patch-tun} but MAC table on br-tun is empty as well: # ovs-appctl fdb/show br-tun  port  VLAN  MAC                Age # Finally, packets get to destination, while being copied to all ports on source host, which is serious security issue. I do not think so conceived by design, I rather think we missed something in configuration. Can anybody point me where we're wrong and help with this issue? We're using Openstack Rocky and OVS 2.10.0 under Ubuntu 16.04. Network configuration is: @controller: # cat /etc/neutron/plugins/ml2/ml2_conf.ini |egrep -v "^$|^#" [DEFAULT] verbose = true [ml2] type_drivers = flat,vxlan tenant_network_types = vxlan mechanism_drivers = l2population,openvswitch extension_drivers = port_security,qos,dns_domain_ports [ml2_type_flat] flat_networks = provider [ml2_type_geneve] [ml2_type_gre] [ml2_type_vlan] [ml2_type_vxlan] vni_ranges = 400:400000 [securitygroup] firewall_driver = openvswitch enable_security_group = true enable_ipset = true @agent: # cat /etc/neutron/plugins/ml2/openvswitch_agent.ini |egrep -v "^$|^#" [DEFAULT] verbose = true [agent] tunnel_types = vxlan l2_population = true arp_responder = true extensions = qos [ovs] local_ip = 10.150.0.5 bridge_mappings = provider:br-ex [securitygroup] firewall_driver = openvswitch enable_security_group = true enable_ipset = true [xenapi] Thank you. -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.denton at rackspace.com Mon Nov 11 17:38:24 2019 From: james.denton at rackspace.com (James Denton) Date: Mon, 11 Nov 2019 17:38:24 +0000 Subject: [Neutron] OVS forwarding issues In-Reply-To: References: Message-ID: Hi, This is a known issue with the openvswitch firewall[1]. > firewall_driver = openvswitch I recommend running iptables_hybrid until that is resolved. [1] https://bugs.launchpad.net/neutron/+bug/1732067 James Denton Network Engineer Rackspace Private Cloud james.denton at rackspace.com From: Volodymyr Litovka Date: Monday, November 11, 2019 at 12:10 PM To: "openstack-discuss at lists.openstack.org" Cc: "doka.ua at gmx.com" Subject: [Neutron] OVS forwarding issues CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Dear colleagues, just faced an issue with Openvswitch, which looks strange for me. The problem is that any particular VM receives a lot of packets, which are unicasted: - from other VMs which reside on the same host (let's name them "local VMs") - to other VMs which reside on other hosts (let's name them "remote VMs") Long output from "ovs-ofctl dump-flows br-int" which, as far as I can narrow, ends there: # ovs-ofctl dump-flows br-int |grep " table=94," |egrep "n_packets=[123456789]" cookie=0xaf6b1435fe826bdf, duration=2952350.695s, table=94, n_packets=291494723, n_bytes=40582103074, idle_age=0, hard_age=65534, priority=1 actions=NORMAL coming to normal processing (classic MAC learning). Looking into br-int MAC-table (ovs-appctl fdb/show br-int) shows, that there are really no MAC addresses of remote VMs and br-int behaves in the right way, flooding unknown unicast to all ports in this L2 segment. Of course, there is br-tun which connected over vxlan to all other hosts and to br-int: Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "vxlan-0a960008" Interface "vxlan-0a960008" type: vxlan options: {df_default="true", in_key=flow, local_ip="10.150.0.5", out_key=flow, remote_ip="10.150.0.8"} [ ... ] Port br-tun Interface br-tun type: internal Port patch-int Interface patch-int type: patch options: {peer=patch-tun} but MAC table on br-tun is empty as well: # ovs-appctl fdb/show br-tun port VLAN MAC Age # Finally, packets get to destination, while being copied to all ports on source host, which is serious security issue. I do not think so conceived by design, I rather think we missed something in configuration. Can anybody point me where we're wrong and help with this issue? We're using Openstack Rocky and OVS 2.10.0 under Ubuntu 16.04. Network configuration is: @controller: # cat /etc/neutron/plugins/ml2/ml2_conf.ini |egrep -v "^$|^#" [DEFAULT] verbose = true [ml2] type_drivers = flat,vxlan tenant_network_types = vxlan mechanism_drivers = l2population,openvswitch extension_drivers = port_security,qos,dns_domain_ports [ml2_type_flat] flat_networks = provider [ml2_type_geneve] [ml2_type_gre] [ml2_type_vlan] [ml2_type_vxlan] vni_ranges = 400:400000 [securitygroup] firewall_driver = openvswitch enable_security_group = true enable_ipset = true @agent: # cat /etc/neutron/plugins/ml2/openvswitch_agent.ini |egrep -v "^$|^#" [DEFAULT] verbose = true [agent] tunnel_types = vxlan l2_population = true arp_responder = true extensions = qos [ovs] local_ip = 10.150.0.5 bridge_mappings = provider:br-ex [securitygroup] firewall_driver = openvswitch enable_security_group = true enable_ipset = true [xenapi] Thank you. -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From flux.adam at gmail.com Mon Nov 11 18:09:03 2019 From: flux.adam at gmail.com (Adam Harwell) Date: Mon, 11 Nov 2019 10:09:03 -0800 Subject: [octavia][ptg] Summary of Shanghai PTG Discussion Message-ID: Fellow Octavians, We covered a lot of ground during this PTG, met a number of new folks, and got a lot of valuable feedback. I'll do my best to summarize here what was discussed. 1. Metrics 1. It would be nice to expose metrics for pools/members, though we would like to get a better understanding of the requirements / use-cases. 2. We should publish metrics using some mechanism (plugin/driver). 1. The default would be "database" and would handle the existing API-exposed metrics. 2. Additional drivers would be loaded in parallel, and might include Monasca/Ceilometer/Prometheus drivers. 3. We will switch our metrics internally to use a delta system instead of absolute values from HAProxy. This will allow us to publish in a more sane way in the future. This would not change the way metrics are exposed in the existing API. 2. Notifications 1. We will need to create a spec and gather community feedback. 2. Initial observation indicates the need for two general paths, which will most likely have their own driver systems: 1. provisioning_status changes (including all create/update/delete events) 2. operating_status changes (member up/down, etc) 3. We would provide the entire object in the notification, similar to what other services do. 4. Most likely the default driver(s) would use oslo.notify. 3. Availability Zone Support (Multi-Zone Fault Tolerance) 1. Make at least a story for tracking this, if it doesn't already exist. 2. Allow a single LB to have amphorae in multiple zones. 3. Existing patch: https://review.opendev.org/#/c/558962/ 4. Availability Zone Support (Compute AZ Awareness) 1. Make at least a story for tracking this, if it doesn't already exist. 2. Allow placing LBs in specific zones: 1. When zones are geographically separated, LBs should exist in the same zone as the members they support 2. When zones are logically separated (like PCI compliance zones, etc), users may need to place them specifically. 3. A new parameter `availability_zone` will be added to the LB create API. It will allow the user to select which Octavia AZ to use. 4. A new API section will be added for creating/configuring/listing Octavia AZs. This will allow a linkage between Compute AZs and Amphora Management Network, along with other possible options in the future. Admins can create/update, and users can list zones. 5. Update clients to support this, including further polluting the `openstack availability zone list` command to include `--loadbalancers` zones. 5. Python 2 EOL 1. Remove all jobs that test Python 2 (or update them if they're not duplicates). 2. Remove six compatibility code, which should simplify string handling significantly. 6. More Flavor Capabilities 1. Image Tag (to allow different amp images per flavor) 2. Availability Zone (to allow compute AZ pinning) 3. Management Network (to go with compute AZ) 4. Metadata (allow passing arbitrary metadata through to compute) 7. TLS Protocol/Cipher API Support 1. Allow users to select specific protocols/ciphers as a whitelist. 2. Stories: 1. Ciphers: https://storyboard.openstack.org/#!/story/2006627 2. Protocols: https://storyboard.openstack.org/#!/story/2006733 8. Performance Tuning 1. HAProxy: There are a number of knobs/dials that can be adjusted to make HAProxy behave more efficiently. Some of these that we could look at more are around TLS options, and around multiprocessing/threading. The latter will probably need to wait for us to switch to HAProxy 2.0. 2. Image Metadata: There are flags that could be added to our amphora image's metadata that might improve performance. To be further researched. 9. Testing 1. Team to evaluate existing non-voting jobs for promotion to voting. 1. Agreement was made with the Barbican team to promote both side's co-gating jobs to voting. 2. Team to evaluate merging or pruning some jobs to reduce the overall set that run on each change. 3. Grenade needs a few changes: 1. Switch to python3. 2. Upgrade to Zuul v3. 3. Test additional operations on existing LBs (old amp image), not just traffic. 4. Test more than just the most recent amphora image against the current control-plane code. Use periodic jobs for this. 5. Fix the Zuul grafana dashboard for Octavia test history. 10. Jobboard 1. This continues to be a priority for Ussuri. 2. Put together a priority list of patches specifically for jobboard. 11. HAProxy 2.0 1. A number of features are gated behind the new version, including multi-process and HTTP/2 support. 2. Need to reach out to distributions to push for backports (to cloudarchive for Ubuntu, and whatever similar thing for CentOS). 3. Possibly add an element to allow building new versions from source. 4. Perform version-based validation of options on the API-side of the amphora driver. 1. Inspect and cache Glance metadata for the LB's amphora image to get version data. 2. Provide the metadata string for the operator from our disk-image-create script. The full etherpad from the PTG including the notes I've summarized here is available at https://etherpad.openstack.org/p/octavia-shanghai-U-ptg if further review is desired. Thanks to everyone who participated, and best of luck on this (hopefully) productive new cycle! --Adam Harwell -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Mon Nov 11 19:05:55 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 12 Nov 2019 03:05:55 +0800 Subject: [nova][ptg] drop the shadow table concept In-Reply-To: <1573403387.31166.6@est.tech> References: <1573403387.31166.6@est.tech> Message-ID: On Mon, Nov 11, 2019 at 12:33 AM Balázs Gibizer wrote: > > CERN reported two issues with archive_deleted_rows CLI: > * When one record gets inserted into the shadow_instance_extra but > didn't get deleted from instance_extra (I know this is in a single > transaction but sometimes it happens), needs manual cleanup on the > database > * Also there could be two cells running this command at the same time > fighting for the API db lock, > > > TODOs: > * tssurya to report bugs / improvements on archive_deleted_rows CLI > based on CERN's experience with long table locking > * mnaser to report a wishlist bug / specless bp about one step db purge > CLI which would skip the shadow tables I did my homework: https://bugs.launchpad.net/nova/+bug/1852121 I don't think I have time currently to iterate and work on it right now, but at least it's documented. > Cheers, > gibi > > > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. https://vexxhost.com From zbitter at redhat.com Mon Nov 11 19:32:20 2019 From: zbitter at redhat.com (Zane Bitter) Date: Mon, 11 Nov 2019 14:32:20 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> On 7/11/19 2:11 pm, Corey Bryant wrote: > Hello TC members, > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's > too late to enable voting py38 unit tests for ussuri, I'd like to at > least enable non-voting py38 unit tests. This email is seeking approval > and direction from the TC to move forward with enabling non-voting py38 > tests. I was a bit fuzzy on this myself, so I looked it up and this is what the TC decided when we passed the resolution: > If the new Zuul template contains test jobs that were not in the previous one, the goal champion(s) may choose to update the previous template to add a non-voting check job (or jobs) to match the gating jobs in the new template. This means that all repositories that have not yet converted to the template for the upcoming release will see a non-voting preview of the new job(s) that will be added once they update. If this option is chosen, the non-voting job should be limited to the master branch so that it does not run on the preceding release’s stable branch. (from https://governance.openstack.org/tc/resolutions/20181024-python-update-process.html#unit-tests ) So to follow that process we would need to define the python versions for V, then appoint a goal champion, and after that it would be at the champion's discretion to add a non-voting job on master in Ussuri. I happened to be sitting next to Sean when I saw this thread, and after discussing it with him I think he would OK with having a non-voting job on every commit, since it's what we have documented. Previous discussions established that the overhead of adding one Python unit test job to every project was pretty inconsequential (we'll offset it by dropping 2.7 jobs anyway). I submitted a draft governance patch defining the Python versions for V (https://review.opendev.org/693743). Unfortunately we can't merge it yet because we don't have a release name for V (Sean is working on that: https://review.opendev.org/693266). It's gazing in the crystal ball a little bit, but even if for some reason Ubuntu 20.04 is not released before the V cycle starts, it's inevitable that we will be selecting Python 3.8 because it meets the first criterion ("The latest released version of Python 3 that is available in any distribution we can feasibly use for testing") - 3.8 is released and it's available in Ubuntu 18.04, which is the distro we use for testing anyway. So, in my opinion, if you're volunteering to be the goal champion then there's no need for any further approval by the TC ;) I guess to make that official we should commit the python3 update Goal for the V cycle now... or at least as soon as we have a release name. This is happening a little earlier than I think we anticipated but, given that there's no question what is going to happen in V, I don't think we'd be doing anybody any favours by delaying the process unnecessarily. > For some further background: The next release of Ubuntu, Focal (20.04) > LTS, is scheduled to release in April 2020. Python 3.8 will be the > default in the Focal release, so I'm hopeful that non-voting unit tests > will help close some of the gap. > > I have a review here for the zuul project template enablement for ussuri: > https://review.opendev.org/#/c/693401 > > Also should this be updated considering py38 would be non-voting? > https://governance.openstack.org/tc/reference/runtimes/ussuri.html No, I don't think this changes anything for Ussuri. It's preparation for V. cheers, Zane. From melwittt at gmail.com Mon Nov 11 19:37:30 2019 From: melwittt at gmail.com (melanie witt) Date: Mon, 11 Nov 2019 11:37:30 -0800 Subject: [nova][ptg] drop the shadow table concept In-Reply-To: References: <1573403387.31166.6@est.tech> Message-ID: On 11/11/19 08:50, melanie witt wrote: > On 11/10/19 12:41, Matt Riedemann wrote: >> On 11/10/2019 10:29 AM, Balázs Gibizer wrote: >>> * Also there could be two cells running this command at the same time >>> fighting for the API db lock, >> >> In Train the --all-cells option was added to the CLI so that should >> resolve this issue. I think Mel said she backported those changes >> internally so I'm not sure how hard it would be for those to go back >> to Stein or Rocky or whatever release CERN is using now. > > That's correct, I backported --all-cells [1][2][3][4] to Stein, Rocky, > and Queens downstream. I found it not to be easy but YMMV. > > The primary conflicts in Stein were with --before, so I went ahead and > brought those patches back as well [5][6][7] since we also needed > --before to help people avoid the "orphaned virt guests if archive runs > while nova-compute is down" problem. > > Same deal for Rocky. > > And finally with Queens, there's an additional conflict around deleting > instance group members [8], so I also brought that back because it's > related to all of the database cleanup issues that support has > repeatedly faced with customers. Sorry, I have to be pedantic and amend the info about Queens ^ to add that --purge [9][10][11] was another conflict in Queens that I also backported because we had a separate request open by support for that as well anyway. > Hope this helps anyone considering backporting --all-cells. > > Cheers, > -melanie > > [1] https://review.opendev.org/675218 > [2] https://review.opendev.org/675209 > [3] https://review.opendev.org/675205 > [4] https://review.opendev.org/507486 > [5] https://review.opendev.org/661289 > [6] https://review.opendev.org/556751 > [7] https://review.opendev.org/643779 > [8] https://review.opendev.org/598953 [9] https://review.opendev.org/550171 [10] https://review.opendev.org/550182 [11] https://review.opendev.org/550502 From mriedemos at gmail.com Mon Nov 11 19:40:52 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 11 Nov 2019 13:40:52 -0600 Subject: [nova][ptg] drop the shadow table concept In-Reply-To: References: <1573403387.31166.6@est.tech> Message-ID: <6c497067-70ad-5a47-c432-995855917989@gmail.com> On 11/11/2019 1:05 PM, Mohammed Naser wrote: > I did my homework: > > https://bugs.launchpad.net/nova/+bug/1852121 > > I don't think I have time currently to iterate and work on it right > now, but at least it's documented. I commented in the bug and, without more details, I don't see how it's really worth the trouble of refactoring the archive/purge code to deal with this optimization but I can probably be proven wrong. -- Thanks, Matt From tidwellrdev at gmail.com Mon Nov 11 19:47:15 2019 From: tidwellrdev at gmail.com (Ryan Tidwell) Date: Mon, 11 Nov 2019 13:47:15 -0600 Subject: BGP dynamic routing In-Reply-To: References: <2e0d4ab1-824c-b605-91ad-93536044ae69@gmx.com> <4016b2a9-f3f4-3e4e-211f-4d6c1a63dc3e@gmx.com> Message-ID: At the moment neutron-dynamic-routing does not support receiving routes from its peers. If you look at the code, you'll see that the BGP will handle any route updates it gets from a peer by simply invoking a no-op routine that logs an info message [1]. You're not the first one to ask the question, so if you can express a solid use case I think an RFE could be crafted to support you. I just haven't seen the use case expressed by anyone yet, but that's not to say it doesn't exist. -Ryan Tidwell [1] https://github.com/openstack/neutron-dynamic-routing/blob/master/neutron_dynamic_routing/services/bgp/agent/driver/os_ken/driver.py#L40 On Mon, Nov 4, 2019 at 10:38 AM Donny Davis wrote: > To be honest I only use it for the use case I listed before, so beyond > that I am not going to be much help. > > However.. they are both speaking bgp I would imagine that it works the > same way as any bgp instance. > > Give it a whirl and let us know how it works out. :) > > On Mon, Nov 4, 2019 at 11:28 AM Volodymyr Litovka wrote: > >> Hi Donny, >> >> the question if I have few peers to few PoPs, everyone with own set of >> prefixes and need to import these external prefixes INTO the tenant. >> >> >> On 04.11.2019 17:08, Donny Davis wrote: >> >> The way I use it is to dynamically advertise my tenant networks to the >> edge. The edge router still handles routes in the rest of my infra. >> >> Works pretty well for me. >> >> Donny Davis >> c: 805 814 6800 >> >> On Mon, Nov 4, 2019, 6:52 AM Volodymyr Litovka wrote: >> >>> Dear colleagues, >>> >>> "BGP dynamic routing" doc >>> ( >>> https://docs.openstack.org/neutron/rocky/admin/config-bgp-dynamic-routing.html >>> ) >>> says only about advertisement of routes: "BGP dynamic routing enables >>> advertisement of self-service (private) network prefixes to physical >>> network devices that support BGP such as routers, thus removing the >>> conventional dependency on static routes." and nothing about receiving >>> of routes from external peers. >>> >>> Whether it is ever possible using Neutron to have fully dynamic routing >>> inside the project, both advertising/receiving (and updating VRs >>> configuration) routes to/from remote peers? >>> >>> Thank you. >>> >>> -- >>> Volodymyr Litovka >>> "Vision without Execution is Hallucination." -- Thomas Edison >>> >>> >>> >> -- >> Volodymyr Litovka >> "Vision without Execution is Hallucination." -- Thomas Edison >> >> > > -- > ~/DonnyD > C: 805 814 6800 > "No mission too difficult. No sacrifice too great. Duty First" > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Mon Nov 11 19:54:02 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 11 Nov 2019 13:54:02 -0600 Subject: [nova][cinder] Volume-backed instance disks and some operations that do not support those yet In-Reply-To: References: Message-ID: <89de51c7-b489-7df3-1bb5-7424ee8a9542@gmail.com> On 11/11/2019 10:06 AM, Peter Penchev wrote: > There seem to still be some quirks with Nova and volume-backed instance > disks; some actions on instances are not allowed, others produce > somewhat weird results. From a quick look at the code it seems to me > that currently these are: > - taking a snapshot of an instance (produces a zero-sized file, no real > data backed up) Volume-backed instance snapshot is supported [1]. It creates a volume snapshot in cinder and then links that to the glance image via metadata. If you boot a server from that image snapshot it's boot-from-volume under the covers, what is sometimes referred to as an image-defined block device mapping. Tempest also has a scenario test for this [2]. > - backing an instance up (refuses outright) Yeah not supported and not really necessary to support. The createBackup API is essentially frozen since it's just orchestration over the existing createImage API and could all be done via external tooling so it's not really a priority to make that a more feature rich API. We've even talked about deprecating createBackup just to get people to stop using it. > - rescuing an instance (refuses outright) Yeah, not supported, but there have been specs [3][4]. > ...and maybe there are some that I've missed. Rebuilding a volume-backed server is another big one. There was actually agreement on how to do this between nova and cinder [5][6], the cinder implementation was code up and being reviewed, but the nova side lagged and was eventually abandoned. So that could be picked up again if someone was willing to invest the time in it. [1] https://github.com/openstack/nova/blob/20.0.0/nova/compute/api.py#L3031 [2] https://github.com/openstack/tempest/blob/22.1.0/tempest/scenario/test_volume_boot_pattern.py#L210 [3] https://review.opendev.org/#/c/651151/ [4] https://review.opendev.org/#/c/532410/ [5] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/volume-backed-server-rebuild.html [6] https://blueprints.launchpad.net/cinder/+spec/add-volume-re-image-api -- Thanks, Matt From mriedemos at gmail.com Mon Nov 11 20:01:48 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 11 Nov 2019 14:01:48 -0600 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> Message-ID: <7f98ad40-bcf0-bb85-da58-d6a7a1f71705@gmail.com> On 11/11/2019 7:03 AM, Chris Dent wrote: > Or using > separate processes? For the ironic and vsphere contexts, increased > CPU usage by the nova-compute process does not impact on the > workload resources, so parallization is likely a good option. I don't know how much it would help - someone would have to actually test it out and get metrics - but one easy win might just be using a thread or process executor pool here [1] so that N compute nodes could be processed through the update_available_resource periodic task concurrently, maybe $ncpu or some factor thereof. By default make it serialized for backward compatibility and non-ironic deployments. Making that too highly concurrent could have negative impacts on other things running on that host, like the neutron agent, or potentially storming conductor/rabbit with a ton of DB requests from that compute. That doesn't help with the scenario that the big COMPUTE_RESOURCE_SEMAPHORE lock is held by the periodic task while spawning, moving, or deleting an instance that also needs access to the big lock to update the resource tracker, but baby steps if any steps in this area of the code would be my recommendation. [1] https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629 -- Thanks, Matt From dharmendra.kushwaha at gmail.com Tue Nov 12 07:21:16 2019 From: dharmendra.kushwaha at gmail.com (Dharmendra Kushwaha) Date: Tue, 12 Nov 2019 12:51:16 +0530 Subject: [tacker] No IRC meeting today Message-ID: Hello Taker team, As we have PTG in last week, lets skip today's weekly meeting. Thanks & Regards Dharmendra Kushwaha -------------- next part -------------- An HTML attachment was scrubbed... URL: From luyao.zhong at intel.com Tue Nov 12 05:46:13 2019 From: luyao.zhong at intel.com (Zhong, Luyao) Date: Tue, 12 Nov 2019 05:46:13 +0000 Subject: [nova] track error migrations and orphans in Resource Tracker Message-ID: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> Hi Nova experts, "Not tracking error migrations and orphans in RT." is probably a bug. This may trigger some problems in update_available_resources in RT at the moment. That is some orphans or error migrations are using cpus/memory/disk etc, but we don't take these usage into consideration. And instance.resources is introduced from Train used to contain specific resources, we also track assigned specific resources in RT based on tracked migrations and instances. So this bug will also affect the specific resources tracking. I draft an doc to clarify this bug and possible solutions: https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT Looking forward to suggestions from you. Thanks in advance. Best Regards, Luyao -------------- next part -------------- An HTML attachment was scrubbed... URL: From doka.ua at gmx.com Tue Nov 12 07:38:08 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Tue, 12 Nov 2019 09:38:08 +0200 Subject: BGP dynamic routing In-Reply-To: References: <2e0d4ab1-824c-b605-91ad-93536044ae69@gmx.com> <4016b2a9-f3f4-3e4e-211f-4d6c1a63dc3e@gmx.com> Message-ID: <7440b104-20c5-7473-c151-06f66b904731@gmx.com> Hi Ryan, thanks for the reply. To be frank, I can't come up with some general use cases for such RFE. I'm solving particular problem, connecting remote premises over VPN to the cloud tenant and is able to combine BGP on VPN concentrator and static routes inside tenant. The question was like "what if supported? It will be convenient." I appreciate your efforts and thanks again for the answer. On 11.11.2019 21:47, Ryan Tidwell wrote: > At the moment neutron-dynamic-routing does not support receiving > routes from its peers. If you look at the code, you'll see that the > BGP will handle any route updates it gets from a peer by simply > invoking a no-op routine that logs an info message [1]. You're not the > first one to ask the question, so if you can express a solid use case > I think an RFE could be crafted to support you. I just haven't seen > the use case expressed by anyone yet, but that's not to say it doesn't > exist. > > -Ryan Tidwell > > [1] > https://github.com/openstack/neutron-dynamic-routing/blob/master/neutron_dynamic_routing/services/bgp/agent/driver/os_ken/driver.py#L40 > > On Mon, Nov 4, 2019 at 10:38 AM Donny Davis > wrote: > > To be honest I only use it for the use case I listed before, so > beyond that I am not going to be much help. > > However.. they are both speaking bgp I would imagine that it works > the same way as any bgp instance. > > Give it a whirl and let us know how it works out. :) > > On Mon, Nov 4, 2019 at 11:28 AM Volodymyr Litovka > wrote: > > Hi Donny, > > the question if I have few peers to few PoPs, everyone with > own set of prefixes and need to import these external prefixes > INTO the tenant. > > > On 04.11.2019 17:08, Donny Davis wrote: >> The way I use it is to dynamically advertise my tenant >> networks to the edge. The edge router still handles routes in >> the rest of my infra. >> >> Works pretty well for me. >> >> Donny Davis >> c: 805 814 6800 >> >> On Mon, Nov 4, 2019, 6:52 AM Volodymyr Litovka >> > wrote: >> >> Dear colleagues, >> >> "BGP dynamic routing" doc >> (https://docs.openstack.org/neutron/rocky/admin/config-bgp-dynamic-routing.html) >> says only about advertisement of routes: "BGP dynamic >> routing enables >> advertisement of self-service (private) network prefixes >> to physical >> network devices that support BGP such as routers, thus >> removing the >> conventional dependency on static routes." and nothing >> about receiving >> of routes from external peers. >> >> Whether it is ever possible using Neutron to have fully >> dynamic routing >> inside the project, both advertising/receiving (and >> updating VRs >> configuration) routes to/from remote peers? >> >> Thank you. >> >> -- >> Volodymyr Litovka >>    "Vision without Execution is Hallucination." -- Thomas >> Edison >> >> > > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison > > > > -- > ~/DonnyD > C: 805 814 6800 > "No mission too difficult. No sacrifice too great. Duty First" > -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Tue Nov 12 08:42:45 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 12 Nov 2019 09:42:45 +0100 Subject: [Neutron] OVS forwarding issues In-Reply-To: References: Message-ID: <20191112084245.hjz3zrwbrdw64okd@skaplons-mac> Hi, If You are using ovs firewall driver, it is known issue there. See bug [1] for details. There is proposal how to fix it in [2] but it's not perfect and still require some more work to do. [1] https://bugs.launchpad.net/neutron/+bug/1732067 [2] https://bugs.launchpad.net/neutron/+bug/1841622 On Mon, Nov 11, 2019 at 07:06:43PM +0200, Volodymyr Litovka wrote: > Dear colleagues, > > just faced an issue with Openvswitch, which looks strange for me. The > problem is that any particular VM receives a lot of packets, which are > unicasted: > - from other VMs which reside on the same host (let's name them "local VMs") > - to other VMs which reside on other hosts (let's name them "remote VMs") > > Long output from "ovs-ofctl dump-flows br-int" which, as far as I can > narrow, ends there: > > # ovs-ofctl dump-flows br-int |grep " table=94," |egrep > "n_packets=[123456789]" >  cookie=0xaf6b1435fe826bdf, duration=2952350.695s, table=94, > n_packets=291494723, n_bytes=40582103074, idle_age=0, hard_age=65534, > priority=1 actions=NORMAL > > coming to normal processing (classic MAC learning). Looking into br-int > MAC-table (ovs-appctl fdb/show br-int) shows, that there are really no > MAC addresses of remote VMs and br-int behaves in the right way, > flooding unknown unicast to all ports in this L2 segment. > > Of course, there is br-tun which connected over vxlan to all other hosts > and to br-int: > >     Bridge br-tun >         Controller "tcp:127.0.0.1:6633" >             is_connected: true >         fail_mode: secure >         Port "vxlan-0a960008" >             Interface "vxlan-0a960008" >                 type: vxlan >                 options: {df_default="true", in_key=flow, > local_ip="10.150.0.5", out_key=flow, remote_ip="10.150.0.8"} >         [ ... ] >         Port br-tun >             Interface br-tun >                 type: internal >         Port patch-int >             Interface patch-int >                 type: patch >                 options: {peer=patch-tun} > > but MAC table on br-tun is empty as well: > > # ovs-appctl fdb/show br-tun >  port  VLAN  MAC                Age > # > > Finally, packets get to destination, while being copied to all ports on > source host, which is serious security issue. > > I do not think so conceived by design, I rather think we missed > something in configuration. Can anybody point me where we're wrong and > help with this issue? > > We're using Openstack Rocky and OVS 2.10.0 under Ubuntu 16.04. Network > configuration is: > > @controller: > # cat /etc/neutron/plugins/ml2/ml2_conf.ini |egrep -v "^$|^#" > [DEFAULT] > verbose = true > [ml2] > type_drivers = flat,vxlan > tenant_network_types = vxlan > mechanism_drivers = l2population,openvswitch > extension_drivers = port_security,qos,dns_domain_ports > [ml2_type_flat] > flat_networks = provider > [ml2_type_geneve] > [ml2_type_gre] > [ml2_type_vlan] > [ml2_type_vxlan] > vni_ranges = 400:400000 > [securitygroup] > firewall_driver = openvswitch > enable_security_group = true > enable_ipset = true > > @agent: > # cat /etc/neutron/plugins/ml2/openvswitch_agent.ini |egrep -v "^$|^#" > [DEFAULT] > verbose = true > [agent] > tunnel_types = vxlan > l2_population = true > arp_responder = true > extensions = qos > [ovs] > local_ip = 10.150.0.5 > bridge_mappings = provider:br-ex > [securitygroup] > firewall_driver = openvswitch > enable_security_group = true > enable_ipset = true > [xenapi] > > Thank you. > > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison > -- Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Tue Nov 12 10:27:29 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 12 Nov 2019 11:27:29 +0100 Subject: [infra] Etherpad problem In-Reply-To: <6c3b8ca3-d2d4-7eb4-d615-5396eaf1cdef@gmail.com> References: <20191109021607.ffm5lkwkfohl22fq@skaplons-mac> <6c3b8ca3-d2d4-7eb4-d615-5396eaf1cdef@gmail.com> Message-ID: <20191112102729.fggiccohpwpqd3he@skaplons-mac> Hi, Thx Brian. I used Your backup etherpad to "restore" everything in [1] [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored On Sun, Nov 10, 2019 at 12:31:36AM +0800, Brian Haley wrote: > On 11/9/19 10:16 AM, Slawek Kaplonski wrote: > > Hi, > > > > Just at the end of the ptg sessions, neutron etherpad was got broken somehow. > > Now when I try to open [1] I see only something like: > > > > An error occurred > > The error was reported with the following id: 'igzOahZ6ruH0eSUAWKaj' > > > > Please press and hold Ctrl and press F5 to reload this page, if the problem > > persists please send this error message to your webmaster: > > 'ErrorId: igzOahZ6ruH0eSUAWKaj > > URL: https://etherpad.openstack.org/p/Shanghai-Neutron-Planning > > UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:70.0) Gecko/20100101 > > Firefox/70.0 > > TypeError: r.dropdowns is undefined in > > https://etherpad.openstack.org/javascripts/lib/ep_etherpad-lite/static/js/pad.js?callback=require.define > > at line 18' > > > > > > We can open one of the previous versions which is available at [2] but I don't > > know how we can fix original etherpad or restore version from [2] to be original > > etherpad and make it working again. > > Can someone from infra team check that for us maybe? > > Thx in advance for any help. > > > > [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning > > [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning/timeslider#16117 > > Hi Slawek, > > When I just went to check this etherpad now, noticed I had a tab open that > was in "Force reconnect" state. I made a copy of that, just might be a > little out of date on the last items. The formatting is also a little odd, > but at least it's better than nothing if we can't get the original back. > > https://etherpad.openstack.org/p/neutron-ptg-temp > > -Brian > -- Slawek Kaplonski Senior software engineer Red Hat From sbauza at redhat.com Tue Nov 12 10:29:20 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 11:29:20 +0100 Subject: [nova][ptg] Resurrecting NUMA topology in Placement Message-ID: We discussed about a long known story https://review.openstack.org/#/c/552924/ The whole agreement during the PTG was to keep things simple with baby steps : - only supporting a few NUMA queries and defer others as unsupported (still supported by legacy NUMATopologyFilter) - The to-be-resurrected spec would be only focus on VCPU/PCPU/MEMORY_MB resource classes and not handle PCI or GPU devices (ie. level-1 tree, no children under the NUMA RPs) Agreement was also there for saying that functional tests should be enough since PlacementFixture already works perfectly. -S -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Tue Nov 12 10:33:52 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 11:33:52 +0100 Subject: [nova][ptg] Support re-configure deleted_on_termination in server Message-ID: Spec is https://review.opendev.org/#/c/580336/ Most people seem to think this makes sense but realize there are already other ways to do this (snapshot) and therefore it's not totally necessary. The agreement in the room was to post the code up for the change, as this will help sell people on it if it's trivial enough and document the use case (i.e. are there scenarios where this would make life 10x easier?) -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Tue Nov 12 10:51:16 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 11:51:16 +0100 Subject: [nova][ptg] PCI refactoring needs and a strawman proposal inside Message-ID: Based on some Forum discussions, we had a de facto conversation at the Nova PTG about any potential things we could do for helping our operators, but also potentially Cyborg since they use the PCI passthrough capabilities. The feedback from ops (thanks mnaser) was that PCI passthrough works pretty smoothly but there are some ugly issues where we could improve the UX. The agreement in the room was, at least for Ussuri, to start collecting some ideas on how we could model Placement usage for PCI devices and also write motivations for such things in a spec. We also agreed on a smooth upgrade plan where PCITracker would still be present for a couple of releases until we're able to close the feature gap. Bauzas volunteered for drafting the spec and gibi, stephenfin and bauzas started to sketch up the Placement modeling on a whiteboard, picture to be shared in a follow-up. -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Tue Nov 12 13:18:35 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 12 Nov 2019 13:18:35 +0000 Subject: [nova] track error migrations and orphans in Resource Tracker In-Reply-To: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> References: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> Message-ID: On Tue, 2019-11-12 at 05:46 +0000, Zhong, Luyao wrote: > Hi Nova experts, > > "Not tracking error migrations and orphans in RT." is probably a bug. This may trigger some problems in > update_available_resources in RT at the moment. That is some orphans or error migrations are using cpus/memory/disk > etc, but we don't take these usage into consideration. And instance.resources is introduced from Train used to contain > specific resources, we also track assigned specific resources in RT based on tracked migrations and instances. So this > bug will also affect the specific resources tracking. > > I draft an doc to clarify this bug and possible solutions: > https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT > Looking forward to suggestions from you. Thanks in advance. > there are patche up to allow cleaning up orpahn instances https://review.opendev.org/#/c/627765/ https://review.opendev.org/#/c/648912/ if we can get those merged that woudl adress at least some of the proablem > Best Regards, > Luyao From smooney at redhat.com Tue Nov 12 13:26:14 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 12 Nov 2019 13:26:14 +0000 Subject: [nova][ptg] Resurrecting NUMA topology in Placement In-Reply-To: References: Message-ID: On Tue, 2019-11-12 at 11:29 +0100, Sylvain Bauza wrote: > We discussed about a long known story > https://review.openstack.org/#/c/552924/ > > The whole agreement during the PTG was to keep things simple with baby > steps : > - only supporting a few NUMA queries and defer others as unsupported (still > supported by legacy NUMATopologyFilter) > - The to-be-resurrected spec would be only focus on VCPU/PCPU/MEMORY_MB > resource classes and not handle PCI or GPU devices (ie. level-1 tree, no > children under the NUMA RPs) > > Agreement was also there for saying that functional tests should be enough > since PlacementFixture already works perfectly. we can now do numa testing in the gate so we can also add tempest testing this cycle. artom has recently gotten whitebox to run (more work to do https://review.opendev.org/#/c/691062/) and i do want to get my multi numa nfv testing job https://review.opendev.org/#/c/679656/ at least in experimental and perodic pipelines. i would like it to be in check eventually but baby steps. i dont think these should be a blocker for any of this work but i think we shoudl take advantage of them to contiue to improve the numa testing in gate. > > -S From smooney at redhat.com Tue Nov 12 13:29:32 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 12 Nov 2019 13:29:32 +0000 Subject: [nova][ptg] PCI refactoring needs and a strawman proposal inside In-Reply-To: References: Message-ID: <35cff7ea13a85bde43a4626c84b6bc130eb67110.camel@redhat.com> On Tue, 2019-11-12 at 11:51 +0100, Sylvain Bauza wrote: > Based on some Forum discussions, we had a de facto conversation at the Nova > PTG about any potential things we could do for helping our operators, but > also potentially Cyborg since they use the PCI passthrough capabilities. > > The feedback from ops (thanks mnaser) was that PCI passthrough works pretty > smoothly but there are some ugly issues where we could improve the UX. > > The agreement in the room was, at least for Ussuri, to start collecting > some ideas on how we could model Placement usage for PCI devices and also > write motivations for such things in a spec. > We also agreed on a smooth upgrade plan where PCITracker would still be > present for a couple of releases until we're able to close the feature gap. > > Bauzas volunteered for drafting the spec and gibi, stephenfin and bauzas > started to sketch up the Placement modeling on a whiteboard, picture to be > shared in a follow-up. i have a list of things i want to enhance related to pci/sriov so i would be interested in this topic too. it might be worth consiering a SIG on this topic if it will be cross project. From smooney at redhat.com Tue Nov 12 13:29:53 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 12 Nov 2019 13:29:53 +0000 Subject: [nova][ptg] Resurrecting NUMA topology in Placement In-Reply-To: References: Message-ID: On Tue, 2019-11-12 at 13:26 +0000, Sean Mooney wrote: > On Tue, 2019-11-12 at 11:29 +0100, Sylvain Bauza wrote: > > We discussed about a long known story > > https://review.openstack.org/#/c/552924/ > > > > The whole agreement during the PTG was to keep things simple with baby > > steps : > > - only supporting a few NUMA queries and defer others as unsupported (still > > supported by legacy NUMATopologyFilter) > > - The to-be-resurrected spec would be only focus on VCPU/PCPU/MEMORY_MB > > resource classes and not handle PCI or GPU devices (ie. level-1 tree, no > > children under the NUMA RPs) > > > > Agreement was also there for saying that functional tests should be enough > > since PlacementFixture already works perfectly. > > we can now do numa testing in the gate so we can also add tempest testing this cycle. > artom has recently gotten whitebox to run (more work to do https://review.opendev.org/#/c/691062/) > and i do want to get my multi numa nfv testing job https://review.opendev.org/#/c/679656/ at least > in experimental and perodic pipelines. i would like it to be in check eventually but baby steps. > > i dont think these should be a blocker for any of this work but i think we shoudl take advantage > of them to contiue to improve the numa testing in gate. by gate i ment ci/ > > > > > -S > > From skaplons at redhat.com Tue Nov 12 13:53:11 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 12 Nov 2019 14:53:11 +0100 Subject: [ptg][neutron] Ussuri PTG summary Message-ID: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> Hi Neutron team, First if all thank to all of You for great and very productive week during the PTG in Shanghai. Below is summary of our discussions from whole 3 days. If I forgot about something, please respond to the email and update missing informations. But if You want to have follow up discussion about one of the topics from this summary, please start a new thread to keep this one only as high level summary of the PTG. On boarding =========== Slides from onboarding session can be found at [1] If You have any follow up questions to us about onboarding, or You need help with starting any work in Neutron team, please contact me or Miguel Lavalle by email or on IRC. My IRC nick is slaweq and Miguel's nick is mlavalle. We are available on #openstack-neutron channel @freenode. Train retrospective =================== Good things in Train cycle: * working with this team is still good experience * core team is stable, and we didn't lost any core reviewers during the cycle, * networking is still one of key reasons why people use OpenStack Not good things: * dimished vitality in stadium projects - we had also forum session and follow discussion about this later during the PTG, * gate instability - we have seen many issues which were out of our control, like infra problems, grenade jobs failures, other projects failures, but also many bugs on our side, * we have really a lot of jobs in our check/gate queue. If each of them is failing 5% of times, it's hard to merge any patch as almost every time, one of jobs will fail. Later during the PTG we also discussed that topic and we were looking for some jobs which we maybe can potentially drop from our queues. See below for summary about that, Action items/improvements: * many team meetings each week. We decided to limit number of meetings by: ** consolidate performance subteam meeting into weekly team meeting - this topic will be added to the team meeting's agenda for team meetings on Monday, ** consolidate ovn convergence meeting into weekly team meeting - this topic will be added to the team meeting's agenda for team meetings on Tuesday, ** we need to check if QoS subteam meeting is still needed, * Review process: list of actual review priorities would be useful for the team, we will add "Review-Priority" label to the Neutron reviews board and try to use it during the Ussuri cycle. Openvswitch agent enhancements ============================== We had bunch of topics related to potential improvements for neutron-openvswitch-agent proposed mostly by Liu Yulong. Slides with his proposals are available at [2]. * retire DHCP agent - resyncs of DHCP agent are problematic, especially when agent hosts many networks. Proposal was to add new L2 agent's extension which could be used instead of "regular" DHCP agent and to provide only basic DHCP functionalities. Such solutions would work in the way quite similar to how networking-ovn works today but we would need to implement and maintain own dhcp server application. Problems of this solution are: ** problems with compatibility e.g. with Ironic, ** how it would work with mixed deployments, e.g. with ovs and sriov agents, ** support for dhcp options, Advantages of this solution: ** fully distributed DHCP service, ** no DHCP agents, so less RPC messages on the bus and easier maintanance of the agents, Team's feedback for that is that this is potentially nice solution which may helps in some specific, large scale deploymnets. We can continue discussion about this during Ussuri cycle for sure. * add accepted egress fdb flows We agreed that this is a bug and we should continue work on this to propose some way to fix it. Solution proposed by LIU during this discussion wasn't good as it could potentially break some corner cases. * new API and agent for L2 traffic health check The team asked to add to the spec some more detailed and concrete use cases with explanation how this new API may help operator of the cloud to investigate where the problem actually is. * Local flows cache and batch updating The team agreed that as long as this will be optional solution which operator can opt-in we can give it a try. But spec and discuss details there will be necessary. * stop processing ports twice in ovs-agent We all agreed that this is a bug and should be fixed. But we have to be careful as fixing this bug may cause some other problems e.g. with live-migration - see nova-neutron cross project session. * ovs-agent: batch flow updates with --bundle We all agreed that this can be done as an improvement of existing code. Similar option is already used in openvswitch firewall driver. Neutron - Cyborg cross project session ====================================== Etherpad for the session is at [3]. Cyborg team wants to include Neutron in workflow of spawning VMs with Smart NICs or accelerator cards. From Neutron's side, required change is to allow including "accel" data in port binding profile. As long as this will be well documented what can be placed there, there should be no problem with doing that. Technically we can place almost anything there. Neutron - Kuryr cross project session ===================================== Etherpad for the session is at [4]. Kuryr team proposed 4 improvements for Neutron which would help a lot Kuryr. Ideas are: * Network cascade deletion, * Force subport deletion, * Tag resources at creation time, * Security group creation with rules & bulk security group rule creation All of those ideas makes sense for Neutron team. Tag resources at creation time is even accepted rfe already - see [5] but there was no volunteer to implement it. We will add it to list of our BPs tracked weekly on team meeting. Miguel Lavalle is going to take a look at it during this cycle. For other proposals we need to have RFEs reported first. Starting the process of removing ML2/Linuxbridge ================================================ Currently in Neutron tree we have 4 drivers: * Linuxbridge, * Openvswitch, * macvtap, * sriov. SR-IOV driver is out of discussion here as this driver is addressing slightly different use case than other out drivers. We started discussion about above topic because we don't want to end up with too many drivers in-tree and we also had some discussions (and we have spec for that already) about include networking-ovn as in-tree driver. So with networking-ovn in-tree we would have already 4 drivers which can be used on any hardware: linuxbridge, ovs, macvtap and ovn. Conclusions from the discussion are: * each driver requires proper testing in the gate, so we need to add many new jobs to our check/gate queue, * currently linuxbridge driver don't have a lot of development and feature parity gaps between linuxbridge and ovs drivers is getting bigger and bigger (e.g. dvr, trunk ports), * also macvtap driver don't have a lot of activity in last few cycles. Maybe this one could be also considered as candidate to deprecation, * we need to have process of deprecating some drivers and time horizon for such actions should be at least 2 cycles. * we will not remove any driver completly but rather we will move it to be in stadium process first so it still can be maintained by people who are interested in it. Actions to do after this discussion: * Miguel Lavalle will contact RAX and Godaddy (we know that those are Linuxbridge users currently) to ask about their feedback about this, * if there are any other companies using LB driver, Nate Johnston is willing to help conctating them, please reach to him in such case. * we may ratify marking linuxbridge as deprecated in the team meeting during Ussuri cycle if nothing surprising pops in. Encrypted(IPSec) tenant networks ================================ Interesting topic proposed but we need to have RFE and spec with more detailed informations about it to continue discussions. Medatada service over IPv6 ========================== This is continuation of old RFE [6]. The only real problem is to choose proper IPv6 address which will be well known address used e.g. by cloud-init. Original spec proposed fe80::a9fe:a9fe as IPv6 address to access metadata service. We decided to be bold and define the standard. Bence Romsics and Miguel Lavalle volunteered to reach out to cloud-init maintainers to discuss that. walkthrough of OVN ================== Since some time we have in review spec about ml2/ovs and ovn convergence. See [7] for details. List of parity gaps between those backends is available at [8]. During the discussion we talked about things like: * migration from ml2/ovs to ml2/ovn - some scripts are already done in [9], * migration from ml2/lb to ml2/ovn - there was no any work done in this topic so far but it should be doable also if someone would need it and want to invest own time for that, * include networking-ovn as in-tree neutron driver and reasons why it could be good idea. Main reasons of that are: ** that would help growing networking-ovn community, ** would help to maintain a healthy project team, ** the default drivers have always been in-tree, However such inclusion may also hurt modularity/logical separation/dependency management/packaging/etc so we need to consider it really carefully and consider all points of view and opinions. Next action item on this topic is to write more detailed summary of this topic and send it to ML and ask wider audience for feedback. IPv6 devstack tempest test configuration vs OVN =============================================== Generally team supports idea which was described during this session and we should change sligtly IPv6 config on e.g. devstack deployments. Neutron - Edge SIG session ========================== We discussed about RFE [10]. This will require also changes on placement side. See [11] for details. Also some cyborg and ovn related changes may be relevant to topics related to Edge. Currently specs which we have are only related to ML2/OVS solution. Neutron - Nova cross project session ==================================== Etherpad for this session is on [12]. Summary written already by gibi can be found at [13]. On [14] You can find image which shows in visual way problem with live-migration of instances with SR-IOV ports. Policy handling in Neutron ========================== The goal of the session was to plan on Neutron's side similar effort to what services like nova are doing now to use new roles like reader and scopes, like project, domain, system provided by Keystone. Miguel Lavalle volunteered to work on this for Neutron and to be part of popup team for cross project collaboration on this topic. Neutron performance improvements ================================ Miguel Lavalle shown us his new profiling decorator [15] and how we all can use it to profile some of API calls in Neutron. Reevaluate Stadium projects =========================== This was follow up discussion after forum session. Notes from forum session can be found at [16]. Nate also prepared some good data about stadium projects activity in last cycles. See at [17] and [18] for details. We all agreed that projects which are in (relatively) good condition now are: * networking-ovn, * networking-odl, * ovsdbapp Projects in bad condition are other projects, like: * neutron-interconnection, * networking-sfc, * networking-bagpipe/bgpvpn, * networking-midonet, * neutron-fwaas and neutron-fwaas-dashboard, * neutron-dynamic-routing, * neutron-vpnaas and neutron-vpnaas-dashboard, We decided to immediately remove neutron-interconnection project as it was never really implemented. For other of those projects, we will send emails to ML to ask for potential maintainers of those projects. If there will be no any volunteers to maintain some of those projects, we will deprecated them and move to "x/" namespace in 2 cycles. Floating IP's On Routed Networks ================================ There is still interest of doing this. Lajos Katona started adding some scenario tests for routed networks already as we need improved test coverage for this feature. Miguel Lavalle said that he will possibly try to work on implementing this in Ussuri cycle. L3 agent enhancement ==================== We talked about couple potential improvements of existing L3 agent, all proposed by LIU Yulong. * retire metering-agent It seems that there is some interest in metering agent recently so we shouldn't probably consider of retiring it for now. We also talked about adding new "tc based" driver to the metering agent and this discussion can be continue on rfe bug [19]. * Centralized DNAT (non-DVR) traffic (floating IP) Scale-out This is proposal of new DVR solution. Some details of this new solution are available at [20]. We agreed that this proposal is trying to solve some very specific use case, and it seems to be very complicated solution with many potential corner cases to address. As a community we don't want to introduce and maintain such complicated new L3 design. * Lazy-load agent side router resources when no related service port Team wants to see RFE with detailed description of the exact problem which this is trying to solve and than continue discussion on such RFE. Zuul jobs ========= In this session we talked about jobs which we can potentially promote to be voting (and we didn't found any of such) and about jobs which we maybe can potentially remove from our queues. Here is what we agreed: * we have 2 iptables_hybrid jobs - one on Fedora and one on Ubuntu - we will drop one of those jobs and left only one of them, * drop neutron-grenade job as it is running still on py27 - we have grenade-py3 which is the same job but run on py36 already, * as it is begin of the cycle, we will switch in devstack neutron uwsgi to be default choice and we will remove "-uwsgi" jobs from queue, * we should compare our single node and multinode variants of same jobs and maybe promote multinode jobs to be voting and then remove single node job - I volunteered to do that, * remove our existing experimental jobs as those jobs are mostly broken and nobody is run those jobs in experimental queue actually, * Yamamoto will check failing networking-midonet job and propose patch to make it passing again, * we will change neutron-tempest-plugin jobs for branch in EM phase to always use certain tempest-plugin and tempest tag, than we will remove those jobs from check and gate queue in master branch, Stateless security groups ========================= Old RFE [21] was approved for neutron-fwaas project but we all agreed that this should be now implemented for security groups in core Neutron. People from Nuage are interested in work on this in upstream. We should probably also explore how easy/hard it will be to implement it in networking-ovn backend. Old, stagnant specs =================== During this session we decided to abandon many of old specs which were proposed long time ago and there is currently no any activity and interest in continue working on them. If anyone would be interested in continue work on some of them, feel free to contact neutron core team on irc or through email and we can always reopen such patch. Community Goal things ===================== We discussed about currently proposed community goals and who can take care of which goal on Neutron's side. Currently there are proposals of community goals as below: * python3 readiness - Nate will take care of this, * move jobs definitions to zuul v3 - I will take care of it. In core neutron and neutron-tempest-plugin we are (mostly) done. On stadium projects' side this will require some work to do, * Project specific PTL and contributor guides - Miguel Lavalle will take care of this goal as former PTL, We will track progress of community goals weekly in our team meetings. Neutron-lib =========== As some time ago our main neutron-lib maintainer (Boden) leaved from the project, we need some new volunteers to continue work on it. Todo list is available on [22]. This should be mostly important for people who are maintaining stadium projects or some 3rd party drivers/plugins so if You are doing things like that, please check list from [22] and reach out to us on ML or #openstack-neutron IRC channel. [1] https://www.slideshare.net/SawomirKaposki/neutron-on-boarding-room [2] https://github.com/gotostack/shanghai_ptg/blob/master/shanghai_neutron_ptg_topics_liuyulong.pdf [3] https://etherpad.openstack.org/p/Shanghai-Neutron-Cyborg-xproj [4] https://etherpad.openstack.org/p/kuryr-neutron-nice-to-have [5] https://bugs.launchpad.net/neutron/+bug/1815933 [6] https://bugs.launchpad.net/neutron/+bug/1460177 [7] https://review.opendev.org/#/c/658414/ [8] https://etherpad.openstack.org/p/ML2-OVS-OVN-Convergence [9] https://github.com/openstack/networking-ovn/tree/master/migration [10] https://bugs.launchpad.net/neutron/+bug/1832526 [11] http://lists.openstack.org/pipermail/openstack-discuss/2019-October/009991.html [12] https://etherpad.openstack.org/p/ptg-ussuri-xproj-nova-neutron [13] http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010654.html [14] https://imgur.com/a/12PrQ9W [15] https://review.opendev.org/678438 [16] https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward [17] https://ethercalc.openstack.org/neutron-stadium-train-metrics [18] https://ibb.co/SBzDGdD [19] https://bugs.launchpad.net/neutron/+bug/1817881 [20] https://imgur.com/a/6MeNUNb [21] https://bugs.launchpad.net/neutron/+bug/1753466 [22] https://etherpad.openstack.org/p/neutron-lib-volunteers-and-punch-list -- Slawek Kaplonski Senior software engineer Red Hat From lyarwood at redhat.com Tue Nov 12 13:57:06 2019 From: lyarwood at redhat.com (Lee Yarwood) Date: Tue, 12 Nov 2019 13:57:06 +0000 Subject: [nova][cinder] Volume-backed instance disks and some operations that do not support those yet In-Reply-To: <89de51c7-b489-7df3-1bb5-7424ee8a9542@gmail.com> References: <89de51c7-b489-7df3-1bb5-7424ee8a9542@gmail.com> Message-ID: <20191112135706.zmywdbyucgmqdloo@lyarwood.usersys.redhat.com> On 11-11-19 13:54:02, Matt Riedemann wrote: > On 11/11/2019 10:06 AM, Peter Penchev wrote: > > - rescuing an instance (refuses outright) > > Yeah, not supported, but there have been specs [3][4]. > > [..] > > [3] https://review.opendev.org/#/c/651151/ > [4] https://review.opendev.org/#/c/532410/ I might actually have time for this during U. Third time lucky? https://review.opendev.org/693849 Cheers, -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From sbauza at redhat.com Tue Nov 12 14:07:05 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 15:07:05 +0100 Subject: [ptg][nova][cinder] x-p meeting minutes Message-ID: Excerpts taken from https://etherpad.openstack.org/p/shanghai-ptg-cinder We discussed two items : 1/ Improving replication - When a failover happens volumes are no longer usable in Nova. - Nova should start re-attaching volumes after a failover? - Why can't we detach and attach the volume? Data that is in flight would be lost. - Question about boot from volume. In that case the instance is dead anyway because access to the volume has been lost. - Could go through the shutdown, detach, attach, reboot path. - Problem is that detach is going to fail. Need to force it or handle the failure. - We aren't sure that Nova will allow a detach of a boot volume. - Also don't currently have a force detach API. - AGREE: Need to figure out how to pass the force to os-brick to detach volume and when rebooting a volume. 2/ Nova bug for images created from encrypted volumes - Nova is not creating a new key for encrypted images but the deletion policy metadata can allow a key to be deleted wehn it is still in use by other images or even volumes - Nova needs to clone the keys when doing create image. - The nova team thinks that we have found a bug that needs to be fixed. We just need to open a bug. - ACTION Cinder to open a bug against Nova. (can you also ping lyarwood when it's open?) -S -------------- next part -------------- An HTML attachment was scrubbed... URL: From corey.bryant at canonical.com Tue Nov 12 14:12:29 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Tue, 12 Nov 2019 09:12:29 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> References: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> Message-ID: On Mon, Nov 11, 2019 at 2:33 PM Zane Bitter wrote: > On 7/11/19 2:11 pm, Corey Bryant wrote: > > Hello TC members, > > > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's > > too late to enable voting py38 unit tests for ussuri, I'd like to at > > least enable non-voting py38 unit tests. This email is seeking approval > > and direction from the TC to move forward with enabling non-voting py38 > > tests. > > I was a bit fuzzy on this myself, so I looked it up and this is what the > TC decided when we passed the resolution: > > > If the new Zuul template contains test jobs that were not in the > previous one, the goal champion(s) may choose to update the previous > template to add a non-voting check job (or jobs) to match the gating jobs > in the new template. This means that all repositories that have not yet > converted to the template for the upcoming release will see a non-voting > preview of the new job(s) that will be added once they update. If this > option is chosen, the non-voting job should be limited to the master branch > so that it does not run on the preceding release’s stable branch. > > Thanks for digging that up and explaining. I recall that wording and it makes a lot more sense now that we have a scenario in front of us. > (from > > https://governance.openstack.org/tc/resolutions/20181024-python-update-process.html#unit-tests > ) > > So to follow that process we would need to define the python versions > for V, then appoint a goal champion, and after that it would be at the > champion's discretion to add a non-voting job on master in Ussuri. I > happened to be sitting next to Sean when I saw this thread, and after > discussing it with him I think he would OK with having a non-voting job > on every commit, since it's what we have documented. Previous > discussions established that the overhead of adding one Python unit test > job to every project was pretty inconsequential (we'll offset it by > dropping 2.7 jobs anyway). > > I submitted a draft governance patch defining the Python versions for V > (https://review.opendev.org/693743). Unfortunately we can't merge it yet > because we don't have a release name for V (Sean is working on that: > https://review.opendev.org/693266). It's gazing in the crystal ball a > Thanks very much for getting that going. little bit, but even if for some reason Ubuntu 20.04 is not released > before the V cycle starts, it's inevitable that we will be selecting > Python 3.8 because it meets the first criterion ("The latest released > version of Python 3 that is available in any distribution we can > feasibly use for testing") - 3.8 is released and it's available in > Ubuntu 18.04, which is the distro we use for testing anyway. > > So, in my opinion, if you're volunteering to be the goal champion then > there's no need for any further approval by the TC ;) > > Sure, I can champion that. Just to be clear, would that be Ussuri and V python3-updates champion, similar to the following? https://governance.openstack.org/tc/goals/selected/train/python3-updates.html Granted it's easier now that we mostly just have to switch the job template to the new release. > I guess to make that official we should commit the python3 update Goal > for the V cycle now... or at least as soon as we have a release name. > How far off do you think we are from having a V name? If just a few weeks then I'm fine waiting but if over a month I'm more concerned. This is happening a little earlier than I think we anticipated but, > given that there's no question what is going to happen in V, I don't > think we'd be doing anybody any favours by delaying the process > unnecessarily. I agree. And Python 3.9 doesn't release until Oct 2020 so that won't be in the picture for Ussuri or V. > > For some further background: The next release of Ubuntu, Focal (20.04) > > LTS, is scheduled to release in April 2020. Python 3.8 will be the > > default in the Focal release, so I'm hopeful that non-voting unit tests > > will help close some of the gap. > > > > I have a review here for the zuul project template enablement for ussuri: > > https://review.opendev.org/#/c/693401 > > > > Also should this be updated considering py38 would be non-voting? > > https://governance.openstack.org/tc/reference/runtimes/ussuri.html > > No, I don't think this changes anything for Ussuri. It's preparation for V. > > Ok. Appreciate all the input and help. Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Tue Nov 12 14:13:51 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 15:13:51 +0100 Subject: [ptg][nova][keystone] x-p meeting minutes Message-ID: We only discussed about the large effort needed for the new policies. https://specs.openstack.org/openstack/nova-specs/specs/ussuri/approved/policy-defaults-refresh.html https://review.opendev.org/#/q/topic:bp/policy-defaults-refresh+(status:open+OR+status:merged) We asked the Keystone team to review the first changes in the series as it's very helpful. We had concerns about completion over the Ussuri cycle since the series can be very large and we want to avoid deprecation messages for some APIs if not all the Nova APIs are touched yet. The agreement was to hold a procedural -2 on the API changes and start reviewing (for both Keystone and Nova folks) until all the changes are there. Maybe a runway could help once all the API changes are uploaded. -S -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Tue Nov 12 14:20:44 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 15:20:44 +0100 Subject: [ptg][nova][glance] x-p meeting minutes Message-ID: We only discussed about nova snapshots to dedicated glance stores. Reference is https://review.opendev.org/#/c/641210/ We had concerns on implementing glance storage location strategies in Nova. A counter-proposal was made to only pass the original image ID when calling Glance for a snapshot so then Glance would then use any store the operator wants (eg. a location strategy like "please store the snapshot to a place close to where the original image is"). The agreement was there for the counter-proposal so the Nova change would only be to pass the original glance image ID to the new glanceclient that would support it. Glance folks will provide a new spec revision. -S -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Tue Nov 12 14:24:20 2019 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 12 Nov 2019 09:24:20 -0500 Subject: [ops] meetups team meeting and meetups venue for Jan 2020 Message-ID: There wll be an OpenStack Ops Meetups team meeting in just under 45 minutes on #openstack-operators. The agenda is here: https://etherpad.openstack.org/p/ops-meetups-team We'd like to see if we can agree to formally accept the offer to host the next OpenStack Operators Meetup in London next January (see https://etherpad.openstack.org/p/ops-meetup-1st-2020) Please attend the meeting if you have feedback for or against this proposal. So far all feedback has been positive and there are no other offers. Note (full disclosure) that this proposal is from my employer. We'll also go over material from last week's summit in Shanghai. Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Tue Nov 12 14:27:26 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Tue, 12 Nov 2019 23:27:26 +0900 Subject: [infra] Etherpad problem In-Reply-To: <20191112102729.fggiccohpwpqd3he@skaplons-mac> References: <20191109021607.ffm5lkwkfohl22fq@skaplons-mac> <6c3b8ca3-d2d4-7eb4-d615-5396eaf1cdef@gmail.com> <20191112102729.fggiccohpwpqd3he@skaplons-mac> Message-ID: Hi, I updated the restored etherpad [1] based on the history at the last moment [2], especially "Review old, stagnant specs" session. If someone has notes on spec reviews discussed after the etherpad became broken (around L.562-569). [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning/timeslider#16117 Thanks, Akihiro On Tue, Nov 12, 2019 at 7:35 PM Slawek Kaplonski wrote: > > Hi, > > Thx Brian. I used Your backup etherpad to "restore" everything in [1] > > [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored > > On Sun, Nov 10, 2019 at 12:31:36AM +0800, Brian Haley wrote: > > On 11/9/19 10:16 AM, Slawek Kaplonski wrote: > > > Hi, > > > > > > Just at the end of the ptg sessions, neutron etherpad was got broken somehow. > > > Now when I try to open [1] I see only something like: > > > > > > An error occurred > > > The error was reported with the following id: 'igzOahZ6ruH0eSUAWKaj' > > > > > > Please press and hold Ctrl and press F5 to reload this page, if the problem > > > persists please send this error message to your webmaster: > > > 'ErrorId: igzOahZ6ruH0eSUAWKaj > > > URL: https://etherpad.openstack.org/p/Shanghai-Neutron-Planning > > > UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:70.0) Gecko/20100101 > > > Firefox/70.0 > > > TypeError: r.dropdowns is undefined in > > > https://etherpad.openstack.org/javascripts/lib/ep_etherpad-lite/static/js/pad.js?callback=require.define > > > at line 18' > > > > > > > > > We can open one of the previous versions which is available at [2] but I don't > > > know how we can fix original etherpad or restore version from [2] to be original > > > etherpad and make it working again. > > > Can someone from infra team check that for us maybe? > > > Thx in advance for any help. > > > > > > [1] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning > > > [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning/timeslider#16117 > > > > Hi Slawek, > > > > When I just went to check this etherpad now, noticed I had a tab open that > > was in "Force reconnect" state. I made a copy of that, just might be a > > little out of date on the last items. The formatting is also a little odd, > > but at least it's better than nothing if we can't get the original back. > > > > https://etherpad.openstack.org/p/neutron-ptg-temp > > > > -Brian > > > > -- > Slawek Kaplonski > Senior software engineer > Red Hat > > From lyarwood at redhat.com Tue Nov 12 15:09:08 2019 From: lyarwood at redhat.com (Lee Yarwood) Date: Tue, 12 Nov 2019 15:09:08 +0000 Subject: [ptg][nova][cinder] x-p meeting minutes In-Reply-To: References: Message-ID: <20191112150908.w2of3q5mpznm42dl@lyarwood.usersys.redhat.com> On 12-11-19 15:07:05, Sylvain Bauza wrote: > Excerpts taken from https://etherpad.openstack.org/p/shanghai-ptg-cinder > We discussed two items : > > 1/ Improving replication > > - When a failover happens volumes are no longer usable in Nova. > > > - Nova should start re-attaching volumes after a failover? > > > - Why can't we detach and attach the volume? Data that is in flight > would be lost. > > > - Question about boot from volume. In that case the instance is dead > anyway because access to the volume has been lost. > > > - Could go through the shutdown, detach, attach, reboot path. > > > - Problem is that detach is going to fail. Need to force it or handle > the failure. > > > - We aren't sure that Nova will allow a detach of a boot volume. > > > - Also don't currently have a force detach API. > > > - AGREE: Need to figure out how to pass the force to os-brick to detach > volume and when rebooting a volume. I started looking at this a while ago in the review below: libvirt: Wire up a force disconnect_volume flag https://review.opendev.org/#/c/584849/ I'll restore and see if it's still valid for the above. > 2/ Nova bug for images created from encrypted volumes > > - Nova is not creating a new key for encrypted images but the deletion > policy metadata can allow a key to be deleted wehn it is still in use by > other images or even volumes > > > - Nova needs to clone the keys when doing create image. > > > - The nova team thinks that we have found a bug that needs to be fixed. > We just need to open a bug. > > > - ACTION Cinder to open a bug against Nova. (can you also ping lyarwood > when it's open?) This has been created below: Possible data loss from createImage action https://bugs.launchpad.net/nova/+bug/1852106 As discussed in the bug the actual use case of booting an instance from an encrypted image that itself was created from an encrypted volume has never worked. I'm also confused by the use of cinder_ specific image properties here but this may be because this use case was never intended to be supported? Anyway, I've ended up looking at the ephemeral encryption support in Nova's Libvirt driver for most of the day and honestly it needs to be deprecated and removed as it's never going to be able to handle anything like this. The current implementation using a single per instance key for all disks when use cases like this and the encrypted image spec below require per disk keys: Spec for the Nova part of Image Encryption https://review.opendev.org/#/c/608696/ -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From amotoki at gmail.com Tue Nov 12 15:29:17 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Wed, 13 Nov 2019 00:29:17 +0900 Subject: [all][doc] Patches to add --keep-going to sphinx-build (and patches proposed to many many repositories) Message-ID: Hi, As you may notice, we see a lot of patches which try to add --keep-going to sphinx-build. [0] I have suggestions and questions. 1) First, when reviewing them, keep the following in your mind. * --keep-going is added even when -W option is not used in the sphinx-build command line. -W is recommended in the PTI [1], so ensure to have -W. * Some of them ignores cases where "python setup.py build_sphinx" is still used. If it is a good chance to clean them up and use "sphinx-build" consistently. 2) Why do we need to remove the build directory for releasenotes? Some of them propose to add "rm -rf releasenotes/build"? (for example [2]) I cannot understand why this needs to be added. Do we really want to call "rm -rf "? I know it is needed in some repositories due to various reasons, but generally speaking it make the documentation build longer and they are unnecessary. I tried to get the reason from the authors in reviews but they just say it is simple and it can be added at the same time. Thus, I would like to ask it more broadly in the list. 3) What is the recommended way to get a consensus to make this kind of patches which affects many many repositories? It is not productive to ask questions in individual reviews. Some patches are approved fast and questions are pop-up in other patches. It makes it difficult to discuss the real needs and keep many repositories consistent. I am not against all of these changes, but I would like to see more organized way. Thought? Thanks, Akihiro [0] https://review.opendev.org/#/q/message:keeping [1] https://governance.openstack.org/tc/reference/project-testing-interface.html#documentation [2] https://review.opendev.org/#/c/690956/2/tox.ini From sbauza at redhat.com Tue Nov 12 15:38:38 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 16:38:38 +0100 Subject: [nova][ptg] Resurrecting NUMA topology in Placement In-Reply-To: References: Message-ID: On Tue, Nov 12, 2019 at 2:30 PM Sean Mooney wrote: > On Tue, 2019-11-12 at 13:26 +0000, Sean Mooney wrote: > > On Tue, 2019-11-12 at 11:29 +0100, Sylvain Bauza wrote: > > > We discussed about a long known story > > > https://review.openstack.org/#/c/552924/ > > > > > > The whole agreement during the PTG was to keep things simple with baby > > > steps : > > > - only supporting a few NUMA queries and defer others as unsupported > (still > > > supported by legacy NUMATopologyFilter) > > > - The to-be-resurrected spec would be only focus on VCPU/PCPU/MEMORY_MB > > > resource classes and not handle PCI or GPU devices (ie. level-1 tree, > no > > > children under the NUMA RPs) > > > > > > Agreement was also there for saying that functional tests should be > enough > > > since PlacementFixture already works perfectly. > > > > we can now do numa testing in the gate so we can also add tempest > testing this cycle. > > artom has recently gotten whitebox to run (more work to do > https://review.opendev.org/#/c/691062/) > > and i do want to get my multi numa nfv testing job > https://review.opendev.org/#/c/679656/ at least > > in experimental and perodic pipelines. i would like it to be in check > eventually but baby steps. > > > > i dont think these should be a blocker for any of this work but i think > we shoudl take advantage > > of them to contiue to improve the numa testing in gate. > by gate i ment ci/ > Yup, I understood and we also discussed this possibility at the PTG. To be clear, that would be nice to get Tempest tests on a specific job that'd verify this, but this shouldn't be a blocker. > > > > > > > -S > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Tue Nov 12 15:44:10 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 12 Nov 2019 16:44:10 +0100 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> Message-ID: On Mon, Nov 11, 2019 at 4:05 PM Dan Smith wrote: > > Sharding with and/or within cells will help to some degree (and we are > > actively looking into this as you probably know), but I think that > > should not stop us from checking if there are algorithmic improvements > > (e.g. when collecting the data), or if moving to a different locking > > granularity or even parallelising the update are feasible additional > > improvements. > > All of that code was designed around one node per compute host. In the > ironic case it was expanded (hacked) to support N where N is not > huge. Giving it a huge number, and using a driver where nodes go into > maintenance/cleaning for long periods of time is asking for trouble. > > Given there is only one case where N can legitimately be greater than > one, I'm really hesitant to back a proposal to redesign it for large > values of N. > > Perhaps we as a team just need to document what sane, tested, and > expected-to-work values for N are? > > What we discussed at the PTG was the fact that we only have one global semaphore for this module but we have N ResourceTracker python objects (where N is the number of Ironic nodes per compute service). As per CERN, it looks this semaphore blocks when updating periodically so we basically said it could only be a bugfix given we could create N semaphores instead. That said, as it could have some problems, we want to make sure we can test the change not only by the gate but also directly by CERN. Another discussion was about having more than one thread for the compute service (ie. N threads) but my opinion was that we should first look at the above before discussing about any other way. -S --Dan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Tue Nov 12 15:49:06 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 12 Nov 2019 09:49:06 -0600 Subject: [all][doc] Patches to add --keep-going to sphinx-build (and patches proposed to many many repositories) In-Reply-To: References: Message-ID: <744149ba-bb9f-dd95-e8a6-5330e81d444c@gmail.com> On 11/12/2019 9:29 AM, Akihiro Motoki wrote: > I am not against all of these changes, but I would like to see more > organized way. > > Thought? This looks like the basic "find a maybe not so controversial change and spam it all over every repo to pad contribution stats" pattern to me. Marginally better than fixing typos or URLs which is the usual thing being proposed in these types of changes. -- Thanks, Matt From mihalis68 at gmail.com Tue Nov 12 15:55:22 2019 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 12 Nov 2019 10:55:22 -0500 Subject: [ops] next ops meetup : London 7,8 Jan 2020 - approved! Message-ID: https://twitter.com/osopsmeetup/status/1194281816468938752?s=20 -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From moreira.belmiro.email.lists at gmail.com Tue Nov 12 16:06:17 2019 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Tue, 12 Nov 2019 17:06:17 +0100 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: <7f98ad40-bcf0-bb85-da58-d6a7a1f71705@gmail.com> References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> <7f98ad40-bcf0-bb85-da58-d6a7a1f71705@gmail.com> Message-ID: Hi, using several cells for the Ironic deployment would be great however it doesn't work with the current architecture. The nova ironic driver gets all the nodes available in Ironic. This means that if we have several cells all of them will report the same nodes! The other possibility is to have a dedicated Ironic instance per cell, but in this case it will be very hard to manage a large deployment. What we are trying is to shard the ironic nodes between several nova-computes. nova/ironic deployment supports several nova-computes and it will be great if the RT nodes cycle is sharded between them. But anyway, this will also require speeding up the big lock. It would be great if a compute node can handle more than 500 nodes. Considering our use case: 15k/500 = 30 compute nodes. Belmiro CERN On Mon, Nov 11, 2019 at 9:13 PM Matt Riedemann wrote: > On 11/11/2019 7:03 AM, Chris Dent wrote: > > Or using > > separate processes? For the ironic and vsphere contexts, increased > > CPU usage by the nova-compute process does not impact on the > > workload resources, so parallization is likely a good option. > > I don't know how much it would help - someone would have to actually > test it out and get metrics - but one easy win might just be using a > thread or process executor pool here [1] so that N compute nodes could > be processed through the update_available_resource periodic task > concurrently, maybe $ncpu or some factor thereof. By default make it > serialized for backward compatibility and non-ironic deployments. Making > that too highly concurrent could have negative impacts on other things > running on that host, like the neutron agent, or potentially storming > conductor/rabbit with a ton of DB requests from that compute. > > That doesn't help with the scenario that the big > COMPUTE_RESOURCE_SEMAPHORE lock is held by the periodic task while > spawning, moving, or deleting an instance that also needs access to the > big lock to update the resource tracker, but baby steps if any steps in > this area of the code would be my recommendation. > > [1] > https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629 > > -- > > Thanks, > > Matt > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dms at danplanet.com Tue Nov 12 16:12:49 2019 From: dms at danplanet.com (Dan Smith) Date: Tue, 12 Nov 2019 08:12:49 -0800 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: (Belmiro Moreira's message of "Tue, 12 Nov 2019 17:06:17 +0100") References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> <7f98ad40-bcf0-bb85-da58-d6a7a1f71705@gmail.com> Message-ID: > Hi, using several cells for the Ironic deployment would be great > however it doesn't work with the current architecture. The nova > ironic driver gets all the nodes available in Ironic. This means that > if we have several cells all of them will report the same nodes! The > other possibility is to have a dedicated Ironic instance per cell, but > in this case it will be very hard to manage a large deployment. That's a problem for more reasons than just your scale. However, doesn't this solve that problem? https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/ironic-conductor-groups.html --Dan From moreira.belmiro.email.lists at gmail.com Tue Nov 12 16:27:31 2019 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Tue, 12 Nov 2019 17:27:31 +0100 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> <7f98ad40-bcf0-bb85-da58-d6a7a1f71705@gmail.com> Message-ID: Dan Smith just point me the conductor groups that were added in Stein. https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/ironic-conductor-groups.html This is an interesting way to partition the deployment much better than the multiple nova-computes setup. Thanks, Belmiro CERN On Tue, Nov 12, 2019 at 5:06 PM Belmiro Moreira < moreira.belmiro.email.lists at gmail.com> wrote: > Hi, > using several cells for the Ironic deployment would be great however it > doesn't work with the current architecture. > The nova ironic driver gets all the nodes available in Ironic. This means > that if we have several cells all of them will report the same nodes! > The other possibility is to have a dedicated Ironic instance per cell, but > in this case it will be very hard to manage a large deployment. > > What we are trying is to shard the ironic nodes between several > nova-computes. > nova/ironic deployment supports several nova-computes and it will be great > if the RT nodes cycle is sharded between them. > > But anyway, this will also require speeding up the big lock. > It would be great if a compute node can handle more than 500 nodes. > Considering our use case: 15k/500 = 30 compute nodes. > > Belmiro > CERN > > > > On Mon, Nov 11, 2019 at 9:13 PM Matt Riedemann > wrote: > >> On 11/11/2019 7:03 AM, Chris Dent wrote: >> > Or using >> > separate processes? For the ironic and vsphere contexts, increased >> > CPU usage by the nova-compute process does not impact on the >> > workload resources, so parallization is likely a good option. >> >> I don't know how much it would help - someone would have to actually >> test it out and get metrics - but one easy win might just be using a >> thread or process executor pool here [1] so that N compute nodes could >> be processed through the update_available_resource periodic task >> concurrently, maybe $ncpu or some factor thereof. By default make it >> serialized for backward compatibility and non-ironic deployments. Making >> that too highly concurrent could have negative impacts on other things >> running on that host, like the neutron agent, or potentially storming >> conductor/rabbit with a ton of DB requests from that compute. >> >> That doesn't help with the scenario that the big >> COMPUTE_RESOURCE_SEMAPHORE lock is held by the periodic task while >> spawning, moving, or deleting an instance that also needs access to the >> big lock to update the resource tracker, but baby steps if any steps in >> this area of the code would be my recommendation. >> >> [1] >> >> https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629 >> >> -- >> >> Thanks, >> >> Matt >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim at jimrollenhagen.com Tue Nov 12 16:44:47 2019 From: jim at jimrollenhagen.com (Jim Rollenhagen) Date: Tue, 12 Nov 2019 11:44:47 -0500 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> <7f98ad40-bcf0-bb85-da58-d6a7a1f71705@gmail.com> Message-ID: On Tue, Nov 12, 2019 at 11:38 AM Belmiro Moreira < moreira.belmiro.email.lists at gmail.com> wrote: > Dan Smith just point me the conductor groups that were added in Stein. > > https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/ironic-conductor-groups.html > This is an interesting way to partition the deployment much better than > the multiple nova-computes setup. > Just a note, they aren't mutually exclusive. You can run multiple nova-computes to manage a single conductor group, whether for HA or because you're using groups for some other construct (cells, racks, halls, network zones, etc) which you want to shard further. // jim > Thanks, > Belmiro > CERN > > On Tue, Nov 12, 2019 at 5:06 PM Belmiro Moreira < > moreira.belmiro.email.lists at gmail.com> wrote: > >> Hi, >> using several cells for the Ironic deployment would be great however it >> doesn't work with the current architecture. >> The nova ironic driver gets all the nodes available in Ironic. This means >> that if we have several cells all of them will report the same nodes! >> The other possibility is to have a dedicated Ironic instance per cell, >> but in this case it will be very hard to manage a large deployment. >> >> What we are trying is to shard the ironic nodes between several >> nova-computes. >> nova/ironic deployment supports several nova-computes and it will be >> great if the RT nodes cycle is sharded between them. >> >> But anyway, this will also require speeding up the big lock. >> It would be great if a compute node can handle more than 500 nodes. >> Considering our use case: 15k/500 = 30 compute nodes. >> >> Belmiro >> CERN >> >> >> >> On Mon, Nov 11, 2019 at 9:13 PM Matt Riedemann >> wrote: >> >>> On 11/11/2019 7:03 AM, Chris Dent wrote: >>> > Or using >>> > separate processes? For the ironic and vsphere contexts, increased >>> > CPU usage by the nova-compute process does not impact on the >>> > workload resources, so parallization is likely a good option. >>> >>> I don't know how much it would help - someone would have to actually >>> test it out and get metrics - but one easy win might just be using a >>> thread or process executor pool here [1] so that N compute nodes could >>> be processed through the update_available_resource periodic task >>> concurrently, maybe $ncpu or some factor thereof. By default make it >>> serialized for backward compatibility and non-ironic deployments. Making >>> that too highly concurrent could have negative impacts on other things >>> running on that host, like the neutron agent, or potentially storming >>> conductor/rabbit with a ton of DB requests from that compute. >>> >>> That doesn't help with the scenario that the big >>> COMPUTE_RESOURCE_SEMAPHORE lock is held by the periodic task while >>> spawning, moving, or deleting an instance that also needs access to the >>> big lock to update the resource tracker, but baby steps if any steps in >>> this area of the code would be my recommendation. >>> >>> [1] >>> >>> https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629 >>> >>> -- >>> >>> Thanks, >>> >>> Matt >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From zbitter at redhat.com Tue Nov 12 16:47:06 2019 From: zbitter at redhat.com (Zane Bitter) Date: Tue, 12 Nov 2019 11:47:06 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> Message-ID: On 12/11/19 9:12 am, Corey Bryant wrote: > > On Mon, Nov 11, 2019 at 2:33 PM Zane Bitter > wrote: > > On 7/11/19 2:11 pm, Corey Bryant wrote: > > Hello TC members, > > > > Python 3.8 is available in Ubuntu Bionic now and while I > understand it's > > too late to enable voting py38 unit tests for ussuri, I'd like to at > > least enable non-voting py38 unit tests. This email is seeking > approval > > and direction from the TC to move forward with enabling > non-voting py38 > > tests. > > I was a bit fuzzy on this myself, so I looked it up and this is what > the > TC decided when we passed the resolution: > > > If the new Zuul template contains test jobs that were not in the > previous one, the goal champion(s) may choose to update the previous > template to add a non-voting check job (or jobs) to match the gating > jobs in the new template. This means that all repositories that have > not yet converted to the template for the upcoming release will see > a non-voting preview of the new job(s) that will be added once they > update. If this option is chosen, the non-voting job should be > limited to the master branch so that it does not run on the > preceding release’s stable branch. > > > Thanks for digging that up and explaining. I recall that wording and it > makes a lot more sense now that we have a scenario in front of us. > > (from > https://governance.openstack.org/tc/resolutions/20181024-python-update-process.html#unit-tests > > ) > > So to follow that process we would need to define the python versions > for V, then appoint a goal champion, and after that it would be at the > champion's discretion to add a non-voting job on master in Ussuri. I > happened to be sitting next to Sean when I saw this thread, and after > discussing it with him I think he would OK with having a non-voting job > on every commit, since it's what we have documented. Previous > discussions established that the overhead of adding one Python unit > test > job to every project was pretty inconsequential (we'll offset it by > dropping 2.7 jobs anyway). > > I submitted a draft governance patch defining the Python versions for V > (https://review.opendev.org/693743). Unfortunately we can't merge it > yet > because we don't have a release name for V (Sean is working on that: > https://review.opendev.org/693266). It's gazing in the crystal ball a > > > Thanks very much for getting that going. > > little bit, but even if for some reason Ubuntu 20.04 is not released > before the V cycle starts, it's inevitable that we will be selecting > Python 3.8 because it meets the first criterion ("The latest released > version of Python 3 that is available in any distribution we can > feasibly use for testing") - 3.8 is released and it's available in > Ubuntu 18.04, which is the distro we use for testing anyway. > > So, in my opinion, if you're volunteering to be the goal champion then > there's no need for any further approval by the TC ;) > > > Sure, I can champion that. Thanks! > Just to be clear, would that be Ussuri and V > python3-updates champion, similar to the following? > https://governance.openstack.org/tc/goals/selected/train/python3-updates.html Yes, for V it will be similar to that but s/train/v.../ only simpler because you already did the hard bits :) The goal champion for that is the one who gets to decide on adding the non-voting py38 job in Ussuri. For U the proposed goal is https://review.opendev.org/691178 - so it will both update the Zuul template from train->ussuri and drop the py27 job (the former is a prerequisite for the latter because of reasons - see https://review.opendev.org/688997). That one is a little more complicated because we also should drop Python 2 functional tests before we drop the py27 unit tests, and because things have to happen in a certain order (services before libraries). OTOH we're only dropping stuff in this release and not adding new voting jobs that could break. Currently gmann has listed himself as the champion for that, but I know he's looking for help (we can have multiple champions for a goal). Somebody somewhere already has an action item to ask you about it :) > Granted it's easier now that we mostly just have to switch the job > template to the new release. > > I guess to make that official we should commit the python3 update Goal > for the V cycle now... or at least as soon as we have a release name. > > > How far off do you think we are from having a V name? If just a few > weeks then I'm fine waiting but if over a month I'm more concerned. Sean's patch has the naming poll closing on 2019-12-16, and we have to wait for legal approval from the OSF after that. (Ideally we'd have started sooner, but we were entertaining proposals to change the process and there was kind of an assumption that we wouldn't be using the existing one again.) My take is that we shouldn't get too bureaucratic here. The criteria are well-defined so the outcome is not in doubt. There's no reason to delay until the patch is formally merged. We operate by lazy consensus, so if any TC members object they can reply to this thread. I'll flag it in IRC so people know about it. If there's no objections in the next week or say then the openstack-zuul-jobs team would be entitled to take that as approval. cheers, Zane. > This is happening a little earlier than I think we anticipated but, > given that there's no question what is going to happen in V, I don't > think we'd be doing anybody any favours by delaying the process > unnecessarily. > > > I agree. And Python 3.9 doesn't release until Oct 2020 so that won't be > in the picture for Ussuri or V. > > > > For some further background: The next release of Ubuntu, Focal > (20.04) > > LTS, is scheduled to release in April 2020. Python 3.8 will be the > > default in the Focal release, so I'm hopeful that non-voting unit > tests > > will help close some of the gap. > > > > I have a review here for the zuul project template enablement for > ussuri: > > https://review.opendev.org/#/c/693401 > > > > Also should this be updated considering py38 would be non-voting? > > https://governance.openstack.org/tc/reference/runtimes/ussuri.html > > No, I don't think this changes anything for Ussuri. It's preparation > for V. > > > Ok. Appreciate all the input and help. > > Thanks, > Corey From dpeacock at redhat.com Tue Nov 12 17:21:03 2019 From: dpeacock at redhat.com (David Peacock) Date: Tue, 12 Nov 2019 12:21:03 -0500 Subject: [heat] Addressing the large patch backlog. Message-ID: Hi all, For those interested in the workings of the Heat project, I'd like to kick off a call to action. At the time of writing there are approximately 200 open patches against the core Heat project repo alone, not counting the other Heat repos. Recently I started going through and triaging the patches I'd consider "historical" with an arbitrary cut off for this definition of August 1st of this year. There are 148 patches which meet this definition, dating all the way back to 2015. I have gone through them all and placed them into a spreadsheet [0] which I'd invite you all to check. Provided is a link to the patch in question, initial upload date, last meaningful update date, primary author, and a high level summary of the patch. Additionally I've broken the patches down into three recommended states based on a high level first pass. *Abandon* 34 patches are candidates to be abandoned; they usually are of debatable utility, have significant outstanding concerns, or have no followup from the original developer in a very long time. In many cases, all of these conditions. *Without good reason or explanation from the original developer, these patches may ultimately be cleared out.* *Rebase + Merge* 38 patches are with a high level look in reasonably good shape, perform a stated goal, and may be trivial to core review and ultimately rebase and merge. *If you're the original developer or otherwise interested in these patches and wish to see them through the merge process, please rebase the patch.* *Research* 76 patches are sufficiently complex that they'll need a much closer look. Some of these patches are in a seemingly "finished" state, some are a way off. Some have unanswered concerns from core review and have been left dangling. *If you're the original developer or otherwise interested in working these patches through to completion, please do get involved.* When I started this little mission I wasn't quite sure what to expect. What I have found is that as much as there was anticipated cruft to clear out, there is a great deal of very good work lurking here, waiting to see the light of day, and it would be so good to see this work realised. :-) If you have anything to say, feel free to write back on list, and if you'd like to coordinate with me any efforts with these patches I can be found by email or on Freenode in the #heat channel; I'm dpeacock. Based on feedback of this idea, and indeed on each individual patch, I hope we can get this backlog under control, and harvest some of this excellent code! Thank you, David Peacock [0] https://ethercalc.openstack.org/b3qtqyhkg9g1 Please be mindful of accidental edits. -------------- next part -------------- An HTML attachment was scrubbed... URL: From moreira.belmiro.email.lists at gmail.com Tue Nov 12 17:35:49 2019 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Tue, 12 Nov 2019 18:35:49 +0100 Subject: [nova][ironic][ptg] Resource tracker scaling issues In-Reply-To: References: <1573404293.31166.9@est.tech> <1353234f-53a0-eb17-205b-5fe13b05cec4@gmail.com> <7f98ad40-bcf0-bb85-da58-d6a7a1f71705@gmail.com> Message-ID: Great! Thanks Jim. I will later report our experience with conductor groups. Belmiro CERN On Tue, Nov 12, 2019 at 5:58 PM Jim Rollenhagen wrote: > > > On Tue, Nov 12, 2019 at 11:38 AM Belmiro Moreira < > moreira.belmiro.email.lists at gmail.com> wrote: > >> Dan Smith just point me the conductor groups that were added in Stein. >> >> https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/ironic-conductor-groups.html >> This is an interesting way to partition the deployment much better than >> the multiple nova-computes setup. >> > > Just a note, they aren't mutually exclusive. You can run multiple > nova-computes to manage a single conductor group, whether for HA or because > you're using groups for some other construct (cells, racks, halls, network > zones, etc) which you want to shard further. > > // jim > > >> Thanks, >> Belmiro >> CERN >> >> On Tue, Nov 12, 2019 at 5:06 PM Belmiro Moreira < >> moreira.belmiro.email.lists at gmail.com> wrote: >> >>> Hi, >>> using several cells for the Ironic deployment would be great however it >>> doesn't work with the current architecture. >>> The nova ironic driver gets all the nodes available in Ironic. This >>> means that if we have several cells all of them will report the same nodes! >>> The other possibility is to have a dedicated Ironic instance per cell, >>> but in this case it will be very hard to manage a large deployment. >>> >>> What we are trying is to shard the ironic nodes between several >>> nova-computes. >>> nova/ironic deployment supports several nova-computes and it will be >>> great if the RT nodes cycle is sharded between them. >>> >>> But anyway, this will also require speeding up the big lock. >>> It would be great if a compute node can handle more than 500 nodes. >>> Considering our use case: 15k/500 = 30 compute nodes. >>> >>> Belmiro >>> CERN >>> >>> >>> >>> On Mon, Nov 11, 2019 at 9:13 PM Matt Riedemann >>> wrote: >>> >>>> On 11/11/2019 7:03 AM, Chris Dent wrote: >>>> > Or using >>>> > separate processes? For the ironic and vsphere contexts, increased >>>> > CPU usage by the nova-compute process does not impact on the >>>> > workload resources, so parallization is likely a good option. >>>> >>>> I don't know how much it would help - someone would have to actually >>>> test it out and get metrics - but one easy win might just be using a >>>> thread or process executor pool here [1] so that N compute nodes could >>>> be processed through the update_available_resource periodic task >>>> concurrently, maybe $ncpu or some factor thereof. By default make it >>>> serialized for backward compatibility and non-ironic deployments. >>>> Making >>>> that too highly concurrent could have negative impacts on other things >>>> running on that host, like the neutron agent, or potentially storming >>>> conductor/rabbit with a ton of DB requests from that compute. >>>> >>>> That doesn't help with the scenario that the big >>>> COMPUTE_RESOURCE_SEMAPHORE lock is held by the periodic task while >>>> spawning, moving, or deleting an instance that also needs access to the >>>> big lock to update the resource tracker, but baby steps if any steps in >>>> this area of the code would be my recommendation. >>>> >>>> [1] >>>> >>>> https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629 >>>> >>>> -- >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From corey.bryant at canonical.com Tue Nov 12 17:46:30 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Tue, 12 Nov 2019 12:46:30 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> Message-ID: On Tue, Nov 12, 2019 at 11:47 AM Zane Bitter wrote: > On 12/11/19 9:12 am, Corey Bryant wrote: > > > > On Mon, Nov 11, 2019 at 2:33 PM Zane Bitter > > wrote: > > > > On 7/11/19 2:11 pm, Corey Bryant wrote: > > > Hello TC members, > > > > > > Python 3.8 is available in Ubuntu Bionic now and while I > > understand it's > > > too late to enable voting py38 unit tests for ussuri, I'd like to > at > > > least enable non-voting py38 unit tests. This email is seeking > > approval > > > and direction from the TC to move forward with enabling > > non-voting py38 > > > tests. > > > > I was a bit fuzzy on this myself, so I looked it up and this is what > > the > > TC decided when we passed the resolution: > > > > > If the new Zuul template contains test jobs that were not in the > > previous one, the goal champion(s) may choose to update the previous > > template to add a non-voting check job (or jobs) to match the gating > > jobs in the new template. This means that all repositories that have > > not yet converted to the template for the upcoming release will see > > a non-voting preview of the new job(s) that will be added once they > > update. If this option is chosen, the non-voting job should be > > limited to the master branch so that it does not run on the > > preceding release’s stable branch. > > > > > > Thanks for digging that up and explaining. I recall that wording and it > > makes a lot more sense now that we have a scenario in front of us. > > > > (from > > > https://governance.openstack.org/tc/resolutions/20181024-python-update-process.html#unit-tests > > > > ) > > > > So to follow that process we would need to define the python versions > > for V, then appoint a goal champion, and after that it would be at > the > > champion's discretion to add a non-voting job on master in Ussuri. I > > happened to be sitting next to Sean when I saw this thread, and after > > discussing it with him I think he would OK with having a non-voting > job > > on every commit, since it's what we have documented. Previous > > discussions established that the overhead of adding one Python unit > > test > > job to every project was pretty inconsequential (we'll offset it by > > dropping 2.7 jobs anyway). > > > > I submitted a draft governance patch defining the Python versions > for V > > (https://review.opendev.org/693743). Unfortunately we can't merge it > > yet > > because we don't have a release name for V (Sean is working on that: > > https://review.opendev.org/693266). It's gazing in the crystal ball > a > > > > > > Thanks very much for getting that going. > > > > little bit, but even if for some reason Ubuntu 20.04 is not released > > before the V cycle starts, it's inevitable that we will be selecting > > Python 3.8 because it meets the first criterion ("The latest released > > version of Python 3 that is available in any distribution we can > > feasibly use for testing") - 3.8 is released and it's available in > > Ubuntu 18.04, which is the distro we use for testing anyway. > > > > So, in my opinion, if you're volunteering to be the goal champion > then > > there's no need for any further approval by the TC ;) > > > > > > Sure, I can champion that. > > Thanks! > > > Just to be clear, would that be Ussuri and V > > python3-updates champion, similar to the following? > > > https://governance.openstack.org/tc/goals/selected/train/python3-updates.html > > Yes, for V it will be similar to that but s/train/v.../ only simpler > because you already did the hard bits :) The goal champion for that is > the one who gets to decide on adding the non-voting py38 job in Ussuri. > > Alright I'll definitely sign up to be champion for V. For U the proposed goal is https://review.opendev.org/691178 - so it > will both update the Zuul template from train->ussuri and drop the py27 > job (the former is a prerequisite for the latter because of reasons - > see https://review.opendev.org/688997). That one is a little more > complicated because we also should drop Python 2 functional tests before > we drop the py27 unit tests, and because things have to happen in a > certain order (services before libraries). OTOH we're only dropping > stuff in this release and not adding new voting jobs that could break. > Currently gmann has listed himself as the champion for that, but I know > he's looking for help (we can have multiple champions for a goal). > Somebody somewhere already has an action item to ask you about it :) > > For Ussuri, I'll get in touch with gmann and see where we can help. > > Granted it's easier now that we mostly just have to switch the job > > template to the new release. > > > > I guess to make that official we should commit the python3 update > Goal > > for the V cycle now... or at least as soon as we have a release name. > > > > > > How far off do you think we are from having a V name? If just a few > > weeks then I'm fine waiting but if over a month I'm more concerned. > > Sean's patch has the naming poll closing on 2019-12-16, and we have to > wait for legal approval from the OSF after that. (Ideally we'd have > started sooner, but we were entertaining proposals to change the process > and there was kind of an assumption that we wouldn't be using the > existing one again.) > > My take is that we shouldn't get too bureaucratic here. The criteria are > well-defined so the outcome is not in doubt. There's no reason to delay > until the patch is formally merged. We operate by lazy consensus, so if > any TC members object they can reply to this thread. I'll flag it in IRC > so people know about it. If there's no objections in the next week or > say then the openstack-zuul-jobs team would be entitled to take that as > approval. > That works for me. I'll check back in a week if nothing else comes up. Thanks, Corey > cheers, > Zane. > > > This is happening a little earlier than I think we anticipated but, > > given that there's no question what is going to happen in V, I don't > > think we'd be doing anybody any favours by delaying the process > > unnecessarily. > > > > > > I agree. And Python 3.9 doesn't release until Oct 2020 so that won't be > > in the picture for Ussuri or V. > > > > > > > For some further background: The next release of Ubuntu, Focal > > (20.04) > > > LTS, is scheduled to release in April 2020. Python 3.8 will be the > > > default in the Focal release, so I'm hopeful that non-voting unit > > tests > > > will help close some of the gap. > > > > > > I have a review here for the zuul project template enablement for > > ussuri: > > > https://review.opendev.org/#/c/693401 > > > > > > Also should this be updated considering py38 would be non-voting? > > > > https://governance.openstack.org/tc/reference/runtimes/ussuri.html > > > > No, I don't think this changes anything for Ussuri. It's preparation > > for V. > > > > > > Ok. Appreciate all the input and help. > > > > Thanks, > > Corey > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shambhu.mcp at hotmail.com Mon Nov 11 05:37:27 2019 From: shambhu.mcp at hotmail.com (SHAMBHU KUMAR) Date: Mon, 11 Nov 2019 05:37:27 +0000 Subject: deployment failed Message-ID: Dear Sir/mam i'm facing the issue while deployment of triple o no valid host found error code 500 Can you please eloborate in this matter because i'm stuck heere Your support will be highly appriciated.. From zhaowx at jxresearch.com Tue Nov 12 04:00:06 2019 From: zhaowx at jxresearch.com (zhaowx at jxresearch.com) Date: Tue, 12 Nov 2019 12:00:06 +0800 Subject: How to set password Message-ID: <201911121200054491741@jxresearch.com>+E418130A170D0D4F hello: When I use `trovestack build-image`, how to set password for the image? thanks zhaowx at jxresearch.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From whayutin at redhat.com Tue Nov 12 17:13:39 2019 From: whayutin at redhat.com (Wesley Hayutin) Date: Tue, 12 Nov 2019 10:13:39 -0700 Subject: [tripleo][ptg] Message-ID: Greetings, Thanks for all those Red Hatters who attended the OpenStack Summit and PTG in Shanghai! A special thanks to those who presented their topics and discussed work items with the folks in attendance. As the current PTL for TripleO I will do my best here to summarize those conversations and items others should be made aware of. Over the course of Thursday and Friday roughly 7-10 folks discussed the identified topics [1], at the following times with my raw notes attached [2]. My apologies if I did not accurately represent your topic here, please feel free to correct me. Thursday, Giulio Fidente: Edge ( SDS storage ) Guilio walked us through some background work with regards to support for storage in remote sites / edge deployments. Working through support for Cinder was straight forward enough with no real collaboration required. Support for ceph copy on write for nova guests was also added with the glance image added to remote sites. Where Guilio needed input was with regards to having change the ctrl plane config for glance for each remote site [3]. This ctrl plane update would force operators to put the cloud in maintenance mode for a stack update. It was determined this could not be avoided at this time. It was noted that the TripleO simplification project and rework puppet-apply, please help us achieve that by reviewing the following two topics [4][5]. Thanks Giulio! Thursday, Guilio Fidente: Virtual IP / Storage Guilio walked us through some challenges with hosting a shared file system on remote/edge sites using manilla. The idea was to use Ganesha translation with CephFS. The proposal was that Ganesha and pacemaker would be managed in the ctrl plane but there was a question with regards to the virtual ip on edge sites. This was an interesting conversation that ended up with a suggestion from Kevin Carter to use a host-only local route on the edge to properly route the ip. This seemed to everyone to be a very clever solution to the problem :) Thanks Guilio, Thanks Kevin! Thursday, Martin Schuppert: Nova CellV2 multicell Martin walked the group through the current status and plans for the multicell implementation. Background: Nova multicells are used to help scale a cloud and partition it in such a way to get the messaging queue closer to the compute cell, essentially rabbit, galera, collector, vnc proxy and a number of compute nodes. This architecture is already in use but with only one default cell, pike was the switch to cellv2. The work started in Stein and continued through train using a similar approach as DCN. Some of the specs are that there is one cell per stack that is initially created from an export of the central stack, more ansible is place for the deployment as well. Two different architectures were noted, all cells in one heat stack [6], and one that splits the cell controllers and computes into different heat stacks w/ multiple stacks on the edge sites [7]. The development work for updates is complete and upgrades is still a WIP. Plans for the future included integrating TLS everywhere and enabling storage in the cell ( cinder, ceph, glance). Tony Breeds pointed out this architecture should just work in multiarch but would like the teams help in designing / advice while creating a test environment. Please review the following patches [13] Thanks Martin!! We tried to get more folks to switch to their topics to Thursday but were not able to. On to Friday. Friday, Edge ( DCN ) roadmap: David Paterson This conversation was informally walked through Thursday mainly with Arkady and Guilio and was followed up on Friday with a joint edge session regarding booting edge nodes. Several questions were raised on Thursday regarding the networking and connectivity for edge sites as it relates to provisioning. Validations were discussed as a way to address the minimum requirements for booting edge nodes. David did not end up presenting here, but was available at the joint session. See the “edge booting” section later in the document for details. Friday, Backup and Restore: Carlos Camacho The project started in Newton. Initially the backup consisted of a database dump and files being backed up for a defined set of use cases. In the field it was discovered that customers had many different kinds of deployments and the feature did not work well for all the intended use cases. An improved plan included to move to full disk image backups utilizing REAR [8]. Carlos also noted that customers are now trying to use ( or misuse ) this feature to perform baremetal to virt migrations. One of the current issues with the current solution include that it’s not clear how services behave after backup and restore.. E.G. OSD mons. Wes Hayutin noted that we have an opportunity to test the full image backup and restore solution by moving to a more image based internal CI system currently being designed by Jesse Pretorious and others. Thanks Carlos!! Friday, Failure Domains: Kevin Carter Unfortunately Kevin was in high demand across PTG events and was unable to present this topic. This should be discussed in a mid-cycle ( virtual or in person ) and written up as a blueprint. Essentially Kevin is proposing in large deployments to allow some number or percentage of nodes to fail a deployment while not failing the entire deployment. If a few non-critical nodes fail a large scale deployment TripleO should be better able to handle that, report back and move on. It was pointed out to me there is a related customer bug as well. Thanks Kevin!! Friday, Cross project: Edge Booting: Julia Kreger You can find notes on this session here [9]. I will only summarize the questions proposed on earlier edge ( DCN ) topic. With regards to when does TripleO need to support redfish there was no immediate or extremely urgent requests ( please correct me if I do not have the correct information there). Redfish IMHO did seem to be a nice improvement when compared to IPMI. This was my first introduction to Redfish, and I of course curious what steps we had to take in order to CI it. Luckily after doc diving I found several helpful links that include steps with setting Redfish up with our own OVB tooling ( hooray \0/ ). Links can be found here [10], and it seems like others have done some hard work to make that possible so thank you!! Thank you Julia!! Friday, Further TripleO Ansible Integration: Wes Hayutin The idea here would be allow the TripleO project to govern how TripleO is deployed with ansible as an operator. The TripleO project would ship ansible modules and roles that directly import python-tripleoclient to support ansible to cli parity [12]. Using or modeling a new repo, perhaps called tripleo-operator-ansible would be used to host these modules and roles and include the same requirements and features of tripleo-ansible’s linting, molecule tests, and auto documentation. This could tie in well with an initiative from Dan Macpherson to ship Ansible playbooks as part of our OSP documentation. Julia Kreger noted that we should not ignore the Ansible OpenStackSDK for part of the deployment process which is a very valid point. Most everyone at the PTG agreed this was a good direction moving forward and would help consolidate the public and internal tooling around TripleO’s CLI in ansible. Thanks Dan, Julia!! Friday, TLS-E standalone gate: Ade Lee Ade Lee walked us through a proposal to help test and CI TLS upstream which has been very difficult to date ( I can personally vouch for this ). Using a two node setup upstream with one node as the IPA server and the other a TripleO standalone deployment. The keystone team is setting the right example for other projects and teams that are finding it difficult to keep outside patches from breaking their code, and that is to find a way to get something voting and gating upstream even if it’s not installed and deployed in the same exact ways customers may use it. Please help by reviewing the keystone / security teams patches here [14] Thanks Ade!! Friday, Octavia tempest plugin support in TripleO-CI: Chandan Kumar Chandan was off fighting battles with the infra team and other projects. Here are some of his notes: Have a rdo third party standalone job with full octavia tempest triggered against octavia patches from stein onwards. FS062 Look into multinode job for queens releases as a third party from queens and rocky. FS038 Have a support of octavia tempest plugin in os_tempest. We certainly should have a conversation offline regarding these topics. I’ll note the TripleO-CI community meeting or the #tripleo meeting on Tuesdays are a good way to continue collaborating here. Thanks Chandan!! Friday, Replace clouds.yaml with an openrc ansible module: Chandan Kumar Open Question: is this module [15] from the openstack-ansible project something we can reuse in TripleO via tripleo-ansible? Friday, Zuul jobs and ansible roles to handle rpm packaging: Chandan Kumar The background and context can be found here https://pagure.io/zuul-distro-jobs - collection of ansible roles to deal with rpm packaging https://opendev.org/openstack/tripleo-quickstart-extras/src/branch/master/roles/build-test-packages -> creating rpms from different projects depends-on in commit message Proposal: Move it to zuul-jobs Making the rpm packaging more generic for centos/fedora/rhel and move it to zuul jobs * Move mock and rpmbuild related roles to zuul jobs repo * Adding a mention of third party zuul jobs to main zuul jobs doc * Build-set_registry setup http server and start the job * Details are here Thanks Chandan!! This is indeed a very interesting and powerful proposal. We should definitely continue this conversation with the broader community. Did you make it all the way down here? Well done! I should add an easter egg :) Links: [1] https://etherpad.openstack.org/p/tripleo-ussuri-topics [2] https://etherpad.openstack.org/p/tripleo-ptg-ussuri [3] https://blueprints.launchpad.net/tripleo/+spec/split-controlplane-glance-cache [4 ] https://review.opendev.org/#/q/topic:disable/paunch+(status:open+OR+status:merged) [5] https://review.opendev.org/#/q/topic:deconstruct/container-puppet+(status:merged) [6] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deploy_cellv2_basic.html [7] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deploy_cellv2_advanced.html https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deploy_cellv2_routed.html [8] https://access.redhat.com/solutions/2115051 [9] https://etherpad.openstack.org/p/PVG-ECG-PTG [10] https://github.com/openstack/sushy https://docs.openstack.org/sushy/latest/contributor/index.html#contributing https://docs.openstack.org/sushy-tools/latest/ https://docs.openstack.org/sushy-tools/latest/user/dynamic-emulator.html#systems-resource-driver-openstack [12] https://hackmd.io/caRlGha7SueZxDRcyq9eGA?both [13] https://review.opendev.org/#/q/topic:cellv2+(status:open+OR+status:merged) [14] https://review.opendev.org/#/q/status:open+project:openstack/tripleo-heat-templates+branch:master+topic:add_standalone_tls [15] https://opendev.org/openstack/openstack-ansible-openstack_openrc [16] https://etherpad.openstack.org/p/PVG-keystone-forum-policy [17 https://datko.pl/zuul.pdf [18] https://github.com/openstack/tripleo-heat-templates/blob/master/README.rst#service-testing-matrix [19] https://github.com/openstack/openstack-virtual-baremetal Thanks all!! Wes Hayutin TripleO-PTL -------------- next part -------------- An HTML attachment was scrubbed... URL: From Albert.Braden at synopsys.com Tue Nov 12 19:42:31 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Tue, 12 Nov 2019 19:42:31 +0000 Subject: Scheduler sends VM to HV that lacks resources Message-ID: If I create 20 VMs at once, at least one of them fails with "Exceeded maximum number of retries." When I look at the logs I see that the scheduler sent the VM to a host that doesn't have enough CPU "Free vcpu 14.00 VCPU < requested 16 VCPU." https://paste.fedoraproject.org/paste/6N3wcDzlbNQgj6hRApHiDQ I thought that this must be caused by a race condition, so I stopped the scheduler and conductor on 2 controllers, and then created 20 more VMs. Now I see the logs only on controller 3, and some of the failures are now saying "Unable to establish connection to " but I still see the single scheduler sending VMs to a host that lacks resources "Free vcpu 14.00 VCPU < requested 16 VCPU." https://paste.fedoraproject.org/paste/lGlVpfbB9C19mMzrWQcHCQ I'm looking at my nova.conf but don't see anything misconfigured. My filters are pretty standard: enabled_filters = RetryFilter,AvailabilityZoneFilter,CoreFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,DifferentHostFilter,SameHostFilter What should I be looking for here? Why would a single scheduler send a VM to a host that is too full? We have lots of compute hosts that are not full: https://paste.fedoraproject.org/paste/6SX9pQ4V1KnWfQkVnfoHOw This is the command line I used: openstack server create --flavor s1.16cx120g --image QSC-P-CentOS6.6-19P1-v4 --network vg-network --max 20 alberttestB -------------- next part -------------- An HTML attachment was scrubbed... URL: From stig.openstack at telfer.org Tue Nov 12 20:05:25 2019 From: stig.openstack at telfer.org (Stig Telfer) Date: Tue, 12 Nov 2019 20:05:25 +0000 Subject: [scientific-sig] IRC meeting today - Shanghai roundup and Supercomputing 2019 Message-ID: <94DEBF9B-F1D4-40CC-91FE-8A6207CAC142@telfer.org> Greetings all - We have a Scientific SIG meeting in about an hour’s time (2100 UTC) in channel #openstack-meeting. Everyone is welcome. Agenda is here: https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_November_12th_2019 We are going to cover a trip report from Shanghai and planning for the many activities coming up next week in Supercomputing. Cheers, Stig -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Tue Nov 12 20:17:33 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 12 Nov 2019 15:17:33 -0500 Subject: [stable][glance] Proposal to add Abhishek Kekane to glance-stable-maint Message-ID: <86e16a59-8d38-6951-e779-26d97f4d2eaa@gmail.com> I'd like to propose adding Abhishek Kekane to the Glance stable maintenance team. He's been a glance core for a few years now, and we are currently understaffed in glance-stable-maint. Plus, he's the current Glance PTL. cheers, brian From smooney at redhat.com Tue Nov 12 20:21:45 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 12 Nov 2019 20:21:45 +0000 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: Message-ID: On Tue, 2019-11-12 at 19:42 +0000, Albert Braden wrote: > If I create 20 VMs at once, at least one of them fails with "Exceeded maximum number of retries." When I look at the > logs I see that the scheduler sent the VM to a host that doesn't have enough CPU "Free vcpu 14.00 VCPU < requested 16 > VCPU." > > https://paste.fedoraproject.org/paste/6N3wcDzlbNQgj6hRApHiDQ > > I thought that this must be caused by a race condition, so I stopped the scheduler and conductor on 2 controllers, and > then created 20 more VMs. Now I see the logs only on controller 3, and some of the failures are now saying "Unable to > establish connection to " but I still see the single scheduler sending VMs to a host that lacks resources "Free > vcpu 14.00 VCPU < requested 16 VCPU." > > https://paste.fedoraproject.org/paste/lGlVpfbB9C19mMzrWQcHCQ > > I'm looking at my nova.conf but don't see anything misconfigured. My filters are pretty standard: > > enabled_filters = > RetryFilter,AvailabilityZoneFilter,CoreFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter, > ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,DifferentHostFilter,SameHostFilter > > What should I be looking for here? Why would a single scheduler send a VM to a host that is too full? We have lots of > compute hosts that are not full: > > https://paste.fedoraproject.org/paste/6SX9pQ4V1KnWfQkVnfoHOw > > This is the command line I used: > > openstack server create --flavor s1.16cx120g --image QSC-P-CentOS6.6-19P1-v4 --network vg-network --max 20 alberttestB what version of openstack are you running? if its not using placement then this behaviour is expected as the resources are not claimed untill the vm is booted on the node so there is and interval where the scudler is selecting hosts where you can race with other vm boot. if you are using placement and you are not using numa or pci pass though, which you do not appear to be based on your enabled filters, then this should not happen and we should dig deeper as there is likely a bug either in your configuration or in nova. From rosmaita.fossdev at gmail.com Tue Nov 12 20:25:10 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 12 Nov 2019 15:25:10 -0500 Subject: [stable][glance] Proposal to remove Flavio Percoco from glance-stable-maint Message-ID: <2fee3345-c064-8428-8376-e8b5f114f422@gmail.com> I just noticed that Flavio is still a member of glance-stable-maint. Nothing against him personally -- he's an excellent dude -- but he hasn't been working on Glance (or OpenStack) for quite a while now and is no longer a member of glance-core, so he probably shouldn't be on the stable-maint team. (Not that he'd do anything bad, it just makes the glance-stable-maint team look larger than it actually is.) On the off chance he'll see this message, I'd like to thank Flavio for all his hard work in the past keeping Glance stable! cheers, brian From Albert.Braden at synopsys.com Tue Nov 12 20:30:00 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Tue, 12 Nov 2019 20:30:00 +0000 Subject: Filter costs / filter order Message-ID: I'm running Rocky and trying to figure out filter order. I'm reading this doc: https://docs.openstack.org/nova/rocky/user/filter-scheduler.html It says: Each filter selects hosts in a different way and has different costs. The order of filter_scheduler.enabled_filters affects scheduling performance. The general suggestion is to filter out invalid hosts as soon as possible to avoid unnecessary costs. We can sort filter_scheduler.enabled_filters items by their costs in reverse order. For example, ComputeFilter is better before any resource calculating filters like RamFilter, CoreFilter. Is there a document that specifies filter costs, or ranks filters by cost? Is there a well-known process for determining the optimal filter order? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Tue Nov 12 20:35:18 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 12 Nov 2019 15:35:18 -0500 Subject: [stable][cinder] Proposal to remove John Griffith from cinder-stable-maint Message-ID: <17610e99-5629-31c8-84ac-430ad06b2b62@gmail.com> John Griffith has taken on other commitments and stepped down as a cinder-core recently, so it doesn't make sense for him to continue on the cinder-stable-maint list. I'd like to acknowledge his role as "The Father of Cinder", though, and express my thanks on behalf of the Cinder team for all his past work on the project. cheers, brian From Tim.Bell at cern.ch Tue Nov 12 20:38:57 2019 From: Tim.Bell at cern.ch (Tim Bell) Date: Tue, 12 Nov 2019 20:38:57 +0000 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> Message-ID: <98616CAC-FEBD-46C2-A7BF-A1EBAD58F78B@cern.ch> Many thanks for the summaries. It’s really helpful for those who could not be in the discussions. CERN are also using ML2/Linuxbridge so we’d welcome being involved in any deprecation discussions and migration paths. Tim > On 12 Nov 2019, at 14:53, Slawek Kaplonski wrote: > > Hi Neutron team, > > First if all thank to all of You for great and very productive week during the > PTG in Shanghai. > Below is summary of our discussions from whole 3 days. > If I forgot about something, please respond to the email and update missing > informations. But if You want to have follow up discussion about one of the > topics from this summary, please start a new thread to keep this one only as > high level summary of the PTG. > > ... > Starting the process of removing ML2/Linuxbridge > ================================================ > > Currently in Neutron tree we have 4 drivers: > * Linuxbridge, > * Openvswitch, > * macvtap, > * sriov. > SR-IOV driver is out of discussion here as this driver is > addressing slightly different use case than other out drivers. > > We started discussion about above topic because we don't want to end up with too > many drivers in-tree and we also had some discussions (and we have spec for that > already) about include networking-ovn as in-tree driver. > So with networking-ovn in-tree we would have already 4 drivers which can be used > on any hardware: linuxbridge, ovs, macvtap and ovn. > Conclusions from the discussion are: > * each driver requires proper testing in the gate, so we need to add many new > jobs to our check/gate queue, > * currently linuxbridge driver don't have a lot of development and feature > parity gaps between linuxbridge and ovs drivers is getting bigger and bigger > (e.g. dvr, trunk ports), > * also macvtap driver don't have a lot of activity in last few cycles. Maybe > this one could be also considered as candidate to deprecation, > * we need to have process of deprecating some drivers and time horizon for such > actions should be at least 2 cycles. > * we will not remove any driver completly but rather we will move it to be in > stadium process first so it still can be maintained by people who are > interested in it. > > Actions to do after this discussion: > * Miguel Lavalle will contact RAX and Godaddy (we know that those are > Linuxbridge users currently) to ask about their feedback about this, > * if there are any other companies using LB driver, Nate Johnston is willing to > help conctating them, please reach to him in such case. > * we may ratify marking linuxbridge as deprecated in the team meeting during > Ussuri cycle if nothing surprising pops in. > From Albert.Braden at synopsys.com Tue Nov 12 20:47:38 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Tue, 12 Nov 2019 20:47:38 +0000 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: Message-ID: We are running placement under apache: https://paste.fedoraproject.org/paste/mZviLVe5xONPsXfLqdxI6A The placement error logs show a lot of GETs but no errors: https://paste.fedoraproject.org/paste/xDVGaXEdoQ5Z3wHv17Lezg We are planning to use NUMA but haven't started yet. It's probably a config error. Where should I be looking? This is our nova config on the controllers: https://paste.fedoraproject.org/paste/kNe1eRimk4ifrAuuN790bg -----Original Message----- From: Sean Mooney Sent: Tuesday, November 12, 2019 12:22 PM To: Albert Braden ; openstack-discuss at lists.openstack.org Subject: Re: Scheduler sends VM to HV that lacks resources On Tue, 2019-11-12 at 19:42 +0000, Albert Braden wrote: > If I create 20 VMs at once, at least one of them fails with "Exceeded maximum number of retries." When I look at the > logs I see that the scheduler sent the VM to a host that doesn't have enough CPU "Free vcpu 14.00 VCPU < requested 16 > VCPU." > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_6N3wcDzlbNQgj6hRApHiDQ&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=buklMe5R5iK--nSTPE8_2kdSLjTRHLCbk0XatjhiCnY&e= > > I thought that this must be caused by a race condition, so I stopped the scheduler and conductor on 2 controllers, and > then created 20 more VMs. Now I see the logs only on controller 3, and some of the failures are now saying "Unable to > establish connection to " but I still see the single scheduler sending VMs to a host that lacks resources "Free > vcpu 14.00 VCPU < requested 16 VCPU." > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_lGlVpfbB9C19mMzrWQcHCQ&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=PxLwkpEiTHvHxuPTPo0Pt5IHhe79vfnQqLgLLb7JQ8Y&e= > > I'm looking at my nova.conf but don't see anything misconfigured. My filters are pretty standard: > > enabled_filters = > RetryFilter,AvailabilityZoneFilter,CoreFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter, > ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,DifferentHostFilter,SameHostFilter > > What should I be looking for here? Why would a single scheduler send a VM to a host that is too full? We have lots of > compute hosts that are not full: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_6SX9pQ4V1KnWfQkVnfoHOw&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=Yl9s2ZJ47GPXSyPzh6Hf0gyoxbqKGD9J9I2eSE0V8TA&e= > > This is the command line I used: > > openstack server create --flavor s1.16cx120g --image QSC-P-CentOS6.6-19P1-v4 --network vg-network --max 20 alberttestB what version of openstack are you running? if its not using placement then this behaviour is expected as the resources are not claimed untill the vm is booted on the node so there is and interval where the scudler is selecting hosts where you can race with other vm boot. if you are using placement and you are not using numa or pci pass though, which you do not appear to be based on your enabled filters, then this should not happen and we should dig deeper as there is likely a bug either in your configuration or in nova. From aschultz at redhat.com Tue Nov 12 20:53:18 2019 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 12 Nov 2019 13:53:18 -0700 Subject: [tripleo] Re: deployment failed In-Reply-To: References: Message-ID: On Tue, Nov 12, 2019 at 11:43 AM SHAMBHU KUMAR wrote: > Dear Sir/mam > > i'm facing the issue while deployment of triple o > > > no valid host found error code 500 > > > Can you please eloborate in this matter because i'm stuck heere > > > Your support will be highly appriciated.. > > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/troubleshooting/index.html This error traditionally means that the hardware was unable to be provisioned. Check the nova/ironic logs for additional information. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Tue Nov 12 21:13:56 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 12 Nov 2019 15:13:56 -0600 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: Message-ID: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> On 11/12/2019 2:47 PM, Albert Braden wrote: > It's probably a config error. Where should I be looking? This is our nova config on the controllers: > > https://paste.fedoraproject.org/paste/kNe1eRimk4ifrAuuN790bg If your deployment is pike or newer (I'm guessing rocky because your other email says rocky), then you don't need these filters: RetryFilter - alternate hosts bp in queens release makes this moot CoreFilter - placement filters on VCPU RamFilter - placement filters on MEMORY_MB -- Thanks, Matt From mriedemos at gmail.com Tue Nov 12 21:15:13 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 12 Nov 2019 15:15:13 -0600 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> Message-ID: On 11/12/2019 3:13 PM, Matt Riedemann wrote: > If your deployment is pike or newer (I'm guessing rocky because your > other email says rocky), then you don't need these filters: > > RetryFilter - alternate hosts bp in queens release makes this moot > CoreFilter - placement filters on VCPU > RamFilter - placement filters on MEMORY_MB Sorry, I should have said: If your deployment is pike or newer then you don't need the CoreFilter or RamFilter. If your deployment is queens or newer then you don't need the RetryFilter. -- Thanks, Matt From smooney at redhat.com Tue Nov 12 21:22:49 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 12 Nov 2019 21:22:49 +0000 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: Message-ID: am what version of openstack have you deployed. i did not see that in your email. is it ocata or newer http://lists.openstack.org/pipermail/openstack-dev/2018-January/126283.html i see you have the CoreFilter and RamFilter filters enabled. form octa on they shoudl be disabled as we claim those in placement but it should not break anything on older releases. we have removed them in train after we removed the caching scheduler. On Tue, 2019-11-12 at 20:47 +0000, Albert Braden wrote: > We are running placement under apache: > > https://paste.fedoraproject.org/paste/mZviLVe5xONPsXfLqdxI6A > > The placement error logs show a lot of GETs but no errors: > > https://paste.fedoraproject.org/paste/xDVGaXEdoQ5Z3wHv17Lezg > > We are planning to use NUMA but haven't started yet. It's probably a config error. Where should I be looking? This is > our nova config on the controllers: > > https://paste.fedoraproject.org/paste/kNe1eRimk4ifrAuuN790bg > > -----Original Message----- > From: Sean Mooney > Sent: Tuesday, November 12, 2019 12:22 PM > To: Albert Braden ; openstack-discuss at lists.openstack.org > Subject: Re: Scheduler sends VM to HV that lacks resources > > On Tue, 2019-11-12 at 19:42 +0000, Albert Braden wrote: > > If I create 20 VMs at once, at least one of them fails with "Exceeded maximum number of retries." When I look at the > > logs I see that the scheduler sent the VM to a host that doesn't have enough CPU "Free vcpu 14.00 VCPU < requested > > 16 > > VCPU." > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_6N3wcDzlbNQgj6hRApHiDQ&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=buklMe5R5iK--nSTPE8_2kdSLjTRHLCbk0XatjhiCnY&e= > > > > > > I thought that this must be caused by a race condition, so I stopped the scheduler and conductor on 2 controllers, > > and > > then created 20 more VMs. Now I see the logs only on controller 3, and some of the failures are now saying "Unable > > to > > establish connection to " but I still see the single scheduler sending VMs to a host that lacks resources "Free > > vcpu 14.00 VCPU < requested 16 VCPU." > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_lGlVpfbB9C19mMzrWQcHCQ&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=PxLwkpEiTHvHxuPTPo0Pt5IHhe79vfnQqLgLLb7JQ8Y&e= > > > > > > I'm looking at my nova.conf but don't see anything misconfigured. My filters are pretty standard: > > > > enabled_filters = > > RetryFilter,AvailabilityZoneFilter,CoreFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilte > > r, > > ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,DifferentHostFilter,SameHostFilter > > > > What should I be looking for here? Why would a single scheduler send a VM to a host that is too full? We have lots > > of > > compute hosts that are not full: > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_6SX9pQ4V1KnWfQkVnfoHOw&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=Yl9s2ZJ47GPXSyPzh6Hf0gyoxbqKGD9J9I2eSE0V8TA&e= > > > > > > This is the command line I used: > > > > openstack server create --flavor s1.16cx120g --image QSC-P-CentOS6.6-19P1-v4 --network vg-network --max 20 > > alberttestB > > what version of openstack are you running? > if its not using placement then this behaviour is expected as the resources are not claimed untill the vm is booted on > the node so there is and interval where the scudler is selecting hosts where you can race with other vm boot. > > if you are using placement and you are not using numa or pci pass though, which you do not appear to be based on your > enabled filters, then this should not happen and we should dig deeper as there is likely a bug either in your > configuration or in nova. > From Albert.Braden at synopsys.com Tue Nov 12 21:25:25 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Tue, 12 Nov 2019 21:25:25 +0000 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: Message-ID: We're on Rocky -----Original Message----- From: Sean Mooney Sent: Tuesday, November 12, 2019 1:23 PM To: Albert Braden ; openstack-discuss at lists.openstack.org Subject: Re: Scheduler sends VM to HV that lacks resources am what version of openstack have you deployed. i did not see that in your email. is it ocata or newer https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_pipermail_openstack-2Ddev_2018-2DJanuary_126283.html&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=kS78s-S5ZnMEZmKqQgS-OX7Xz_pMjQQrrzv8fC0H5s8&s=jdZtsoEhWjn-EV3ffMxUc8E5Xum3xXbpR-0gpGp2Y14&e= i see you have the CoreFilter and RamFilter filters enabled. form octa on they shoudl be disabled as we claim those in placement but it should not break anything on older releases. we have removed them in train after we removed the caching scheduler. On Tue, 2019-11-12 at 20:47 +0000, Albert Braden wrote: > We are running placement under apache: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_mZviLVe5xONPsXfLqdxI6A&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=kS78s-S5ZnMEZmKqQgS-OX7Xz_pMjQQrrzv8fC0H5s8&s=-7cWLHHrr0qduVnO6FYrDXp3b3QSIBgC3M3CABtQup8&e= > > The placement error logs show a lot of GETs but no errors: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_xDVGaXEdoQ5Z3wHv17Lezg&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=kS78s-S5ZnMEZmKqQgS-OX7Xz_pMjQQrrzv8fC0H5s8&s=tQuMny6EiubEruJIJyN1zj2GSUBGBzqD3SW06H8ZIe8&e= > > We are planning to use NUMA but haven't started yet. It's probably a config error. Where should I be looking? This is > our nova config on the controllers: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_kNe1eRimk4ifrAuuN790bg&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=kS78s-S5ZnMEZmKqQgS-OX7Xz_pMjQQrrzv8fC0H5s8&s=r1qv0CcWP5-3CXkQsiNgoe3pxGqKGkqymdjTLsJ9dYI&e= > > -----Original Message----- > From: Sean Mooney > Sent: Tuesday, November 12, 2019 12:22 PM > To: Albert Braden ; openstack-discuss at lists.openstack.org > Subject: Re: Scheduler sends VM to HV that lacks resources > > On Tue, 2019-11-12 at 19:42 +0000, Albert Braden wrote: > > If I create 20 VMs at once, at least one of them fails with "Exceeded maximum number of retries." When I look at the > > logs I see that the scheduler sent the VM to a host that doesn't have enough CPU "Free vcpu 14.00 VCPU < requested > > 16 > > VCPU." > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_6N3wcDzlbNQgj6hRApHiDQ&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=buklMe5R5iK--nSTPE8_2kdSLjTRHLCbk0XatjhiCnY&e= > > > > > > I thought that this must be caused by a race condition, so I stopped the scheduler and conductor on 2 controllers, > > and > > then created 20 more VMs. Now I see the logs only on controller 3, and some of the failures are now saying "Unable > > to > > establish connection to " but I still see the single scheduler sending VMs to a host that lacks resources "Free > > vcpu 14.00 VCPU < requested 16 VCPU." > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_lGlVpfbB9C19mMzrWQcHCQ&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=PxLwkpEiTHvHxuPTPo0Pt5IHhe79vfnQqLgLLb7JQ8Y&e= > > > > > > I'm looking at my nova.conf but don't see anything misconfigured. My filters are pretty standard: > > > > enabled_filters = > > RetryFilter,AvailabilityZoneFilter,CoreFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilte > > r, > > ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,DifferentHostFilter,SameHostFilter > > > > What should I be looking for here? Why would a single scheduler send a VM to a host that is too full? We have lots > > of > > compute hosts that are not full: > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_6SX9pQ4V1KnWfQkVnfoHOw&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=Yl9s2ZJ47GPXSyPzh6Hf0gyoxbqKGD9J9I2eSE0V8TA&e= > > > > > > This is the command line I used: > > > > openstack server create --flavor s1.16cx120g --image QSC-P-CentOS6.6-19P1-v4 --network vg-network --max 20 > > alberttestB > > what version of openstack are you running? > if its not using placement then this behaviour is expected as the resources are not claimed untill the vm is booted on > the node so there is and interval where the scudler is selecting hosts where you can race with other vm boot. > > if you are using placement and you are not using numa or pci pass though, which you do not appear to be based on your > enabled filters, then this should not happen and we should dig deeper as there is likely a bug either in your > configuration or in nova. > From colleen at gazlene.net Tue Nov 12 21:42:18 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Tue, 12 Nov 2019 13:42:18 -0800 Subject: [keystone] Shanghai forum/PTG recap Message-ID: <9e8f0a58-b20d-49e2-87a6-5afa7c73423d@www.fastmail.com> While the keystone team didn't itself meet at last week's PTG, I posted a recap of the event from a keystone perspective here: http://www.gazlene.net/shanghai-forum-ptg.html Hope it's a useful summary for those who couldn't attend in-person. Colleen From smooney at redhat.com Tue Nov 12 21:46:28 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 12 Nov 2019 21:46:28 +0000 Subject: Filter costs / filter order In-Reply-To: References: Message-ID: <24b8fe814dd497bb6e39a255fefcea24a44bb518.camel@redhat.com> On Tue, 2019-11-12 at 20:30 +0000, Albert Braden wrote: > I'm running Rocky and trying to figure out filter order. I'm reading this doc: > https://docs.openstack.org/nova/rocky/user/filter-scheduler.html > > It says: > > Each filter selects hosts in a different way and has different costs. The order of filter_scheduler.enabled_filters > affects scheduling performance. The general suggestion is to filter out invalid hosts as soon as possible to avoid > unnecessary costs. We can sort filter_scheduler.enabled_filters items by their costs in reverse order. For example, > ComputeFilter is better before any resource calculating filters like RamFilter, CoreFilter. > > Is there a document that specifies filter costs, or ranks filters by cost? Is there a well-known process for > determining the optimal filter order? im not a aware of a specific document that cover it but this will very based on deployment. as a general guideline you should order your filter by which ones elmiate the most hosts. so the AvailabilityZoneFilter should generally be first. in older release the retry filter shoudl go first. the numa toplogy filter and pci passthough filter are kind fo expensive. so they are better to have near the end. so i would start with the Aggreaget* filters first folowed by "cheap" filter that dont have any complex boolean logic so SameHostFilter, DifferentHostFilter, IoOpsFilter, NumInstancesFilter there are a few others the the more complex filters like numa toplogy, pci passthogh, ComputeCapabilitiesFilter, JsonFilter effectivly what you want to do is maxius the infomation gain at each filtering step will miniusing the cost(reducing the possible host with as few cpu cycles as posible) its important to only enable the filter that matter to your deployment also but if we had a perfect costing for each filter then you could follow the ID3 algorithm to get an optimal layout. https://en.wikipedia.org/wiki/ID3_algorithm i have wanted to experiment with tracing the boot requests on large public clould and model this for some time but i always endup finding other things to thinker with instead but i think even with out that data to work with you could do some intersting things with code complexity metricts as a proxy to try and auto sort them. perhaps some of the operator can share what they do i know cern pre placement used to map tenant to cells as there first filtering step which signifcatly helped them with scale but if the goal is speed then you need to have each step give you the maxium infomation gain for the minium addtional cost. that is why the aggreate filters and multi host filters like affintiy filters tend to be better at the start of the list and very detailed filters like the numa topolgy filter then to be better at the end. From mriedemos at gmail.com Tue Nov 12 22:09:27 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 12 Nov 2019 16:09:27 -0600 Subject: [ptg][nova][cinder] x-p meeting minutes In-Reply-To: References: Message-ID: On 11/12/2019 8:07 AM, Sylvain Bauza wrote: > We aren't sure that Nova will allow a detach of a boot volume. This was never completed: https://specs.openstack.org/openstack/nova-specs/specs/train/approved/detach-boot-volume.html -- Thanks, Matt From mriedemos at gmail.com Tue Nov 12 22:11:41 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 12 Nov 2019 16:11:41 -0600 Subject: [stable][glance] Proposal to remove Flavio Percoco from glance-stable-maint In-Reply-To: <2fee3345-c064-8428-8376-e8b5f114f422@gmail.com> References: <2fee3345-c064-8428-8376-e8b5f114f422@gmail.com> Message-ID: <0191ee00-d6fb-80ba-2b74-db76e6219360@gmail.com> On 11/12/2019 2:25 PM, Brian Rosmaita wrote: > I just noticed that Flavio is still a member of glance-stable-maint. > Nothing against him personally -- he's an excellent dude -- but he > hasn't been working on Glance (or OpenStack) for quite a while now and > is no longer a member of glance-core, so he probably shouldn't be on the > stable-maint team.  (Not that he'd do anything bad, it just makes the > glance-stable-maint team look larger than it actually is.) Done. -- Thanks, Matt From mriedemos at gmail.com Tue Nov 12 22:11:53 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 12 Nov 2019 16:11:53 -0600 Subject: [stable][cinder] Proposal to remove John Griffith from cinder-stable-maint In-Reply-To: <17610e99-5629-31c8-84ac-430ad06b2b62@gmail.com> References: <17610e99-5629-31c8-84ac-430ad06b2b62@gmail.com> Message-ID: On 11/12/2019 2:35 PM, Brian Rosmaita wrote: > John Griffith has taken on other commitments and stepped down as a > cinder-core recently, so it doesn't make sense for him to continue on > the cinder-stable-maint list. Done. -- Thanks, Matt From openstack at nemebean.com Tue Nov 12 23:03:26 2019 From: openstack at nemebean.com (Ben Nemec) Date: Tue, 12 Nov 2019 17:03:26 -0600 Subject: [qa] required rabbitMQ materials In-Reply-To: References: <7620140da67f4b38bfbf0e88ab212874@inspur.com> Message-ID: On 10/31/19 11:03 AM, Brin Zhang(张百林) wrote: > Hi all > Can anyone provide me with some materials about RabbitMQ? like its implementation mechanisms, scenarios, etc. > Thanks anyway. There's some documentation about rabbitmq in the oslo.messaging docs: https://docs.openstack.org/oslo.messaging/train/admin/rabbit.html I don't know if that's exactly what you're looking for, but hopefully it will get you started and you can ask specific questions as a followup. > > Brin Zhang > From jasonanderson at uchicago.edu Tue Nov 12 23:17:00 2019 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Tue, 12 Nov 2019 23:17:00 +0000 Subject: [blazar] Why/how does Blazar use Keystone trusts? Message-ID: Hi Blazar contributors, We hit an issue today involving trusts in Blazar, where a host couldn't be deleted due to some issue authenticating against the trust associated with the host. We still haven't resolved this issue, but it felt odd to me: why is a trust even involved here? I have often wondered what the reason is for using trusts in Blazar, as I can't think of anything Blazar is doing that could not be done by the Blazar system user (and in fact, many operations are done via this user... via another trust.) There are also issues where a user leaves a project before their leases have ended; in this case Blazar has difficulty cleaning up because it tries to resurrect a trust that is not tied to a valid user/project relationship. Does anybody have context on to how trusts are used in Blazar and if they are still necessary? Does it make sense to remove this functionality? Thank you, -- Jason Anderson Chameleon DevOps Lead Consortium for Advanced Science and Engineering, The University of Chicago Mathematics & Computer Science Division, Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony at bakeyournoodle.com Tue Nov 12 23:29:27 2019 From: tony at bakeyournoodle.com (Tony Breeds) Date: Wed, 13 Nov 2019 10:29:27 +1100 Subject: [nova][ptg] PCI refactoring needs and a strawman proposal inside In-Reply-To: <35cff7ea13a85bde43a4626c84b6bc130eb67110.camel@redhat.com> References: <35cff7ea13a85bde43a4626c84b6bc130eb67110.camel@redhat.com> Message-ID: <20191112232927.GB22972@thor.bakeyournoodle.com> On Tue, Nov 12, 2019 at 01:29:32PM +0000, Sean Mooney wrote: > On Tue, 2019-11-12 at 11:51 +0100, Sylvain Bauza wrote: > i have a list of things i want to enhance related to pci/sriov so i would be interested > in this topic too. it might be worth consiering a SIG on this topic if it will be > cross project. /me too I have a bunch of work to do with device pass-through that may, or may not be on the PCI bus so I'd like to ensure we don't make my use case impossible ;P Yours Tony. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From tony at bakeyournoodle.com Tue Nov 12 23:42:33 2019 From: tony at bakeyournoodle.com (Tony Breeds) Date: Wed, 13 Nov 2019 10:42:33 +1100 Subject: [nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations? In-Reply-To: References: <146f5d1c-4045-0f78-b2c1-8c83965c49cb@gmail.com> Message-ID: <20191112234233.GC22972@thor.bakeyournoodle.com> On Tue, Nov 05, 2019 at 08:51:13AM -0800, Dan Smith wrote: > > The question I'm posing is if people would like to see those options > > backported to stein and if so, would the stable team be OK with it? > > I'd say this falls into a gray area where these are things that are > > optional, not used by default, and are operational tooling so less > > risk to backport, but it's not zero risk. It's also worth noting that > > when I wrote those patches I did so with the intent that people could > > backport them at least internally. > > Backporting features to operator tooling that helps them recover from > bugs or other failures without doing database surgery seems like a good > thing. Hard to argue that the risk outweighs the benefit, IMHO. FWIW, I agree with this. #makeitso Yours Tony. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From soulxu at gmail.com Wed Nov 13 03:42:02 2019 From: soulxu at gmail.com (Alex Xu) Date: Wed, 13 Nov 2019 11:42:02 +0800 Subject: [nova] track error migrations and orphans in Resource Tracker In-Reply-To: References: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> Message-ID: Sean Mooney 于2019年11月12日周二 下午9:27写道: > On Tue, 2019-11-12 at 05:46 +0000, Zhong, Luyao wrote: > > Hi Nova experts, > > > > "Not tracking error migrations and orphans in RT." is probably a bug. > This may trigger some problems in > > update_available_resources in RT at the moment. That is some orphans or > error migrations are using cpus/memory/disk > > etc, but we don't take these usage into consideration. And > instance.resources is introduced from Train used to contain > > specific resources, we also track assigned specific resources in RT > based on tracked migrations and instances. So this > > bug will also affect the specific resources tracking. > > > > I draft an doc to clarify this bug and possible solutions: > > https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT > > Looking forward to suggestions from you. Thanks in advance. > > > there are patche up to allow cleaning up orpahn instances > https://review.opendev.org/#/c/627765/ > https://review.opendev.org/#/c/648912/ > if we can get those merged that woudl adress at least some of the proablem > Yes, and we separate the issue to be two parts, one part is tracking, another part is cleanup. Yongli's patch will help on cleanup. > > > Best Regards, > > Luyao > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.denton at rackspace.com Wed Nov 13 04:31:20 2019 From: james.denton at rackspace.com (James Denton) Date: Wed, 13 Nov 2019 04:31:20 +0000 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: <98616CAC-FEBD-46C2-A7BF-A1EBAD58F78B@cern.ch> References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> <98616CAC-FEBD-46C2-A7BF-A1EBAD58F78B@cern.ch> Message-ID: <59D1EFBF-B929-4814-9820-F5A9FAF9DA5C@rackspace.com> Appreciate the summary as well. For what it's worth, the ML2/LinuxBridge combo has been a very stable setup for us since its inception, and I'd hate to see it deprecated and removed for the sake of removing something. Last I checked, trunk ports were supported with the ML2/LinuxBridge driver. And while of course DVR is not a supported feature, a good number of our ML2/LXB environments forgo Neutron routers altogether in favor of putting VMs on the provider network. It has shown to be as performant as vanilla OVS, and a simpler model to implement and support as an operator. Just my two cents. Thanks, James Denton Network Engineer Rackspace Private Cloud james.denton at rackspace.com On 11/12/19, 3:41 PM, "Tim Bell" wrote: CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Many thanks for the summaries. It’s really helpful for those who could not be in the discussions. CERN are also using ML2/Linuxbridge so we’d welcome being involved in any deprecation discussions and migration paths. Tim > On 12 Nov 2019, at 14:53, Slawek Kaplonski wrote: > > Hi Neutron team, > > First if all thank to all of You for great and very productive week during the > PTG in Shanghai. > Below is summary of our discussions from whole 3 days. > If I forgot about something, please respond to the email and update missing > informations. But if You want to have follow up discussion about one of the > topics from this summary, please start a new thread to keep this one only as > high level summary of the PTG. > > ... > Starting the process of removing ML2/Linuxbridge > ================================================ > > Currently in Neutron tree we have 4 drivers: > * Linuxbridge, > * Openvswitch, > * macvtap, > * sriov. > SR-IOV driver is out of discussion here as this driver is > addressing slightly different use case than other out drivers. > > We started discussion about above topic because we don't want to end up with too > many drivers in-tree and we also had some discussions (and we have spec for that > already) about include networking-ovn as in-tree driver. > So with networking-ovn in-tree we would have already 4 drivers which can be used > on any hardware: linuxbridge, ovs, macvtap and ovn. > Conclusions from the discussion are: > * each driver requires proper testing in the gate, so we need to add many new > jobs to our check/gate queue, > * currently linuxbridge driver don't have a lot of development and feature > parity gaps between linuxbridge and ovs drivers is getting bigger and bigger > (e.g. dvr, trunk ports), > * also macvtap driver don't have a lot of activity in last few cycles. Maybe > this one could be also considered as candidate to deprecation, > * we need to have process of deprecating some drivers and time horizon for such > actions should be at least 2 cycles. > * we will not remove any driver completly but rather we will move it to be in > stadium process first so it still can be maintained by people who are > interested in it. > > Actions to do after this discussion: > * Miguel Lavalle will contact RAX and Godaddy (we know that those are > Linuxbridge users currently) to ask about their feedback about this, > * if there are any other companies using LB driver, Nate Johnston is willing to > help conctating them, please reach to him in such case. > * we may ratify marking linuxbridge as deprecated in the team meeting during > Ussuri cycle if nothing surprising pops in. > From akekane at redhat.com Wed Nov 13 05:57:31 2019 From: akekane at redhat.com (Abhishek Kekane) Date: Wed, 13 Nov 2019 11:27:31 +0530 Subject: [ptg][glance] PTG summary Message-ID: Hi All, I attended OpenInfra summit and PTG at Shanghai last week. It was an interesting event with lots of discussion happening around different OpenStack projects. I was mostly associated with Glance and cross-projects work related to Glance. There were other topics around Edge, UI and QA. During summit Me and Erno gave a Glance project update where we discussed what we achieved in Train cycle and what we are going to do in Ussuri cycle. As multiple stores feature is stabilized during Train, In Ussuri main focus of glance is on enhancing /v2/import API to import single image into multiple stores and copying existing images to multiple stores to avoid the manual efforts required by operator to copy the image across the stores. New delete API will also be added to delete the image from single store, also cinder driver of glance_store needs refactoring so that it can use multiple backends configured by cinder. Efforts will be continued for cluster awareness of glance API during this cycle as well. Apart from these edge related work, Glance team will also work on removing deprecated registry and related functional tests, removing of sheepdog driver from glance_store, adding s3 driver with multiple stores support in glance_store and some urgent bug fixes. Cross-Project work: In this PTG we had discussion with Nova and Cinder regarding the adoption of multiple store feature of Glance. As per discussion we have finalized the design and Glance team will work together with Nova and Cinder towards adding multiple store support feature in Train cycle. Support for Glance multiple stores in Cinder: As per discussion, volume-type will be used to add which store the image will be uploaded on upload-to-image operation, also cinder will send base image id to glance as a header using which glance will upload the image created from volume to all those stores in which base image is present. Nova snapshots to dedicated store: Agreement is, Nova will send us a base image id to glance as a header using which glance will upload the instance snapshot to all those stores in which base image is present. Talk with QA team: Glance has also talked with QA team for adding new tempest coverage for newly added features in the last couple of cycles, Glance team will work with tempest to add below new tempest tests. 1. New import workflow (image conversion, inject metadata etc.) - Depends on https://review.opendev.org/#/c/545483/ devstack patch 2. Hide old images 3. Multiple stores: https://review.opendev.org/#/c/689104/ in devstack 3.1 Devstack patch + zuul job to setup multiple stores and the job will run on glance and run glance api and scenario tests 4. Delete barbican secrets from glance images 4.1 add the tests the in barbican-tempest-plugin 4.2 run as part of barbican gate using their job 4.3 run that tests with new job (multi stores) on glance gate. do not run barbican job on glance. Below is the Ussuri cycle planning and deadline for Glance. Ussuri milestone planning: Ussuri U1 - December 09-13: 1. Import image in multiple stores (Specs + Implementation) 2. Copy existing image in multiple stores (Specs + Implementation) 3. S3 driver for glance 4. remove sheepdog driver from glance_store 5. Fix subunit parser error 6. Modify existing nova and cinder specs Ussuri U2 - February 10-14 1. Cluster awareness of glance API nodes 2. remove registry code 3. Delete image from single store 4. Nova and Cinder upload snapshot/volume to glance 5. image-import.conf parsing issue with uwsgi Ussuri U3 - April 06-10 1. Multiple cinder store support in glance_store (specs + implementation) 2. Creating image from volume using ceph (slow uploading issue) 3. Image encryption 4. Tempest work Glance PTG planning etherpad: https://etherpad.openstack.org/p/Glance-Ussuri-PTG-planning Let me know if you guys need more details on this. Thanks & Best Regards, Abhishek Kekane -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Wed Nov 13 08:25:31 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 13 Nov 2019 09:25:31 +0100 Subject: [blazar] Why/how does Blazar use Keystone trusts? In-Reply-To: References: Message-ID: Let me tell you a story. Long time ago, in a far far away galaxy, some people wanting to have a way to reserve some compute nodes in OpenStack created a new project that was named "Climate". Those folks weren't really knowing Keystone but they saw some problem : when the reservation was beginning, the token was expired. For that specific reason, they tried to see how to fix it and then saw Keystone trusts. They then said "heh, nice" and they started to use it. After 5 years, nobody really thought whether trusts should still be needed. Maybe the new Blazar team should look at service tokens, rather. Anyway, just my 2cts. -Sylvain On Wed, Nov 13, 2019 at 12:26 AM Jason Anderson wrote: > Hi Blazar contributors, > > We hit an issue today involving trusts in Blazar, where a host couldn't be > deleted due to some issue authenticating against the trust associated with > the host. We still haven't resolved this issue, but it felt odd to me: why > is a trust even involved here? > > I have often wondered what the reason is for using trusts in Blazar, as I > can't think of anything Blazar is doing that could not be done by the > Blazar system user (and in fact, many operations are done via this user... > via another trust.) There are also issues where a user leaves a project > before their leases have ended; in this case Blazar has difficulty cleaning > up because it tries to resurrect a trust that is not tied to a valid > user/project relationship. > > Does anybody have context on to how trusts are used in Blazar and if they > are still necessary? Does it make sense to remove this functionality? > > Thank you, > > -- > Jason Anderson > > Chameleon DevOps Lead > *Consortium for Advanced Science and Engineering, The University of > Chicago* > *Mathematics & Computer Science Division, Argonne National Laboratory* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Wed Nov 13 09:32:12 2019 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 13 Nov 2019 10:32:12 +0100 Subject: [stable][glance] Proposal to remove Flavio Percoco from glance-stable-maint In-Reply-To: <0191ee00-d6fb-80ba-2b74-db76e6219360@gmail.com> References: <2fee3345-c064-8428-8376-e8b5f114f422@gmail.com> <0191ee00-d6fb-80ba-2b74-db76e6219360@gmail.com> Message-ID: <22e5ebb5-b077-f503-3062-bf38a8347614@openstack.org> Matt Riedemann wrote: > On 11/12/2019 2:25 PM, Brian Rosmaita wrote: >> I just noticed that Flavio is still a member of glance-stable-maint. >> Nothing against him personally -- he's an excellent dude -- but he >> hasn't been working on Glance (or OpenStack) for quite a while now and >> is no longer a member of glance-core, so he probably shouldn't be on >> the stable-maint team.  (Not that he'd do anything bad, it just makes >> the glance-stable-maint team look larger than it actually is.) > > Done. Might be a good idea to remove him from stable-maint-core as well (https://review.opendev.org/#/admin/groups/530,members) otherwise this is a noop. -- Thierry Carrez (ttx) From dharmendra.kushwaha at gmail.com Wed Nov 13 10:15:22 2019 From: dharmendra.kushwaha at gmail.com (Dharmendra Kushwaha) Date: Wed, 13 Nov 2019 15:45:22 +0530 Subject: [tc][horizon][all] Horizon plugins maintenance In-Reply-To: References: Message-ID: Hi, As discussed in PTG, I had added horizon-core into tacker-horizon-core team. Thanks for your support. Thanks & Regards Dharmendra Kushwaha On Wed, Oct 23, 2019 at 6:20 PM Ivan Kolodyazhny wrote: > Hi team, > > As you may know, we've got a pretty big list of Horizon Plugins [1]. > Unfortunately, not all of them are in active development due to the lack of > resources in projects teams. > > As a Horizon team, we understand all the reasons, and we're doing our best > to help other teams to maintain plugins. > > That's why we're proposing our help to maintain horizon plugins. We raised > this topic during the last Horizon weekly meeting [2] and we'll have some > discussion during the PTG [3] too. > > There are a lot of Horizon changes which affect plugins and horizon team > is ready to help: > - new Django versions > - dependencies updates > - Horizon API changes > - etc. > > To get faster fixes in, it would be good to have +2 permissions for the > horizon-core team for each plugin. > > We helped Heat team during the last cycle adding horizon-core to the > heat-dashboard-core team. Also, we've got +2 on other plugins via global > project config [4] and via Gerrit configuration for > (neutron-*aas-dashboard, tuskar-ui). > > Vitrage PTL agreed to do the same for vitrage-dashboard during the last > meeting [5]. > > > Of course, it's up to each project to maintain horizon plugins and it's > responsibilities but I would like to raise this topic to the TC too. I > really sure, that it will speed up some critical fixes for Horizon plugins > and makes users and operators experience better. > > > [1] https://docs.openstack.org/horizon/latest/install/plugin-registry.html > [2] > http://eavesdrop.openstack.org/meetings/horizon/2019/horizon.2019-10-16-15.02.log.html#l-128 > [3] https://etherpad.openstack.org/p/horizon-u-ptg > [4] > http://codesearch.openstack.org/?q=horizon-core&i=nope&files=&repos=openstack/project-config > [5] > http://eavesdrop.openstack.org/meetings/vitrage/2019/vitrage.2019-10-23-08.03.log.html#l-21 > > Regards, > Ivan Kolodyazhny, > http://blog.e0ne.info/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Wed Nov 13 11:10:28 2019 From: pierre at stackhpc.com (Pierre Riteau) Date: Wed, 13 Nov 2019 12:10:28 +0100 Subject: [blazar] Why/how does Blazar use Keystone trusts? In-Reply-To: References: Message-ID: Hi Jason, As you point out, reliance on trusts causes problems when users are disabled or deleted from Keystone. In the past it even prevented non-admin users from starting leases, see [1] for context. I believe there are some operations that could still benefit from the use of trusts (or another mechanism to execute actions on behalf of users), such as snapshot in the before_end event. It's possible that with the current code, snapshot end up being owned by the blazar service user. I don't think I've ever used this feature… For management of hosts specifically, I don't see why trusts should be needed. I have a WIP patch to remove their use [2] which should fix your issue. IIRC it just needs unit tests fixes, maybe some from Chameleon could help to finish it? [1] https://bugs.launchpad.net/blazar/+bug/1663204 [2] https://review.opendev.org/#/c/641103/ On Wed, 13 Nov 2019 at 09:39, Sylvain Bauza wrote: > > Let me tell you a story. > Long time ago, in a far far away galaxy, some people wanting to have a way to reserve some compute nodes in OpenStack created a new project that was named "Climate". > Those folks weren't really knowing Keystone but they saw some problem : when the reservation was beginning, the token was expired. > > For that specific reason, they tried to see how to fix it and then saw Keystone trusts. They then said "heh, nice" and they started to use it. > After 5 years, nobody really thought whether trusts should still be needed. Maybe the new Blazar team should look at service tokens, rather. > > Anyway, just my 2cts. > > -Sylvain > > On Wed, Nov 13, 2019 at 12:26 AM Jason Anderson wrote: >> >> Hi Blazar contributors, >> >> We hit an issue today involving trusts in Blazar, where a host couldn't be deleted due to some issue authenticating against the trust associated with the host. We still haven't resolved this issue, but it felt odd to me: why is a trust even involved here? >> >> I have often wondered what the reason is for using trusts in Blazar, as I can't think of anything Blazar is doing that could not be done by the Blazar system user (and in fact, many operations are done via this user... via another trust.) There are also issues where a user leaves a project before their leases have ended; in this case Blazar has difficulty cleaning up because it tries to resurrect a trust that is not tied to a valid user/project relationship. >> >> Does anybody have context on to how trusts are used in Blazar and if they are still necessary? Does it make sense to remove this functionality? >> >> Thank you, >> >> -- >> Jason Anderson >> >> Chameleon DevOps Lead >> Consortium for Advanced Science and Engineering, The University of Chicago >> Mathematics & Computer Science Division, Argonne National Laboratory From thierry at openstack.org Wed Nov 13 11:18:47 2019 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 13 Nov 2019 12:18:47 +0100 Subject: [sig] Forming a Large scale SIG Message-ID: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> Hi everyone, In Shanghai we held a forum session to gauge interest in a new SIG to specifically address cluster scaling issues. In the past we had several groups ("Large deployments", "Performance", LCOO...) but those efforts were arguably a bit too wide and those groups are now abandoned. My main goal here is to get large users directly involved in a domain where their expertise can best translate into improvements in the software. It's easy for such a group to go nowhere while trying to boil the ocean. To maximize its chances of success and make it sustainable, the group should have a narrow focus, and reasonable objectives. My personal idea for the group focus was to specifically address scaling issues within a single cluster: basically identify and address issues that prevent scaling a single cluster (or cell) past a number of nodes. By sharing analysis and experience, the group could identify common pain points that, once solved, would help raising that number. There was a lot of interest in that session[1], and it predictably exploded in lots of different directions, including some that are definitely past a single cluster (like making Neutron better support cells). I think it's fine: my initial proposal was more of a strawman. Active members of the group should really define what they collectively want to work on. And the SIG name should be picked to match that. I'd like to help getting that group off the ground and to a place where it can fly by itself, without needing external coordination. The first step would be to identify interested members and discuss group scope and objectives. Given the nature of the group (with interested members in Japan, Europe, Australia and the US) it will be hard to come up with a synchronous meeting time that will work for everyone, so let's try to hold that discussion over email. So to kick this off: if you are interested in that group, please reply to this email, introduce yourself and tell us what you would like the group scope and objectives to be, and what you can contribute to the group. Thanks! [1] https://etherpad.openstack.org/p/PVG-large-scale-SIG -- Thierry Carrez (ttx) From jean-philippe at evrard.me Wed Nov 13 11:19:23 2019 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Wed, 13 Nov 2019 12:19:23 +0100 Subject: [kuryr] [tc] kuryr project mission In-Reply-To: <94950c5e942e22a4ea1599a4c814eb554d4f2a9b.camel@redhat.com> References: <94950c5e942e22a4ea1599a4c814eb554d4f2a9b.camel@redhat.com> Message-ID: <47d278b0338b1ca297aaef190df06a7bbb92831b.camel@evrard.me> On Tue, 2019-10-29 at 09:52 +0100, Michał Dulko wrote: > (snipped) I'd like to propose rephrasing Kuryr mission > statement from: > > > Bridge between container framework networking and storage models > > to OpenStack networking and storage abstractions. > > to > > > Bridge between container framework networking models > > to OpenStack networking abstractions. > > effectively getting storage out of project scope. > > I am looking forward to see the change in governance :) Regards, JP From skaplons at redhat.com Wed Nov 13 11:20:04 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Wed, 13 Nov 2019 12:20:04 +0100 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: <59D1EFBF-B929-4814-9820-F5A9FAF9DA5C@rackspace.com> References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> <98616CAC-FEBD-46C2-A7BF-A1EBAD58F78B@cern.ch> <59D1EFBF-B929-4814-9820-F5A9FAF9DA5C@rackspace.com> Message-ID: <20191113112004.ynsuamxrl7pa7hiq@skaplons-mac> Hi, On Wed, Nov 13, 2019 at 04:31:20AM +0000, James Denton wrote: > Appreciate the summary as well. > > For what it's worth, the ML2/LinuxBridge combo has been a very stable setup for us since its inception, and I'd hate to see it deprecated and removed for the sake of removing something. Last I checked, trunk ports were supported with the ML2/LinuxBridge driver. And while of course DVR is not a supported feature, a good number of our ML2/LXB environments forgo Neutron routers altogether in favor of putting VMs on the provider network. It has shown to be as performant as vanilla OVS, and a simpler model to implement and support as an operator. You're right. Trunk ports are ofcourse supported by LB agent. But e.g. some of QoS rules aren't supported by this backend. > > Just my two cents. > > Thanks, > > James Denton > Network Engineer > Rackspace Private Cloud > james.denton at rackspace.com > > On 11/12/19, 3:41 PM, "Tim Bell" wrote: > > CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > > > Many thanks for the summaries. It’s really helpful for those who could not be in the discussions. > > CERN are also using ML2/Linuxbridge so we’d welcome being involved in any deprecation discussions and migration paths. > > Tim > > > On 12 Nov 2019, at 14:53, Slawek Kaplonski wrote: > > > > Hi Neutron team, > > > > First if all thank to all of You for great and very productive week during the > > PTG in Shanghai. > > Below is summary of our discussions from whole 3 days. > > If I forgot about something, please respond to the email and update missing > > informations. But if You want to have follow up discussion about one of the > > topics from this summary, please start a new thread to keep this one only as > > high level summary of the PTG. > > > > ... > > > Starting the process of removing ML2/Linuxbridge > > ================================================ > > > > Currently in Neutron tree we have 4 drivers: > > * Linuxbridge, > > * Openvswitch, > > * macvtap, > > * sriov. > > SR-IOV driver is out of discussion here as this driver is > > addressing slightly different use case than other out drivers. > > > > We started discussion about above topic because we don't want to end up with too > > many drivers in-tree and we also had some discussions (and we have spec for that > > already) about include networking-ovn as in-tree driver. > > So with networking-ovn in-tree we would have already 4 drivers which can be used > > on any hardware: linuxbridge, ovs, macvtap and ovn. > > Conclusions from the discussion are: > > * each driver requires proper testing in the gate, so we need to add many new > > jobs to our check/gate queue, > > * currently linuxbridge driver don't have a lot of development and feature > > parity gaps between linuxbridge and ovs drivers is getting bigger and bigger > > (e.g. dvr, trunk ports), > > * also macvtap driver don't have a lot of activity in last few cycles. Maybe > > this one could be also considered as candidate to deprecation, > > * we need to have process of deprecating some drivers and time horizon for such > > actions should be at least 2 cycles. > > * we will not remove any driver completly but rather we will move it to be in > > stadium process first so it still can be maintained by people who are > > interested in it. > > > > Actions to do after this discussion: > > * Miguel Lavalle will contact RAX and Godaddy (we know that those are > > Linuxbridge users currently) to ask about their feedback about this, > > * if there are any other companies using LB driver, Nate Johnston is willing to > > help conctating them, please reach to him in such case. > > * we may ratify marking linuxbridge as deprecated in the team meeting during > > Ussuri cycle if nothing surprising pops in. > > > > > -- Slawek Kaplonski Senior software engineer Red Hat From amotoki at gmail.com Wed Nov 13 13:16:30 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Wed, 13 Nov 2019 22:16:30 +0900 Subject: [ptg][PTL] Auto-generated etherpad links ! In-Reply-To: <86f9ea36-5c38-ef64-aa7c-dd5849143c5d@openstack.org> References: <86f9ea36-5c38-ef64-aa7c-dd5849143c5d@openstack.org> Message-ID: Hi, I created the wiki page for Ussuri PTG etherpad [1] based on the latest snapshot of the auto-generated etherpad links [2]. The auto-generated page to collect etherpad links is really useful, but it will be gone when the next PTG comes, so I believe the wiki page is still useful for memory. PTLs, please update the links of your projects if they are not up-to-date. Thanks, Akihiro Motoki (amotoki) [1] https://wiki.openstack.org/wiki/PTG/Ussuri/Etherpads [2] http://ptg.openstack.org/etherpads.html On Fri, Oct 11, 2019 at 12:45 AM Thierry Carrez wrote: > > Hi everyone, > > The PTGbot grew a new feature over the summer. It now dynamically > generates the list of PTG track etherpads. You can find that list at: > > http://ptg.openstack.org/etherpads.html > > If you haven't created your etherpad already, just follow the link there > to create your etherpad. > > If you have created your track etherpad already under a different name, > you can overload the automatically-generated name using the PTGbot. Just > join the #openstack-ptg channel and (as a Freenode authenticated user) > send the following command: > > #TRACKNAME etherpad > > Example: > #keystone etherpad https://etherpad.openstack.org/p/awesome-keystone-pad > > That will update the link on that page automatically. > > Hoping to see you in Shanghai! > > -- > Thierry Carrez (ttx) > From amotoki at gmail.com Wed Nov 13 13:46:23 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Wed, 13 Nov 2019 22:46:23 +0900 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: <20191113112004.ynsuamxrl7pa7hiq@skaplons-mac> References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> <98616CAC-FEBD-46C2-A7BF-A1EBAD58F78B@cern.ch> <59D1EFBF-B929-4814-9820-F5A9FAF9DA5C@rackspace.com> <20191113112004.ynsuamxrl7pa7hiq@skaplons-mac> Message-ID: Hi, After the neutron PTG session on the future of ML2/LinuxBridge, I discussed this topic in the ops-meetup PTG room (L.43- in [1]). - A lot of needs for Linux Bridge driver was expressed in the room. - LB is for simple network and many ops need it to keep deployment simple including a provider network without L3 feature. - The stats on Linux Bridge usage were shared as well. LB still has a large user base. 40% use Linux Bridge driver according to a survey in Wed's ops(?) session and the user survey last Oct shows 33% use Linux Bridge driver (63% use OVS based) [2]. This discussion does not mean the deprecation of the linux bridge driver. In my understanding, the main motivation is how the neutron team can keep the reference implementations simple. One example is that the features supported in the linux bridge driver are behind those in the OVS driver and some developers think this is the lack of the interest for the linux bridge driver, but this may show that most/not small number of linux bridge users just want simple features. That's my understanding in the PTG. Hope it helps the discussion :) Thanks, Akihiro [1] https://etherpad.openstack.org/p/shanghai-ptg-ops-meetup [2] Deployment Decisions in https://www.openstack.org/analytics On Wed, Nov 13, 2019 at 8:21 PM Slawek Kaplonski wrote: > > Hi, > > On Wed, Nov 13, 2019 at 04:31:20AM +0000, James Denton wrote: > > Appreciate the summary as well. > > > > For what it's worth, the ML2/LinuxBridge combo has been a very stable setup for us since its inception, and I'd hate to see it deprecated and removed for the sake of removing something. Last I checked, trunk ports were supported with the ML2/LinuxBridge driver. And while of course DVR is not a supported feature, a good number of our ML2/LXB environments forgo Neutron routers altogether in favor of putting VMs on the provider network. It has shown to be as performant as vanilla OVS, and a simpler model to implement and support as an operator. > > You're right. Trunk ports are ofcourse supported by LB agent. But e.g. some of > QoS rules aren't supported by this backend. > > > > > Just my two cents. > > > > Thanks, > > > > James Denton > > Network Engineer > > Rackspace Private Cloud > > james.denton at rackspace.com > > > > On 11/12/19, 3:41 PM, "Tim Bell" wrote: > > > > CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! > > > > > > Many thanks for the summaries. It’s really helpful for those who could not be in the discussions. > > > > CERN are also using ML2/Linuxbridge so we’d welcome being involved in any deprecation discussions and migration paths. > > > > Tim > > > > > On 12 Nov 2019, at 14:53, Slawek Kaplonski wrote: > > > > > > Hi Neutron team, > > > > > > First if all thank to all of You for great and very productive week during the > > > PTG in Shanghai. > > > Below is summary of our discussions from whole 3 days. > > > If I forgot about something, please respond to the email and update missing > > > informations. But if You want to have follow up discussion about one of the > > > topics from this summary, please start a new thread to keep this one only as > > > high level summary of the PTG. > > > > > > ... > > > > > Starting the process of removing ML2/Linuxbridge > > > ================================================ > > > > > > Currently in Neutron tree we have 4 drivers: > > > * Linuxbridge, > > > * Openvswitch, > > > * macvtap, > > > * sriov. > > > SR-IOV driver is out of discussion here as this driver is > > > addressing slightly different use case than other out drivers. > > > > > > We started discussion about above topic because we don't want to end up with too > > > many drivers in-tree and we also had some discussions (and we have spec for that > > > already) about include networking-ovn as in-tree driver. > > > So with networking-ovn in-tree we would have already 4 drivers which can be used > > > on any hardware: linuxbridge, ovs, macvtap and ovn. > > > Conclusions from the discussion are: > > > * each driver requires proper testing in the gate, so we need to add many new > > > jobs to our check/gate queue, > > > * currently linuxbridge driver don't have a lot of development and feature > > > parity gaps between linuxbridge and ovs drivers is getting bigger and bigger > > > (e.g. dvr, trunk ports), > > > * also macvtap driver don't have a lot of activity in last few cycles. Maybe > > > this one could be also considered as candidate to deprecation, > > > * we need to have process of deprecating some drivers and time horizon for such > > > actions should be at least 2 cycles. > > > * we will not remove any driver completly but rather we will move it to be in > > > stadium process first so it still can be maintained by people who are > > > interested in it. > > > > > > Actions to do after this discussion: > > > * Miguel Lavalle will contact RAX and Godaddy (we know that those are > > > Linuxbridge users currently) to ask about their feedback about this, > > > * if there are any other companies using LB driver, Nate Johnston is willing to > > > help conctating them, please reach to him in such case. > > > * we may ratify marking linuxbridge as deprecated in the team meeting during > > > Ussuri cycle if nothing surprising pops in. > > > > > > > > > > > -- > Slawek Kaplonski > Senior software engineer > Red Hat > > From mriedemos at gmail.com Wed Nov 13 13:58:25 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 07:58:25 -0600 Subject: [stable][glance] Proposal to remove Flavio Percoco from glance-stable-maint In-Reply-To: <22e5ebb5-b077-f503-3062-bf38a8347614@openstack.org> References: <2fee3345-c064-8428-8376-e8b5f114f422@gmail.com> <0191ee00-d6fb-80ba-2b74-db76e6219360@gmail.com> <22e5ebb5-b077-f503-3062-bf38a8347614@openstack.org> Message-ID: On 11/13/2019 3:32 AM, Thierry Carrez wrote: > Might be a good idea to remove him from stable-maint-core as well > (https://review.opendev.org/#/admin/groups/530,members) otherwise this > is a noop. Good point. Done. Alan and Chuck should probably come off that list as well. -- Thanks, Matt From balazs.gibizer at est.tech Wed Nov 13 14:21:52 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Wed, 13 Nov 2019 14:21:52 +0000 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: <4ffbcc4c-c043-5d7a-7f7a-d78de9fc75d7@gmail.com> References: <1573200108.23158.4@est.tech> <4ffbcc4c-c043-5d7a-7f7a-d78de9fc75d7@gmail.com> Message-ID: <1573654907.26082.2@est.tech> On Fri, Nov 8, 2019 at 08:03, Matt Riedemann wrote: > On 11/8/2019 2:01 AM, Balázs Gibizer wrote: >> * deployer needs to create the sharing disk RP and report inventory / >> traits on it >> * deployer needs to define the placement aggregate and add the >> sharing >> disk RP into it >> * when compute restarts and sees that 'using_shared_disk_provider' = >> True in the config, it adds the its compute RP to the aggregate >> defined >> in 'sharing_disk_aggregate' Then if it sees that the root RP still >> has >> DISK_GB inventory then trigger a reshape > > Does the compute host also get added to a nova host aggregate which > mirrors the resource provider aggregate in placmeent or do we only > need the placement resource provider sharing DISK_GB aggregate? As far as I see we only need the placement aggregate to make this work. > > -- > > Thanks, > > Matt > From balazs.gibizer at est.tech Wed Nov 13 14:22:17 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Wed, 13 Nov 2019 14:22:17 +0000 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> References: <1573200108.23158.4@est.tech> <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> Message-ID: <1573654932.26082.3@est.tech> On Fri, Nov 8, 2019 at 08:05, Matt Riedemann wrote: > On 11/8/2019 2:01 AM, Balázs Gibizer wrote: >> * when compute restarts and sees that 'using_shared_disk_provider' = >> True in the config, it adds the its compute RP to the aggregate >> defined >> in 'sharing_disk_aggregate' Then if it sees that the root RP still >> has >> DISK_GB inventory then trigger a reshape > > Conversely, if the deployer decides to use local disk for the host > again, what are the steps? > > 1. Change using_shared_disk_provider=False > 2. Restart/SIGHUP compute service > 3. Compute removes itself from the aggregate > 4. Compute reshapes to add DISK_GB inventory on the root compute node > resource provider and moves DISK_GB allocations from the sharing > provider back to the root compute node provider. > > Correct? Seems correct to me. gibi > > -- > > Thanks, > > Matt > From sbauza at redhat.com Wed Nov 13 14:34:16 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 13 Nov 2019 15:34:16 +0100 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: <1573654932.26082.3@est.tech> References: <1573200108.23158.4@est.tech> <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> <1573654932.26082.3@est.tech> Message-ID: On Wed, Nov 13, 2019 at 3:32 PM Balázs Gibizer wrote: > > > On Fri, Nov 8, 2019 at 08:05, Matt Riedemann > wrote: > > On 11/8/2019 2:01 AM, Balázs Gibizer wrote: > >> * when compute restarts and sees that 'using_shared_disk_provider' = > >> True in the config, it adds the its compute RP to the aggregate > >> defined > >> in 'sharing_disk_aggregate' Then if it sees that the root RP still > >> has > >> DISK_GB inventory then trigger a reshape > > > > Conversely, if the deployer decides to use local disk for the host > > again, what are the steps? > > > > 1. Change using_shared_disk_provider=False > > 2. Restart/SIGHUP compute service > > 3. Compute removes itself from the aggregate > > 4. Compute reshapes to add DISK_GB inventory on the root compute node > > resource provider and moves DISK_GB allocations from the sharing > > provider back to the root compute node provider. > > > > Correct? > > Seems correct to me. > > gibi > > Me too. To be clear, I don't think operators would modify the above but if so, they would need reshapes. > > > > -- > > > > Thanks, > > > > Matt > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed Nov 13 14:41:08 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 08:41:08 -0600 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: References: <1573200108.23158.4@est.tech> <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> <1573654932.26082.3@est.tech> Message-ID: On 11/13/2019 8:34 AM, Sylvain Bauza wrote: > Me too. To be clear, I don't think operators would modify the above but > if so, they would need reshapes. Maybe not, but this is the kind of detail that should be in the spec and functional tests to make sure it's solid since this is a big architectural change in nova. -- Thanks, Matt From rosmaita.fossdev at gmail.com Wed Nov 13 14:54:42 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 13 Nov 2019 09:54:42 -0500 Subject: [cinder] meeting time reminder Message-ID: <3906b712-f327-81d1-ed24-2962026ce69c@gmail.com> For people who were on Daylight Savings Time but now are not, just a reminder that this week's Cinder meeting at 16:00 UTC may be an hour earlier for you. The meeting-time-change-poll link will be in a separate email. If the time is changed, it will be effective for the first meeting in December (4 December 2019). That's because we'll be having the Virtual PTG during the last week of November (that poll will be out shortly). cheers, brian From rosmaita.fossdev at gmail.com Wed Nov 13 14:57:58 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 13 Nov 2019 09:57:58 -0500 Subject: [cinder] meeting time change poll Message-ID: The poll to help decide whether we move the time of the weekly Cinder meeting is now available: https://forms.gle/kA2JGzoxegy2KRDB6 The poll closes on 20 November at 23:59 UTC. If the time is changed, it will be effective for the first meeting in December (4 December 2019). cheers, brian From mriedemos at gmail.com Wed Nov 13 15:19:05 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 09:19:05 -0600 Subject: [nova][ironic] nova docs bug for ironic looking for an owner Message-ID: <453e2ccb-ef4f-0e5b-aa15-cacf0ca104e8@gmail.com> While discussing some tribal knowledge about how ironic is the black sheep of nova compute drivers I realized that we (nova) have no docs about the ironic driver like we do for other drivers, so we don't mention anything about the weird cardinality rules around compute service : node : instance and host vs nodename things, how to configure the service for HA mode, how to configure baremetal flavors with custom resource classes, how to partition for conductor groups, how to deal with scaling issues, missing features (migrate), etc. I've opened a bug in case someone wants to get started on some of that information: https://bugs.launchpad.net/nova/+bug/1852446 -- Thanks, Matt From rosmaita.fossdev at gmail.com Wed Nov 13 15:42:25 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 13 Nov 2019 10:42:25 -0500 Subject: [cinder] virtual PTG datetime selection poll Message-ID: At the Shanghai PTG last week, the Cinder team decided to hold a Virtual PTG the last week of November. The format will be 2 two-hour sessions, ideally held on consecutive days. Since most people have the 16:00 UTC Cinder meeting already on their calendars, I suggest that we skip the meeting on 27 November and instead use that time for Virtual PTG. So, basically, I'd like to meet: Wednesday for sure, and either Tuesday or Thursday (with Monday or Friday as possibilities if Tuesday or Thursday are impossible for too many people). Let me know what your preferences are on this poll: https://forms.gle/rKDJpSZvAxbnBESp7 The poll closes at 23:39 on *Tuesday* 19 November 2019. thanks, brian From openstack at nemebean.com Wed Nov 13 16:12:38 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 13 Nov 2019 10:12:38 -0600 Subject: [oslo] Adding Michael Johnson as Taskflow core Message-ID: <0558ec80-5894-7961-f2ac-3de502f90fe4@nemebean.com> Hi, After discussion with the Oslo team, we (and he) have agreed to add Michael as a Taskflow core. He's done more work on the project than anyone else still active in Oslo and also works on a project that consumes it so he likely understands it better than anyone else at this point. Welcome Michael and thanks for your contributions! -Ben From openstack at nemebean.com Wed Nov 13 16:22:44 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 13 Nov 2019 10:22:44 -0600 Subject: [tripleo] Adding Alex Schultz as OVB core Message-ID: <7562aee5-1ea2-2d8f-ebb5-9fa02d9dc354@nemebean.com> Hi, After a discussion with Wes in Shanghai about how to make me less of a SPOF for OVB, one of the outcomes was that we should try to grow the OVB core team. Alex has been reviewing a lot of the patches to OVB lately and obviously has a good handle on how all of this stuff fits together, so I've added him to the OVB core team. Thanks and congratulations(?) Alex! :-) -Ben From jasonanderson at uchicago.edu Wed Nov 13 16:42:25 2019 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Wed, 13 Nov 2019 16:42:25 +0000 Subject: [blazar] Why/how does Blazar use Keystone trusts? In-Reply-To: References: Message-ID: <73e756a3-029c-679b-b94a-5bca069c6799@uchicago.edu> Thank you both for the information. Some comments inline. On 11/13/19 5:10 AM, Pierre Riteau wrote: > Hi Jason, > > As you point out, reliance on trusts causes problems when users are > disabled or deleted from Keystone. In the past it even prevented > non-admin users from starting leases, see [1] for context. > > I believe there are some operations that could still benefit from the > use of trusts (or another mechanism to execute actions on behalf of > users), such as snapshot in the before_end event. It's possible that > with the current code, snapshot end up being owned by the blazar > service user. I don't think I've ever used this feature… That's a good point, I also haven't used that functionality. Though, I can't think of many cases where the system user couldn't just override the image owner on create. > For management of hosts specifically, I don't see why trusts should be > needed. I have a WIP patch to remove their use [2] which should fix > your issue. IIRC it just needs unit tests fixes, maybe some from > Chameleon could help to finish it? > > [1] https://bugs.launchpad.net/blazar/+bug/1663204 > [2] https://review.opendev.org/#/c/641103/ I did see this original patch. Yes, perhaps we can pick it up and see what to do with it. It does call out that, if trusts are removed, the notification payload would change. This likely is not used in practice, perhaps others on this list can chime in if that is not the case. > > On Wed, 13 Nov 2019 at 09:39, Sylvain Bauza wrote: >> Let me tell you a story. >> Long time ago, in a far far away galaxy, some people wanting to have a way to reserve some compute nodes in OpenStack created a new project that was named "Climate". >> Those folks weren't really knowing Keystone but they saw some problem : when the reservation was beginning, the token was expired. >> >> For that specific reason, they tried to see how to fix it and then saw Keystone trusts. They then said "heh, nice" and they started to use it. >> After 5 years, nobody really thought whether trusts should still be needed. Maybe the new Blazar team should look at service tokens, rather. >> >> Anyway, just my 2cts. >> >> -Sylvain Thanks for the historical context! Good to know that there aren't any technical blockers from considering simplifying this. >> >> On Wed, Nov 13, 2019 at 12:26 AM Jason Anderson wrote: >>> Hi Blazar contributors, >>> >>> We hit an issue today involving trusts in Blazar, where a host couldn't be deleted due to some issue authenticating against the trust associated with the host. We still haven't resolved this issue, but it felt odd to me: why is a trust even involved here? >>> >>> I have often wondered what the reason is for using trusts in Blazar, as I can't think of anything Blazar is doing that could not be done by the Blazar system user (and in fact, many operations are done via this user... via another trust.) There are also issues where a user leaves a project before their leases have ended; in this case Blazar has difficulty cleaning up because it tries to resurrect a trust that is not tied to a valid user/project relationship. >>> >>> Does anybody have context on to how trusts are used in Blazar and if they are still necessary? Does it make sense to remove this functionality? >>> >>> Thank you, >>> >>> -- >>> Jason Anderson >>> >>> Chameleon DevOps Lead >>> Consortium for Advanced Science and Engineering, The University of Chicago >>> Mathematics & Computer Science Division, Argonne National Laboratory Cheers, /Jason From moguimar at redhat.com Wed Nov 13 16:42:36 2019 From: moguimar at redhat.com (Moises Guimaraes de Medeiros) Date: Wed, 13 Nov 2019 17:42:36 +0100 Subject: [oslo] Adding Michael Johnson as Taskflow core In-Reply-To: <0558ec80-5894-7961-f2ac-3de502f90fe4@nemebean.com> References: <0558ec80-5894-7961-f2ac-3de502f90fe4@nemebean.com> Message-ID: Welcome Michael! On Wed, Nov 13, 2019 at 5:13 PM Ben Nemec wrote: > Hi, > > After discussion with the Oslo team, we (and he) have agreed to add > Michael as a Taskflow core. He's done more work on the project than > anyone else still active in Oslo and also works on a project that > consumes it so he likely understands it better than anyone else at this > point. > > Welcome Michael and thanks for your contributions! > > -Ben > > -- Moisés Guimarães Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed Nov 13 16:51:59 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 10:51:59 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event Message-ID: tl;dr: What do people think about storing and showing the *type* of exception that is recorded with a failed instance action event (like a fault) to the owner of the server who may not be an admin? Details: As noted here [1] and recreated here [2] the instance action event details that a non-admin owner of a server sees do not contain any useful information about what caused the failure of the action. Here is an example of a failed resize from that paste (this is what the non-admin owner of the server would see): $ openstack --os-compute-api-version 2.51 server event show vm2 req-11487504-da59-411b-b3b8-267bebe9b0d2 -f json -c events { "events": [ { "finish_time": "2019-11-13T16:18:27.000000", "start_time": "2019-11-13T16:18:26.000000", "event": "cold_migrate", "result": "Error" }, { "finish_time": "2019-11-13T16:18:27.000000", "start_time": "2019-11-13T16:18:26.000000", "event": "conductor_migrate_server", "result": "Error" } ] } Super useful, right? In this case scheduling failed for the resize so the instance is not in ERROR status which means the user cannot see a fault message with the NoValidHost error either. The admin can see the traceback in the failed action event list: $ openstack --os-compute-api-version 2.51 server event show 3ef043ea-e2d7-4565-a401-5c758e149f23 req-11487504-da59-411b-b3b8-267bebe9b0d2 -f json -c events { "events": [ { "finish_time": "2019-11-13T16:18:27.000000", "start_time": "2019-11-13T16:18:26.000000", "traceback": " File \"/opt/stack/nova/nova/conductor/manager.py\", line 301, in migrate_server\n host_list)\n File \"/opt/stack/nova/nova/conductor/manager.py\", line 367, in _cold_migrate\n raise exception.NoValidHost(reason=msg)\n", "event": "cold_migrate", "result": "Error" }, { "finish_time": "2019-11-13T16:18:27.000000", "start_time": "2019-11-13T16:18:26.000000", "traceback": " File \"/opt/stack/nova/nova/compute/utils.py\", line 1411, in decorated_function\n return function(self, context, *args, **kwargs)\n File \"/opt/stack/nova/nova/conductor/manager.py\", line 301, in migrate_server\n host_list)\n File \"/opt/stack/nova/nova/conductor/manager.py\", line 367, in _cold_migrate\n raise exception.NoValidHost(reason=msg)\n", "event": "conductor_migrate_server", "result": "Error" } ] } So when the admin gets the support ticket they can at least tell that scheduling failed and then dig into why. My idea is to store the exception *type* with the action event, similar to the recorded instance fault message for non-NovaExceptions [3] which will show to the non-admin owner of the server if the server status is ERROR or DELETED [4]. We should record the exc_val to get a prettier message like "No valid host was found." but that could leak details in the error message that we don't want non-admins to see [5]. With what I'm thinking, the non-admin owner of the server could see something like this for a failed event: { "finish_time": "2019-11-13T16:18:27.000000", "start_time": "2019-11-13T16:18:26.000000", "event": "cold_migrate", "result": "Error", "details": "NoValidHost" } That's pretty simple, doesn't leak details, and at least indicates to the user that maybe they can retry the resize with another flavor or something. It's just an example. This would require a microversion so before writing a spec I wanted to get general feelings about this in the mailing list. I accept that it might not really be worth the effort so that's good feedback if it's how you feel (I'll only cry a little). [1] https://review.opendev.org/#/c/693937/2/nova/objects/instance_action.py [2] http://paste.openstack.org/show/786054/ [3] https://github.com/openstack/nova/blob/20.0.0/nova/compute/utils.py#L101 [4] https://github.com/openstack/nova/blob/20.0.0/nova/api/openstack/compute/views/servers.py#L564 [5] https://bugs.launchpad.net/nova/+bug/1851587 -- Thanks, Matt From mriedemos at gmail.com Wed Nov 13 16:55:20 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 10:55:20 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: References: Message-ID: On 11/13/2019 10:51 AM, Matt Riedemann wrote: > We should record the exc_val to get a prettier message like "No valid > host was found." but that could leak details in the error message that > we don't want non-admins to see [5]. Typo above, should have been "We *could* record...". -- Thanks, Matt From openstack at fried.cc Wed Nov 13 17:17:25 2019 From: openstack at fried.cc (Eric Fried) Date: Wed, 13 Nov 2019 11:17:25 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: References: Message-ID: Unless it's likely to be something other than NoValidHost a significant percentage of the time, IMO it... On 11/13/19 10:51 AM, Matt Riedemann wrote: > might not really be worth the effort efried . From sylvain.bauza at gmail.com Wed Nov 13 17:35:42 2019 From: sylvain.bauza at gmail.com (Sylvain Bauza) Date: Wed, 13 Nov 2019 18:35:42 +0100 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: References: Message-ID: Le mer. 13 nov. 2019 à 18:27, Eric Fried a écrit : > Unless it's likely to be something other than NoValidHost a significant > percentage of the time, IMO it... > > On 11/13/19 10:51 AM, Matt Riedemann wrote: > > might not really be worth the effort > > efried > . > > FWIW, os-instance-actions is super useful for some ops, at least my customers :-) Having the exact same answer from this API than a nova show would be very nice honestly. So, yeah, please +1 to the spec and add me for a review :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed Nov 13 17:41:43 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 11:41:43 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: References: Message-ID: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> On 11/13/2019 11:17 AM, Eric Fried wrote: > Unless it's likely to be something other than NoValidHost a significant > percentage of the time, IMO it... Well just taking resize, it could be one of many things: https://github.com/openstack/nova/blob/20.0.0/nova/conductor/manager.py#L366 - oops you tried resizing which would screw up your group affinity policy https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L4490 - (for an admin, cold migrate) oops you tried cold migrating a vcenter vm or you have allow_resize_to_same_host=True and the scheduler picks the same host (silly scheduler, see bug 1748697) https://github.com/openstack/nova/blob/20.0.0/nova/compute/claims.py#L113 - oops you lost a resource claims race, try again https://github.com/openstack/nova/blob/20.0.0/nova/scheduler/client/report.py#L1898 - oops you lost a race with allocation consumer generation conflicts, try again -- Thanks, Matt From juliaashleykreger at gmail.com Wed Nov 13 17:48:53 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Wed, 13 Nov 2019 09:48:53 -0800 Subject: [ironic] Anyone using the process metrics collection functionality in ironic? Message-ID: A question from the PTG was raised if anyone was using the existing statsd metrics publishing support in ironic/ironic-python-agent to publish internal performance times? There seems to be some interest in expanding this metric data publishing capability so Prometheus can also be used, but before we really even think of heading down that path, we wanted to understand if there were present users of that feature. Thanks, -Julia From stig.openstack at telfer.org Wed Nov 13 17:54:48 2019 From: stig.openstack at telfer.org (Stig Telfer) Date: Wed, 13 Nov 2019 17:54:48 +0000 Subject: [sig] Forming a Large scale SIG In-Reply-To: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> Message-ID: <47730534-09AF-4018-A2AE-670D4725E569@telfer.org> Hi Thierry & all - Thanks for your mail. I’m interested in joining this SIG. Among others, I’m interested in participating in discussions around these common problems: - golden signals for scaling bottlenecks (and what to do about them) - using Ansible at scale - strategies for simplifying OpenStack functionality in order to scale Cheers, Stig > On 13 Nov 2019, at 11:18, Thierry Carrez wrote: > > Hi everyone, > > In Shanghai we held a forum session to gauge interest in a new SIG to specifically address cluster scaling issues. In the past we had several groups ("Large deployments", "Performance", LCOO...) but those efforts were arguably a bit too wide and those groups are now abandoned. > > My main goal here is to get large users directly involved in a domain where their expertise can best translate into improvements in the software. It's easy for such a group to go nowhere while trying to boil the ocean. To maximize its chances of success and make it sustainable, the group should have a narrow focus, and reasonable objectives. > > My personal idea for the group focus was to specifically address scaling issues within a single cluster: basically identify and address issues that prevent scaling a single cluster (or cell) past a number of nodes. By sharing analysis and experience, the group could identify common pain points that, once solved, would help raising that number. > > There was a lot of interest in that session[1], and it predictably exploded in lots of different directions, including some that are definitely past a single cluster (like making Neutron better support cells). I think it's fine: my initial proposal was more of a strawman. Active members of the group should really define what they collectively want to work on. And the SIG name should be picked to match that. > > I'd like to help getting that group off the ground and to a place where it can fly by itself, without needing external coordination. The first step would be to identify interested members and discuss group scope and objectives. Given the nature of the group (with interested members in Japan, Europe, Australia and the US) it will be hard to come up with a synchronous meeting time that will work for everyone, so let's try to hold that discussion over email. > > So to kick this off: if you are interested in that group, please reply to this email, introduce yourself and tell us what you would like the group scope and objectives to be, and what you can contribute to the group. > > Thanks! > > [1] https://etherpad.openstack.org/p/PVG-large-scale-SIG > > -- > Thierry Carrez (ttx) > From rosmaita.fossdev at gmail.com Wed Nov 13 17:58:07 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 13 Nov 2019 12:58:07 -0500 Subject: [cinder] meeting time change poll In-Reply-To: References: Message-ID: <74017827-bb03-24aa-59b0-8cd812e34a8f@gmail.com> On 11/13/19 9:57 AM, Brian Rosmaita wrote: > The poll to help decide whether we move the time of the weekly Cinder > meeting is now available: > > https://forms.gle/kA2JGzoxegy2KRDB6 The target of that link, a google form, may not be available in China. This one is probably not blocked: https://rosmaita.wufoo.com/forms/cinder-ussuri-meeting-time-poll/ If you already voted, please do NOT use the wufoo poll to vote again. I will collate the results from the two polls. > > The poll closes on 20 November at 23:59 UTC. > > If the time is changed, it will be effective for the first meeting in > December (4 December 2019). > > > cheers, > brian > From rosmaita.fossdev at gmail.com Wed Nov 13 17:59:09 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 13 Nov 2019 12:59:09 -0500 Subject: [cinder] virtual PTG datetime selection poll In-Reply-To: References: Message-ID: On 11/13/19 10:42 AM, Brian Rosmaita wrote: > At the Shanghai PTG last week, the Cinder team decided to hold a Virtual > PTG the last week of November. > > The format will be 2 two-hour sessions, ideally held on consecutive days. > > Since most people have the 16:00 UTC Cinder meeting already on their > calendars, I suggest that we skip the meeting on 27 November and instead > use that time for Virtual PTG. > > So, basically, I'd like to meet: Wednesday for sure, and either Tuesday > or Thursday (with Monday or Friday as possibilities if Tuesday or > Thursday are impossible for too many people). > > Let me know what your preferences are on this poll: > > https://forms.gle/rKDJpSZvAxbnBESp7 The target of that link, a google form, may not be available in China. This one is probably not blocked: https://rosmaita.wufoo.com/forms/cinder-ussuri-virtual-ptg/ If you already voted, please do NOT use the wufoo poll to vote again. I will collate the results from the two polls. > > The poll closes at 23:39 on *Tuesday* 19 November 2019. > > > thanks, > brian From openstack at nemebean.com Wed Nov 13 18:08:27 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 13 Nov 2019 12:08:27 -0600 Subject: [oslo] Virtual PTG Planning Message-ID: Hi Osloers, Given that a lot of the team was not in Shanghai and we had a few topics proposed that didn't make sense to discuss as a result, I would like to try doing a virtual PTG the way a number of the other teams are. I've added a section to the PTG etherpad[0] with some proposed details, but in general I'm thinking we meet on Jitsi (it's open source) around the time of the Oslo meeting. It's possible we might be able to get through everything in the regularly scheduled hour, but if possible I'd like to keep the following hour (1600-1700 UTC) open as well. If everyone's available we could do it next week (the 18th) or possibly the following week (the 25th), although that runs into Thanksgiving week in the US so people might be out. I've created a Doodle poll[1] with selections for the next three weeks so please respond there if you can make it any of those days. If none of them work well we can discuss alternative options. Thanks. -Ben 0: https://etherpad.openstack.org/p/oslo-shanghai-topics 1: https://doodle.com/poll/8bqiv865ucyt8499 From smooney at redhat.com Wed Nov 13 18:12:51 2019 From: smooney at redhat.com (Sean Mooney) Date: Wed, 13 Nov 2019 18:12:51 +0000 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> Message-ID: <2ce84a47ac59bdd160a71b37eaf05f0eca9e1f85.camel@redhat.com> On Tue, 2019-11-12 at 14:53 +0100, Slawek Kaplonski wrote: > Stateless security groups > ========================= > > Old RFE [21] was approved for neutron-fwaas project but we all agreed that this > should be now implemented for security groups in core Neutron. > People from Nuage are interested in work on this in upstream. > We should probably also explore how easy/hard it will be to implement it in > networking-ovn backend. for what its worth we implemented this 4 years ago and it was breifly used in production trial deployment in a telco deployment but i dont think it ever went to full production as they went wtih sriov instead https://review.opendev.org/#/c/264131/ as part of this RFE https://bugs.launchpad.net/neutron/+bug/1531205 which was closed as wont fix https://bugs.launchpad.net/neutron/+bug/1531205/comments/14 as it was view that this was not the correct long term direction for the community. this is the summit presentation for austin for anyone that does not rememebr this effort https://www.openstack.org/videos/summits/austin-2016/tired-of-iptables-based-security-groups-heres-how-to-gain-tremendous-speed-with-open-vswitch-instead im not sure how the new proposal differeres form our previous proposal for the same feautre but the main pushback we got was that the securtiy group api is assumed to be stateful and that is why this was rejected. form our mesurments at the time we expected the stateless approch to scale better then contrack driver so it woudl be nice to see a stateless approch avialable. i never got around to deleteing our implemenation form networking-ovs-dpdk https://opendev.org/x/networking-ovs-dpdk/src/branch/master/networking_ovs_dpdk/agent/ovs_dpdk_firewall.py but i has not been tested our updated really for the last 2 years but it could be used as a basis of this effort if nuage does not have a poc already. From mriedemos at gmail.com Wed Nov 13 18:53:27 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 12:53:27 -0600 Subject: [stable][glance] Proposal to add Abhishek Kekane to glance-stable-maint In-Reply-To: <86e16a59-8d38-6951-e779-26d97f4d2eaa@gmail.com> References: <86e16a59-8d38-6951-e779-26d97f4d2eaa@gmail.com> Message-ID: <23ee160b-ce08-4418-cdc3-756659452ea2@gmail.com> On 11/12/2019 2:17 PM, Brian Rosmaita wrote: > we are currently understaffed in glance-stable-maint.  Plus, he's the > current Glance PTL. glance-stable-maint is understaffed yes. I ran a reviewstats report on glance stable branch reviews over the last 180 days: http://paste.openstack.org/show/786058/ Abhishek has only done 3 stable branch reviews in 6 months which is pretty low but to be fair maybe there aren't that many open reviews on stable branches for glance and the other existing glance-stable-maint cores don't have a lot more reviews either, so maybe that's just par for the course. As for being core on master or being PTL, as you probably know, that doesn't really mean much when it comes to stable branch reviews, which is more about the stable branch guidelines. Nova has a few stable branch cores that aren't core on master because they adhere to the guidelines and do a lot of stable branch reviews. Anyway, I'm OK trusting Abhishek here and adding him to the glance-stable-maint team. Things are such these days that beggars can't really be choosers. -- Thanks, Matt From gmann at ghanshyammann.com Wed Nov 13 19:01:18 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 14 Nov 2019 03:01:18 +0800 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> Message-ID: <16e6624331a.e64752ca206466.2746510287828263922@ghanshyammann.com> ---- On Tue, 12 Nov 2019 22:12:29 +0800 Corey Bryant wrote ---- > > On Mon, Nov 11, 2019 at 2:33 PM Zane Bitter wrote: > On 7/11/19 2:11 pm, Corey Bryant wrote: > > Hello TC members, > > > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's > > too late to enable voting py38 unit tests for ussuri, I'd like to at > > least enable non-voting py38 unit tests. This email is seeking approval > > and direction from the TC to move forward with enabling non-voting py38 > > tests. > > I was a bit fuzzy on this myself, so I looked it up and this is what the > TC decided when we passed the resolution: > > > If the new Zuul template contains test jobs that were not in the previous one, the goal champion(s) may choose to update the previous template to add a non-voting check job (or jobs) to match the gating jobs in the new template. This means that all repositories that have not yet converted to the template for the upcoming release will see a non-voting preview of the new job(s) that will be added once they update. If this option is chosen, the non-voting job should be limited to the master branch so that it does not run on the preceding release’s stable branch. > > > Thanks for digging that up and explaining. I recall that wording and it makes a lot more sense now that we have a scenario in front of us. > > (from > https://governance.openstack.org/tc/resolutions/20181024-python-update-process.html#unit-tests > ) > > So to follow that process we would need to define the python versions > for V, then appoint a goal champion, and after that it would be at the > champion's discretion to add a non-voting job on master in Ussuri. I > happened to be sitting next to Sean when I saw this thread, and after > discussing it with him I think he would OK with having a non-voting job > on every commit, since it's what we have documented. Previous > discussions established that the overhead of adding one Python unit test > job to every project was pretty inconsequential (we'll offset it by > dropping 2.7 jobs anyway). > > I submitted a draft governance patch defining the Python versions for V > (https://review.opendev.org/693743). Unfortunately we can't merge it yet > because we don't have a release name for V (Sean is working on that: > https://review.opendev.org/693266). It's gazing in the crystal ball a > > Thanks very much for getting that going. > little bit, but even if for some reason Ubuntu 20.04 is not released > before the V cycle starts, it's inevitable that we will be selecting > Python 3.8 because it meets the first criterion ("The latest released > version of Python 3 that is available in any distribution we can > feasibly use for testing") - 3.8 is released and it's available in > Ubuntu 18.04, which is the distro we use for testing anyway. > > So, in my opinion, if you're volunteering to be the goal champion then > there's no need for any further approval by the TC ;) > > > Sure, I can champion that. Just to be clear, would that be Ussuri and V python3-updates champion, similar to the following? > https://governance.openstack.org/tc/goals/selected/train/python3-updates.html > Granted it's easier now that we mostly just have to switch the job template to the new release. > I guess to make that official we should commit the python3 update Goal > for the V cycle now... or at least as soon as we have a release name. > > How far off do you think we are from having a V name? If just a few weeks then I'm fine waiting but if over a month I'm more concerned. > > This is happening a little earlier than I think we anticipated but, > given that there's no question what is going to happen in V, I don't > think we'd be doing anybody any favours by delaying the process > unnecessarily. ++ on not delaying the process. That is the main point of the goal process schedule also. To be clear, are we going to add the py3.8 n-v job as part of v cycle template (openstack-python3-v*-jobs) ? I hope yes, as it will enable us to make the one-time change on the project's side. Once we are in V cycle then template can be updated to make it a voting job. If not as part of the template (adding n-v job explicitly in Ussuri cycle and then add the V template once V cycle starts. ) then it will be two changes per project which I would like to avoid. -gmann > I agree. And Python 3.9 doesn't release until Oct 2020 so that won't be in the picture for Ussuri or V. > > > > For some further background: The next release of Ubuntu, Focal (20.04) > > LTS, is scheduled to release in April 2020. Python 3.8 will be the > > default in the Focal release, so I'm hopeful that non-voting unit tests > > will help close some of the gap. > > > > I have a review here for the zuul project template enablement for ussuri: > > https://review.opendev.org/#/c/693401 > > > > Also should this be updated considering py38 would be non-voting? > > https://governance.openstack.org/tc/reference/runtimes/ussuri.html > > No, I don't think this changes anything for Ussuri. It's preparation for V. > > > Ok. Appreciate all the input and help. > Thanks,Corey > From Albert.Braden at synopsys.com Wed Nov 13 19:30:15 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Wed, 13 Nov 2019 19:30:15 +0000 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> Message-ID: Removing these 3 obsolete filters appears to have fixed the problem. Thank you for your advice! -----Original Message----- From: Matt Riedemann Sent: Tuesday, November 12, 2019 1:14 PM To: openstack-discuss at lists.openstack.org Subject: Re: Scheduler sends VM to HV that lacks resources On 11/12/2019 2:47 PM, Albert Braden wrote: > It's probably a config error. Where should I be looking? This is our nova config on the controllers: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_kNe1eRimk4ifrAuuN790bg&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=TZI4wT8_y-RAnwbbXaWBhdvAhhcbY1qymxKLRVpPt2U&s=3aQNqwtEMfOC7U_QUTqNqXiZv4yJy6ceB4kCuZKuL0o&e= If your deployment is pike or newer (I'm guessing rocky because your other email says rocky), then you don't need these filters: RetryFilter - alternate hosts bp in queens release makes this moot CoreFilter - placement filters on VCPU RamFilter - placement filters on MEMORY_MB -- Thanks, Matt From juliaashleykreger at gmail.com Wed Nov 13 19:35:23 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Wed, 13 Nov 2019 11:35:23 -0800 Subject: [ironic][ptg] Summary of discussions/happenings related to ironic Message-ID: Overall, There was quite a bit of interest in Ironic. We had great attendance for the Project Update, Rico Lin’s Heat/Ironic integration presentation, demonstration of dhcp-less virtual media boot, and the forum discussion on snapshot support for bare metal machines, and more! We also learned there are some very large bare metal clouds in China, even larger than the clouds we typically talk about when we discuss scale issues. As such, I think it would behoove the ironic community and OpenStack in general to be mindful of hyper-scale. These are not clouds with 100s of compute nodes, but with baremetal clouds containing thousands to tens of thousands of physical bare metal machines. So in no particular order, below is an overview of the sessions, discussions, and commentary with additional status where applicable. My apologies now since this is over 4,000 words in length. Project Update =========== The project update was fairly quick. I’ll try and record a video of it sometime this week or next and post it online. Essentially Ironic’s code addition/deletion levels are relatively stable cycle to cycle. Our developer and Ironic operator commit contribution levels have increased in Train over Stein, while the overall pool of contributors has continued to decline cycle after cycle, although not dramatically. I think the takeaway from this is that as ironic has become more and more stable, and that the problems being solved in many cases are operator specific needs or wants, or bug fixes in cases that are only raised in particular environment configurations. The only real question that came out of the project update was, if my memory is correct, was “What does Metal^3 mean for Ironic”, and “Who is driving forward Metal^3?” The answers are fairly straight forward, more ironic users and more use cases from Metal^3 driving ironic to deploy machines. As for who is driving it forward, it is largely being driven forward by Red Hat along with interested communities and hardware vendors. Quick, Solid, and Automatic OpenStack Bare-Metal Orchestration ================================================== Rico Lin, the Heat PTL, proposed this talk promoting the possibility of using ironic naively to deploy bare metal nodes. Specifically where configuration pass-through can’t be made generic or somehow articulated through the compute API. Cases where they may be is where someone wishes to utilize something like our “ramdisk” deploy_interface which does not deploy an image to the actual physical disk. The only real question that I seem to remember coming up was the question why might someone want or need to do this, which again becomes more of a question of doing things that are not quite “compute” API-ish. The patches are available in gerrit[10]. Operator Feedback Session ===================== The operator feedback[0] session was not as well populated with maybe ~20-25 people present. Overall the feeling of the room was that “everything works”, however there is a need and desire for information and additional capabilities. * Detailed driver support matrix * Reduce the deployment times further * Disk key rotation is an ask from operators for drives that claim smart erase support but end up doing a drive wipe instead. In essence, to reduce the overall time spent cleaning * Software RAID is needed at deploy time. * IPA needs improved error handling. - This may be a case where something of the communication flow changes that had been previously discussed could help in that we could actively try and keep track of the agent a little more. Additional discussion will definitely be required. * There does still seem to be some interest in graphical console support. A contributor has been revising patches, but I think it would really help for a vendor to become involved here and support accessing their graphical interface through such a method. * Information and an information sharing location is needed. I’ve reached out to the Foundation staff regarding the Bare Metal Logo Program to see if we can find a common place that we can build/foster moving forward. In this topic, the one major pain point began being stressed, issues with the resource tracker at 3,500 bare metal nodes. Privately another operator reached out with the same issue in the scale of tens of thousands of bare metal nodes. As such, this became a topic during the PTG which gained further discussion. I’ll cover that later. Ironic – Snapshots? =============== As a result of some public discussion of adding snapshot capability, I proposed a forum session to discuss the topic[1] such that requirements can be identified and the discussion can continue over the next cycle. I didn't expect the number of attendees present to swell from the operator's feedback session. The discussion of requirements went back and forth to ultimately define "what is a snapshot" in this case, and "what should Ironic do?" There was quite a bit of interaction in this session and the consensus seemed to be the following: * Don’t make it reliant on nova, for standalone users may want/need to use it. * This could be a very powerful feature as an operator could ``adopt`` a machine into ironic and then ``snapshot`` it to capture the disk contents. * Block level only and we can’t forget about capturing/storing content checksum * Capture the machine’s contents with the same expectation as we would have for a VM, and upload this to someplace. In order to make this happen in a fashion which will scale, the ironic team will likely need to leverage the application credentials. Ironically reeling in large bare metal deployment without PXE ============================================== This was a talk submitted by Ilya Etingof, who unfortunately was unable to make it to the summit. Special thanks goes to Both Ilya and Richard Pioso for working together to make this demonstration happen. The idea was to demonstrate where the ironic team sees the future of deployment of machines on the edge using virtual media and how vendors would likely interact with that in some cases as slightly different mechanics may be required even if the BMCs all speak Redfish, which is the case for a Dell iDRAC BMC. The idea[2] ultimately being is that the conductor would inject the configuration information into the virtual media ISO image that is attached via virtual media negating the need for DHCP. We have videos posted that allow those interested to see what this functionality looks like with neutron[3] and without neutron[4]. While the large audience was impressed, it seemed to be a general surprise that Ironic had virtual media support in some of the drivers previously. This talk spurred quite a bit of conversation and hallway track style discussion after the presentation concluded which is always an excellent sign. Project Teams Gathering =================== The ironic community PTG attendance was nothing short of excellent. Thank you everyone who attended! At one point we had fifteen people and a chair had to be pulled up to our table for a 16th person to join us. At which point, we may have captured another table and created confusion. We did things a little differently this time around. Given some of the unknowns, we did not create a strict schedule around the topics. We simply went through and prioritized topics and tried to discuss them each as thoroughly as possible until we had reached the conclusion or a consensus on the topic. Topics and a few words on each topic we discussed in the notes section on the PTG etherpad[5]. On-boarding ----------------- We had three contributors that attended a fairly brief on-boarding overview of Ironic. Two of them were more developer focused where as the third was more of an operator focus looking to leverage ironic and see how they can contribute back to the community. BareMetal SIG - Next Steps ------------------------------------- Arne Wiebalck and I both provided an update including current conversations where we saw the SIG, the Logo Program, the white paper, and what should the SIG do beyond the whitepaper. To start with the Logo program, it largely seems there that somewhere along the way a message or document got lost and that largely impacted the Logo Program -> SIG feedback mechanism. I’m working with the OpenStack Foundation to fix that and get communication going again. Largely what spurred that was that some vendors expressed interest in joining, and wanted additional information. As for the white paper, contributions are welcome and progress is being made again. >From a next steps standpoint, the question was raised how do we build up an improved Operator point of contact. There was some consensus that we as a community should try to encourage at least one contributor to attend the operations mid-cycles. This allows for a somewhat shorter feedback look with a different audience. We also discussed knowledge sharing, or how to improve it. Included with this is how do we share best practices. I’ve got the question out to folks at the foundation if there is a better way as part of the Logo program, or if we should just use the Wiki. I think this will be an open discussion topic in the coming weeks. The final question that came up as part of the SIG is how to show activity. I reached out to Amy on the UC regarding this, and it seems the process is largely just reach out to the current leaders of the SIG, so it is critical that we keep that up to date moving forward. Sensor Data/Metrics --------------------------- The barrier between Tenant level information and Operator level information is difficult with this topic. The consensus among the group was that the capability to collect some level of OOB sensor data should be present on all drivers, but there is also a recognition that this comes at a cost and possible performance impact. Mainly this performance impact question was raised with Redfish because this data is scattered around the API where multiple API calls are required, and may even cause some interruption to actively inquire upon some data points. The middle ground in the discussion came to adding a capability of somehow saying “collect power status, temp every minute, fan speeds every five minutes, drive/cpu health data maybe every 30 minutes”. I would be remiss if I didn't note that there was joking about how this would in essence be re-implementation of Cron. What this would end up looking like, we don’t know, but it would provide operators the data resolution necessary for the failure risk/impact. The analogy used was that “If the temperature sensor has risen to an alarm level, either a AC failure or a thermal hot spot forming based upon load in the data center, checking the sensor too often is just not going to result in a human investigating that on the data center floor any faster.” Mainly I believe this discussion largely stresses that the information is for the operator of the bare metal and not to provide insight into a tenant monitoring system, that those activities should largely be done with-in the operating system. One question among the group was if anyone was using the metrics framework built into ironic already for metrics of ironic itself, to see if we can re-use it. Well, it uses a plugin interface! In any event, I've sent a post to the openstack-discuss mailing list seeking usage information. Node Retirement ----------------------- This is a returning discussion from the last PTG, and in discussing the topic we figured out where the discussion became derailed at previously. In essence, the desire was to mix this with the concept of being able to take a node “out of service”. Except, taking a node out of service is an immediate state related flag, where as retiring might be as soon as the current tenant vacates the machine… possibly in three to six months. In other words, one is “do something or nothing now”, and the other is “do something later when a particular state boundary is crossed”. Trying to make one solution for both, doesn’t exactly work. Unanimous consensus among those present was that in order to provide node retirement functionality, that the logic should be similar to maintenance/maintenance reason. A top level field in the node object that would allow API queries for nodes slated for retirement, which helps solve an operator workflow conundrum “How do I know what is slated for retirement but not yet vacated?” Going back to the “out of service” discussion, we reached consensus that this was in essence a “user declarable failed state”, and as such that it should be done only in the state machine as it is in the present, not a future action. Should we implement out of service, we’ll need to check the nova.virt.ironic code and related virt code to properly handle nodes dropping from `ACTIVE` state, which could also be problematic and need to be API version guarded to prevent machines from accidentally entering `ERROR` state if they are not automatically recovered in nova. Multi-tenancy ------------------ Lots of interest existed around making the API somewhat of a mutli-tenant aware interaction, and the exact interactions and uses involved there are not exactly clear. What IS clear is that providing functionality as such will allow operators to remove complication in their resource classes and tenant specific flavors which is presently being used to enable tenant specific hardware pools. The added benefit of providing some level for normally non-admin users to access the ironic API is that it would allow those tenants to have a clear understanding of their used resources and available resources by directly asking ironic, where as presently, they don’t have a good way to collect nor understand that short of asking the cloud operator when it comes to bare metal. Initial work has been posted for this to gerrit[6]. In terms of how tenants resources would be shared, there was consensus that the community should stress that new special use tenants should be created for collaborative efforts. There was some discussion regarding explicitly dropping fields for non-privileged users that can see the nodes, such as driver_info and possibly even driver_internal_info. Definitely a topic that requires more discussion, but that would solve operator reporting and use headaches. Manual Cleaning Out-Of-Band ---------------------------------------- The point was raised that we unconditionally start the agent ramdisk to perform manual cleaning. Except, we should support a method of out of band cleaning operators to only be executed so the bare metal node doesn’t need to be booted to a ramdisk. The consensus seemed to be that we should consider a decorator or existing decorator change that allows the conductor to hold off actually powering the node on for ramdisk boot unless or until a step is reached that is not purely out of band. In essence, fixing this allows a “fix_bmc” out of band clean step to be executed first without trying to modify BMC settings, which would presently fail. Scale issues ----------------- A number of scaling issues between how nova and ironic interact, specifically with the resource tracker and how inventory is updated from ironic and loaded into nova. Largely this issue revolves around the concept in nova that each ``nova-compute`` is a hypervisor. And while one can run multiple ``nova-compute`` processes to serve as the connection to ironic, the underlying lock in Nova is at the level of the compute node, not the node level. This means as thousands of records are downloaded, synced, copied into the resource tracker, the compute process is essentially blocked from other actions while this serialized job runs. In a typical VM case, you may only have at most a couple hundred VMs on a hypervisor, where as with bare metal, we’re potentially servicing thousands of physical machines. It should be noted that there are several large scale operators that indicated during the PTG that this was their pain point. Some of the contributors from CERN sat down with us and the nova team to try and hammer out a solution to this issue. A summary of that cross project session can be found at line 212 in the PTG etherpad[0]. But there is another pain point that contributes to this performance issue and that is the speed at which records are returned by our API. We’ve had some operators voice some frustration with this before, and we should at least be mindful of this and hopefully see if we can improve record retrieval performance. In addition to this, if we supported some form of bulk “GET” of nodes, it might be able to be leveraged as opposed to a get on each node one at a time which is presently what occurs in the nova-compute process. Boot Mode Config ------------------------ Previously, when scheduling occurred with flavors and filters were appropriately set, if a machine was declared as supporting only one boot mode, requests would only ever land on that node. Now with Traits, this is a bit different and unfortunately optional without logic to really guard the setting application for an instance. So in this case, if filters are such that a request for a Legacy boot instance lands on a UEFI only machine, we’ll still try to deploy it. In reality, we really should try and fail fast. Ideally the solution here is we consult with the BMC through some sort of get_supported_boot_modes method, and if we determine a mismatch between what the settings are or what the requested instance is from the data we have, we fail the deploy. This ultimately may require work in the nova.virt.ironic driver code to identify the cause of the failure being an invalid configuration and reporting that back, however it may not be fatal on another machine. Security of /heartbeat and /lookup endpoints ----------------------------------------------------------- We had a discussion of adding some additional layers of security mechanics around the /heartbeat and /lookup endpoints in ironic’s REST API. These limited endpoints are documented as being unauthenticated, so naturally some issues can arise from these and we want to minimize the vectors in which an attacker that has gained access to a cleaning/provisioning/rescue network could possibly impersonate a running ironic-python-agent. Conversely, the ironic-python-agent runs in a similar fashion, intended to run on secure trusted networks which is only accessible to the ironic-conductor. As such, we also want to add some validation to the API request is from the same Ironic deployment that IPA is heart-beating to. The solution to this introduce a limited lifetime token that is unique per node per deployment. It would be stored in RAM on the agent, and in the node.driver_internal_info so it is available to the conductor. It would be provided only once via out of band OR via the first “lookup” of a node, and then only become accessible again during known reboot steps. Conceptually the introduction of tokens was well supported in the discussions and there were zero objections to doing so. Some initial patches[7][8] are under development to move this forward. An additional item is to add IP address filtering capabilities to both endpoints such that we only process the heartbeat/lookup address if we know it came from the correct IP address. An operator has written this feature downstream and consensus was unanimous at the PTG that we should accept this feature upstream. We should expect a patch for this functionality to be posted soon. Persistent Agents ------------------------ The use case behind persistent agents is “I want to kexec my way to the agent ramdisk, or the next operating system.” and “I want to have up to date inspection data.” We’ve already somewhat solved the latter, but the former is a harder problem requiring the previously mentioned endpoint security enhancements to be in-place first. There is some interest from CERN and some other large scale operators. In other words, we should expect more of this from an bare metal fleet operations point of view for some environments as we move forward. “Managing hardware the Ironic way” ------------------------------------------------- The question that spurred this discussion was “How do I provide a way for my hardware manager to know what it might need to do by default.” Except, those defaults may differ between racks that serve different purposes. “Rack 1, node0” may need a port set to FiberChannel mode, where as “Rack2, node1” may require it to be Ethernet. This quickly also reaches the discussion of “What if I need different firmware versions by default?” This topic quickly evolved from there and the idea that surfaced was that we introduce a new field on the node object for the storage of such data. Something like ``node.default_config``, where it would be a dictionary sort of like what a user provides for cleaning steps or deploy steps, that provides argument values which is iterated through when in automated cleaning mode to allow operators to fill in configuration requirement gaps for hardware managers. Interestingly enough, even today we just had someone ask a similar question in IRC. This should ultimately be usable to assert desired/default firmware from an administrative point of view. Adrianc (Mellanox) is going to reach out to bdobb (DMTF) regarding the redfish PLDM firmware update interface to see where this may go from here. Edge computing working group session ---------------------------------------------------- The edge working group largely became a session to update everyone on where Ironic was going and where we see things going in terms of managing bare metal at the edge/far-edge. This included some in-depth questions about dhcp-less deployment and related mechanics as well as HTTPBoot’ing machines. Supporting HTTPBoot does definitely seem to be of interest to a number of people, although at least after sharing my context only five or six people in attendance really seemed interested in ironic prioritizing such functionality. The primary blocker, for those that are unaware, is pre-built UEFI images for us to do integration testing for IPv4 HTTPBoot. Functionally ironic already supports IPv6 HTTPBoot via DHCPv6 as part of our IPv6 support with PXE/iPXE, however we also don’t have an integration test job for this code path for the same reason, pre-built UEFI firmware images lack the built-in support. More minor PTG topics ------------------------------- * Smartnics - A desire to attach virtual ports in ironic baremetal nodes with smartnics was raised. Seems that we don’t need to try and create a port entry in ironic. It seems we only need to track/signal and remove the “vif” attachment” to the node in general as there is no physical mac required for that virtual port in ironic. The constraint that at least one MAC address would be required to identify the machine is understood. If anyone sees an issue with this, please raise this to adrianc. * Metal^3 - Within the group attending the PTG, there was not much interest in Metal^3 or using CRDs to manage bare metal resources with ironic hidden behind the CRD. One factor related to this is the desire to define more data to be passed through to ironic which is not presently supported in the CRD definition.. Stable Backports with Ironic's release model ================================== I was pulled into a discussion with the TC and the Stable team regarding frustrations that have been expressed with-in the ironic team regarding stable back-porting of fixes, mainly drivers. There is consensus that it is okay for us as the ironic team to backport drivery things when needed to support vendors as long as they are not breaking or overall behavior contracts. This quickly leads us to needing to also modify constraints for drivery things as well. Constraints changes will continue to be evaluated on a case by case basis, but the general consensus is there is full support to "do the right thing" for ironic's users, vendors, and community. The key is making sure we are on the same page and agreeing to what that right thing is. This is where asynchronous communication can get us into trouble, and I would highly encourage trying to start higher bandwidth discussion when these cases arise in the future. The key takeaway that we should likely keep in mind is policy is there for good reasons, but policy is not and can not be a crutch to prevent the right thing from being done. Additional items worth noting - Q1 Gatherings =================================== There will be an operations mid-cycle at Bloomberg in London, January 7th-8th, 2020. It would be good if at least one ironic contributor could attend as the operators group tends to be closer to the physical baremetal, and it is a good chance to build mutual context between developers and operations people actually using our software. Additionally, we want to gauge the interest of having an ironic mid-cycle in central Europe in Q1 of 2020. We need to identify the number of contributors that would be interested in and able to attend since the next PTG will be in June. Please email me off-list if your interested in attending and I'll make a note of it as we're still having initial discussions. And now I've reached a buffer under-run on words. If there are any questions, just reply to the list. -Julia Links: [0]: https://etherpad.openstack.org/p/PVG-ironic-operator-feedback [1]: https://etherpad.openstack.org/p/PVG-ironic-snapshot-support [2]: https://review.opendev.org/#/c/672780/ [3]: https://tinyurl.com/vwts36l [4]: https://tinyurl.com/st6azrw [5]: https://etherpad.openstack.org/p/PVG-Ironic-Planning [6]: https://review.opendev.org/#/c/689551/ [7]: https://review.opendev.org/692609 [8]: https://review.opendev.org/692614 [9]: https://etherpad.openstack.org/p/ops-meetup-1st-2020 [10]: https://review.opendev.org/#/q/topic:story/2006403+(status:open+OR+status:merged) From corey.bryant at canonical.com Wed Nov 13 19:43:29 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Wed, 13 Nov 2019 14:43:29 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: <16e6624331a.e64752ca206466.2746510287828263922@ghanshyammann.com> References: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> <16e6624331a.e64752ca206466.2746510287828263922@ghanshyammann.com> Message-ID: On Wed, Nov 13, 2019 at 2:01 PM Ghanshyam Mann wrote: > ---- On Tue, 12 Nov 2019 22:12:29 +0800 Corey Bryant < > corey.bryant at canonical.com> wrote ---- > > > > On Mon, Nov 11, 2019 at 2:33 PM Zane Bitter wrote: > > On 7/11/19 2:11 pm, Corey Bryant wrote: > > > Hello TC members, > > > > > > Python 3.8 is available in Ubuntu Bionic now and while I understand > it's > > > too late to enable voting py38 unit tests for ussuri, I'd like to at > > > least enable non-voting py38 unit tests. This email is seeking > approval > > > and direction from the TC to move forward with enabling non-voting > py38 > > > tests. > > > > I was a bit fuzzy on this myself, so I looked it up and this is what > the > > TC decided when we passed the resolution: > > > > > If the new Zuul template contains test jobs that were not in the > previous one, the goal champion(s) may choose to update the previous > template to add a non-voting check job (or jobs) to match the gating jobs > in the new template. This means that all repositories that have not yet > converted to the template for the upcoming release will see a non-voting > preview of the new job(s) that will be added once they update. If this > option is chosen, the non-voting job should be limited to the master branch > so that it does not run on the preceding release’s stable branch. > > > > > > Thanks for digging that up and explaining. I recall that wording and it > makes a lot more sense now that we have a scenario in front of us. > > > > (from > > > https://governance.openstack.org/tc/resolutions/20181024-python-update-process.html#unit-tests > > ) > > > > So to follow that process we would need to define the python versions > > for V, then appoint a goal champion, and after that it would be at the > > champion's discretion to add a non-voting job on master in Ussuri. I > > happened to be sitting next to Sean when I saw this thread, and after > > discussing it with him I think he would OK with having a non-voting job > > on every commit, since it's what we have documented. Previous > > discussions established that the overhead of adding one Python unit > test > > job to every project was pretty inconsequential (we'll offset it by > > dropping 2.7 jobs anyway). > > > > I submitted a draft governance patch defining the Python versions for V > > (https://review.opendev.org/693743). Unfortunately we can't merge it > yet > > because we don't have a release name for V (Sean is working on that: > > https://review.opendev.org/693266). It's gazing in the crystal ball a > > > > Thanks very much for getting that going. > > little bit, but even if for some reason Ubuntu 20.04 is not released > > before the V cycle starts, it's inevitable that we will be selecting > > Python 3.8 because it meets the first criterion ("The latest released > > version of Python 3 that is available in any distribution we can > > feasibly use for testing") - 3.8 is released and it's available in > > Ubuntu 18.04, which is the distro we use for testing anyway. > > > > So, in my opinion, if you're volunteering to be the goal champion then > > there's no need for any further approval by the TC ;) > > > > > > Sure, I can champion that. Just to be clear, would that be Ussuri and V > python3-updates champion, similar to the following? > > > https://governance.openstack.org/tc/goals/selected/train/python3-updates.html > > Granted it's easier now that we mostly just have to switch the job > template to the new release. > > I guess to make that official we should commit the python3 update Goal > > for the V cycle now... or at least as soon as we have a release name. > > > > How far off do you think we are from having a V name? If just a few > weeks then I'm fine waiting but if over a month I'm more concerned. > > > > This is happening a little earlier than I think we anticipated but, > > given that there's no question what is going to happen in V, I don't > > think we'd be doing anybody any favours by delaying the process > > unnecessarily. > > ++ on not delaying the process. That is the main point of the goal process > schedule also. > To be clear, are we going to add the py3.8 n-v job as part of v cycle > template (openstack-python3-v*-jobs) ? I hope yes, as > it will enable us to make the one-time change on the project's side. Once > we are in V cycle then template can be updated to make it a voting job. > > If not as part of the template (adding n-v job explicitly in Ussuri cycle > and then add the V template once V cycle starts. ) then it will be two > changes per project which I would like to avoid. > > -gmann > > My plan is to create V templates soon which will include voting py38. And ussuri templates will have non-voting py38: https://review.opendev.org/#/c/693401/ I was thinking we couldn't add V templates to projects until after their stable/ussuri branches are created, which would mean one update per project per release. Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Wed Nov 13 19:43:56 2019 From: melwittt at gmail.com (melanie witt) Date: Wed, 13 Nov 2019 11:43:56 -0800 Subject: [nova] track error migrations and orphans in Resource Tracker In-Reply-To: References: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> Message-ID: On 11/12/19 05:18, Sean Mooney wrote: > On Tue, 2019-11-12 at 05:46 +0000, Zhong, Luyao wrote: >> Hi Nova experts, >> >> "Not tracking error migrations and orphans in RT." is probably a bug. This may trigger some problems in >> update_available_resources in RT at the moment. That is some orphans or error migrations are using cpus/memory/disk >> etc, but we don't take these usage into consideration. And instance.resources is introduced from Train used to contain >> specific resources, we also track assigned specific resources in RT based on tracked migrations and instances. So this >> bug will also affect the specific resources tracking. >> >> I draft an doc to clarify this bug and possible solutions: >> https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT >> Looking forward to suggestions from you. Thanks in advance. >> > there are patche up to allow cleaning up orpahn instances > https://review.opendev.org/#/c/627765/ > https://review.opendev.org/#/c/648912/ > if we can get those merged that woudl adress at least some of the proablem I just wanted to mention: I have reviewed the cleanup patches ^ multiple times and I'm having a hard time getting past the fact that any way you slice it (AFAICT), the cleanup code will have a window where a valid guest could be destroyed erroneously (not an orphan). This is because the "get instance list by host" can miss instances that are mid-migration, because of how/where we update the instance.host field. Maybe this ^ could be acceptable (?) if we put a big fat warning on the config option help for 'reap_unknown'. But I was unsure of the answers about what recovery looks like in case a guest is erroneously destroyed for an instance that is in the middle of migrating. In the case of resize or cold migrate, a hard reboot would fix it AFAIK. What about for a live migration? If recovery is possible in every case, those would also need to be documented in the config option help for 'reap_unknown'. The patch has lots of complexities to think about and I'm left wondering if the pitfalls are better or worse than the current state. It would help if others joined in the review with their thoughts about it. -melanie From mriedemos at gmail.com Wed Nov 13 19:51:33 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 13:51:33 -0600 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> Message-ID: On 11/13/2019 1:30 PM, Albert Braden wrote: > Removing these 3 obsolete filters appears to have fixed the problem. Thank you for your advice! Awesome, I'm glad it worked. -- Thanks, Matt From mriedemos at gmail.com Wed Nov 13 19:53:27 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 13:53:27 -0600 Subject: [nova] track error migrations and orphans in Resource Tracker In-Reply-To: References: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> Message-ID: On 11/13/2019 1:43 PM, melanie witt wrote: > This is because the "get instance list by host" can miss instances that > are mid-migration, because of how/where we update the instance.host field. Why not just filter out any instances that have a non-None task_state? Or barring that, filter out any instances that have an in-progress migration (there is a method that the ResourceTracker uses to get those kinds of migrations occurring either as incoming to or outgoing from the host). -- Thanks, Matt From gmann at ghanshyammann.com Wed Nov 13 20:02:35 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 14 Nov 2019 04:02:35 +0800 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: <02efdb60-a257-aa0b-fc0d-dbcadd754783@redhat.com> <16e6624331a.e64752ca206466.2746510287828263922@ghanshyammann.com> Message-ID: <16e665c4c72.12a62ab24207568.4557683352710362215@ghanshyammann.com> ---- On Thu, 14 Nov 2019 03:43:29 +0800 Corey Bryant wrote ---- > > > On Wed, Nov 13, 2019 at 2:01 PM Ghanshyam Mann wrote: > ---- On Tue, 12 Nov 2019 22:12:29 +0800 Corey Bryant wrote ---- > > > > On Mon, Nov 11, 2019 at 2:33 PM Zane Bitter wrote: > > On 7/11/19 2:11 pm, Corey Bryant wrote: > > > Hello TC members, > > > > > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's > > > too late to enable voting py38 unit tests for ussuri, I'd like to at > > > least enable non-voting py38 unit tests. This email is seeking approval > > > and direction from the TC to move forward with enabling non-voting py38 > > > tests. > > > > I was a bit fuzzy on this myself, so I looked it up and this is what the > > TC decided when we passed the resolution: > > > > > If the new Zuul template contains test jobs that were not in the previous one, the goal champion(s) may choose to update the previous template to add a non-voting check job (or jobs) to match the gating jobs in the new template. This means that all repositories that have not yet converted to the template for the upcoming release will see a non-voting preview of the new job(s) that will be added once they update. If this option is chosen, the non-voting job should be limited to the master branch so that it does not run on the preceding release’s stable branch. > > > > > > Thanks for digging that up and explaining. I recall that wording and it makes a lot more sense now that we have a scenario in front of us. > > > > (from > > https://governance.openstack.org/tc/resolutions/20181024-python-update-process.html#unit-tests > > ) > > > > So to follow that process we would need to define the python versions > > for V, then appoint a goal champion, and after that it would be at the > > champion's discretion to add a non-voting job on master in Ussuri. I > > happened to be sitting next to Sean when I saw this thread, and after > > discussing it with him I think he would OK with having a non-voting job > > on every commit, since it's what we have documented. Previous > > discussions established that the overhead of adding one Python unit test > > job to every project was pretty inconsequential (we'll offset it by > > dropping 2.7 jobs anyway). > > > > I submitted a draft governance patch defining the Python versions for V > > (https://review.opendev.org/693743). Unfortunately we can't merge it yet > > because we don't have a release name for V (Sean is working on that: > > https://review.opendev.org/693266). It's gazing in the crystal ball a > > > > Thanks very much for getting that going. > > little bit, but even if for some reason Ubuntu 20.04 is not released > > before the V cycle starts, it's inevitable that we will be selecting > > Python 3.8 because it meets the first criterion ("The latest released > > version of Python 3 that is available in any distribution we can > > feasibly use for testing") - 3.8 is released and it's available in > > Ubuntu 18.04, which is the distro we use for testing anyway. > > > > So, in my opinion, if you're volunteering to be the goal champion then > > there's no need for any further approval by the TC ;) > > > > > > Sure, I can champion that. Just to be clear, would that be Ussuri and V python3-updates champion, similar to the following? > > https://governance.openstack.org/tc/goals/selected/train/python3-updates.html > > Granted it's easier now that we mostly just have to switch the job template to the new release. > > I guess to make that official we should commit the python3 update Goal > > for the V cycle now... or at least as soon as we have a release name. > > > > How far off do you think we are from having a V name? If just a few weeks then I'm fine waiting but if over a month I'm more concerned. > > > > This is happening a little earlier than I think we anticipated but, > > given that there's no question what is going to happen in V, I don't > > think we'd be doing anybody any favours by delaying the process > > unnecessarily. > > ++ on not delaying the process. That is the main point of the goal process schedule also. > To be clear, are we going to add the py3.8 n-v job as part of v cycle template (openstack-python3-v*-jobs) ? I hope yes, as > it will enable us to make the one-time change on the project's side. Once we are in V cycle then template can be updated to make it a voting job. > > If not as part of the template (adding n-v job explicitly in Ussuri cycle and then add the V template once V cycle starts. ) then it will be two > changes per project which I would like to avoid. I saw the review now and that too works well and matches the TC resolution. Once we have V cycle testing runtime (zane patch) reflecting the 3.8 as required version then we are good to merge that. -gmann > > -gmann > > > My plan is to create V templates soon which will include voting py38. And ussuri templates will have non-voting py38: https://review.opendev.org/#/c/693401/ > I was thinking we couldn't add V templates to projects until after their stable/ussuri branches are created, which would mean one update per project per release. > > Thanks, > Corey > From cboylan at sapwetik.org Wed Nov 13 20:34:49 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Wed, 13 Nov 2019 12:34:49 -0800 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: On Fri, Nov 8, 2019, at 6:09 AM, Corey Bryant wrote: > > > On Thu, Nov 7, 2019 at 5:56 PM Sean McGinnis wrote: > > My non-TC take on this... > > > > > > > Python 3.8 is available in Ubuntu Bionic now and while I understand it's too late to enable voting py38 unit tests for ussuri, I'd like to at least enable non-voting py38 unit tests. This email is seeking approval and direction from the TC to move forward with enabling non-voting py38 tests. > > > > I think it would be great to start testing 3.8 so there are no surprises once we need to officially move there. But I would actually not want to see that run on every since patch in every single repo. > > Just to be clear I'm only talking about unit tests right now which are > generally light on resource requirements. However it would be great to > also have py38 function test enablement and periodic would make sense > for function tests at this point. For unit tests though it seems the > benefit of knowing whether your patch regresses unit tests for the > latest python version far outweighs the resources required, so I don't > see much benefit in adding periodic unit test jobs. > Wanted to point out that we've begun to expose resource consumption in nodepool to graphite. You can find per project and per tenant resource usage under stats.zuul.nodepool.resources at https://graphite.opendev.org. Unfortunately, I don't think we have per job resource tracking there yet, but previous measurements from log files do agree that unittest consumption is relatively low. It is large multinode integration jobs that run for extended periods of time that have the greatest impact on our resource utilization. Clark From openstack at fried.cc Wed Nov 13 20:38:37 2019 From: openstack at fried.cc (Eric Fried) Date: Wed, 13 Nov 2019 14:38:37 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> References: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> Message-ID: <9abb3dc9-4b01-90f6-c020-80d4b50d6356@fried.cc> Okay, are we going to have a document that maps exception classes to these explanations and recovery actions? Which we then have to maintain as the code changes? Or are they expected to look through code (without a stack trace)? I'm not against the idea, just playing devil's advocate. Sylvain seems to have a use case, so great. As an alternative, have we considered a mechanism whereby we could, in appropriate code paths, provide some text that's expressly intended for the end user to see? Maybe it's a new user_message field on NovaException which, if present, gets percolated up to a new field similar to the one you suggested. efried On 11/13/19 11:41 AM, Matt Riedemann wrote: > On 11/13/2019 11:17 AM, Eric Fried wrote: >> Unless it's likely to be something other than NoValidHost a significant >> percentage of the time, IMO it... > > Well just taking resize, it could be one of many things: > > https://github.com/openstack/nova/blob/20.0.0/nova/conductor/manager.py#L366 > - oops you tried resizing which would screw up your group affinity policy > > https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L4490 > - (for an admin, cold migrate) oops you tried cold migrating a vcenter > vm or you have allow_resize_to_same_host=True and the scheduler picks > the same host (silly scheduler, see bug 1748697) > > https://github.com/openstack/nova/blob/20.0.0/nova/compute/claims.py#L113 - > oops you lost a resource claims race, try again > > https://github.com/openstack/nova/blob/20.0.0/nova/scheduler/client/report.py#L1898 > - oops you lost a race with allocation consumer generation conflicts, > try again > From juliaashleykreger at gmail.com Wed Nov 13 20:40:41 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Wed, 13 Nov 2019 12:40:41 -0800 Subject: [ironic][ptg] Summary of discussions/happenings related to ironic In-Reply-To: References: Message-ID: A minor revision, we have new links for the videos as it seems there was an access permission issue. Links replaced below. On Wed, Nov 13, 2019 at 11:35 AM Julia Kreger wrote: > > Overall, There was quite a bit of interest in Ironic. We had great > attendance for the Project Update, Rico Lin’s Heat/Ironic integration > presentation, demonstration of dhcp-less virtual media boot, and the > forum discussion on snapshot support for bare metal machines, and > more! We also learned there are some very large bare metal clouds in > China, even larger than the clouds we typically talk about when we > discuss scale issues. As such, I think it would behoove the ironic > community and OpenStack in general to be mindful of hyper-scale. These > are not clouds with 100s of compute nodes, but with baremetal clouds > containing thousands to tens of thousands of physical bare metal > machines. > > So in no particular order, below is an overview of the sessions, > discussions, and commentary with additional status where applicable. > > My apologies now since this is over 4,000 words in length. > > Project Update > =========== > > The project update was fairly quick. I’ll try and record a video of it > sometime this week or next and post it online. Essentially Ironic’s > code addition/deletion levels are relatively stable cycle to cycle. > Our developer and Ironic operator commit contribution levels have > increased in Train over Stein, while the overall pool of contributors > has continued to decline cycle after cycle, although not dramatically. > I think the takeaway from this is that as ironic has become more and > more stable, and that the problems being solved in many cases are > operator specific needs or wants, or bug fixes in cases that are only > raised in particular environment configurations. > > The only real question that came out of the project update was, if my > memory is correct, was “What does Metal^3 mean for Ironic”, and “Who > is driving forward Metal^3?” The answers are fairly straight forward, > more ironic users and more use cases from Metal^3 driving ironic to > deploy machines. As for who is driving it forward, it is largely being > driven forward by Red Hat along with interested communities and > hardware vendors. > > Quick, Solid, and Automatic OpenStack Bare-Metal Orchestration > ================================================== > > Rico Lin, the Heat PTL, proposed this talk promoting the possibility > of using ironic naively to deploy bare metal nodes. Specifically where > configuration pass-through can’t be made generic or somehow > articulated through the compute API. Cases where they may be is where > someone wishes to utilize something like our “ramdisk” > deploy_interface which does not deploy an image to the actual physical > disk. The only real question that I seem to remember coming up was the > question why might someone want or need to do this, which again > becomes more of a question of doing things that are not quite > “compute” API-ish. The patches are available in gerrit[10]. > > Operator Feedback Session > ===================== > > The operator feedback[0] session was not as well populated with maybe > ~20-25 people present. Overall the feeling of the room was that > “everything works”, however there is a need and desire for information > and additional capabilities. > > * Detailed driver support matrix > * Reduce the deployment times further > * Disk key rotation is an ask from operators for drives that claim > smart erase support but end up doing a drive wipe instead. In essence, > to reduce the overall time spent cleaning > * Software RAID is needed at deploy time. > * IPA needs improved error handling. - This may be a case where > something of the communication flow changes that had been previously > discussed could help in that we could actively try and keep track of > the agent a little more. Additional discussion will definitely be > required. > * There does still seem to be some interest in graphical console > support. A contributor has been revising patches, but I think it would > really help for a vendor to become involved here and support accessing > their graphical interface through such a method. > * Information and an information sharing location is needed. I’ve > reached out to the Foundation staff regarding the Bare Metal Logo > Program to see if we can find a common place that we can build/foster > moving forward. In this topic, the one major pain point began being > stressed, issues with the resource tracker at 3,500 bare metal nodes. > Privately another operator reached out with the same issue in the > scale of tens of thousands of bare metal nodes. As such, this became a > topic during the PTG which gained further discussion. I’ll cover that > later. > > Ironic – Snapshots? > =============== > > As a result of some public discussion of adding snapshot capability, I > proposed a forum session to discuss the topic[1] such that > requirements can be identified and the discussion can continue over > the next cycle. > I didn't expect the number of attendees present to swell from the > operator's feedback session. The discussion of requirements went back > and forth to ultimately define "what is a snapshot" in this case, and > "what should Ironic do?" > > There was quite a bit of interaction in this session and the consensus > seemed to be the following: > * Don’t make it reliant on nova, for standalone users may want/need to use it. > * This could be a very powerful feature as an operator could ``adopt`` > a machine into ironic and then ``snapshot`` it to capture the disk > contents. > * Block level only and we can’t forget about capturing/storing content checksum > * Capture the machine’s contents with the same expectation as we would > have for a VM, and upload this to someplace. > > In order to make this happen in a fashion which will scale, the ironic > team will likely need to leverage the application credentials. > > Ironically reeling in large bare metal deployment without PXE > ============================================== > > This was a talk submitted by Ilya Etingof, who unfortunately was > unable to make it to the summit. Special thanks goes to Both Ilya and > Richard Pioso for working together to make this demonstration happen. > The idea was to demonstrate where the ironic team sees the future of > deployment of machines on the edge using virtual media and how vendors > would likely interact with that in some cases as slightly different > mechanics may be required even if the BMCs all speak Redfish, which is > the case for a Dell iDRAC BMC. > > The idea[2] ultimately being is that the conductor would inject the > configuration information into the virtual media ISO image that is > attached via virtual media negating the need for DHCP. We have videos > posted that allow those interested to see what this functionality > looks like with neutron[3] and without neutron[4]. > > While the large audience was impressed, it seemed to be a general > surprise that Ironic had virtual media support in some of the drivers > previously. This talk spurred quite a bit of conversation and hallway > track style discussion after the presentation concluded which is > always an excellent sign. > > Project Teams Gathering > =================== > > The ironic community PTG attendance was nothing short of excellent. > Thank you everyone who attended! At one point we had fifteen people > and a chair had to be pulled up to our table for a 16th person to join > us. At which point, we may have captured another table and created > confusion. > > We did things a little differently this time around. Given some of the > unknowns, we did not create a strict schedule around the topics. We > simply went through and prioritized topics and tried to discuss them > each as thoroughly as possible until we had reached the conclusion or > a consensus on the topic. > > Topics and a few words on each topic we discussed in the notes section > on the PTG etherpad[5]. > > On-boarding > ----------------- > > We had three contributors that attended a fairly brief on-boarding > overview of Ironic. Two of them were more developer focused where as > the third was more of an operator focus looking to leverage ironic and > see how they can contribute back to the community. > > BareMetal SIG - Next Steps > ------------------------------------- > > Arne Wiebalck and I both provided an update including current > conversations where we saw the SIG, the Logo Program, the white paper, > and what should the SIG do beyond the whitepaper. > > To start with the Logo program, it largely seems there that somewhere > along the way a message or document got lost and that largely impacted > the Logo Program -> SIG feedback mechanism. I’m working with the > OpenStack Foundation to fix that and get communication going again. > Largely what spurred that was that some vendors expressed interest in > joining, and wanted additional information. > > As for the white paper, contributions are welcome and progress is > being made again. > > From a next steps standpoint, the question was raised how do we build > up an improved Operator point of contact. There was some consensus > that we as a community should try to encourage at least one > contributor to attend the operations mid-cycles. This allows for a > somewhat shorter feedback look with a different audience. > > We also discussed knowledge sharing, or how to improve it. Included > with this is how do we share best practices. I’ve got the question out > to folks at the foundation if there is a better way as part of the > Logo program, or if we should just use the Wiki. I think this will be > an open discussion topic in the coming weeks. > > The final question that came up as part of the SIG is how to show > activity. I reached out to Amy on the UC regarding this, and it seems > the process is largely just reach out to the current leaders of the > SIG, so it is critical that we keep that up to date moving forward. > > Sensor Data/Metrics > --------------------------- > > The barrier between Tenant level information and Operator level > information is difficult with this topic. > > The consensus among the group was that the capability to collect some > level of OOB sensor data should be present on all drivers, but there > is also a recognition that this comes at a cost and possible > performance impact. Mainly this performance impact question was raised > with Redfish because this data is scattered around the API where > multiple API calls are required, and may even cause some interruption > to actively inquire upon some data points. > > The middle ground in the discussion came to adding a capability of > somehow saying “collect power status, temp every minute, fan speeds > every five minutes, drive/cpu health data maybe every 30 minutes”. I > would be remiss if I didn't note that there was joking about how this > would in essence be re-implementation of Cron. What this would end up > looking like, we don’t know, but it would provide operators the data > resolution necessary for the failure risk/impact. The analogy used was > that “If the temperature sensor has risen to an alarm level, either a > AC failure or a thermal hot spot forming based upon load in the data > center, checking the sensor too often is just not going to result in a > human investigating that on the data center floor any faster.” > > Mainly I believe this discussion largely stresses that the information > is for the operator of the bare metal and not to provide insight into > a tenant monitoring system, that those activities should largely be > done with-in the operating system. > > One question among the group was if anyone was using the metrics > framework built into ironic already for metrics of ironic itself, to > see if we can re-use it. Well, it uses a plugin interface! In any > event, I've sent a post to the openstack-discuss mailing list seeking > usage information. > > > Node Retirement > ----------------------- > > This is a returning discussion from the last PTG, and in discussing > the topic we figured out where the discussion became derailed at > previously. In essence, the desire was to mix this with the concept > of being able to take a node “out of service”. Except, taking a node > out of service is an immediate state related flag, where as retiring > might be as soon as the current tenant vacates the machine… possibly > in three to six months. > > In other words, one is “do something or nothing now”, and the other is > “do something later when a particular state boundary is crossed”. > Trying to make one solution for both, doesn’t exactly work. > > Unanimous consensus among those present was that in order to provide > node retirement functionality, that the logic should be similar to > maintenance/maintenance reason. A top level field in the node object > that would allow API queries for nodes slated for retirement, which > helps solve an operator workflow conundrum “How do I know what is > slated for retirement but not yet vacated?” > > Going back to the “out of service” discussion, we reached consensus > that this was in essence a “user declarable failed state”, and as such > that it should be done only in the state machine as it is in the > present, not a future action. Should we implement out of service, > we’ll need to check the nova.virt.ironic code and related virt code to > properly handle nodes dropping from `ACTIVE` state, which could also > be problematic and need to be API version guarded to prevent machines > from accidentally entering `ERROR` state if they are not automatically > recovered in nova. > > Multi-tenancy > ------------------ > > Lots of interest existed around making the API somewhat of a > mutli-tenant aware interaction, and the exact interactions and uses > involved there are not exactly clear. What IS clear is that providing > functionality as such will allow operators to remove complication in > their resource classes and tenant specific flavors which is presently > being used to enable tenant specific hardware pools. The added benefit > of providing some level for normally non-admin users to access the > ironic API is that it would allow those tenants to have a clear > understanding of their used resources and available resources by > directly asking ironic, where as presently, they don’t have a good way > to collect nor understand that short of asking the cloud operator when > it comes to bare metal. Initial work has been posted for this to > gerrit[6]. > > In terms of how tenants resources would be shared, there was consensus > that the community should stress that new special use tenants should > be created for collaborative efforts. > > There was some discussion regarding explicitly dropping fields for > non-privileged users that can see the nodes, such as driver_info and > possibly even driver_internal_info. Definitely a topic that requires > more discussion, but that would solve operator reporting and use > headaches. > > Manual Cleaning Out-Of-Band > ---------------------------------------- > > The point was raised that we unconditionally start the agent ramdisk > to perform manual cleaning. Except, we should support a method of out > of band cleaning operators to only be executed so the bare metal node > doesn’t need to be booted to a ramdisk. > > The consensus seemed to be that we should consider a decorator or > existing decorator change that allows the conductor to hold off > actually powering the node on for ramdisk boot unless or until a step > is reached that is not purely out of band. > > In essence, fixing this allows a “fix_bmc” out of band clean step to > be executed first without trying to modify BMC settings, which would > presently fail. > > Scale issues > ----------------- > > A number of scaling issues between how nova and ironic interact, > specifically with the resource tracker and how inventory is updated > from ironic and loaded into nova. Largely this issue revolves around > the concept in nova that each ``nova-compute`` is a hypervisor. And > while one can run multiple ``nova-compute`` processes to serve as the > connection to ironic, the underlying lock in Nova is at the level of > the compute node, not the node level. This means as thousands of > records are downloaded, synced, copied into the resource tracker, the > compute process is essentially blocked from other actions while this > serialized job runs. > > In a typical VM case, you may only have at most a couple hundred VMs > on a hypervisor, where as with bare metal, we’re potentially servicing > thousands of physical machines. > > It should be noted that there are several large scale operators that > indicated during the PTG that this was their pain point. Some of the > contributors from CERN sat down with us and the nova team to try and > hammer out a solution to this issue. A summary of that cross project > session can be found at line 212 in the PTG etherpad[0]. > > But there is another pain point that contributes to this performance > issue and that is the speed at which records are returned by our API. > We’ve had some operators voice some frustration with this before, and > we should at least be mindful of this and hopefully see if we can > improve record retrieval performance. In addition to this, if we > supported some form of bulk “GET” of nodes, it might be able to be > leveraged as opposed to a get on each node one at a time which is > presently what occurs in the nova-compute process. > > Boot Mode Config > ------------------------ > > Previously, when scheduling occurred with flavors and filters were > appropriately set, if a machine was declared as supporting only one > boot mode, requests would only ever land on that node. Now with > Traits, this is a bit different and unfortunately optional without > logic to really guard the setting application for an instance. > > So in this case, if filters are such that a request for a Legacy boot > instance lands on a UEFI only machine, we’ll still try to deploy it. > In reality, we really should try and fail fast. > > Ideally the solution here is we consult with the BMC through some sort > of get_supported_boot_modes method, and if we determine a mismatch > between what the settings are or what the requested instance is from > the data we have, we fail the deploy. > > This ultimately may require work in the nova.virt.ironic driver code > to identify the cause of the failure being an invalid configuration > and reporting that back, however it may not be fatal on another > machine. > > Security of /heartbeat and /lookup endpoints > ----------------------------------------------------------- > > We had a discussion of adding some additional layers of security > mechanics around the /heartbeat and /lookup endpoints in ironic’s REST > API. These limited endpoints are documented as being unauthenticated, > so naturally some issues can arise from these and we want to minimize > the vectors in which an attacker that has gained access to a > cleaning/provisioning/rescue network could possibly impersonate a > running ironic-python-agent. Conversely, the ironic-python-agent runs > in a similar fashion, intended to run on secure trusted networks which > is only accessible to the ironic-conductor. As such, we also want to > add some validation to the API request is from the same Ironic > deployment that IPA is heart-beating to. > > The solution to this introduce a limited lifetime token that is unique > per node per deployment. It would be stored in RAM on the agent, and > in the node.driver_internal_info so it is available to the conductor. > It would be provided only once via out of band OR via the first > “lookup” of a node, and then only become accessible again during known > reboot steps. > > Conceptually the introduction of tokens was well supported in the > discussions and there were zero objections to doing so. Some initial > patches[7][8] are under development to move this forward. > > An additional item is to add IP address filtering capabilities to both > endpoints such that we only process the heartbeat/lookup address if we > know it came from the correct IP address. An operator has written this > feature downstream and consensus was unanimous at the PTG that we > should accept this feature upstream. We should expect a patch for this > functionality to be posted soon. > > Persistent Agents > ------------------------ > > The use case behind persistent agents is “I want to kexec my way to > the agent ramdisk, or the next operating system.” and “I want to have > up to date inspection data.” We’ve already somewhat solved the latter, > but the former is a harder problem requiring the previously mentioned > endpoint security enhancements to be in-place first. There is some > interest from CERN and some other large scale operators. > > In other words, we should expect more of this from an bare metal fleet > operations point of view for some environments as we move forward. > > “Managing hardware the Ironic way” > ------------------------------------------------- > > The question that spurred this discussion was “How do I provide a way > for my hardware manager to know what it might need to do by default.” > Except, those defaults may differ between racks that serve different > purposes. “Rack 1, node0” may need a port set to FiberChannel mode, > where as “Rack2, node1” may require it to be Ethernet. > > This quickly also reaches the discussion of “What if I need different > firmware versions by default?” > > This topic quickly evolved from there and the idea that surfaced was > that we introduce a new field on the node object for the storage of > such data. Something like ``node.default_config``, where it would be a > dictionary sort of like what a user provides for cleaning steps or > deploy steps, that provides argument values which is iterated through > when in automated cleaning mode to allow operators to fill in > configuration requirement gaps for hardware managers. > > Interestingly enough, even today we just had someone ask a similar > question in IRC. > > This should ultimately be usable to assert desired/default firmware > from an administrative point of view. Adrianc (Mellanox) is going to > reach out to bdobb (DMTF) regarding the redfish PLDM firmware update > interface to see where this may go from here. > > Edge computing working group session > ---------------------------------------------------- > > The edge working group largely became a session to update everyone on > where Ironic was going and where we see things going in terms of > managing bare metal at the edge/far-edge. This included some in-depth > questions about dhcp-less deployment and related mechanics as well as > HTTPBoot’ing machines. > > Supporting HTTPBoot does definitely seem to be of interest to a number > of people, although at least after sharing my context only five or six > people in attendance really seemed interested in ironic prioritizing > such functionality. The primary blocker, for those that are unaware, > is pre-built UEFI images for us to do integration testing for IPv4 > HTTPBoot. Functionally ironic already supports IPv6 HTTPBoot via > DHCPv6 as part of our IPv6 support with PXE/iPXE, however we also > don’t have an integration test job for this code path for the same > reason, pre-built UEFI firmware images lack the built-in support. > > More minor PTG topics > ------------------------------- > > * Smartnics - A desire to attach virtual ports in ironic baremetal > nodes with smartnics was raised. Seems that we don’t need to try and > create a port entry in ironic. It seems we only need to track/signal > and remove the “vif” attachment” to the node in general as there is no > physical mac required for that virtual port in ironic. The constraint > that at least one MAC address would be required to identify the > machine is understood. If anyone sees an issue with this, please raise > this to adrianc. > * Metal^3 - Within the group attending the PTG, there was not much > interest in Metal^3 or using CRDs to manage bare metal resources with > ironic hidden behind the CRD. One factor related to this is the desire > to define more data to be passed through to ironic which is not > presently supported in the CRD definition.. > > Stable Backports with Ironic's release model > ================================== > > I was pulled into a discussion with the TC and the Stable team > regarding frustrations that have been expressed with-in the ironic > team regarding stable back-porting of fixes, mainly drivers. There is > consensus that it is okay for us as the ironic team to backport > drivery things when needed to support vendors as long as they are not > breaking or overall behavior contracts. This quickly leads us to > needing to also modify constraints for drivery things as well. > Constraints changes will continue to be evaluated on a case by case > basis, but the general consensus is there is full support to "do the > right thing" for ironic's users, vendors, and community. The key is > making sure we are on the same page and agreeing to what that right > thing is. This is where asynchronous communication can get us into > trouble, and I would highly encourage trying to start higher bandwidth > discussion when these cases arise in the future. The key takeaway that > we should likely keep in mind is policy is there for good reasons, but > policy is not and can not be a crutch to prevent the right thing from > being done. > > Additional items worth noting - Q1 Gatherings > =================================== > > There will be an operations mid-cycle at Bloomberg in London, January > 7th-8th, 2020. It would be good if at least one ironic contributor > could attend as the operators group tends to be closer to the physical > baremetal, and it is a good chance to build mutual context between > developers and operations people actually using our software. > > Additionally, we want to gauge the interest of having an ironic > mid-cycle in central Europe in Q1 of 2020. We need to identify the > number of contributors that would be interested in and able to attend > since the next PTG will be in June. Please email me off-list if your > interested in attending and I'll make a note of it as we're still > having initial discussions. > > > And now I've reached a buffer under-run on words. If there are any > questions, just reply to the list. > > -Julia > > Links: > > [0]: https://etherpad.openstack.org/p/PVG-ironic-operator-feedback > [1]: https://etherpad.openstack.org/p/PVG-ironic-snapshot-support > [2]: https://review.opendev.org/#/c/672780/ [3] https://drive.google.com/file/d/1_PaPM5FvCyM6jkACADwQtDeoJkfuZcAs/view?usp=sharing [4] https://drive.google.com/file/d/1YUFmwblLbJ9uJgW6Rkf6pkW8ouU-PYFK/view?usp=sharing > [5]: https://etherpad.openstack.org/p/PVG-Ironic-Planning > [6]: https://review.opendev.org/#/c/689551/ > [7]: https://review.opendev.org/692609 > [8]: https://review.opendev.org/692614 > [9]: https://etherpad.openstack.org/p/ops-meetup-1st-2020 > [10]: https://review.opendev.org/#/q/topic:story/2006403+(status:open+OR+status:merged) From melwittt at gmail.com Wed Nov 13 21:21:08 2019 From: melwittt at gmail.com (melanie witt) Date: Wed, 13 Nov 2019 13:21:08 -0800 Subject: [nova] track error migrations and orphans in Resource Tracker In-Reply-To: References: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> Message-ID: <5e082c92-b05c-e0d9-a418-ec7120331fe3@gmail.com> On 11/13/19 11:53, Matt Riedemann wrote: > On 11/13/2019 1:43 PM, melanie witt wrote: >> This is because the "get instance list by host" can miss instances >> that are mid-migration, because of how/where we update the >> instance.host field. > > Why not just filter out any instances that have a non-None task_state? > Or barring that, filter out any instances that have an in-progress > migration (there is a method that the ResourceTracker uses to get those > kinds of migrations occurring either as incoming to or outgoing from the > host). Yeah, an earlier version of the patch was trying to do that: https://review.opendev.org/#/c/627765/36/nova/compute/manager.py at 8455 but it was not a complete list of all the possible migrating intermediate states. We didn't know about the method the resource tracker is already using for the same purpose, that we could re-use. After some confusion on my part, we removed the task_state checks and now I see we need to put them back. I'll find the RT method and comment on the review. Thanks for mentioning that. -melanie From Albert.Braden at synopsys.com Wed Nov 13 21:23:05 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Wed, 13 Nov 2019 21:23:05 +0000 Subject: Filter costs / filter order In-Reply-To: <24b8fe814dd497bb6e39a255fefcea24a44bb518.camel@redhat.com> References: <24b8fe814dd497bb6e39a255fefcea24a44bb518.camel@redhat.com> Message-ID: This is very helpful, thank you! Does anyone have a "filter order" document that they are willing to share, or documentation on how you decide filter order? -----Original Message----- From: Sean Mooney Sent: Tuesday, November 12, 2019 1:46 PM To: Albert Braden ; openstack-discuss at lists.openstack.org Subject: Re: Filter costs / filter order On Tue, 2019-11-12 at 20:30 +0000, Albert Braden wrote: > I'm running Rocky and trying to figure out filter order. I'm reading this doc: > https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_nova_rocky_user_filter-2Dscheduler.html&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=m0wl70cY9eaA6LFc5XBoTcth0vOUW424EfBg5nqVuOQ&s=TulvynR6iVICFMuzLG7D9NnLLGC2cgMj9d0h6KZI1J0&e= > > It says: > > Each filter selects hosts in a different way and has different costs. The order of filter_scheduler.enabled_filters > affects scheduling performance. The general suggestion is to filter out invalid hosts as soon as possible to avoid > unnecessary costs. We can sort filter_scheduler.enabled_filters items by their costs in reverse order. For example, > ComputeFilter is better before any resource calculating filters like RamFilter, CoreFilter. > > Is there a document that specifies filter costs, or ranks filters by cost? Is there a well-known process for > determining the optimal filter order? im not a aware of a specific document that cover it but this will very based on deployment. as a general guideline you should order your filter by which ones elmiate the most hosts. so the AvailabilityZoneFilter should generally be first. in older release the retry filter shoudl go first. the numa toplogy filter and pci passthough filter are kind fo expensive. so they are better to have near the end. so i would start with the Aggreaget* filters first folowed by "cheap" filter that dont have any complex boolean logic so SameHostFilter, DifferentHostFilter, IoOpsFilter, NumInstancesFilter there are a few others the the more complex filters like numa toplogy, pci passthogh, ComputeCapabilitiesFilter, JsonFilter effectivly what you want to do is maxius the infomation gain at each filtering step will miniusing the cost(reducing the possible host with as few cpu cycles as posible) its important to only enable the filter that matter to your deployment also but if we had a perfect costing for each filter then you could follow the ID3 algorithm to get an optimal layout. https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_ID3-5Falgorithm&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=m0wl70cY9eaA6LFc5XBoTcth0vOUW424EfBg5nqVuOQ&s=0dTE5QPOFLn9yT2vwDZNqaF5RXbLMjtSTM90MjI2fZc&e= i have wanted to experiment with tracing the boot requests on large public clould and model this for some time but i always endup finding other things to thinker with instead but i think even with out that data to work with you could do some intersting things with code complexity metricts as a proxy to try and auto sort them. perhaps some of the operator can share what they do i know cern pre placement used to map tenant to cells as there first filtering step which signifcatly helped them with scale but if the goal is speed then you need to have each step give you the maxium infomation gain for the minium addtional cost. that is why the aggreate filters and multi host filters like affintiy filters tend to be better at the start of the list and very detailed filters like the numa topolgy filter then to be better at the end. From mriedemos at gmail.com Wed Nov 13 22:54:10 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 13 Nov 2019 16:54:10 -0600 Subject: [nova] track error migrations and orphans in Resource Tracker In-Reply-To: <5e082c92-b05c-e0d9-a418-ec7120331fe3@gmail.com> References: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> <5e082c92-b05c-e0d9-a418-ec7120331fe3@gmail.com> Message-ID: <870543fc-782a-2249-a7a1-37329388d6a7@gmail.com> On 11/13/2019 3:21 PM, melanie witt wrote: > I'll find the RT method and comment on the review. https://github.com/openstack/nova/blob/1c7a3d59080e5de50615bd2408b10d372ec30861/nova/compute/resource_tracker.py#L935 -- Thanks, Matt From openstack at nemebean.com Wed Nov 13 23:56:07 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 13 Nov 2019 17:56:07 -0600 Subject: [oslo] Adoption of microversion-parse In-Reply-To: <30c14806-6c43-a2be-9612-0a54de6c4323@openstack.org> References: <7d239dce-77c5-0e23-e1cb-57785e241b07@openstack.org> <72c50e25-0a26-b3d8-5f3a-64c48570824d@nemebean.com> <30c14806-6c43-a2be-9612-0a54de6c4323@openstack.org> Message-ID: On 10/21/19 9:14 AM, Thierry Carrez wrote: > Thierry Carrez wrote: >> [...] >> I'll propose the project addition so you can all vote directly on it :) > > https://review.opendev.org/#/c/689754/ > This has merged, but I still don't have access to the core group for the library. Is this the point where we need to get infra involved or are there other steps needed to make this official first? From cboylan at sapwetik.org Thu Nov 14 00:20:03 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Wed, 13 Nov 2019 16:20:03 -0800 Subject: [oslo] Adoption of microversion-parse In-Reply-To: References: <7d239dce-77c5-0e23-e1cb-57785e241b07@openstack.org> <72c50e25-0a26-b3d8-5f3a-64c48570824d@nemebean.com> <30c14806-6c43-a2be-9612-0a54de6c4323@openstack.org> Message-ID: <16d88443-8d02-4c90-b3af-b0b143fb6348@www.fastmail.com> On Wed, Nov 13, 2019, at 3:56 PM, Ben Nemec wrote: > > > On 10/21/19 9:14 AM, Thierry Carrez wrote: > > Thierry Carrez wrote: > >> [...] > >> I'll propose the project addition so you can all vote directly on it :) > > > > https://review.opendev.org/#/c/689754/ > > > > This has merged, but I still don't have access to the core group for the > library. Is this the point where we need to get infra involved or are > there other steps needed to make this official first? > > Ideally the existing cores would simply add you as the method of checks and balances here. Any current member can manage the member list as well as a Gerrit admin. Once you've been added by the existing core group you'll be able to add any others (like oslo-core). You can find the existing group members here: https://review.opendev.org/#/admin/groups/1345,members If for some reason this voluntary hand over doesn't work then the infra team's gerrit admins can get involved, but the ideal is that existing core members would do it themselves to ack the handover. Clark From cdent+os at anticdent.org Thu Nov 14 00:29:49 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Thu, 14 Nov 2019 00:29:49 +0000 (GMT) Subject: [oslo] Adoption of microversion-parse In-Reply-To: <16d88443-8d02-4c90-b3af-b0b143fb6348@www.fastmail.com> References: <7d239dce-77c5-0e23-e1cb-57785e241b07@openstack.org> <72c50e25-0a26-b3d8-5f3a-64c48570824d@nemebean.com> <30c14806-6c43-a2be-9612-0a54de6c4323@openstack.org> <16d88443-8d02-4c90-b3af-b0b143fb6348@www.fastmail.com> Message-ID: On Wed, 13 Nov 2019, Clark Boylan wrote: > On Wed, Nov 13, 2019, at 3:56 PM, Ben Nemec wrote: >> >> >> On 10/21/19 9:14 AM, Thierry Carrez wrote: >>> Thierry Carrez wrote: >>>> [...] >>>> I'll propose the project addition so you can all vote directly on it :) >>> >>> https://review.opendev.org/#/c/689754/ >>> >> >> This has merged, but I still don't have access to the core group for the >> library. Is this the point where we need to get infra involved or are >> there other steps needed to make this official first? >> >> > > Ideally the existing cores would simply add you as the method of checks and balances here. Any current member can manage the member list as well as a Gerrit admin. Once you've been added by the existing core group you'll be able to add any others (like oslo-core). I've added oslo-core. I've been somewhat out of touch, so forgot about this step. (Note, it appears that oslo-core is way out of date...) -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From johnsomor at gmail.com Thu Nov 14 01:20:48 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Wed, 13 Nov 2019 17:20:48 -0800 Subject: [oslo] Adding Michael Johnson as Taskflow core In-Reply-To: <0558ec80-5894-7961-f2ac-3de502f90fe4@nemebean.com> References: <0558ec80-5894-7961-f2ac-3de502f90fe4@nemebean.com> Message-ID: Thank you Ben, happy to help! Michael On Wed, Nov 13, 2019 at 8:18 AM Ben Nemec wrote: > > Hi, > > After discussion with the Oslo team, we (and he) have agreed to add > Michael as a Taskflow core. He's done more work on the project than > anyone else still active in Oslo and also works on a project that > consumes it so he likely understands it better than anyone else at this > point. > > Welcome Michael and thanks for your contributions! > > -Ben > From cp769u at att.com Wed Nov 13 23:07:17 2019 From: cp769u at att.com (PARSONS, CLIFF) Date: Wed, 13 Nov 2019 23:07:17 +0000 Subject: Keystone user ID case sensitivity issue Message-ID: <52A47B23EC4EC94C8EE6BFE332D37E224EBFCB3F@MOSTLS1MSGUSRFB.ITServices.sbc.com> Hello everyone! My organization has a need to make the user name/ID retrieval from Heat template to be case insensitive. For example: suppose we already have a user in keystone, "xyz123". Then we have a client that creates a heat stack containing a UserRoleAssignment resource, in which the user was specified as "XYZ123". The user would not be found in the Keystone database (due to Keystone user IDs being case sensitive) and the role assignment would not occur. Either Keystone could be changed so that its users are treated case insensitive, or we could make the change to heat (Heat KeystoneClientPlugin class) like in https://review.opendev.org/#/c/694117/ so that it converts to lower case before querying keystone. Can I get some thoughts on this? Would something like this be acceptable at all? Would we need to make it configurable, and if we did, would that be acceptable? Thanks in advance for your thoughts/concerns/suggestions. Thank you, Cliff Parsons -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Nov 14 01:57:01 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 14 Nov 2019 01:57:01 +0000 Subject: Keystone user ID case sensitivity issue In-Reply-To: <52A47B23EC4EC94C8EE6BFE332D37E224EBFCB3F@MOSTLS1MSGUSRFB.ITServices.sbc.com> References: <52A47B23EC4EC94C8EE6BFE332D37E224EBFCB3F@MOSTLS1MSGUSRFB.ITServices.sbc.com> Message-ID: On Wed, 2019-11-13 at 23:07 +0000, PARSONS, CLIFF wrote: > Hello everyone! > > My organization has a need to make the user name/ID retrieval from Heat template to be case insensitive. For example: > suppose we already have a user in keystone, "xyz123". Then we have a client that creates a heat stack containing a > UserRoleAssignment resource, in which the user was specified as "XYZ123". The user would not be found in the Keystone > database (due to Keystone user IDs being case sensitive) and the role assignment would not occur. > > Either Keystone could be changed so that its users are treated case insensitive, or we could make the change to heat > (Heat KeystoneClientPlugin class) like in https://review.opendev.org/#/c/694117/ so that it converts to lower case > before querying keystone. i honestly dont think we shoudl force everyone to use case insensitive user names so i dont think converting to lower case is valid. however it might we worth exploring if you could change the encoding of the database so that it uses the case insensitive by using the utf8_general_ci encodeing so that all db opertion are case insensitive on the user tabel. > Can I get some thoughts on this? Would something like this be acceptable at all? Would we need to make it > configurable, and if we did, would that be acceptable? i think chaing api behavior based on a config option is an interoperablity probelm keystone has to interact with external identity systesm and so assuming all of those will be case inseitive would proably break someone else who has the opisite requirement. i honestly think that people should just use the correct case in the heat template. if heat is not currently erroring out when the role assignment failts that feels like a heat bug but i would personlly think its an error if i type my user name with the wrong case and my correct passwourd and was able to get a keystone token. > > Thanks in advance for your thoughts/concerns/suggestions. > > Thank you, > Cliff Parsons From luyao.zhong at intel.com Thu Nov 14 02:33:18 2019 From: luyao.zhong at intel.com (Luyao Zhong) Date: Thu, 14 Nov 2019 10:33:18 +0800 Subject: [nova] track error migrations and orphans in Resource Tracker In-Reply-To: References: <183EFA13E8A23E4AA7057ED9BCC1102E3D0B49B3@shsmsx102.ccr.corp.intel.com> Message-ID: <05502840-fc3f-4bca-89bd-a18db3a5ad80@intel.com> On 2019/11/14 上午3:43, melanie witt wrote: > On 11/12/19 05:18, Sean Mooney wrote: >> On Tue, 2019-11-12 at 05:46 +0000, Zhong, Luyao wrote: >>> Hi Nova experts, >>> >>> "Not tracking error migrations and orphans in RT." is probably a bug. >>> This may trigger some problems in >>> update_available_resources in RT at the moment. That is some orphans >>> or error migrations are using cpus/memory/disk >>> etc, but we don't take these usage into consideration. And >>> instance.resources is introduced from Train used to contain >>> specific resources, we also track assigned specific resources in RT >>> based on tracked migrations and instances. So this >>> bug will also affect the specific resources tracking. >>> >>> I draft an doc to clarify this bug and possible solutions: >>> https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT >>> Looking forward to suggestions from you. Thanks in advance. >>> >> there are patche up to allow cleaning up orpahn instances >> https://review.opendev.org/#/c/627765/ >> https://review.opendev.org/#/c/648912/ >> if we can get those merged that woudl adress at least some of the >> proablem > > I just wanted to mention: > > I have reviewed the cleanup patches ^ multiple times and I'm having a > hard time getting past the fact that any way you slice it (AFAICT), the > cleanup code will have a window where a valid guest could be destroyed > erroneously (not an orphan). This is because the "get instance list by > host" can miss instances that are mid-migration, because of how/where we > update the instance.host field. > > Maybe this ^ could be acceptable (?) if we put a big fat warning on the > config option help for 'reap_unknown'. But I was unsure of the answers > about what recovery looks like in case a guest is erroneously destroyed > for an instance that is in the middle of migrating. In the case of > resize or cold migrate, a hard reboot would fix it AFAIK. What about for > a live migration? If recovery is possible in every case, those would > also need to be documented in the config option help for 'reap_unknown'. > > The patch has lots of complexities to think about and I'm left wondering > if the pitfalls are better or worse than the current state. It would > help if others joined in the review with their thoughts about it. > > -melanie Hi Sean Mooney and melanir, thanks for mentioning. This ^ is for cleanup orphans. For imcomplete migations, you prefer not destroying them, right? I'm not sure about it either. But I gave a possible solution on the etherpad (set instance.host and apply/revert migration context and then invoke cleanup_running_deleted_instances to cleanup the instance). And before cleanup done, we need track these instances/migrations in RT, need more people join our discussion. Welcome put your suggestion on the etherpad. https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT. Thanks in advance. BR, Luyao From Tushar.Patil at nttdata.com Thu Nov 14 02:58:49 2019 From: Tushar.Patil at nttdata.com (Patil, Tushar) Date: Thu, 14 Nov 2019 02:58:49 +0000 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: References: <1573200108.23158.4@est.tech> <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> <1573654932.26082.3@est.tech> , Message-ID: On 11/13/2019 8:34 AM, Sylvain Bauza wrote: >> Me too. To be clear, I don't think operators would modify the above but >> if so, they would need reshapes. > Maybe not, but this is the kind of detail that should be in the spec and > functional tests to make sure it's solid since this is a big > architectural change in nova. It depends on how the aggregates are created on the nova and placement side. A) From placement point of view, operator can create a new aggregate and add shared storage RP to it (tag MISC_SHARES_VIA_AGGREGATE trait to this RP). The newly created valid UUID would then be set in the config option ``sharing_disk_aggregate`` on the compute node side. This aggregate UUID wouldn't be present in the nova aggregate. so it's not possible to add host to the nova aggregate unless a new aggregate is created on nova side. B) If nova aggregates are synced to the placement service and say below is the picture: Nova: Agg1 - metadata (pinned=True) - host1 - host2 Now, operator adds a new shared storage RP to Agg1 on placement side and then set Agg1 UUID in ``sharing_disk_aggregate`` on compute nodes along with ``using_shared_disk_provider`=True``, then it would add compute node RP to the Agg1 on the placement without any issues but when you want to reverse the configuration, using_shared_disk_provider=False, then it not that straight to remove the host from the placement/nova aggregate because there would be other traits set to compute RPs which could cause those functions stop working. We had same kind of discussion [1] when implementing forbidden aggregates where we want to sync traits set to the aggregates but later it was concluded that operator will do it manually. I will include the details Matt has pointed out in this email in my next patchset. [1] : http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006950.html Regards, tpatil ________________________________________ From: Matt Riedemann Sent: Wednesday, November 13, 2019 11:41 PM To: openstack-discuss at lists.openstack.org Subject: Re: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship On 11/13/2019 8:34 AM, Sylvain Bauza wrote: > Me too. To be clear, I don't think operators would modify the above but > if so, they would need reshapes. Maybe not, but this is the kind of detail that should be in the spec and functional tests to make sure it's solid since this is a big architectural change in nova. -- Thanks, Matt Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. From ramishra at redhat.com Thu Nov 14 03:24:05 2019 From: ramishra at redhat.com (Rabi Mishra) Date: Thu, 14 Nov 2019 08:54:05 +0530 Subject: Keystone user ID case sensitivity issue In-Reply-To: References: <52A47B23EC4EC94C8EE6BFE332D37E224EBFCB3F@MOSTLS1MSGUSRFB.ITServices.sbc.com> Message-ID: On Thu, Nov 14, 2019 at 7:31 AM Sean Mooney wrote: > On Wed, 2019-11-13 at 23:07 +0000, PARSONS, CLIFF wrote: > > Hello everyone! > > > > My organization has a need to make the user name/ID retrieval from Heat > template to be case insensitive. For example: > > suppose we already have a user in keystone, "xyz123". Then we have a > client that creates a heat stack containing a > > UserRoleAssignment resource, in which the user was specified as > "XYZ123". The user would not be found in the Keystone > > database (due to Keystone user IDs being case sensitive) and the role > assignment would not occur. > > > > Either Keystone could be changed so that its users are treated case > insensitive, or we could make the change to heat > > (Heat KeystoneClientPlugin class) like in > https://review.opendev.org/#/c/694117/ so that it converts to lower case > > before querying keystone. > i honestly dont think we shoudl force everyone to use case insensitive > user names so i dont think converting to lower > case is valid. however it might we worth exploring if you could change the > encoding of the database so that it uses the > case insensitive by using the utf8_general_ci encodeing so that all db > opertion are case insensitive on the user tabel. > > Can I get some thoughts on this? Would something like this be > acceptable at all? Would we need to make it > > configurable, and if we did, would that be acceptable? > i think chaing api behavior based on a config option is an interoperablity > probelm > > keystone has to interact with external identity systesm and so assuming > all of those will be case inseitive would > proably break someone else who has the opisite requirement. > > i honestly think that people should just use the correct case in the heat > template. > if heat is not currently erroring out when the role assignment failts that > feels like a heat bug There is no heat bug. Heat would fail if the user does not exist and it does not override any service behaviour in the default client plugins. However, heat allows to write your own custom client plugin for keystone (if that's what you want), which overrides the behavior and use it in place of the default plugin. > but i would > personlly think its an error if i type my user name with the wrong case > and my correct passwourd and was able > to get a keystone token. > > > > Thanks in advance for your thoughts/concerns/suggestions. > > > > Thank you, > > Cliff Parsons > > > -- Regards, Rabi Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Thu Nov 14 05:39:25 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Thu, 14 Nov 2019 14:39:25 +0900 Subject: [horizon] next weekly meeting cancelled Message-ID: Hi, The weekly team meeting next week (Nov 20) is cancelled. I will be on a business trip to join a conference in US and cannot run it. We agreed to cancel it in the team meeting yesterday. Akihiro Motoki (amotoki) From amotoki at gmail.com Thu Nov 14 05:59:34 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Thu, 14 Nov 2019 14:59:34 +0900 Subject: [neutron][docs] networking-onos EOL? In-Reply-To: References: Message-ID: Hi, networking-onos project was under the neutron team governance, but it was retired in Oct 2016 [4][5]. Regarding the 'latest' documentation, there is no clear guideline on cleaning up "docs.o.o/latest/foo" when a repository is retried. I think that is the only reason we can still see docs.o.o/latest/networking-onos. Only projects under TC governance can publish documentation under docs.o.o, so I thnk we need a cleanup when a repository retirement. Thanks, Akihiro Motoki (amotoki) [4] https://review.opendev.org/#/c/383911/ (neutron team decision) [5] https://review.opendev.org/#/c/392010/ (governance change) On Mon, Nov 4, 2019 at 7:12 PM Mark Goddard wrote: > > Hi, > > We (kolla) had a bug report [1] from someone trying to use the neutron > onos_ml2 ML2 driver for the ONOS SDN controller. As far as I can tell > [2], this project hasn't been released since 2015. However, the > 'latest' documentation is still accessible [3], and does not mention > that the project is dead. What can we do to help steer people away > from projects like this? > > Cheers, > Mark > > [1] https://bugs.launchpad.net/bugs/1850763 > [2] https://pypi.org/project/networking-onos/#history > [3] https://docs.openstack.org/networking-onos/latest/ > From arnaud.morin at gmail.com Thu Nov 14 07:10:51 2019 From: arnaud.morin at gmail.com (Arnaud MORIN) Date: Thu, 14 Nov 2019 08:10:51 +0100 Subject: [sig] Forming a Large scale SIG In-Reply-To: <47730534-09AF-4018-A2AE-670D4725E569@telfer.org> References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> <47730534-09AF-4018-A2AE-670D4725E569@telfer.org> Message-ID: Hi all, +1 for me and my employer (OVH). We are mostly interested in sharing good practices when deploying a region at scale, and operating it. For the deployment part, my main pain point is about the configuration parameters I should use on different software (e.g. nova behind wsgi). The current doc is designed to deploy a small pod, but when we are going large, usually some of those params needs tuning. I'd like to identify them and eventually tag them to help other being aware that they are useful at large scale. About operating, I am pretty sure we can share some good advices as well. E.g., avoid restarting neutron agents in a single shot. So definitely interested in that group. Thanks for bringing that up. Cheers. Le mer. 13 nov. 2019 à 19:00, Stig Telfer a écrit : > Hi Thierry & all - > > Thanks for your mail. I’m interested in joining this SIG. Among others, > I’m interested in participating in discussions around these common problems: > > - golden signals for scaling bottlenecks (and what to do about them) > - using Ansible at scale > - strategies for simplifying OpenStack functionality in order to scale > > Cheers, > Stig > > > > On 13 Nov 2019, at 11:18, Thierry Carrez wrote: > > > > Hi everyone, > > > > In Shanghai we held a forum session to gauge interest in a new SIG to > specifically address cluster scaling issues. In the past we had several > groups ("Large deployments", "Performance", LCOO...) but those efforts were > arguably a bit too wide and those groups are now abandoned. > > > > My main goal here is to get large users directly involved in a domain > where their expertise can best translate into improvements in the software. > It's easy for such a group to go nowhere while trying to boil the ocean. To > maximize its chances of success and make it sustainable, the group should > have a narrow focus, and reasonable objectives. > > > > My personal idea for the group focus was to specifically address scaling > issues within a single cluster: basically identify and address issues that > prevent scaling a single cluster (or cell) past a number of nodes. By > sharing analysis and experience, the group could identify common pain > points that, once solved, would help raising that number. > > > > There was a lot of interest in that session[1], and it predictably > exploded in lots of different directions, including some that are > definitely past a single cluster (like making Neutron better support > cells). I think it's fine: my initial proposal was more of a strawman. > Active members of the group should really define what they collectively > want to work on. And the SIG name should be picked to match that. > > > > I'd like to help getting that group off the ground and to a place where > it can fly by itself, without needing external coordination. The first step > would be to identify interested members and discuss group scope and > objectives. Given the nature of the group (with interested members in > Japan, Europe, Australia and the US) it will be hard to come up with a > synchronous meeting time that will work for everyone, so let's try to hold > that discussion over email. > > > > So to kick this off: if you are interested in that group, please reply > to this email, introduce yourself and tell us what you would like the group > scope and objectives to be, and what you can contribute to the group. > > > > Thanks! > > > > [1] https://etherpad.openstack.org/p/PVG-large-scale-SIG > > > > -- > > Thierry Carrez (ttx) > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at est.tech Thu Nov 14 07:45:01 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Thu, 14 Nov 2019 07:45:01 +0000 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: References: <1573200108.23158.4@est.tech> <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> <1573654932.26082.3@est.tech> Message-ID: <1573717497.26082.4@est.tech> On Thu, Nov 14, 2019 at 02:58, "Patil, Tushar" wrote: > On 11/13/2019 8:34 AM, Sylvain Bauza wrote: >>> Me too. To be clear, I don't think operators would modify the >>> above but >>> if so, they would need reshapes. > >> Maybe not, but this is the kind of detail that should be in the >> spec and >> functional tests to make sure it's solid since this is a big >> architectural change in nova. > > It depends on how the aggregates are created on the nova and > placement side. > > A) From placement point of view, operator can create a new aggregate > and add shared storage RP to it (tag MISC_SHARES_VIA_AGGREGATE trait > to this RP). The newly created valid UUID would then be set in the > config option ``sharing_disk_aggregate`` on the compute node side. > This aggregate UUID wouldn't be present in the nova aggregate. so > it's not possible to add host to the nova aggregate unless a new > aggregate is created on nova side. > > B) If nova aggregates are synced to the placement service and say > below is the picture: > > Nova: > > Agg1 - metadata (pinned=True) > - host1 > - host2 > > Now, operator adds a new shared storage RP to Agg1 on placement side > and then set Agg1 UUID in ``sharing_disk_aggregate`` on compute nodes > along with ``using_shared_disk_provider`=True``, then it would add > compute node RP to the Agg1 on the placement without any issues but > when you want to reverse the configuration, > using_shared_disk_provider=False, then it not that straight to remove > the host from the placement/nova aggregate because there would be > other traits set to compute RPs which could cause those functions > stop working. For me from the sharing disk provider feature perspective the placement aggregate that is needed for the sharing to work, and any kind of nova host aggregate (either synced to placement or not) is independent. The placement aggregate is a must for the feature. On top of that if the operator wants to create a nova host aggregate as well and sync it to placement then at the end there will be two, independent placement aggregates. One to express the sharing relationship and one to express a host aggregate from nova. These two aggregate will not be the same as the first one will have the sharing provider in it while the second one doesn't. gibi > > We had same kind of discussion [1] when implementing forbidden > aggregates where we want to sync traits set to the aggregates but > later it was concluded that operator will do it manually. > > I will include the details Matt has pointed out in this email in my > next patchset. > > [1] : > http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006950.html > > Regards, > tpatil > > > > ________________________________________ > From: Matt Riedemann > Sent: Wednesday, November 13, 2019 11:41 PM > To: openstack-discuss at lists.openstack.org > Subject: Re: [nova][ptg] Allow compute nodes to use DISK_GB from > shared storage RP by using aggregate relationship > > On 11/13/2019 8:34 AM, Sylvain Bauza wrote: >> Me too. To be clear, I don't think operators would modify the above >> but >> if so, they would need reshapes. > > Maybe not, but this is the kind of detail that should be in the spec > and > functional tests to make sure it's solid since this is a big > architectural change in nova. > > -- > > Thanks, > > Matt > > Disclaimer: This email and any attachments are sent in strictest > confidence for the sole use of the addressee and may contain legally > privileged, confidential, and proprietary data. If you are not the > intended recipient, please advise the sender by replying promptly to > this email and then delete and destroy this email and any attachments > without any further use, copying or forwarding. > From fsbiz at yahoo.com Thu Nov 14 08:03:45 2019 From: fsbiz at yahoo.com (fsbiz at yahoo.com) Date: Thu, 14 Nov 2019 08:03:45 +0000 (UTC) Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> Message-ID: <78766172.92122.1573718625984@mail.yahoo.com> I am running stable Queens with hundreds of ironic baremetal nodes. Things are mostly stable but occasionally some baremetal node provisions are failing.  These failures have been tracked to nova placement failure leading to 409 errors.My nova and baremetal filters do NOT have the 3 filters you mention. [root at sc-control03 objects]# grep filter /etc/nova/nova.conf | grep filters # * enabled_filters #enabled_filters=RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter#use_baremetal_filters=false#baremetal_enabled_filters=RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ExactRamFilter,ExactDiskFilter,ExactCoreFilter The baremetal nodes are all using resource class.  My image does NOT  have the changes for https://review.opendev.org/#/c/565841 Ultimately, nova-conductor is reported "NoValidHost: No valid host was found. There are not enough hosts available"This has been traced to nova-placement-api "Allocation for CUSTOM_RRR430 on resource provider 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 violates min_unit, max_unit, or step_size. Requested: 2, min_unit: 1, max_unit: 1, step_size: 1" Any pointers on what next steps I should be looking at ? thanks,Fred. Relevant logs:  nova-conductor.log2019-11-12 10:26:02.593 1666486 ERROR nova.conductor.manager [req-fa1bfb2e-c765-432d-aa66-e16db8329312 - - - - -] Failed to schedule instances: NoValidHost_Remote: No valid host was found. There are not enough hosts available.Traceback (most recent call last):   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 226, in inner    return func(*args, **kwargs)   File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 154, in select_destinations    allocation_request_version, return_alternates)   File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 91, in select_destinations    allocation_request_version, return_alternates)   File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 243, in _schedule    claimed_instance_uuids)   File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 280, in _ensure_sufficient_hosts    raise exception.NoValidHost(reason=reason) NoValidHost: No valid host was found. There are not enough hosts available. nova-placement-api.log  3cacac3f-9af0-4e39-9bc8-d1f362bdb730 = resource ID of baremetal node 84ea2b90-06b2-489e-92ea-24b859b3c997 = instance ID 2019-11-12 10:26:02.427 4161131 INFO nova.api.openstack.placement.requestlog [req-66a6dc45-8326-4e24-9216-fc77099303ba 1ee9f9bf77294e8e8bf50bb35c581689 acf8cd411e5e4751a61d1ed54e8e874d - default default] 10.33.24.13 "GET /allocations/84ea2b90-06b2-489e-92ea-24b859b3c997" status: 200 len: 111 microversion: 1.0 2019-11-12 10:26:02.461 4161129 WARNING nova.objects.resource_provider [req-6d79841e-6abe-490e-b79b-8d88b04215af 1ee9f9bf77294e8e8bf50bb35c581689 acf8cd411e5e4751a61d1ed54e8e874d - default default] Allocation for CUSTOM_Z370_A on resource provider 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 violates min_unit, max_unit, or step_size. Requested: 2, min_unit: 1, max_unit: 1, step_size: 1 2019-11-12 10:26:02.568 4161129 INFO nova.api.openstack.placement.requestlog [req-6d79841e-6abe-490e-b79b-8d88b04215af 1ee9f9bf77294e8e8bf50bb35c581689 acf8cd411e5e4751a61d1ed54e8e874d - default default] 10.33.24.13 "PUT /allocations/84ea2b90-06b2-489e-92ea-24b859b3c997" status: 409 len: 383 microversion: 1.17 http_access_log10.33.24.13 - - [12/Nov/2019:10:26:02 -0800] "GET /allocations/84ea2b90-06b2-489e-92ea-24b859b3c997 HTTP/1.1" 200 111 "-" "nova-scheduler keystoneauth1/3.4.0 python-requests/2.14.2 CPython/2.7.5"10.33.24.13 - - [12/Nov/2019:10:26:02 -0800] "PUT /allocations/84ea2b90-06b2-489e-92ea-24b859b3c997 HTTP/1.1" 409 383 "-" "nova-scheduler keystoneauth1/3.4.0 python-requests/2.14.2 CPython/2.7.5" On Wednesday, November 13, 2019, 11:36:35 AM PST, Albert Braden wrote: Removing these 3 obsolete filters appears to have fixed the problem. Thank you for your advice! -----Original Message----- From: Matt Riedemann Sent: Tuesday, November 12, 2019 1:14 PM To: openstack-discuss at lists.openstack.org Subject: Re: Scheduler sends VM to HV that lacks resources On 11/12/2019 2:47 PM, Albert Braden wrote: > It's probably a config error. Where should I be looking? This is our nova config on the controllers: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_kNe1eRimk4ifrAuuN790bg&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=TZI4wT8_y-RAnwbbXaWBhdvAhhcbY1qymxKLRVpPt2U&s=3aQNqwtEMfOC7U_QUTqNqXiZv4yJy6ceB4kCuZKuL0o&e= If your deployment is pike or newer (I'm guessing rocky because your other email says rocky), then you don't need these filters: RetryFilter - alternate hosts bp in queens release makes this moot CoreFilter - placement filters on VCPU RamFilter - placement filters on MEMORY_MB -- Thanks, Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at est.tech Thu Nov 14 08:37:14 2019 From: balazs.gibizer at est.tech (=?utf-8?B?QmFsw6F6cyBHaWJpemVy?=) Date: Thu, 14 Nov 2019 08:37:14 +0000 Subject: [nova][ptg] Flavor explosion In-Reply-To: References: Message-ID: <1573720630.26082.5@est.tech> On Sun, Nov 10, 2019 at 16:09, Brin Zhang(张百林) wrote: > Hi all, > Based on the discussion on the Train PTG, and reference to the > records on the etherpad and ML, I was updated that SPEC, and I think > there are some details need to be discussed, and I have listed some > details, > if there are any other things that I have not considered, or if some > place that I thoughtless, please post a discussion. > > List some details as follows, and you can review that spec in > https://review.opendev.org/#/c/663563. > > Listed details: > - Don't change the model of the flavor in nova code and in the db. > > - No change for operators who choose not to request the flavor > extra specs group. > > - Requested more than one flavor extra specs groups, if there are > different values for the same spec will be raised a 409. > > - Flavor in request body of server create that has the same spec in > the request ``flavor_extra_specs_group``, it will be raised a 409. > > - When resize an instance, you need to compare the > ``flavor_extra_specs_group`` with the spec request spec, otherwise > raise a 400. > Thanks Brin for updating the spec, I did a review round on it and left comments. gibi From balazs.gibizer at est.tech Thu Nov 14 08:38:56 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Thu, 14 Nov 2019 08:38:56 +0000 Subject: [nova][ptg] Expose auto converge and post copy In-Reply-To: <75036C56-67F9-4C14-9ECD-BFF1DEAD006B@99cloud.net> References: <1573402509.31166.3@est.tech> <5717FBD3-8373-4EAA-9EF4-5F3EFDE7B53F@99cloud.net> <75036C56-67F9-4C14-9ECD-BFF1DEAD006B@99cloud.net> Message-ID: <1573720733.26082.6@est.tech> On Mon, Nov 11, 2019 at 15:45, wang.ya wrote: > Hi: > > Here is the spec [1]_ > Because the exist spec [2]_ has gap with the agreement, so I rewrote > a new spec. > > .. [1]: https://review.opendev.org/#/c/693655/ > .. [2]: https://review.opendev.org/#/c/687199/ Could you please abandon one of the specs this is no confuses me which solution you want to push forward. Cheers, gibi > > Best Regards > From zhangbailin at inspur.com Thu Nov 14 08:47:48 2019 From: zhangbailin at inspur.com (=?utf-8?B?QnJpbiBaaGFuZyjlvKDnmb7mnpcp?=) Date: Thu, 14 Nov 2019 08:47:48 +0000 Subject: =?utf-8?B?W2xpc3RzLm9wZW5zdGFjay5vcmfku6Plj5FdUmU6IFtub3ZhXSBUaG91Z2h0?= =?utf-8?B?cyBvbiBleHBvc2luZyBleGNlcHRpb24gdHlwZSB0byBub24tYWRtaW5zIGlu?= =?utf-8?Q?_instance_action_event?= In-Reply-To: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> References: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> Message-ID: <03bfc8edb0fe4b23955ae8007a11e8c1@inspur.com> I would like to see this feature, our customers have mentioned the same problem, I think this is useful. I think that should consider of the all instance action operations, such as actions in nova/compute/instance_actions.py. brinzhang > 主题: [lists.openstack.org代发]Re: [nova] Thoughts on exposing exception > type to non-admins in instance action event > > On 11/13/2019 11:17 AM, Eric Fried wrote: > > Unless it's likely to be something other than NoValidHost a > > significant percentage of the time, IMO it... > > Well just taking resize, it could be one of many things: > > https://github.com/openstack/nova/blob/20.0.0/nova/conductor/manager.py# > L366 > - oops you tried resizing which would screw up your group affinity policy > > https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L > 4490 > - (for an admin, cold migrate) oops you tried cold migrating a vcenter vm or you > have allow_resize_to_same_host=True and the scheduler picks the same host > (silly scheduler, see bug 1748697) > > https://github.com/openstack/nova/blob/20.0.0/nova/compute/claims.py#L11 > 3 > - oops you lost a resource claims race, try again > > https://github.com/openstack/nova/blob/20.0.0/nova/scheduler/client/report. > py#L1898 > - oops you lost a race with allocation consumer generation conflicts, try again > > -- > > Thanks, > > Matt From zhangbailin at inspur.com Thu Nov 14 08:55:48 2019 From: zhangbailin at inspur.com (=?utf-8?B?QnJpbiBaaGFuZyjlvKDnmb7mnpcp?=) Date: Thu, 14 Nov 2019 08:55:48 +0000 Subject: [nova][ptg] Flavor explosion Message-ID: Thanks Gibizer, I will update this specs to the ussuri directory, and update & reply your comments. brinzhang > 发件人: Balázs Gibizer [mailto:balazs.gibizer at est.tech] > > On Sun, Nov 10, 2019 at 16:09, Brin Zhang(张百林) > wrote: > > Hi all, > > Based on the discussion on the Train PTG, and reference to the > > records on the etherpad and ML, I was updated that SPEC, and I think > > there are some details need to be discussed, and I have listed some > > details, if there are any other things that I have not considered, or > > if some place that I thoughtless, please post a discussion. > > > > List some details as follows, and you can review that spec in > > https://review.opendev.org/#/c/663563. > > > > Listed details: > > - Don't change the model of the flavor in nova code and in the db. > > > > - No change for operators who choose not to request the flavor extra > > specs group. > > > > - Requested more than one flavor extra specs groups, if there are > > different values for the same spec will be raised a 409. > > > > - Flavor in request body of server create that has the same spec in > > the request ``flavor_extra_specs_group``, it will be raised a 409. > > > > - When resize an instance, you need to compare the > > ``flavor_extra_specs_group`` with the spec request spec, otherwise > > raise a 400. > > > > Thanks Brin for updating the spec, I did a review round on it and left comments. > > gibi > From sfinucan at redhat.com Thu Nov 14 09:06:34 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Thu, 14 Nov 2019 09:06:34 +0000 Subject: [nova][ptg] pinned and unpinned CPUs in one instance In-Reply-To: References: <1573196961.23158.1@est.tech> Message-ID: <1eafabf9c807438461a96afd2af9aa6d7992765e.camel@redhat.com> On Fri, 2019-11-08 at 12:20 +0000, Sean Mooney wrote: > > Naming: use the 'shared' and 'dedicated' terminology > didn't we want to have a hw:cpu_policy=mixed specificaly for this case? It wasn't clear, but gibi was referring to how we'd distinguish the "types" of CPU and instances using those CPUs. The alternative was pinned and unpinned. Stephen From sfinucan at redhat.com Thu Nov 14 09:08:46 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Thu, 14 Nov 2019 09:08:46 +0000 Subject: [nova][ptg] pinned and unpinned CPUs in one instance In-Reply-To: <77E9D723B6A15C4CB27F7C3F130DE8624776582B@shsmsx102.ccr.corp.intel.com> References: <1573196961.23158.1@est.tech> <77E9D723B6A15C4CB27F7C3F130DE8624776582B@shsmsx102.ccr.corp.intel.com> Message-ID: <4a0ab4e36683efefb5289c0ab2a8861569dd691a.camel@redhat.com> On Mon, 2019-11-11 at 11:58 +0000, Wang, Huaqiang wrote: > > -----Original Message----- > > From: Balázs Gibizer > > Sent: Friday, November 8, 2019 3:10 PM > > To: openstack-discuss > > Subject: [nova][ptg] pinned and unpinned CPUs in one instance > > > > spec: https://review.opendev.org/668656 > > > > Agreements from the PTG: > > > > How we will test it: > > * do functional test with libvirt driver, like the pinned cpu tests we have > > today > > * donyd's CI supports nested virt so we can do pinned cpu testing but not > > realtime. As this CI is still work in progress we should not block on this. > > * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to > > have > > > > Naming: use the 'shared' and 'dedicated' terminology > > > > Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra > > specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax will > > have less expression power until nova models NUMA in placement. So nova > > will try to evenly distribute PCPUs between numa nodes. If it not possible we > > reject the request and ask the user to use the > > hw:pinvcpus=3 syntax. > > > > Realtime mask is an exclusion mask, any vcpus not listed there has to be in > > the dedicated set of the instance. > > > > TODOInvestigate whether we want to enable NUMA by default > > * Pros: Simpler, everything is NUMA by default > > * Cons: We'll either have to break/make configurablethe 1:1 guest:host > > NUMA mapping else we won't be able to boot e.g. a 40 core shared instance > > on a 40 core, 2 NUMA node host > > For the case of 'booting a 40 core shared instance on 40 core 2NUMA node' that will > not be covered by the new 'mixed' policy. It is just a legacy 'shared' instance with no > assumption about instance NUMA topology. Correct. However, this investigation refers to *all* instances, not just those using the 'mixed' policy. For the 'mixed' policy, I assume we'll need to apply a virtual NUMA topology since we currently apply one for instances using the 'dedicated' policy. > By the way if you want a 'shared' instance, with 40 cores, to be scheduled on a host > of 40cores, 2 NUMA nodes, you also need to register all host cores as 'shared' cpus > through 'conf.compute.cpu_shared_set'. > > For instance with 'mixed' policy, what I want to propose is the instance should > demand at least one 'dedicated'(or PCPU) core. Thus, any 'mixed' instance or 'dedicated' > instance will not be scheduled one this host due to no PCPU available on this host. > > And also, a 'mixed' instance should also demand at least one 'shared' (or VCPU) core. > a 'mixed' instance demanding all cores from PCPU resource should be considered as > an invalid one. And an instance demanding all cores from PCPU resource is just a > legacy 'dedicated' instance, which CPU allocation policy is 'dedicated'. > > In conclusion, a instance with the policy of 'mixed' > -. demands at least one 'dedicated' cpu and at least one 'shared' cpu. > -. with NUMA topology by default due to requesting pinned cpu > > In my understanding the cons does not exist by making above rules. > > Br > Huaqiang > > > > > Cheers, > > gibi From cdent+os at anticdent.org Thu Nov 14 09:12:51 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Thu, 14 Nov 2019 09:12:51 +0000 (GMT) Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: <78766172.92122.1573718625984@mail.yahoo.com> References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> <78766172.92122.1573718625984@mail.yahoo.com> Message-ID: On Thu, 14 Nov 2019, fsbiz at yahoo.com wrote: > Ultimately, nova-conductor is reported "NoValidHost: No valid host was found. There are not enough hosts available"This has been traced to nova-placement-api "Allocation for CUSTOM_RRR430 on resource provider 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 violates min_unit, max_unit, or step_size. Requested: 2, min_unit: 1, max_unit: 1, step_size: 1" > Any pointers on what next steps I should be looking at ? Your request, is asking for CUSTOM_RRR430 will a value of 2, but it is only available as 1. Have a look at your server create request, there's something, probably your flavor, which is unexpected. Placement and nova scheduler are working correctly with the data they have, the problem is with how inventory is being reported or requested. This could either be with how your ironic nodes are being reported, or with flavors. > 2019-11-12 10:26:02.461 4161129 WARNING nova.objects.resource_provider [req-6d79841e-6abe-490e-b79b-8d88b04215af 1ee9f9bf77294e8e8bf50bb35c581689 acf8cd411e5e4751a61d1ed54e8e874d - default default] Allocation for CUSTOM_Z370_A on resource provider 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 violates min_unit, max_unit, or step_size. Requested: 2, min_unit: 1, max_unit: 1, step_size: 1 This is the same issue, but with a different class of inventory -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From wang.ya at 99cloud.net Thu Nov 14 09:23:35 2019 From: wang.ya at 99cloud.net (wangya) Date: Thu, 14 Nov 2019 17:23:35 +0800 Subject: [nova][ptg] Expose auto converge and post copy In-Reply-To: <1573720733.26082.6@est.tech> References: <1573402509.31166.3@est.tech> <5717FBD3-8373-4EAA-9EF4-5F3EFDE7B53F@99cloud.net> <75036C56-67F9-4C14-9ECD-BFF1DEAD006B@99cloud.net> <1573720733.26082.6@est.tech> Message-ID: <4118925c-15c8-3f91-2fea-7ece720d5dd9@99cloud.net> > On Mon, Nov 11, 2019 at 15:45, wang.ya wrote: >> Hi: >> >> Here is the spec [1]_ >> Because the exist spec [2]_ has gap with the agreement, so I rewrote >> a new spec. >> >> .. [1]: https://review.opendev.org/#/c/693655/ >> .. [2]: https://review.opendev.org/#/c/687199/ > Could you please abandon one of the specs this is no confuses me which > solution you want to push forward. https://review.opendev.org/#/c/687199/ has been abandoned Please discuss in this spec: https://review.opendev.org/#/c/693655/;-) From mark at stackhpc.com Thu Nov 14 09:24:11 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 14 Nov 2019 09:24:11 +0000 Subject: [neutron][docs][infra] networking-onos EOL? In-Reply-To: References: Message-ID: Added [infra]. On Thu, 14 Nov 2019 at 05:59, Akihiro Motoki wrote: > > Hi, > > networking-onos project was under the neutron team governance, but it > was retired in Oct 2016 [4][5]. > > Regarding the 'latest' documentation, there is no clear guideline on > cleaning up "docs.o.o/latest/foo" > when a repository is retried. I think that is the only reason we can > still see docs.o.o/latest/networking-onos. > Only projects under TC governance can publish documentation under > docs.o.o, so I thnk we need a cleanup > when a repository retirement. That sounds like a fair argument to me. > > Thanks, > Akihiro Motoki (amotoki) > > [4] https://review.opendev.org/#/c/383911/ (neutron team decision) > [5] https://review.opendev.org/#/c/392010/ (governance change) > > On Mon, Nov 4, 2019 at 7:12 PM Mark Goddard wrote: > > > > Hi, > > > > We (kolla) had a bug report [1] from someone trying to use the neutron > > onos_ml2 ML2 driver for the ONOS SDN controller. As far as I can tell > > [2], this project hasn't been released since 2015. However, the > > 'latest' documentation is still accessible [3], and does not mention > > that the project is dead. What can we do to help steer people away > > from projects like this? > > > > Cheers, > > Mark > > > > [1] https://bugs.launchpad.net/bugs/1850763 > > [2] https://pypi.org/project/networking-onos/#history > > [3] https://docs.openstack.org/networking-onos/latest/ > > From moreira.belmiro.email.lists at gmail.com Thu Nov 14 09:31:10 2019 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Thu, 14 Nov 2019 10:31:10 +0100 Subject: [sig] Forming a Large scale SIG In-Reply-To: References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> <47730534-09AF-4018-A2AE-670D4725E569@telfer.org> Message-ID: Hi, first of all thanks to Thierry for driving this SIG creation. Having a SIG to discuss how to deploy/operate a large deployment will be incredibly useful. In my opinion we shouldn't restrict ourselves to a specific project or deployment size (or number of cells) but discuss the limits of each project architecture, the projects dependencies, limitations at scale (functionality vs simplicity), operational difficulties... Sharing experiences and understand the different challenges and actions that we are using to mitigate them will be extremely valuable. I think that we already have a lot of examples of companies/organizations that are deploying OpenStack at large scale. Compiling all this information (Summit presentations, blogs, superuser articles, ...) will be a good starting point for all operators and discussions. Every deployment is different. I also would like this SIG to be the bridge between the operators of large deployments and developers. Bringing specific pain points to discussion with developers. cheers, Belmiro CERN On Thu, Nov 14, 2019 at 8:25 AM Arnaud MORIN wrote: > Hi all, > > +1 for me and my employer (OVH). > We are mostly interested in sharing good practices when deploying a region > at scale, and operating it. > > For the deployment part, my main pain point is about the configuration > parameters I should use on different software (e.g. nova behind wsgi). > The current doc is designed to deploy a small pod, but when we are going > large, usually some of those params needs tuning. I'd like to identify them > and eventually tag them to help other being aware that they are useful at > large scale. > > About operating, I am pretty sure we can share some good advices as well. > E.g., avoid restarting neutron agents in a single shot. > > So definitely interested in that group. Thanks for bringing that up. > > Cheers. > > Le mer. 13 nov. 2019 à 19:00, Stig Telfer a > écrit : > >> Hi Thierry & all - >> >> Thanks for your mail. I’m interested in joining this SIG. Among others, >> I’m interested in participating in discussions around these common problems: >> >> - golden signals for scaling bottlenecks (and what to do about them) >> - using Ansible at scale >> - strategies for simplifying OpenStack functionality in order to scale >> >> Cheers, >> Stig >> >> >> > On 13 Nov 2019, at 11:18, Thierry Carrez wrote: >> > >> > Hi everyone, >> > >> > In Shanghai we held a forum session to gauge interest in a new SIG to >> specifically address cluster scaling issues. In the past we had several >> groups ("Large deployments", "Performance", LCOO...) but those efforts were >> arguably a bit too wide and those groups are now abandoned. >> > >> > My main goal here is to get large users directly involved in a domain >> where their expertise can best translate into improvements in the software. >> It's easy for such a group to go nowhere while trying to boil the ocean. To >> maximize its chances of success and make it sustainable, the group should >> have a narrow focus, and reasonable objectives. >> > >> > My personal idea for the group focus was to specifically address >> scaling issues within a single cluster: basically identify and address >> issues that prevent scaling a single cluster (or cell) past a number of >> nodes. By sharing analysis and experience, the group could identify common >> pain points that, once solved, would help raising that number. >> > >> > There was a lot of interest in that session[1], and it predictably >> exploded in lots of different directions, including some that are >> definitely past a single cluster (like making Neutron better support >> cells). I think it's fine: my initial proposal was more of a strawman. >> Active members of the group should really define what they collectively >> want to work on. And the SIG name should be picked to match that. >> > >> > I'd like to help getting that group off the ground and to a place where >> it can fly by itself, without needing external coordination. The first step >> would be to identify interested members and discuss group scope and >> objectives. Given the nature of the group (with interested members in >> Japan, Europe, Australia and the US) it will be hard to come up with a >> synchronous meeting time that will work for everyone, so let's try to hold >> that discussion over email. >> > >> > So to kick this off: if you are interested in that group, please reply >> to this email, introduce yourself and tell us what you would like the group >> scope and objectives to be, and what you can contribute to the group. >> > >> > Thanks! >> > >> > [1] https://etherpad.openstack.org/p/PVG-large-scale-SIG >> > >> > -- >> > Thierry Carrez (ttx) >> > >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Thu Nov 14 09:38:49 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 14 Nov 2019 10:38:49 +0100 Subject: [neutron] Review priorities Message-ID: <20191114093849.nqpjoomcfq6s2dfw@skaplons-mac> Hi neutrinos, According to our discussion during Train retrospective in Shanghai, I added "review-priority" label for neutron projects. It can be set by every core team member to values like: -1 - Branch Freeze +1 - Important Change +2 - Gate Blocker Fix / Urgent Change You can use dashboard like [1] to track such high priority patches and review them. I will also add some note about this to our docs this week to make it clear and visible for everyone. [1] https://tinyurl.com/vezk6n6 -- Slawek Kaplonski Senior software engineer Red Hat From dh3 at sanger.ac.uk Thu Nov 14 09:44:33 2019 From: dh3 at sanger.ac.uk (Dave Holland) Date: Thu, 14 Nov 2019 09:44:33 +0000 Subject: [sig] Forming a Large scale SIG [EXT] In-Reply-To: References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> <47730534-09AF-4018-A2AE-670D4725E569@telfer.org> Message-ID: <20191114094433.GN3793@sanger.ac.uk> Hi, Belmiro's point of linking operators and developers is hugely important because the developers have the tough job of catering for both large and small deployments. What can look like a safety net for small systems (e.g. per-container file descriptor limits) turns into a huge pitfall when deploying at scale. I'm really interested to be involved in this SIG. Cheers, Dave -- ** Dave Holland ** Systems Support -- Informatics Systems Group ** ** 01223 496923 ** Wellcome Sanger Institute, Hinxton, UK ** On Thu, Nov 14, 2019 at 10:31:10AM +0100, Belmiro Moreira wrote: > Hi, > first of all thanks to Thierry for driving this SIG creation. > Having a SIG to discuss how to deploy/operate a large deployment will > be incredibly useful. > In my opinion we shouldn't restrict ourselves to a specific project > or deployment size (or number of cells) but discuss the limits of > each project architecture, the projects dependencies, limitations at > scale (functionality vs simplicity), operational difficulties... > Sharing experiences and understand the different challenges and > actions that we are using to mitigate them will be extremely > valuable. > I think that we already have a lot of examples of > companies/organizations that are deploying OpenStack at large scale. > Compiling all this information (Summit presentations, blogs, > superuser articles, ...) will be a good starting point for all > operators and discussions. Every deployment is different. > I also would like this SIG to be the bridge between the operators of > large deployments and developers. Bringing specific pain points to > discussion with developers. > cheers, > Belmiro > CERN > > On Thu, Nov 14, 2019 at 8:25 AM Arnaud MORIN > <[1]arnaud.morin at gmail.com> wrote: > > Hi all, > +1 for me and my employer (OVH). > We are mostly interested in sharing good practices when deploying a > region at scale, and operating it. > For the deployment part, my main pain point is about the > configuration parameters I should use on different software (e.g. > nova behind wsgi). > The current doc is designed to deploy a small pod, but when we are > going large, usually some of those params needs tuning. I'd like to > identify them and eventually tag them to help other being aware that > they are useful at large scale. > About operating, I am pretty sure we can share some good advices as > well. E.g., avoid restarting neutron agents in a single shot. > So definitely interested in that group. Thanks for bringing that up. > Cheers. > > Le mer. 13 nov. 2019 à 19:00, Stig Telfer > <[2]stig.openstack at telfer.org> a écrit : > > Hi Thierry & all - > Thanks for your mail. I’m interested in joining this SIG. Among > others, I’m interested in participating in discussions around > these common problems: > - golden signals for scaling bottlenecks (and what to do about > them) > - using Ansible at scale > - strategies for simplifying OpenStack functionality in order to > scale > Cheers, > Stig > > On 13 Nov 2019, at 11:18, Thierry Carrez > <[3]thierry at openstack.org> wrote: > > > > Hi everyone, > > > > In Shanghai we held a forum session to gauge interest in a new > SIG to specifically address cluster scaling issues. In the past we > had several groups ("Large deployments", "Performance", LCOO...) > but those efforts were arguably a bit too wide and those groups > are now abandoned. > > > > My main goal here is to get large users directly involved in a > domain where their expertise can best translate into improvements > in the software. It's easy for such a group to go nowhere while > trying to boil the ocean. To maximize its chances of success and > make it sustainable, the group should have a narrow focus, and > reasonable objectives. > > > > My personal idea for the group focus was to specifically address > scaling issues within a single cluster: basically identify and > address issues that prevent scaling a single cluster (or cell) > past a number of nodes. By sharing analysis and experience, the > group could identify common pain points that, once solved, would > help raising that number. > > > > There was a lot of interest in that session[1], and it > predictably exploded in lots of different directions, including > some that are definitely past a single cluster (like making > Neutron better support cells). I think it's fine: my initial > proposal was more of a strawman. Active members of the group > should really define what they collectively want to work on. And > the SIG name should be picked to match that. > > > > I'd like to help getting that group off the ground and to a > place where it can fly by itself, without needing external > coordination. The first step would be to identify interested > members and discuss group scope and objectives. Given the nature > of the group (with interested members in Japan, Europe, Australia > and the US) it will be hard to come up with a synchronous meeting > time that will work for everyone, so let's try to hold that > discussion over email. > > > > So to kick this off: if you are interested in that group, please > reply to this email, introduce yourself and tell us what you would > like the group scope and objectives to be, and what you can > contribute to the group. > > > > Thanks! > > > > [1] [4]https://etherpad.openstack.org/p/PVG-large-scale-SIG > [etherpad.openstack.org] > > > > -- > > Thierry Carrez (ttx) > > > > References > > 1. mailto:arnaud.morin at gmail.com > 2. mailto:stig.openstack at telfer.org > 3. mailto:thierry at openstack.org > 4. https://urldefense.proofpoint.com/v2/url?u=https-3A__etherpad.openstack.org_p_PVG-2Dlarge-2Dscale-2DSIG&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=64bKjxgut4Pa0xs5b84yPg&m=DdEhOLy_myry74y3z2LhDWbl3ztokcSVufGIqfDSCaM&s=L7GyQqoSsD_56ROhOkKxfMtbER6jrPjcNSZrjNsQrMg&e= -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From moreira.belmiro.email.lists at gmail.com Thu Nov 14 09:58:59 2019 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Thu, 14 Nov 2019 10:58:59 +0100 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: <2ce84a47ac59bdd160a71b37eaf05f0eca9e1f85.camel@redhat.com> References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> <2ce84a47ac59bdd160a71b37eaf05f0eca9e1f85.camel@redhat.com> Message-ID: Hi, Akihiro, thanks for you summary. We use the linuxbridge driver because its simplicity and the match with the old nova-network schema (yes, are we still migrating). The functionality gap between ovs driver and linuxbridge is a good think in my view. It allows operators to chose the best solution considering their deployment use case and scale. Slawek, Miguel please keep us in the discussions. Belmiro CERN On Wed, Nov 13, 2019 at 7:22 PM Sean Mooney wrote: > On Tue, 2019-11-12 at 14:53 +0100, Slawek Kaplonski wrote: > > Stateless security groups > > ========================= > > > > Old RFE [21] was approved for neutron-fwaas project but we all agreed > that this > > should be now implemented for security groups in core Neutron. > > People from Nuage are interested in work on this in upstream. > > We should probably also explore how easy/hard it will be to implement it > in > > networking-ovn backend. > > for what its worth we implemented this 4 years ago and it was breifly used > in production trial deployment > in a telco deployment but i dont think it ever went to full production as > they went wtih sriov instead > https://review.opendev.org/#/c/264131/ as part of this RFE > https://bugs.launchpad.net/neutron/+bug/1531205 which was > closed as wont fix > https://bugs.launchpad.net/neutron/+bug/1531205/comments/14 > as it was view that this was not the correct long term direction for the > community. > this is the summit presentation for austin for anyone that does not > rememebr this effort > > > https://www.openstack.org/videos/summits/austin-2016/tired-of-iptables-based-security-groups-heres-how-to-gain-tremendous-speed-with-open-vswitch-instead > > im not sure how the new proposal differeres form our previous proposal for > the same > feautre but the main pushback we got was that the securtiy group api is > assumed to be stateful > and that is why this was rejected. form our mesurments at the time we > expected the stateless approch > to scale better then contrack driver so it woudl be nice to see a > stateless approch avialable. > i never got around to deleteing our implemenation form networking-ovs-dpdk > > https://opendev.org/x/networking-ovs-dpdk/src/branch/master/networking_ovs_dpdk/agent/ovs_dpdk_firewall.py > but i has not been tested our updated really for the last 2 years but it > could be used as a basis of this effort > if nuage does not have a poc already. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Nov 14 11:14:21 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 14 Nov 2019 11:14:21 +0000 Subject: [nova][ptg] pinned and unpinned CPUs in one instance In-Reply-To: <4a0ab4e36683efefb5289c0ab2a8861569dd691a.camel@redhat.com> References: <1573196961.23158.1@est.tech> <77E9D723B6A15C4CB27F7C3F130DE8624776582B@shsmsx102.ccr.corp.intel.com> <4a0ab4e36683efefb5289c0ab2a8861569dd691a.camel@redhat.com> Message-ID: <5ac1120f9d872879d9cfaf19d2f61fa02e63887b.camel@redhat.com> On Thu, 2019-11-14 at 09:08 +0000, Stephen Finucane wrote: > On Mon, 2019-11-11 at 11:58 +0000, Wang, Huaqiang wrote: > > > -----Original Message----- > > > From: Balázs Gibizer > > > Sent: Friday, November 8, 2019 3:10 PM > > > To: openstack-discuss > > > Subject: [nova][ptg] pinned and unpinned CPUs in one instance > > > > > > spec: https://review.opendev.org/668656 > > > > > > Agreements from the PTG: > > > > > > How we will test it: > > > * do functional test with libvirt driver, like the pinned cpu tests we have > > > today > > > * donyd's CI supports nested virt so we can do pinned cpu testing but not > > > realtime. As this CI is still work in progress we should not block on this. > > > * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to > > > have > > > > > > Naming: use the 'shared' and 'dedicated' terminology > > > > > > Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra > > > specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax will > > > have less expression power until nova models NUMA in placement. So nova > > > will try to evenly distribute PCPUs between numa nodes. If it not possible we > > > reject the request and ask the user to use the > > > hw:pinvcpus=3 syntax. > > > > > > Realtime mask is an exclusion mask, any vcpus not listed there has to be in > > > the dedicated set of the instance. > > > > > > TODOInvestigate whether we want to enable NUMA by default > > > * Pros: Simpler, everything is NUMA by default > > > * Cons: We'll either have to break/make configurablethe 1:1 guest:host > > > NUMA mapping else we won't be able to boot e.g. a 40 core shared instance > > > on a 40 core, 2 NUMA node host > > > > For the case of 'booting a 40 core shared instance on 40 core 2NUMA node' that will > > not be covered by the new 'mixed' policy. It is just a legacy 'shared' instance with no > > assumption about instance NUMA topology. > > Correct. However, this investigation refers to *all* instances, not > just those using the 'mixed' policy. For the 'mixed' policy, I assume > we'll need to apply a virtual NUMA topology since we currently apply > one for instances using the 'dedicated' policy. yes for consitency i think that would be the correct approch too. > > > By the way if you want a 'shared' instance, with 40 cores, to be scheduled on a host > > of 40cores, 2 NUMA nodes, you also need to register all host cores as 'shared' cpus > > through 'conf.compute.cpu_shared_set'. > > > > For instance with 'mixed' policy, what I want to propose is the instance should > > demand at least one 'dedicated'(or PCPU) core. Thus, any 'mixed' instance or 'dedicated' > > instance will not be scheduled one this host due to no PCPU available on this host. > > > > And also, a 'mixed' instance should also demand at least one 'shared' (or VCPU) core. > > a 'mixed' instance demanding all cores from PCPU resource should be considered as > > an invalid one. And an instance demanding all cores from PCPU resource is just a > > legacy 'dedicated' instance, which CPU allocation policy is 'dedicated'. > > > > In conclusion, a instance with the policy of 'mixed' > > -. demands at least one 'dedicated' cpu and at least one 'shared' cpu. > > -. with NUMA topology by default due to requesting pinned cpu > > > > In my understanding the cons does not exist by making above rules. > > > > Br > > Huaqiang > > > > > > > > Cheers, > > > gibi > > From fungi at yuggoth.org Thu Nov 14 11:54:45 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 14 Nov 2019 11:54:45 +0000 Subject: [neutron][docs][infra] networking-onos EOL? In-Reply-To: References: Message-ID: <20191114115445.mfuin7xuqfkss42r@yuggoth.org> On 2019-11-14 09:24:11 +0000 (+0000), Mark Goddard wrote: > Added [infra]. [...] Can you clarify why? Reading back through the thread this sounds like you either want a change to the way the main documentation redirects in the openstack-manuals repo are designed, or you want some change merged to networking-onos to replace its documentation with some indication it's retired, or you want a change to the openstackdocstheme Sphinx theme to add a banner/admonishment on docs for retired repos... or are you simply relying on the Infra team to remind people on other teams how their projects are interrelated? A more specific request/question would really help. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From corey.bryant at canonical.com Thu Nov 14 13:47:35 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Thu, 14 Nov 2019 08:47:35 -0500 Subject: [tc] Add non-voting py38 for ussuri In-Reply-To: References: Message-ID: On Wed, Nov 13, 2019 at 3:36 PM Clark Boylan wrote: > On Fri, Nov 8, 2019, at 6:09 AM, Corey Bryant wrote: > > > > > > On Thu, Nov 7, 2019 at 5:56 PM Sean McGinnis > wrote: > > > My non-TC take on this... > > > > > > > > > > Python 3.8 is available in Ubuntu Bionic now and while I understand > it's too late to enable voting py38 unit tests for ussuri, I'd like to at > least enable non-voting py38 unit tests. This email is seeking approval and > direction from the TC to move forward with enabling non-voting py38 tests. > > > > > > I think it would be great to start testing 3.8 so there are no > surprises once we need to officially move there. But I would actually not > want to see that run on every since patch in every single repo. > > > > Just to be clear I'm only talking about unit tests right now which are > > generally light on resource requirements. However it would be great to > > also have py38 function test enablement and periodic would make sense > > for function tests at this point. For unit tests though it seems the > > benefit of knowing whether your patch regresses unit tests for the > > latest python version far outweighs the resources required, so I don't > > see much benefit in adding periodic unit test jobs. > > > > Wanted to point out that we've begun to expose resource consumption in > nodepool to graphite. You can find per project and per tenant resource > usage under stats.zuul.nodepool.resources at https://graphite.opendev.org. > Unfortunately, I don't think we have per job resource tracking there yet, > but previous measurements from log files do agree that unittest consumption > is relatively low. > > It is large multinode integration jobs that run for extended periods of > time that have the greatest impact on our resource utilization. > > Clark > > That's great, thanks for sharing. Per job would be a super nice addition. Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Nov 14 13:58:17 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 07:58:17 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: <9abb3dc9-4b01-90f6-c020-80d4b50d6356@fried.cc> References: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> <9abb3dc9-4b01-90f6-c020-80d4b50d6356@fried.cc> Message-ID: On 11/13/2019 2:38 PM, Eric Fried wrote: > Okay, are we going to have a document that maps exception classes to > these explanations and recovery actions? Which we then have to maintain > as the code changes? Or are they expected to look through code (without > a stack trace)? Nope. > > I'm not against the idea, just playing devil's advocate. Sylvain seems > to have a use case, so great. Yeah I know. Like I said in the original email, just having the exception type might not be very useful to an end user. That's almost like just showing an error code that is then used by support staff. If we do expose the details as the formatted exception message, like we do for faults, then I think it would be more useful to end users, but then you also run into the same issues as we have for fault messages that maybe leak too much detail [1]. However, with the way I was thinking about doing this, the instance action code would use the same utility method that generates the fault message so if we fix [1] for faults it's also fixed for instance actions automatically. If I get the time this week I'll WIP something together that does what I'm thinking as a proof of concept, likely without the microversion stuff just since that's unnecessary overhead for a PoC. > > As an alternative, have we considered a mechanism whereby we could, in > appropriate code paths, provide some text that's expressly intended for > the end user to see? Maybe it's a new user_message field on > NovaException which, if present, gets percolated up to a new field > similar to the one you suggested. I think that likely becomes as whack-a-mole to contain as documenting all of the different types of errors. [1] https://bugs.launchpad.net/nova/+bug/1851587 -- Thanks, Matt From mriedemos at gmail.com Thu Nov 14 13:59:30 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 07:59:30 -0600 Subject: =?UTF-8?B?UmU6IFtsaXN0cy5vcGVuc3RhY2sub3Jn5Luj5Y+RXVJlOiBbbm92YV0g?= =?UTF-8?Q?Thoughts_on_exposing_exception_type_to_non-admins_in_instance_act?= =?UTF-8?Q?ion_event?= In-Reply-To: <03bfc8edb0fe4b23955ae8007a11e8c1@inspur.com> References: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> <03bfc8edb0fe4b23955ae8007a11e8c1@inspur.com> Message-ID: <1c9e6663-4c27-6865-7a3f-4cd15664e581@gmail.com> On 11/14/2019 2:47 AM, Brin Zhang(张百林) wrote: > I think that should consider of the all instance action operations, such as actions in nova/compute/instance_actions.py. The resize examples in my email are just examples. The code that generates the action events is centralized in the InstanceActionEvent object so it would be used for all actions that fail with some exception. -- Thanks, Matt From mriedemos at gmail.com Thu Nov 14 14:02:28 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 08:02:28 -0600 Subject: [nova][ptg] Allow compute nodes to use DISK_GB from shared storage RP by using aggregate relationship In-Reply-To: <1573717497.26082.4@est.tech> References: <1573200108.23158.4@est.tech> <54cc7ca8-eee9-6115-5c7b-ecfa8e39bf54@gmail.com> <1573654932.26082.3@est.tech> <1573717497.26082.4@est.tech> Message-ID: On 11/14/2019 1:45 AM, Balázs Gibizer wrote: > For me from the sharing disk provider feature perspective the placement > aggregate that is needed for the sharing to work, and any kind of nova > host aggregate (either synced to placement or not) is independent. The > placement aggregate is a must for the feature. On top of that if the > operator wants to create a nova host aggregate as well and sync it to > placement then at the end there will be two, independent placement > aggregates. One to express the sharing relationship and one to express > a host aggregate from nova. These two aggregate will not be the same as > the first one will have the sharing provider in it while the second one > doesn't. I tend to agree with the simplicity of this as well. -- Thanks, Matt From witold.bedyk at suse.com Thu Nov 14 14:03:55 2019 From: witold.bedyk at suse.com (Witek Bedyk) Date: Thu, 14 Nov 2019 15:03:55 +0100 Subject: [monasca] New team meeting time poll Message-ID: Hello everyone, We would like to find the new time slot for the Monasca Team Meeting which suites you best. Please fill in the times which work for you in that poll [1] until next Wednesday. Thanks Witek [1] https://doodle.com/poll/ey6brvmbsubkxpp9 From mriedemos at gmail.com Thu Nov 14 14:10:00 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 08:10:00 -0600 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> <78766172.92122.1573718625984@mail.yahoo.com> Message-ID: On 11/14/2019 3:12 AM, Chris Dent wrote: > Your request, is asking for CUSTOM_RRR430 will a value of 2, but it > is only available as 1. Have a look at your server create request, > there's something, probably your flavor, which is unexpected. https://review.opendev.org/#/c/620111/ comes to mind, I'm not sure if that helps you workaround the problem or not. Be sure to go through this doc as well: https://docs.openstack.org/ironic/queens/install/configure-nova-flavors.html#scheduling-based-on-resource-classes Specifically the part about overriding the VCPU/MEMORY_MB/DISK_GB values in the baremetal flavors. My guess is maybe you haven't done that and the scheduler is selecting a node based on vcpu/ram/disk that is already fully consumed by another node with the same resource class? Failing all that, it might be an issue due to https://review.opendev.org/#/c/637217/ which I abandoned because I just didn't have the time or will to push on it any further. If nothing else the bugs linked to those patches might be helpful with workarounds that CERN did when they were doing their baremetal flavor migration to custom resource classes. There were definitely bumps along the way. -- Thanks, Matt From mriedemos at gmail.com Thu Nov 14 14:13:05 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 08:13:05 -0600 Subject: [sig] Forming a Large scale SIG In-Reply-To: References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> <47730534-09AF-4018-A2AE-670D4725E569@telfer.org> Message-ID: <473b01fd-217d-3739-c8a2-ab26944bbb6a@gmail.com> On 11/14/2019 1:10 AM, Arnaud MORIN wrote: > The current doc is designed to deploy a small pod, but when we are going > large, usually some of those params needs tuning. I'd like to identify > them and eventually tag them to help other being aware that they are > useful at large scale. For anything nova specific you could dump it into [1] with a comment. That's a bug tracking stuff like this that should eventually be documented in nova for large scale performance considerations. [1] https://bugs.launchpad.net/nova/+bug/1838819 -- Thanks, Matt From tpb at dyncloud.net Thu Nov 14 14:30:38 2019 From: tpb at dyncloud.net (Tom Barron) Date: Thu, 14 Nov 2019 09:30:38 -0500 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: <9abb3dc9-4b01-90f6-c020-80d4b50d6356@fried.cc> References: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> <9abb3dc9-4b01-90f6-c020-80d4b50d6356@fried.cc> Message-ID: <20191114143038.r3afg4ai6rq65qsr@barron.net> On 13/11/19 14:38 -0600, Eric Fried wrote: >Okay, are we going to have a document that maps exception classes to >these explanations and recovery actions? Which we then have to maintain >as the code changes? Or are they expected to look through code (without >a stack trace)? > >I'm not against the idea, just playing devil's advocate. Sylvain seems >to have a use case, so great. > >As an alternative, have we considered a mechanism whereby we could, in >appropriate code paths, provide some text that's expressly intended for >the end user to see? Maybe it's a new user_message field on >NovaException which, if present, gets percolated up to a new field >similar to the one you suggested. Would this be like the "user messages" provided by block [1] and file [2] storage components? [1] https://docs.openstack.org/cinder/latest/contributor/user_messages.html [2] https://docs.openstack.org/manila/latest/contributor/user_messages.html -- Tom >efried > >On 11/13/19 11:41 AM, Matt Riedemann wrote: >> On 11/13/2019 11:17 AM, Eric Fried wrote: >>> Unless it's likely to be something other than NoValidHost a significant >>> percentage of the time, IMO it... >> >> Well just taking resize, it could be one of many things: >> >> https://github.com/openstack/nova/blob/20.0.0/nova/conductor/manager.py#L366 >> - oops you tried resizing which would screw up your group affinity policy >> >> https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L4490 >> - (for an admin, cold migrate) oops you tried cold migrating a vcenter >> vm or you have allow_resize_to_same_host=True and the scheduler picks >> the same host (silly scheduler, see bug 1748697) >> >> https://github.com/openstack/nova/blob/20.0.0/nova/compute/claims.py#L113 - >> oops you lost a resource claims race, try again >> >> https://github.com/openstack/nova/blob/20.0.0/nova/scheduler/client/report.py#L1898 >> - oops you lost a race with allocation consumer generation conflicts, >> try again >> > From skaplons at redhat.com Thu Nov 14 14:35:18 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 14 Nov 2019 15:35:18 +0100 Subject: [neutron][ci] Jobs cleaning Message-ID: <20191114143518.cxbjuiismoj5v5af@skaplons-mac> Hi, As we discussed during the PTG, I'm now checking what multinode and singlenode jobs we are exactly running in Neutron CI and what jobs can be potentially removed maybe. Here is what I found +-------------------------------+-----------------------------------------------+-----------------------------------------------------------+ | Singlenode job | Multinode job | Comments | +-------------------------------+-----------------------------------------------+-----------------------------------------------------------+ | neutron-tempest-dvr | neutron-tempest-plugin-dvr-multinode-scenario | Singlenode job runs tempest tests, | | | (non-voting) | multinode job runs tests from neutron-tempest-plugin repo | | | | multinode job isn't stable currently | +-------------------------------+-----------------------------------------------+-----------------------------------------------------------+ | tempest-integrated-networking | tempest-multinode-full-py3 (non-voting) | Singlenode job runs tempest tests related to neutron/nova,| | | | Multinode job runs all tempest tests | | | | multinode job is stable enough to make it voting IMO | +-------------------------------+-----------------------------------------------+-----------------------------------------------------------+ | grenade-py3 | neutron-grenade-multinode | Both jobs runs the same tests and are voting already | +-------------------------------+-----------------------------------------------+-----------------------------------------------------------+ I also found that we have few jobs which we ver similar but the only difference is that one runs tempest tests and other runs tests from neutron tempest plugin. Such jobs are: neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid and neutron-tempest-iptables_hybrid neutron-tempest-plugin-scenario-linuxbridge and neutron-tempest-linuxbridge Do we need all those jobs? Maybe we can simply stay only with neutron-tempest-plugins jobs for those configurations? Or maybe we should "merge" them and run tests from both tempest and neutron-tempest-plugin in one job? -- Slawek Kaplonski Senior software engineer Red Hat From sriram.ec at gmail.com Thu Nov 14 14:36:06 2019 From: sriram.ec at gmail.com (Sriram) Date: Thu, 14 Nov 2019 20:06:06 +0530 Subject: [Neutron] VPNaaS using certs Message-ID: Hi, I would like to know if VPNaaS in openstack provides ipsec with cert based authentication mechanism using certs. Documentation says only psk based authentication is supported. Please advise. Regards, Sriram -------------- next part -------------- An HTML attachment was scrubbed... URL: From akekane at redhat.com Thu Nov 14 14:46:31 2019 From: akekane at redhat.com (Abhishek Kekane) Date: Thu, 14 Nov 2019 20:16:31 +0530 Subject: [stable][glance] Proposal to add Abhishek Kekane to glance-stable-maint In-Reply-To: <23ee160b-ce08-4418-cdc3-756659452ea2@gmail.com> References: <86e16a59-8d38-6951-e779-26d97f4d2eaa@gmail.com> <23ee160b-ce08-4418-cdc3-756659452ea2@gmail.com> Message-ID: Hi Matt, I agree, your concern is valid. I have been working in glance since Icehouse and aware about stable branch guidelines. Given the opportunity, I will try my best to justify my selection. Thanks & Best Regards, Abhishek Kekane On Thu, Nov 14, 2019 at 12:29 AM Matt Riedemann wrote: > On 11/12/2019 2:17 PM, Brian Rosmaita wrote: > > we are currently understaffed in glance-stable-maint. Plus, he's the > > current Glance PTL. > > glance-stable-maint is understaffed yes. I ran a reviewstats report on > glance stable branch reviews over the last 180 days: > > http://paste.openstack.org/show/786058/ > > Abhishek has only done 3 stable branch reviews in 6 months which is > pretty low but to be fair maybe there aren't that many open reviews on > stable branches for glance and the other existing glance-stable-maint > cores don't have a lot more reviews either, so maybe that's just par for > the course. > > As for being core on master or being PTL, as you probably know, that > doesn't really mean much when it comes to stable branch reviews, which > is more about the stable branch guidelines. Nova has a few stable branch > cores that aren't core on master because they adhere to the guidelines > and do a lot of stable branch reviews. > > Anyway, I'm OK trusting Abhishek here and adding him to the > glance-stable-maint team. Things are such these days that beggars can't > really be choosers. > > -- > > Thanks, > > Matt > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Nov 14 14:48:26 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 08:48:26 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: <20191114143038.r3afg4ai6rq65qsr@barron.net> References: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> <9abb3dc9-4b01-90f6-c020-80d4b50d6356@fried.cc> <20191114143038.r3afg4ai6rq65qsr@barron.net> Message-ID: On 11/14/2019 8:30 AM, Tom Barron wrote: > Would this be like the "user messages" provided by block [1] and file > [2] storage components? > > [1] https://docs.openstack.org/cinder/latest/contributor/user_messages.html > [2] https://docs.openstack.org/manila/latest/contributor/user_messages.html The instance actions API in nova is very similar. Rather than build a new "user messages" API in nova I'm just talking about providing more detail on the actual error that occurred per failed event per action, basically the same as the user would see in a fault message on the server when it's in ERROR status. Because right now the instance action and events either say "Success" or "Error" for the message/result which is not useful in the Error case. -- Thanks, Matt From openstack at fried.cc Thu Nov 14 14:56:48 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 14 Nov 2019 08:56:48 -0600 Subject: [nova] Today's meeting Message-ID: <6713ed8b-0f8f-46cb-96a9-a52f0ec2e4a6@fried.cc> Attendance at today's nova meeting was sparse, to say the least. Predictably, some forgot about DST [1], some had conflicts, some are jetlagged, some probably all three. Most hot topics are on ML threads anyway. I and others have updated the meeting agenda [2] with links to those threads. Please be sure to chime in on topics of interest. Thanks, efried [1] DST is stupid and should be abolished [2] https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting From mriedemos at gmail.com Thu Nov 14 15:35:46 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 09:35:46 -0600 Subject: [nova][ptg] Resource provider delete at service delete In-Reply-To: <1573373353.31166.0@est.tech> References: <1573373353.31166.0@est.tech> Message-ID: <628bf2aa-56bc-47fc-81f7-37445a7c1f86@gmail.com> On 11/10/2019 2:09 AM, Balázs Gibizer wrote: > * Check ongoing migration and reject the delete if migration with this > compute having the source node exists. Let operator confirm the > migrations To be clear, the suggestion here is call [1] from the API like around [2]? That's a behavior change but so was blocking the delete when the compute was hosting instances [3] and we added a release note for that. Anyway, that's a pretty simple change and not really something I thought about in earlier threads on this problem. Regarding evacuate migration records that should also work since the final states for an evacuate migration are done, failed or error for which [1] accounts. > * Cascade delete providers and allocations in placement. > * in case of evacuated instances this is the right thing to do OK this seems to confirm my TODO here [4]. > * in any other dangling allocation case nova has the final thrut so > nova > has the authority to delete them. So this would build on the first idea above about blocking the service delete if there are in-progress migrations involving the node (either incoming or outgoing) right? So if we get to the point of deleting the provider we know (1) there are no in-progress migrations and (2) there are no instances on the host (outside of evacuated instances which we can cleanup automatically per [4]). Given that, I'm not sure there is really anything else to do here. > * Document possible ways to reconcile Placement with Nova using > heal_allocations and eventually the audit command once it's merged. Done (merged yesterday) [5]. [1] https://github.com/openstack/nova/blob/20.0.0/nova/objects/migration.py#L240 [2] https://github.com/openstack/nova/blob/20.0.0/nova/api/openstack/compute/services.py#L254 [3] https://review.opendev.org/#/c/560674/ [4] https://review.opendev.org/#/c/678100/2/nova/scheduler/client/report.py at 2165 [5] https://docs.openstack.org/nova/latest/admin/troubleshooting/orphaned-allocations.html -- Thanks, Matt From openstack at nemebean.com Thu Nov 14 15:41:29 2019 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 14 Nov 2019 09:41:29 -0600 Subject: [oslo] Adoption of microversion-parse In-Reply-To: References: <7d239dce-77c5-0e23-e1cb-57785e241b07@openstack.org> <72c50e25-0a26-b3d8-5f3a-64c48570824d@nemebean.com> <30c14806-6c43-a2be-9612-0a54de6c4323@openstack.org> <16d88443-8d02-4c90-b3af-b0b143fb6348@www.fastmail.com> Message-ID: On 11/13/19 6:29 PM, Chris Dent wrote: > On Wed, 13 Nov 2019, Clark Boylan wrote: > >> On Wed, Nov 13, 2019, at 3:56 PM, Ben Nemec wrote: >>> >>> >>> On 10/21/19 9:14 AM, Thierry Carrez wrote: >>>> Thierry Carrez wrote: >>>>> [...] >>>>> I'll propose the project addition so you can all vote directly on >>>>> it :) >>>> >>>> https://review.opendev.org/#/c/689754/ >>>> >>> >>> This has merged, but I still don't have access to the core group for the >>> library. Is this the point where we need to get infra involved or are >>> there other steps needed to make this official first? >>> >>> >> >> Ideally the existing cores would simply add you as the method of >> checks and balances here. Any current member can manage the member >> list as well as a Gerrit admin. Once you've been added by the existing >> core group you'll be able to add any others (like oslo-core). > > I've added oslo-core. I've been somewhat out of touch, so forgot > about this step. Great, thanks! > > (Note, it appears that oslo-core is way out of date...) We've never really removed cores from Oslo. Maybe we should, but I've never run into a compelling reason to. From openstack at nemebean.com Thu Nov 14 15:43:40 2019 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 14 Nov 2019 09:43:40 -0600 Subject: [oslo] Adoption of microversion-parse In-Reply-To: References: <7d239dce-77c5-0e23-e1cb-57785e241b07@openstack.org> <72c50e25-0a26-b3d8-5f3a-64c48570824d@nemebean.com> Message-ID: On 10/21/19 9:08 AM, Eric Fried wrote: >> Makes sense. We probably want to have an independent core team for it in >> addition to oslo-core so we can add people like Chris to it. > > I volunteer to help maintain it, if you'll have me. Works for me. Any objections from the existing core team? From sean.mcginnis at gmx.com Thu Nov 14 15:45:31 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 14 Nov 2019 09:45:31 -0600 Subject: [all] requirements-check failures Message-ID: <20191114154531.GA10859@sm-workstation> Hey everyone, You may have noticed some odd failures with the requirements-check job on your patches lately. The requirements team is aware of this issue and are working to get it resolved ASAP. I believe things should be good again once https://review.opendev.org/694248 lands. So for the time being, please hold off on doing rechecks on these patches. This job should only be running for patches that touch any of the requirements files. As a workaround for now, if your patch can make a change without modifying requirements, that should bypass the need to run this job. Another alternative would be to add: Depends-on: https://review.opendev.org/694248 to your commit message, but hopefully that will not be necessary once this patch makes it through and the test is fixed. Sorry for the inconvenience this has caused. Sean From cdent+os at anticdent.org Thu Nov 14 15:48:31 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Thu, 14 Nov 2019 15:48:31 +0000 (GMT) Subject: [oslo] Adoption of microversion-parse In-Reply-To: References: <7d239dce-77c5-0e23-e1cb-57785e241b07@openstack.org> <72c50e25-0a26-b3d8-5f3a-64c48570824d@nemebean.com> Message-ID: On Thu, 14 Nov 2019, Ben Nemec wrote: > On 10/21/19 9:08 AM, Eric Fried wrote: >>> Makes sense. We probably want to have an independent core team for it in >>> addition to oslo-core so we can add people like Chris to it. >> >> I volunteer to help maintain it, if you'll have me. > > Works for me. Any objections from the existing core team? Works for me too. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From dangtrinhnt at gmail.com Thu Nov 14 15:56:17 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Fri, 15 Nov 2019 00:56:17 +0900 Subject: [all] requirements-check failures In-Reply-To: <20191114154531.GA10859@sm-workstation> References: <20191114154531.GA10859@sm-workstation> Message-ID: Thank Sean for the notification. On Fri, Nov 15, 2019 at 12:49 AM Sean McGinnis wrote: > Hey everyone, > > You may have noticed some odd failures with the requirements-check job on > your > patches lately. The requirements team is aware of this issue and are > working to > get it resolved ASAP. I believe things should be good again once > https://review.opendev.org/694248 lands. > > So for the time being, please hold off on doing rechecks on these patches. > > This job should only be running for patches that touch any of the > requirements > files. As a workaround for now, if your patch can make a change without > modifying requirements, that should bypass the need to run this job. > > Another alternative would be to add: > > Depends-on: https://review.opendev.org/694248 > > to your commit message, but hopefully that will not be necessary once this > patch makes it through and the test is fixed. > > Sorry for the inconvenience this has caused. > > Sean > > -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Nov 14 15:58:59 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 09:58:59 -0600 Subject: [stable][glance] Proposal to add Abhishek Kekane to glance-stable-maint In-Reply-To: References: <86e16a59-8d38-6951-e779-26d97f4d2eaa@gmail.com> <23ee160b-ce08-4418-cdc3-756659452ea2@gmail.com> Message-ID: <205717d9-7e51-8157-a944-58a9d4c4a64d@gmail.com> On 11/14/2019 8:46 AM, Abhishek Kekane wrote: > I have been working in glance since Icehouse and aware about stable > branch guidelines. Given the opportunity, I will try my best to justify > my selection. Sure, I added you to glance-stable-maint yesterday. Enjoy. -- Thanks, Matt From balazs.gibizer at est.tech Thu Nov 14 16:06:03 2019 From: balazs.gibizer at est.tech (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Thu, 14 Nov 2019 16:06:03 +0000 Subject: [nova][ptg] Resource provider delete at service delete In-Reply-To: <628bf2aa-56bc-47fc-81f7-37445a7c1f86@gmail.com> References: <1573373353.31166.0@est.tech> <628bf2aa-56bc-47fc-81f7-37445a7c1f86@gmail.com> Message-ID: <1573747559.19107.0@est.tech> On Thu, Nov 14, 2019 at 09:35, Matt Riedemann wrote: > On 11/10/2019 2:09 AM, Balázs Gibizer wrote: >> * Check ongoing migration and reject the delete if migration with >> this >> compute having the source node exists. Let operator confirm the >> migrations > > To be clear, the suggestion here is call [1] from the API like around > [2]? That's a behavior change but so was blocking the delete when the > compute was hosting instances [3] and we added a release note for > that. Anyway, that's a pretty simple change and not really something > I thought about in earlier threads on this problem. Regarding > evacuate migration records that should also work since the final > states for an evacuate migration are done, failed or error for which > [1] accounts. Yeah, [1] called at [2] sounds good to me. Regarding evacuation records. If the evacuation succeeded, i.e. the migration is in 'done' state then we are OK. But if it is finished with 'error' or 'failed' state then we still have an instance on the host so we should not allow deleting the compute service. As far as I see get_count_by_hosts will cover this case. > >> * Cascade delete providers and allocations in placement. >> * in case of evacuated instances this is the right thing to do > > OK this seems to confirm my TODO here [4]. > >> * in any other dangling allocation case nova has the final thrut >> so >> nova >> has the authority to delete them. > > So this would build on the first idea above about blocking the > service delete if there are in-progress migrations involving the node > (either incoming or outgoing) right? So if we get to the point of > deleting the provider we know (1) there are no in-progress migrations > and (2) there are no instances on the host (outside of evacuated > instances which we can cleanup automatically per [4]). Given that, > I'm not sure there is really anything else to do here. In theory cannot be any other allocation on the compute RP tree if there is no instance on the host, no ongoing migrations involving the host. But still I guess we need to cascade the delete to make sure that orphaned allocations (which is a bug itself but we no that it happens) are cleaned up when the service is deleted. cheers, gibi > >> * Document possible ways to reconcile Placement with Nova using >> heal_allocations and eventually the audit command once it's >> merged. > > Done (merged yesterday) [5]. > > [1] > https://github.com/openstack/nova/blob/20.0.0/nova/objects/migration.py#L240 > [2] > https://github.com/openstack/nova/blob/20.0.0/nova/api/openstack/compute/services.py#L254 > [3] https://review.opendev.org/#/c/560674/ > [4] > https://review.opendev.org/#/c/678100/2/nova/scheduler/client/report.py at 2165 > [5] > https://docs.openstack.org/nova/latest/admin/troubleshooting/orphaned-allocations.html > > -- > > Thanks, > > Matt > From fsbiz at yahoo.com Thu Nov 14 16:09:01 2019 From: fsbiz at yahoo.com (fsbiz at yahoo.com) Date: Thu, 14 Nov 2019 16:09:01 +0000 (UTC) Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> <78766172.92122.1573718625984@mail.yahoo.com> Message-ID: <1952364384.238482.1573747741880@mail.yahoo.com> Hi Chris, Thanks for the response. >Your request, is asking for CUSTOM_RRR430 will a value of 2, but it >is only available as 1. Have a look at your server create request, >there's something, probably your flavor, which is unexpected. The requests coming in are "forced host" requests.  The PaaS layer maintains an inventory of actual bare-metal available nodes and a user has to explicitly selecta baremetal node.  The PaaS layer then makes a nova api call for an instance to be createdon that specific baremetal node.    >Placement and nova scheduler are working correctly with the data they >have, the problem is with how inventory is being reported or requested. >This could either be with how your ironic nodes are being reported, >or with flavors.As far as I can recall, we've started seeing this particular error only recently after we added another 200 nodes to our flat infrastructure.   Thanks,Fred. On Thursday, November 14, 2019, 01:18:40 AM PST, Chris Dent wrote: On Thu, 14 Nov 2019, fsbiz at yahoo.com wrote: > Ultimately, nova-conductor is reported "NoValidHost: No valid host was found. There are not enough hosts available"This has been traced to nova-placement-api "Allocation for CUSTOM_RRR430 on resource provider 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 violates min_unit, max_unit, or step_size. Requested: 2, min_unit: 1, max_unit: 1, step_size: 1" > Any pointers on what next steps I should be looking at ? Your request, is asking for CUSTOM_RRR430 will a value of 2, but it is only available as 1. Have a look at your server create request, there's something, probably your flavor, which is unexpected. Placement and nova scheduler are working correctly with the data they have, the problem is with how inventory is being reported or requested. This could either be with how your ironic nodes are being reported, or with flavors. > 2019-11-12 10:26:02.461 4161129 WARNING nova.objects.resource_provider [req-6d79841e-6abe-490e-b79b-8d88b04215af 1ee9f9bf77294e8e8bf50bb35c581689 acf8cd411e5e4751a61d1ed54e8e874d - default default] Allocation for CUSTOM_Z370_A on resource provider 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 violates min_unit, max_unit, or step_size. Requested: 2, min_unit: 1, max_unit: 1, step_size: 1 This is the same issue, but with a different class of inventory -- Chris Dent                      ٩◔̯◔۶          https://anticdent.org/ freenode: cdent -------------- next part -------------- An HTML attachment was scrubbed... URL: From hberaud at redhat.com Thu Nov 14 16:19:45 2019 From: hberaud at redhat.com (Herve Beraud) Date: Thu, 14 Nov 2019 17:19:45 +0100 Subject: [oslo] Adding Michael Johnson as Taskflow core In-Reply-To: References: <0558ec80-5894-7961-f2ac-3de502f90fe4@nemebean.com> Message-ID: Welcome Michael! Le jeu. 14 nov. 2019 à 02:24, Michael Johnson a écrit : > Thank you Ben, happy to help! > > Michael > > On Wed, Nov 13, 2019 at 8:18 AM Ben Nemec wrote: > > > > Hi, > > > > After discussion with the Oslo team, we (and he) have agreed to add > > Michael as a Taskflow core. He's done more work on the project than > > anyone else still active in Oslo and also works on a project that > > consumes it so he likely understands it better than anyone else at this > > point. > > > > Welcome Michael and thanks for your contributions! > > > > -Ben > > > > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From nate.johnston at redhat.com Thu Nov 14 16:28:07 2019 From: nate.johnston at redhat.com (Nate Johnston) Date: Thu, 14 Nov 2019 11:28:07 -0500 Subject: [neutron] Review priorities In-Reply-To: <20191114093849.nqpjoomcfq6s2dfw@skaplons-mac> References: <20191114093849.nqpjoomcfq6s2dfw@skaplons-mac> Message-ID: <20191114162807.m7e3itidrkofw5xa@firewall> On Thu, Nov 14, 2019 at 10:38:49AM +0100, Slawek Kaplonski wrote: > Hi neutrinos, > > According to our discussion during Train retrospective in Shanghai, I added > "review-priority" label for neutron projects. > It can be set by every core team member to values like: > > -1 - Branch Freeze > +1 - Important Change > +2 - Gate Blocker Fix / Urgent Change > > You can use dashboard like [1] to track such high priority patches and review > them. > I will also add some note about this to our docs this week to make it clear and > visible for everyone. > > [1] https://tinyurl.com/vezk6n6 Thanks, Slawek! I'll add this to my daily routine. Nate > -- > Slawek Kaplonski > Senior software engineer > Red Hat > > From mriedemos at gmail.com Thu Nov 14 17:53:54 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 11:53:54 -0600 Subject: [nova][ptg] Resource provider delete at service delete In-Reply-To: <1573747559.19107.0@est.tech> References: <1573373353.31166.0@est.tech> <628bf2aa-56bc-47fc-81f7-37445a7c1f86@gmail.com> <1573747559.19107.0@est.tech> Message-ID: <49c66574-82c9-9343-47a2-81dde219380a@gmail.com> On 11/14/2019 10:06 AM, Balázs Gibizer wrote: > If the evacuation succeeded, i.e. the migration is in 'done' > state then we are OK. But if it is finished with 'error' or 'failed' > state then we still have an instance on the host so we should not allow > deleting the compute service. As far as I see get_count_by_hosts will > cover this case. If the evacuation succeeded then we need to detect it and cleanup the allocation from the evacuated-from-host because get_count_by_hosts won't catch that case (that's bug 1829479). That's the TODO in my patch. If the evacuation failed, I agree with you that get_count_by_hosts should detect and block the deletion of the service since the instance is still hosted there in the DB. -- Thanks, Matt From mriedemos at gmail.com Thu Nov 14 18:01:08 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 12:01:08 -0600 Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: <1952364384.238482.1573747741880@mail.yahoo.com> References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> <78766172.92122.1573718625984@mail.yahoo.com> <1952364384.238482.1573747741880@mail.yahoo.com> Message-ID: <7d53de2f-46de-edcf-63dc-fe7ba8b61f83@gmail.com> On 11/14/2019 10:09 AM, fsbiz at yahoo.com wrote: > The requests coming in are "forced host" requests.  The PaaS layer > maintains > an inventory of actual bare-metal available nodes and a user has to > explicitly select > a baremetal node.  The PaaS layer then makes a nova api call for an > instance to be created > on that specific baremetal node. To be clear, by forced host you mean creating the server with an availability zone in the format ZONE:HOST:NODE or ZONE:NODE where NODE is the ironic node UUID, correct? https://docs.openstack.org/nova/latest/admin/availability-zones.html#using-availability-zones-to-select-hosts Yeah that's a problem because then the scheduler filters aren't run. A potential alternative is to create the server using a hypervisor_hostname query hint that will run through the JsonFilter: https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#jsonfilter Then at least you're not forcing the node and run the scheduler filters. I forget exactly how the scheduler code works in Queens with respect to forced hosts/nodes on server create but the scheduler still has to allocate resources in placement. It looks like we work around that in Queens by disabling the limit we place on getting allocation candidates from placement: https://review.opendev.org/#/c/584616/ My guess is your PaaS layer has bugs in it since it's allowing users to select hosts that are already consumed, or it's just racy. Anyway, this is why nova uses placement since Pike for atomic consumption of resources during scheduling. -- Thanks, Matt From miguel at mlavalle.com Thu Nov 14 18:23:03 2019 From: miguel at mlavalle.com (Miguel Lavalle) Date: Thu, 14 Nov 2019 12:23:03 -0600 Subject: [neutron] Review priorities In-Reply-To: <20191114162807.m7e3itidrkofw5xa@firewall> References: <20191114093849.nqpjoomcfq6s2dfw@skaplons-mac> <20191114162807.m7e3itidrkofw5xa@firewall> Message-ID: Yeah, looking good. I just bookmarked it so it becomes part of my daily routine Thanks On Thu, Nov 14, 2019 at 10:28 AM Nate Johnston wrote: > On Thu, Nov 14, 2019 at 10:38:49AM +0100, Slawek Kaplonski wrote: > > Hi neutrinos, > > > > According to our discussion during Train retrospective in Shanghai, I > added > > "review-priority" label for neutron projects. > > It can be set by every core team member to values like: > > > > -1 - Branch Freeze > > +1 - Important Change > > +2 - Gate Blocker Fix / Urgent Change > > > > You can use dashboard like [1] to track such high priority patches and > review > > them. > > I will also add some note about this to our docs this week to make it > clear and > > visible for everyone. > > > > [1] https://tinyurl.com/vezk6n6 > > Thanks, Slawek! I'll add this to my daily routine. > > Nate > > > -- > > Slawek Kaplonski > > Senior software engineer > > Red Hat > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel at mlavalle.com Thu Nov 14 18:26:54 2019 From: miguel at mlavalle.com (Miguel Lavalle) Date: Thu, 14 Nov 2019 12:26:54 -0600 Subject: [ptg][neutron] Ussuri PTG summary In-Reply-To: References: <20191112135311.b33aq6ngyajdtvjq@skaplons-mac> <2ce84a47ac59bdd160a71b37eaf05f0eca9e1f85.camel@redhat.com> Message-ID: Hi Belmiro, The Neutron team is fully cognizant that we have operators large and small using Linuxbridge. No decision will be made without involving you Regards On Thu, Nov 14, 2019 at 3:59 AM Belmiro Moreira < moreira.belmiro.email.lists at gmail.com> wrote: > Hi, > Akihiro, thanks for you summary. > > We use the linuxbridge driver because its simplicity and the match with > the old nova-network schema (yes, are we still migrating). > > The functionality gap between ovs driver and linuxbridge is a good think > in my view. > It allows operators to chose the best solution considering their > deployment use case and scale. > > Slawek, Miguel please keep us in the discussions. > > Belmiro > CERN > > > On Wed, Nov 13, 2019 at 7:22 PM Sean Mooney wrote: > >> On Tue, 2019-11-12 at 14:53 +0100, Slawek Kaplonski wrote: >> > Stateless security groups >> > ========================= >> > >> > Old RFE [21] was approved for neutron-fwaas project but we all agreed >> that this >> > should be now implemented for security groups in core Neutron. >> > People from Nuage are interested in work on this in upstream. >> > We should probably also explore how easy/hard it will be to implement >> it in >> > networking-ovn backend. >> >> for what its worth we implemented this 4 years ago and it was breifly >> used in production trial deployment >> in a telco deployment but i dont think it ever went to full production as >> they went wtih sriov instead >> https://review.opendev.org/#/c/264131/ as part of this RFE >> https://bugs.launchpad.net/neutron/+bug/1531205 which was >> closed as wont fix >> https://bugs.launchpad.net/neutron/+bug/1531205/comments/14 >> as it was view that this was not the correct long term direction for the >> community. >> this is the summit presentation for austin for anyone that does not >> rememebr this effort >> >> >> https://www.openstack.org/videos/summits/austin-2016/tired-of-iptables-based-security-groups-heres-how-to-gain-tremendous-speed-with-open-vswitch-instead >> >> im not sure how the new proposal differeres form our previous proposal >> for the same >> feautre but the main pushback we got was that the securtiy group api is >> assumed to be stateful >> and that is why this was rejected. form our mesurments at the time we >> expected the stateless approch >> to scale better then contrack driver so it woudl be nice to see a >> stateless approch avialable. >> i never got around to deleteing our implemenation form >> networking-ovs-dpdk >> >> https://opendev.org/x/networking-ovs-dpdk/src/branch/master/networking_ovs_dpdk/agent/ovs_dpdk_firewall.py >> but i has not been tested our updated really for the last 2 years but it >> could be used as a basis of this effort >> if nuage does not have a poc already. >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Thu Nov 14 18:52:52 2019 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 14 Nov 2019 12:52:52 -0600 Subject: [oslo] Shanghai Wrapup Message-ID: <22e64fec-f998-c2b0-aa66-f94a070727d2@nemebean.com> I wrote up a bunch of thoughts about Oslo stuff in Shanghai: http://blog.nemebean.com/content/oslo-shanghai Hopefully I covered everything (and accurately) but if I messed anything up I blame jet lag. :-P -Ben From openstack at fried.cc Thu Nov 14 19:16:32 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 14 Nov 2019 13:16:32 -0600 Subject: [nova][oot] Adding `context` to ComputeDriver.unrescue Message-ID: <6cf09602-9e08-27c8-6ea4-d5a7c9f07aa4@fried.cc> Though still very much WIP, emulated TPM [1] is looking like it will need to add the RequestContext to the ``unrescue`` ComputeDriver method [2]. This is a heads up to out-of-tree virt driver maintainers to keep an eye on this patch, as you will need to update your overrides accordingly once it merges. Thanks, efried [1] https://review.opendev.org/#/c/631363/ [2] https://review.opendev.org/#/c/631363/31/nova/virt/driver.py From whayutin at redhat.com Thu Nov 14 19:16:47 2019 From: whayutin at redhat.com (Wesley Hayutin) Date: Thu, 14 Nov 2019 12:16:47 -0700 Subject: [tripleo] Adding Alex Schultz as OVB core In-Reply-To: <7562aee5-1ea2-2d8f-ebb5-9fa02d9dc354@nemebean.com> References: <7562aee5-1ea2-2d8f-ebb5-9fa02d9dc354@nemebean.com> Message-ID: On Wed, Nov 13, 2019 at 9:23 AM Ben Nemec wrote: > Hi, > > After a discussion with Wes in Shanghai about how to make me less of a > SPOF for OVB, one of the outcomes was that we should try to grow the OVB > core team. Alex has been reviewing a lot of the patches to OVB lately > and obviously has a good handle on how all of this stuff fits together, > so I've added him to the OVB core team. > > Thanks and congratulations(?) Alex! :-) > > -Ben > > +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Thu Nov 14 19:20:07 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Thu, 14 Nov 2019 11:20:07 -0800 Subject: [nova][ironic] nova docs bug for ironic looking for an owner In-Reply-To: <453e2ccb-ef4f-0e5b-aa15-cacf0ca104e8@gmail.com> References: <453e2ccb-ef4f-0e5b-aa15-cacf0ca104e8@gmail.com> Message-ID: Hey Matt, I've gone ahead and added this to the ironic team's meeting agenda for next week. Thanks for bringing this up! -Julia On Wed, Nov 13, 2019 at 7:33 AM Matt Riedemann wrote: > > While discussing some tribal knowledge about how ironic is the black > sheep of nova compute drivers I realized that we (nova) have no docs > about the ironic driver like we do for other drivers, so we don't > mention anything about the weird cardinality rules around compute > service : node : instance and host vs nodename things, how to configure > the service for HA mode, how to configure baremetal flavors with custom > resource classes, how to partition for conductor groups, how to deal > with scaling issues, missing features (migrate), etc. I've opened a bug > in case someone wants to get started on some of that information: > > https://bugs.launchpad.net/nova/+bug/1852446 > > -- > > Thanks, > > Matt > From mriedemos at gmail.com Thu Nov 14 19:31:03 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 13:31:03 -0600 Subject: [nova][ptg] Resource provider delete at service delete In-Reply-To: <1573747559.19107.0@est.tech> References: <1573373353.31166.0@est.tech> <628bf2aa-56bc-47fc-81f7-37445a7c1f86@gmail.com> <1573747559.19107.0@est.tech> Message-ID: <5bf5e113-770d-5bb7-af62-c34e45c4f981@gmail.com> On 11/14/2019 10:06 AM, Balázs Gibizer wrote: > Yeah, [1] called at [2] sounds good to me. Done with functional recreate test patches underneath: https://review.opendev.org/#/c/694389/ -- Thanks, Matt From Albert.Braden at synopsys.com Thu Nov 14 21:44:11 2019 From: Albert.Braden at synopsys.com (Albert Braden) Date: Thu, 14 Nov 2019 21:44:11 +0000 Subject: Order filters by cost Message-ID: I'm working on a list of filters ordered by cost. This is what I have so far. Does this look reasonably correct for Rocky? Cheap Filters: AllHostsFilter - does no filtering. It passes all the available hosts. AvailabilityZoneFilter - filters hosts by availability zone. AggregateInstanceExtraSpecsFilter - checks aggregate metadata set with aggregate_instance_extra_specs. All hosts are passed if no extra_specs are specified. AggregateCoreFilter - filters hosts by CPU core number with per-aggregate cpu_allocation_ratio setting. AggregateRamFilter - filters hosts by RAM with per-aggregate ram_allocation_ratio setting. AggregateDiskFilter - filters hosts by disk allocation with per-aggregate disk_allocation_ratio setting. AggregateNumInstancesFilter - filters hosts by number of instances with per-aggregate max_instances_per_host setting. AggregateIoOpsFilter - filters hosts by I/O operations with per-aggregate max_io_ops_per_host setting. AggregateMultiTenancyIsolation - isolate tenants in specific aggregates. AggregateTypeAffinityFilter - limits instance_type by aggregate. AggregateImagePropertiesIsolation - isolates hosts based on image properties and aggregate metadata. DifferentHostFilter - allows the instance on a different host from a set of instances. SameHostFilter - puts the instance on the same host as another instance in a set of instances. ComputeFilter - passes all hosts that are operational and enabled. NumInstancesFilter - filters compute nodes by number of running instances. IoOpsFilter - filters hosts by concurrent I/O operations. More Expensive Filters: ServerGroupAntiAffinityFilter - This filter implements anti-affinity for a server group. ServerGroupAffinityFilter - This filter works the same way as ServerGroupAntiAffinityFilter. The difference is that when you create the server group, you should specify a policy of 'affinity'. ImagePropertiesFilter - filters hosts based on properties defined on the instance's image. Doc on setting image properties is here: https://docs.openstack.org/glance/rocky/admin/useful-image-properties.html IsolatedHostsFilter - filter based on isolated_images, isolated_hosts and restrict_isolated_hosts_to_isolated_images flags. SimpleCIDRAffinityFilter - allows a new instance on a host within the same IP block. MetricsFilter - filters hosts based on metrics weight_setting. Most Expensive Filters: PciPassthroughFilter - Filter that schedules instances on a host if the host has devices to meet the device requests in the 'extra_specs' for the flavor. ComputeCapabilitiesFilter - checks that the capabilities provided by the host compute service satisfy any extra specifications associated with the instance type. NUMATopologyFilter - filters hosts based on the NUMA topology requested by the instance, if any. JsonFilter - allows simple JSON-based grammar for selecting hosts. Deprecated Filters: Please don't use any of these; they are obsolete in Rocky: RetryFilter - filters hosts that have already been attempted for scheduling. Obsolete since Queens. RamFilter - filters hosts by their RAM. Obsolete since Pike. CoreFilter - filters based on CPU core utilization. Obsolete since Pike. DiskFilter - filters hosts by their disk allocation. Obsolete since Pike. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Thu Nov 14 22:10:51 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 14 Nov 2019 16:10:51 -0600 Subject: [all] requirements-check failures In-Reply-To: <20191114154531.GA10859@sm-workstation> References: <20191114154531.GA10859@sm-workstation> Message-ID: <20191114221051.GA30152@sm-workstation> On Thu, Nov 14, 2019 at 09:45:35AM -0600, Sean McGinnis wrote: > Hey everyone, > > You may have noticed some odd failures with the requirements-check job on your > patches lately. The requirements team is aware of this issue and are working to > get it resolved ASAP. I believe things should be good again once > https://review.opendev.org/694248 lands. > This requirements job fix has landed, and I've seen at least one rechecked patch successfully pass already. Things should be all clear. From smooney at redhat.com Thu Nov 14 22:44:46 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 14 Nov 2019 22:44:46 +0000 Subject: Order filters by cost In-Reply-To: References: Message-ID: <96314f2dfb4198447b7cf7833ae08a0cbb2fa33c.camel@redhat.com> On Thu, 2019-11-14 at 21:44 +0000, Albert Braden wrote: > I'm working on a list of filters ordered by cost. This is what I have so far. Does this look reasonably correct for > Rocky? some comments in line but more or less yes. also the best thing you can do is disable filters you dont need. > > Cheap Filters: > > AllHostsFilter - does no filtering. It passes all the available hosts. > AvailabilityZoneFilter - filters hosts by availability zone. > AggregateInstanceExtraSpecsFilter - checks aggregate metadata set with aggregate_instance_extra_specs. All hosts are > passed if no extra_specs are specified. the AggregateInstanceExtraSpecsFilter is actually expensive in some cases as it has to get the aggreate metadata for each aggreate the hosts is a member of and then compare the flavor extra specs to the metadata specifid in those aggrates so this get expssive as the numaber of aggreates a host is a member of grows. the worst case sclaing for this in NxM where N is the number of hosts an M is the maxium number of aggates a host is a part of. in other words this scales in quadratic complexity. fortunately that is the worst case but its generally liniar. Thinking about it a little more the aggreate filters below also have the same upper bound but in generaly most hosts are in 1 aggreated so it remains liniar. > AggregateCoreFilter - filters hosts by CPU core number with per-aggregate cpu_allocation_ratio setting. > AggregateRamFilter - filters hosts by RAM with per-aggregate ram_allocation_ratio setting. > AggregateDiskFilter - filters hosts by disk allocation with per-aggregate disk_allocation_ratio setting. strictly speaking in rocky the 3 filter above are not deprecated yet but they are in train. the reason they werent deprecacated sooner was we forgot when we deprecated the non aggregate versions. so you shoudl ideally avoid useign those. > AggregateNumInstancesFilter - filters hosts by number of instances with per-aggregate max_instances_per_host setting. this is pretty cheap i hope to eventurally replace this with placment eventurally. > AggregateIoOpsFilter - filters hosts by I/O operations with per-aggregate max_io_ops_per_host setting. > AggregateMultiTenancyIsolation - isolate tenants in specific aggregates. > AggregateTypeAffinityFilter - limits instance_type by aggregate. > AggregateImagePropertiesIsolation - isolates hosts based on image properties and aggregate metadata. this is more or less the same as the AggregateInstanceExtraSpecsFilter in terms of cost > DifferentHostFilter - allows the instance on a different host from a set of instances. > SameHostFilter - puts the instance on the same host as another instance in a set of instances. > ComputeFilter - passes all hosts that are operational and enabled. > NumInstancesFilter - filters compute nodes by number of running instances. > IoOpsFilter - filters hosts by concurrent I/O operations. > > More Expensive Filters: > > ServerGroupAntiAffinityFilter - This filter implements anti-affinity for a server group. > ServerGroupAffinityFilter - This filter works the same way as ServerGroupAntiAffinityFilter. The difference is that > when you create the server group, you should specify a policy of 'affinity'. > ImagePropertiesFilter - filters hosts based on properties defined on the instance's image. Doc on setting image > properties is here: https://docs.openstack.org/glance/rocky/admin/useful-image-properties.html > IsolatedHostsFilter - filter based on isolated_images, isolated_hosts and restrict_isolated_hosts_to_isolated_images > flags. > SimpleCIDRAffinityFilter - allows a new instance on a host within the same IP block. > MetricsFilter - filters hosts based on metrics weight_setting. > > Most Expensive Filters: > my gut feeling is both the AggregateInstanceExtraSpecsFilter and AggregateImagePropertiesIsolation would be here for non simple cases but if you keep things pretty lininar it would be in the group above because it does are resonably number of comparisons and may be non liniar. > PciPassthroughFilter - Filter that schedules instances on a host if the host has devices to meet the device requests > in the 'extra_specs' for the flavor. > ComputeCapabilitiesFilter - checks that the capabilities provided by the host compute service satisfy any extra > specifications associated with the instance type. > NUMATopologyFilter - filters hosts based on the NUMA topology requested by the instance, if any. > JsonFilter - allows simple JSON-based grammar for selecting hosts. > > Deprecated Filters: > > Please don't use any of these; they are obsolete in Rocky: > > RetryFilter - filters hosts that have already been attempted for scheduling. Obsolete since Queens. > RamFilter - filters hosts by their RAM. Obsolete since Pike. > CoreFilter - filters based on CPU core utilization. Obsolete since Pike. > DiskFilter - filters hosts by their disk allocation. Obsolete since Pike. From mriedemos at gmail.com Fri Nov 15 00:45:20 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 14 Nov 2019 18:45:20 -0600 Subject: [nova] Thoughts on exposing exception type to non-admins in instance action event In-Reply-To: References: <9a412f1c-1aed-58b6-9a05-b74910b39ea8@gmail.com> <9abb3dc9-4b01-90f6-c020-80d4b50d6356@fried.cc> Message-ID: <62fd76e1-a3ae-7b62-ebba-824f667b3095@gmail.com> On 11/14/2019 7:58 AM, Matt Riedemann wrote: > If I get the time this week I'll WIP something together that does what > I'm thinking as a proof of concept Here is a simple PoC: https://review.opendev.org/#/q/topic:bp/action-event-fault-details The API change with a new microversion (sans API samples) is actually smaller than the object code change to store the fault message. Anyway, this gives an idea and it was pretty simple to write up. -- Thanks, Matt From missile0407 at gmail.com Fri Nov 15 03:26:47 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Fri, 15 Nov 2019 11:26:47 +0800 Subject: [kolla] Shutdown ordering of MariaDB containers? Message-ID: Hi everyone, I want to ask about the order of shutdown MariaDB (or mean controller) node. For previous steps we found is usually shutdown slaves first, then master [1]. But we found that the MariaDB still get container restarting issue even I followed the step after booting up the cluster. Below is that I did when shutdown/boot up controller. 1. Shutdown the slaves first, then master 2. Boot master first, then slaves. For looking which one is master, we usually looking for the haproxy log and find which mariadb node that the last session access the DB. Or looking for which mariadb container has "--wsrep-new-cluster" in BOOTSTRAP_ARGS. Does anyone has experience about this? Many thanks, Eddie. [1] https://bugs.launchpad.net/kolla-ansible/+bug/1712087 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangbailin at inspur.com Fri Nov 15 06:58:03 2019 From: zhangbailin at inspur.com (=?utf-8?B?QnJpbiBaaGFuZyjlvKDnmb7mnpcp?=) Date: Fri, 15 Nov 2019 06:58:03 +0000 Subject: [nova][ptg] Flavor explosion In-Reply-To: <1573720630.26082.5@est.tech> References: <8a1e435702fb7dfe572bd59d2d652320@sslemail.net> <1573720630.26082.5@est.tech> Message-ID: Hi all, The patch link is https://review.opendev.org/#/c/663563 Rename the bp name from "add-flavor-metadata-or-metadata-group" to "resources-metadata-of-instance", because it not only can compose the extra specs from the *flavor* (current status), and it can be compose the vcpu, ram and disk, I think call this is resource metadata is ok, if you have some suggestion please leave a comment. About the model design, there will be add two DB table in the nova api DB: a) Add "resources_metadata" to record the composable bits, as following fields: - id(int),create_at,updated_at,deleted_at, name, rules, description and deleted fields Saved format like this: { "cpu_pinning": { "hw:cpu_policy": "shared", "hw:cpu_thread_policy": "require" } } If there is one spec that you need, you can set it in the rules as {"key": value}, it means like this: { "mem_huge_page": { "hw:mem_page_size": "1GB" } } b) Add "resources_metadata_mapping" to record the composable bits used by which instance, as following fields: - created_at, updated_at,deleted_at, id(int), resources_md_id, instance_uuid and deleted fields. With b, we have another alternative way, it was wrote in the "Alternatives" in the SPEC, it means add a column to the ``instance_medata`` table, but, this way we should separate the rule in the "resources_metadata" to one by one to save. This way will change the existing data table structure, I am not sure if this will affect some of the features of the instance. (more details you can review this SPEC) We can get all the metadata used by an instance (instance_uuid) through the "resources_metadata_mapping" table easily. > Items: Re: [nova][ptg] Flavor explosion > > > > On Sun, Nov 10, 2019 at 16:09, Brin Zhang(张百林) > wrote: > > Hi all, > > Based on the discussion on the Train PTG, and reference to the > > records on the etherpad and ML, I was updated that SPEC, and I think > > there are some details need to be discussed, and I have listed some > > details, if there are any other things that I have not considered, or > > if some place that I thoughtless, please post a discussion. > > > > List some details as follows, and you can review that spec in > > https://review.opendev.org/#/c/663563. > > > > Listed details: > > - Don't change the model of the flavor in nova code and in the db. > > > > - No change for operators who choose not to request the flavor extra > > specs group. > > > > - Requested more than one flavor extra specs groups, if there are > > different values for the same spec will be raised a 409. > > > > - Flavor in request body of server create that has the same spec in > > the request ``flavor_extra_specs_group``, it will be raised a 409. > > > > - When resize an instance, you need to compare the > > ``flavor_extra_specs_group`` with the spec request spec, otherwise > > raise a 400. > > > > Thanks Brin for updating the spec, I did a review round on it and left comments. > > gibi > From hberaud at redhat.com Fri Nov 15 08:16:28 2019 From: hberaud at redhat.com (Herve Beraud) Date: Fri, 15 Nov 2019 09:16:28 +0100 Subject: [oslo] Shanghai Wrapup In-Reply-To: <22e64fec-f998-c2b0-aa66-f94a070727d2@nemebean.com> References: <22e64fec-f998-c2b0-aa66-f94a070727d2@nemebean.com> Message-ID: Thanks Ben Le jeu. 14 nov. 2019 à 19:55, Ben Nemec a écrit : > I wrote up a bunch of thoughts about Oslo stuff in Shanghai: > http://blog.nemebean.com/content/oslo-shanghai > > Hopefully I covered everything (and accurately) but if I messed anything > up I blame jet lag. :-P > > -Ben > > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From doka.ua at gmx.com Fri Nov 15 08:54:48 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Fri, 15 Nov 2019 10:54:48 +0200 Subject: [Neutron] OVS forwarding issues In-Reply-To: References: Message-ID: <39de8fd7-a57f-5b70-4f7a-2934bbe6b7cc@gmx.com> Hi colleagues, thanks for the pointing on this. Can anybody _assume_ whether this bug affects also ML2/OVN implementation of networking? I was looking into OVN sometimes ago, but due to lack of resources skipped this research, now I think it makes sense to return back to this question. Thank you. On 11.11.2019 19:38, James Denton wrote: > > Hi, > > This is a known issue with the openvswitch firewall[1]. > > > firewall_driver = openvswitch > > I recommend running iptables_hybrid until that is resolved. > > [1] https://bugs.launchpad.net/neutron/+bug/1732067 > > > James Denton > > Network Engineer > > Rackspace Private Cloud > > james.denton at rackspace.com > > *From: *Volodymyr Litovka > *Date: *Monday, November 11, 2019 at 12:10 PM > *To: *"openstack-discuss at lists.openstack.org" > > *Cc: *"doka.ua at gmx.com" > *Subject: *[Neutron] OVS forwarding issues > > *CAUTION:*This message originated externally, please use caution when > clicking on links or opening attachments! > > Dear colleagues, > > just faced an issue with Openvswitch, which looks strange for me. The > problem is that any particular VM receives a lot of packets, which are > unicasted: > - from other VMs which reside on the same host (let's name them "local > VMs") > - to other VMs which reside on other hosts (let's name them "remote VMs") > > Long output from "ovs-ofctl dump-flows br-int" which, as far as I can > narrow, ends there: > > # ovs-ofctl dump-flows br-int |grep " table=94," |egrep > "n_packets=[123456789]" >  cookie=0xaf6b1435fe826bdf, duration=2952350.695s, table=94, > n_packets=291494723, n_bytes=40582103074, idle_age=0, hard_age=65534, > priority=1 actions=NORMAL > > coming to normal processing (classic MAC learning). Looking into > br-int MAC-table (ovs-appctl fdb/show br-int) shows, that there are > really no MAC addresses of remote VMs and br-int behaves in the right > way, flooding unknown unicast to all ports in this L2 segment. > > Of course, there is br-tun which connected over vxlan to all other > hosts and to br-int: > >     Bridge br-tun >         Controller "tcp:127.0.0.1:6633" >             is_connected: true >         fail_mode: secure >         Port "vxlan-0a960008" >             Interface "vxlan-0a960008" >                 type: vxlan >                 options: {df_default="true", in_key=flow, > local_ip="10.150.0.5", out_key=flow, remote_ip="10.150.0.8"} >         [ ... ] >         Port br-tun >             Interface br-tun >                 type: internal >         Port patch-int >             Interface patch-int >                 type: patch >                 options: {peer=patch-tun} > > but MAC table on br-tun is empty as well: > > # ovs-appctl fdb/show br-tun >  port  VLAN  MAC                Age > # > > Finally, packets get to destination, while being copied to all ports > on source host, which is serious security issue. > > I do not think so conceived by design, I rather think we missed > something in configuration. Can anybody point me where we're wrong and > help with this issue? > > We're using Openstack Rocky and OVS 2.10.0 under Ubuntu 16.04. Network > configuration is: > > @controller: > # cat /etc/neutron/plugins/ml2/ml2_conf.ini |egrep -v "^$|^#" > [DEFAULT] > verbose = true > [ml2] > type_drivers = flat,vxlan > tenant_network_types = vxlan > mechanism_drivers = l2population,openvswitch > extension_drivers = port_security,qos,dns_domain_ports > [ml2_type_flat] > flat_networks = provider > [ml2_type_geneve] > [ml2_type_gre] > [ml2_type_vlan] > [ml2_type_vxlan] > vni_ranges = 400:400000 > [securitygroup] > firewall_driver = openvswitch > enable_security_group = true > enable_ipset = true > > @agent: > # cat /etc/neutron/plugins/ml2/openvswitch_agent.ini |egrep -v "^$|^#" > [DEFAULT] > verbose = true > [agent] > tunnel_types = vxlan > l2_population = true > arp_responder = true > extensions = qos > [ovs] > local_ip = 10.150.0.5 > bridge_mappings = provider:br-ex > [securitygroup] > firewall_driver = openvswitch > enable_security_group = true > enable_ipset = true > [xenapi] > > Thank you. > > > -- > Volodymyr Litovka >   "Vision without Execution is Hallucination." -- Thomas Edison -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Fri Nov 15 08:56:05 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 15 Nov 2019 09:56:05 +0100 Subject: [neutron] Removing neutron-interconnection out of stadium Message-ID: <20191115085605.zj35uembs2gaql4v@skaplons-mac> Hi neutrinos, As we discussed during the Shanghai PTG I just proposed changes to move neutron-interconnection project out of stadium to "x/" namespace. So it will not be official neutron project anymore after those changes. Patches for that are in [1] and [2] I also proposed to remove neutron-interconnection api-ref from neutron-lib. Patch is here [3]. Please review it :) [1] https://review.opendev.org/#/c/694478/ [2] https://review.opendev.org/#/c/694480/ [3] https://review.opendev.org/#/c/694466/ -- Slawek Kaplonski Senior software engineer Red Hat From merlin.blom at bertelsmann.de Fri Nov 15 09:13:20 2019 From: merlin.blom at bertelsmann.de (Blom, Merlin, NMU-OI) Date: Fri, 15 Nov 2019 09:13:20 +0000 Subject: [RabbitMQ][cinder] Listen to messages Message-ID: Hey there, it seems to me as if ask.openstack.org is down, so I ask my question here: I'd like to listen to oslo messages from cinder as I do for neutron and octavia to know what is going on. For me the following code worked for neutron: EXCHANGE_NAME = os.getenv('EXCHANGE_NAME', 'neutron') ROUTING_KEY = os.getenv('ROUTING_KEY', 'notifications.info') QUEUE_NAME = os.getenv('QUEUE_NAME', 'messaging_queue') BROKER_URI = os.getenv('BROKER_URI', 'UNDEFINED') BROKER_PASSWORD = os.getenv('BROKER_PASSWORD', '') class Messages(ConsumerMixin): def __init__(self, connection): self.connection = connection return def get_consumers(self, consumer, channel): exchange = Exchange(EXCHANGE_NAME, type="topic", durable=False) queue = Queue(QUEUE_NAME, exchange, routing_key=ROUTING_KEY, durable=False, auto_delete=True, no_ack=True) return [consumer(queues=[queue], callbacks=[self.on_message])] def on_message(self, body, message): try: print(message) except Exception as e: log.info(repr(e)) if __name__ == "__main__": log.info("Connecting to broker {}".format(BROKER_URI)) with BrokerConnection(hostname=BROKER_URI, userid='messaging', password=BROKER_PASSWORD, virtual_host='/'+EXCHANGE_NAME, heartbeat=4, failover_strategy='round-robin') as connection: Messaging(connection).run() BrokerConnection.connection.close() But on the cinder vhost (/cinder) I can't find an exchange that the code is working on. (cinder, cinder-backup, .) I tried using the rabbitmq tracer: https://www.rabbitmq.com/firehose.html And got all the cinder messages but I don't want to use it in production because of performance issues. Does anyone have an idea how to find the correct exchange for the notification info queue in cinder? Cheers, Merlin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5195 bytes Desc: not available URL: From hberaud at redhat.com Fri Nov 15 10:13:42 2019 From: hberaud at redhat.com (Herve Beraud) Date: Fri, 15 Nov 2019 11:13:42 +0100 Subject: [RabbitMQ][cinder] Listen to messages In-Reply-To: References: Message-ID: Le ven. 15 nov. 2019 à 10:17, Blom, Merlin, NMU-OI < merlin.blom at bertelsmann.de> a écrit : > Hey there, > > it seems to me as if ask.openstack.org is down, so I ask my question here: > > > > I’d like to listen to oslo messages from cinder as I do for neutron and > octavia to know what is going on. > > For me the following code worked for neutron: > > > > EXCHANGE_NAME = os.getenv('EXCHANGE_NAME', 'neutron') > > ROUTING_KEY = os.getenv('ROUTING_KEY', 'notifications.info') > > QUEUE_NAME = os.getenv('QUEUE_NAME', 'messaging_queue') > > BROKER_URI = os.getenv('BROKER_URI', 'UNDEFINED') > > BROKER_PASSWORD = os.getenv('BROKER_PASSWORD', '') > > > > class Messages(ConsumerMixin): > > def __init__(self, connection): > > self.connection = connection > > return > > > > def get_consumers(self, consumer, channel): > > exchange = Exchange(EXCHANGE_NAME, type="topic", durable=False) > > queue = Queue(QUEUE_NAME, exchange, routing_key=ROUTING_KEY, > durable=False, auto_delete=True, no_ack=True) > > return [consumer(queues=[queue], callbacks=[self.on_message])] > > > > def on_message(self, body, message): > > try: > > print(message) > > except Exception as e: > > log.info(repr(e)) > > > > if __name__ == "__main__": > > log.info("Connecting to broker {}".format(BROKER_URI)) > > with BrokerConnection(hostname=BROKER_URI, userid='messaging', > password=BROKER_PASSWORD, > > virtual_host='/'+EXCHANGE_NAME, > > heartbeat=4, failover_strategy='round-robin') as > connection: > > Messaging(connection).run() > > BrokerConnection.connection.close() > > > > But on the cinder vhost (/cinder) > Are you sure cinder use a dedicated vhost? I'm notconviced, if I'm right they all use the default vhost '/'. I can’t find an exchange that the code is working on. (cinder, > cinder-backup, …) > > I tried using the rabbitmq tracer: https://www.rabbitmq.com/firehose.html > > And got all the cinder messages but I don’t want to use it in production > because of performance issues. > > > > Does anyone have an idea how to find the correct exchange for the > notification info queue in cinder? > > > > Cheers, > > Merlin > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Fri Nov 15 12:07:43 2019 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 15 Nov 2019 12:07:43 +0000 Subject: [kolla] Shutdown ordering of MariaDB containers? In-Reply-To: References: Message-ID: On Fri, 15 Nov 2019 at 03:28, Eddie Yen wrote: > > Hi everyone, > I want to ask about the order of shutdown MariaDB (or mean controller) node. > > For previous steps we found is usually shutdown slaves first, then master [1]. > But we found that the MariaDB still get container restarting issue even I followed the step after booting up the cluster. > > Below is that I did when shutdown/boot up controller. > 1. Shutdown the slaves first, then master > 2. Boot master first, then slaves. > > For looking which one is master, we usually looking for the haproxy log and find which mariadb node that the last session access the DB. > Or looking for which mariadb container has "--wsrep-new-cluster" in BOOTSTRAP_ARGS. > > Does anyone has experience about this? Hi Eddie, You can use the kolla-ansible mariadb_recovery command to bootstrap a cluster where all nodes have gone down. Mark > > Many thanks, > Eddie. > > [1] https://bugs.launchpad.net/kolla-ansible/+bug/1712087 From aj at suse.com Fri Nov 15 12:28:17 2019 From: aj at suse.com (Andreas Jaeger) Date: Fri, 15 Nov 2019 13:28:17 +0100 Subject: [neutron] Removing neutron-interconnection out of stadium In-Reply-To: <20191115085605.zj35uembs2gaql4v@skaplons-mac> References: <20191115085605.zj35uembs2gaql4v@skaplons-mac> Message-ID: On 15/11/2019 09.56, Slawek Kaplonski wrote: > Hi neutrinos, > > As we discussed during the Shanghai PTG I just proposed changes to move > neutron-interconnection project out of stadium to "x/" namespace. > So it will not be official neutron project anymore after those changes. > Patches for that are in [1] and [2] > > I also proposed to remove neutron-interconnection api-ref from neutron-lib. > Patch is here [3]. Please review it :) > > [1] https://review.opendev.org/#/c/694478/ > [2] https://review.opendev.org/#/c/694480/ > [3] https://review.opendev.org/#/c/694466/ > Looking at https://review.opendev.org/#/q/project:openstack/neutron-interconnection I suggest to retire only with those few changes to the repo - and nothing in the last few months. Or is there anybody committing to continue the work? We can also retire now - and create again in the "x/" namespace if interest suddenly arises. But let's not move repos that are de-facto dead around, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From missile0407 at gmail.com Fri Nov 15 13:19:40 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Fri, 15 Nov 2019 21:19:40 +0800 Subject: [kolla] Shutdown ordering of MariaDB containers? In-Reply-To: References: Message-ID: Yes, we're doing this when all MariaDB containers are down. But we still curious about this problem. Ordering to shutdown then boot up the MariaDB cluster still can caused this issue. My initial guess is that the docker is startup earlier than network, caused they can't connect each other. But it still a guess because the connection usually back if restart container manually. At least this is what we solve if the service can't connect to DB or AMQP but both them are fine. Perhaps I may try this. For now, it seems like using mariadb_recovery is the only way to let MariaDB back online if reboot the whole cluster right? Mark Goddard 於 2019年11月15日 週五 下午8:07寫道: > On Fri, 15 Nov 2019 at 03:28, Eddie Yen wrote: > > > > Hi everyone, > > I want to ask about the order of shutdown MariaDB (or mean controller) > node. > > > > For previous steps we found is usually shutdown slaves first, then > master [1]. > > But we found that the MariaDB still get container restarting issue even > I followed the step after booting up the cluster. > > > > Below is that I did when shutdown/boot up controller. > > 1. Shutdown the slaves first, then master > > 2. Boot master first, then slaves. > > > > For looking which one is master, we usually looking for the haproxy log > and find which mariadb node that the last session access the DB. > > Or looking for which mariadb container has "--wsrep-new-cluster" in > BOOTSTRAP_ARGS. > > > > Does anyone has experience about this? > > Hi Eddie, > You can use the kolla-ansible mariadb_recovery command to bootstrap a > cluster where all nodes have gone down. > Mark > > > > Many thanks, > > Eddie. > > > > [1] https://bugs.launchpad.net/kolla-ansible/+bug/1712087 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.denton at rackspace.com Fri Nov 15 14:03:42 2019 From: james.denton at rackspace.com (James Denton) Date: Fri, 15 Nov 2019 14:03:42 +0000 Subject: [Neutron] OVS forwarding issues In-Reply-To: <39de8fd7-a57f-5b70-4f7a-2934bbe6b7cc@gmx.com> References: <39de8fd7-a57f-5b70-4f7a-2934bbe6b7cc@gmx.com> Message-ID: <48F9BC4F-02B7-4E7A-AF06-EAED4053B63A@rackspace.com> I seem to recall checking this when the issue was first discovered, and OVN did not appear to implement the same flow rules that resulted in the issue. I don’t have a live environment to test with, though. James Denton Network Engineer Rackspace Private Cloud james.denton at rackspace.com From: Volodymyr Litovka Date: Friday, November 15, 2019 at 3:55 AM To: James Denton , "openstack-discuss at lists.openstack.org" , Slawek Kaplonski Cc: "doka.ua at gmx.com" Subject: Re: [Neutron] OVS forwarding issues CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Hi colleagues, thanks for the pointing on this. Can anybody _assume_ whether this bug affects also ML2/OVN implementation of networking? I was looking into OVN sometimes ago, but due to lack of resources skipped this research, now I think it makes sense to return back to this question. Thank you. On 11.11.2019 19:38, James Denton wrote: Hi, This is a known issue with the openvswitch firewall[1]. > firewall_driver = openvswitch I recommend running iptables_hybrid until that is resolved. [1] https://bugs.launchpad.net/neutron/+bug/1732067 James Denton Network Engineer Rackspace Private Cloud james.denton at rackspace.com From: Volodymyr Litovka Date: Monday, November 11, 2019 at 12:10 PM To: "openstack-discuss at lists.openstack.org" Cc: "doka.ua at gmx.com" Subject: [Neutron] OVS forwarding issues CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Dear colleagues, just faced an issue with Openvswitch, which looks strange for me. The problem is that any particular VM receives a lot of packets, which are unicasted: - from other VMs which reside on the same host (let's name them "local VMs") - to other VMs which reside on other hosts (let's name them "remote VMs") Long output from "ovs-ofctl dump-flows br-int" which, as far as I can narrow, ends there: # ovs-ofctl dump-flows br-int |grep " table=94," |egrep "n_packets=[123456789]" cookie=0xaf6b1435fe826bdf, duration=2952350.695s, table=94, n_packets=291494723, n_bytes=40582103074, idle_age=0, hard_age=65534, priority=1 actions=NORMAL coming to normal processing (classic MAC learning). Looking into br-int MAC-table (ovs-appctl fdb/show br-int) shows, that there are really no MAC addresses of remote VMs and br-int behaves in the right way, flooding unknown unicast to all ports in this L2 segment. Of course, there is br-tun which connected over vxlan to all other hosts and to br-int: Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "vxlan-0a960008" Interface "vxlan-0a960008" type: vxlan options: {df_default="true", in_key=flow, local_ip="10.150.0.5", out_key=flow, remote_ip="10.150.0.8"} [ ... ] Port br-tun Interface br-tun type: internal Port patch-int Interface patch-int type: patch options: {peer=patch-tun} but MAC table on br-tun is empty as well: # ovs-appctl fdb/show br-tun port VLAN MAC Age # Finally, packets get to destination, while being copied to all ports on source host, which is serious security issue. I do not think so conceived by design, I rather think we missed something in configuration. Can anybody point me where we're wrong and help with this issue? We're using Openstack Rocky and OVS 2.10.0 under Ubuntu 16.04. Network configuration is: @controller: # cat /etc/neutron/plugins/ml2/ml2_conf.ini |egrep -v "^$|^#" [DEFAULT] verbose = true [ml2] type_drivers = flat,vxlan tenant_network_types = vxlan mechanism_drivers = l2population,openvswitch extension_drivers = port_security,qos,dns_domain_ports [ml2_type_flat] flat_networks = provider [ml2_type_geneve] [ml2_type_gre] [ml2_type_vlan] [ml2_type_vxlan] vni_ranges = 400:400000 [securitygroup] firewall_driver = openvswitch enable_security_group = true enable_ipset = true @agent: # cat /etc/neutron/plugins/ml2/openvswitch_agent.ini |egrep -v "^$|^#" [DEFAULT] verbose = true [agent] tunnel_types = vxlan l2_population = true arp_responder = true extensions = qos [ovs] local_ip = 10.150.0.5 bridge_mappings = provider:br-ex [securitygroup] firewall_driver = openvswitch enable_security_group = true enable_ipset = true [xenapi] Thank you. -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From deepa.kr at fingent.com Thu Nov 14 05:53:13 2019 From: deepa.kr at fingent.com (Deepa) Date: Thu, 14 Nov 2019 11:23:13 +0530 Subject: Freezer Project Update Message-ID: <003201d59aaf$cf415fa0$6dc41ee0$@fingent.com> Hello Team Good Day I am Deepa from Fingent Global Solutions and we are a big fan of Openstack and we do have 4 + openstack setup (including production) We have deployed Openstack using juju and Maas .So when we check for backup feasibility other than cinder-backup we were able to see Freezer Project. But couldn't find any charms for it in juju charms. Also there isn't a clear documentation on how to install freezer . https://docs.openstack.org/releasenotes/freezer/train.html. No proper release notes in the latest version as well. Can you please tell me whether this project is in developing state? Whether charms will be added to juju in future. Can you also share a proper documentation on how to install Freezer in cluster setup. Thanks for your help. Regards, Deepa K R -------------- next part -------------- An HTML attachment was scrubbed... URL: From rabia.shaheen at xflowresearch.com Fri Nov 15 04:51:18 2019 From: rabia.shaheen at xflowresearch.com (rabia.shaheen at xflowresearch.com) Date: Fri, 15 Nov 2019 09:51:18 +0500 Subject: Trove Image issue Message-ID: <008d01d59b70$542067f0$fc6137d0$@xflowresearch.com> Hi Team, I have deployed kolla-ansible(Stein) with Trove enable and using Trove prebuild images to build the Database VM but VM is constantly stuck in build state. I am not sure how to use prebuild image key which is in https://opendev.org/openstack/trove/src/branch/master/integration/scripts/fi les/keys folder. Can you please guide me regarding the key usage for prebuild images (http://tarballs.openstack.org/trove/images/) To build my own trove image on kolla-ansible, is there any specific guide available for it? Warm Regards, Rabia Shaheen Lead Engineer, xFlow Research Inc. +923075462720 (GMT+5) rabia.shaheen at xflowresearch.com www.xflowresearch.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Fri Nov 15 15:29:03 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 15 Nov 2019 09:29:03 -0600 Subject: [all] Nominations for the "V" release name Message-ID: <20191115152903.GA29931@sm-workstation> Hey everyone, There is ongoing discussion about changing our release naming process, but for the time being we are going to stick with what we have been doing. That means it's time to start thinking about the "V" release name! The next developer event will take place in Vancouver, BC. The geographic location for this release will be things starting with "V" in the British Columbia province. The nomination period is now open. Please add suitable names to https://wiki.openstack.org/wiki/Release_Naming/V_Proposals. We will accept nominations until December 6, 2019 23:59:59 UTC. A recap of our current naming rules: * Each release name must start with the letter of the ISO basic Latin alphabet following the initial letter of the previous release, starting with the initial release of "Austin". After "Z", the next name should start with "A" again. * The name must be composed only of the 26 characters of the ISO basic Latin alphabet. Names which can be transliterated into this character set are also acceptable. * The name must refer to the physical or human geography of the region encompassing the location of the OpenStack design summit for the corresponding release. The exact boundaries of the geographic region under consideration must be declared before the opening of nominations, as part of the initiation of the selection process. * The name must be a single word with a maximum of 10 characters. Words that describe the feature should not be included, so "Foo City" or "Foo Peak" would both be eligible as "Foo". Names which do not meet these criteria but otherwise sound really cool should be added to a separate section of the wiki page and the TC may make an exception for one or more of them to be considered in the Condorcet poll. The naming official is responsible for presenting the list of exceptional names for consideration to the TC before the poll opens. Additional information about the release naming process can be found here: https://governance.openstack.org/tc/reference/release-naming.html Looking forward to having a name for our next release! Sean From fungi at yuggoth.org Fri Nov 15 15:55:57 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 15 Nov 2019 15:55:57 +0000 Subject: [all] Nominations for the "V" release name In-Reply-To: <20191115152903.GA29931@sm-workstation> References: <20191115152903.GA29931@sm-workstation> Message-ID: <20191115155557.kt6yazzkzkr3mztl@yuggoth.org> On 2019-11-15 09:29:03 -0600 (-0600), Sean McGinnis wrote: [...] > The next developer event will take place in Vancouver, BC. [...] > The name must refer to the physical or human geography of the > region encompassing the location of the OpenStack design summit > for the corresponding release. [...] It's worth noting we haven't had an OpenStack Design Summit for years now (not since the PTG/Forum split), and the last few have been Open Infrastructure Summits. But the upcoming event in Vancouver isn't one of those either (event naming yet to be determined), so presumably the event name in this rule is being interpreted loosely. (With love, your friendly neighborhood pedant.) -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From skaplons at redhat.com Fri Nov 15 15:57:05 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 15 Nov 2019 16:57:05 +0100 Subject: [neutron] Removing neutron-interconnection out of stadium In-Reply-To: References: <20191115085605.zj35uembs2gaql4v@skaplons-mac> Message-ID: <20191115155705.rdyesxnpgvgbm5fw@skaplons-mac> Hi, Yes, after some discussions on IRC I think that it will be better to simple retire project as it has no any activity since it was created. Finally I have 5 patches to retire this project. See [1] for them. [1] https://review.opendev.org/#/q/topic:neutron-interconnection-retire+(status:open+OR+status:merged) On Fri, Nov 15, 2019 at 01:28:17PM +0100, Andreas Jaeger wrote: > On 15/11/2019 09.56, Slawek Kaplonski wrote: > > Hi neutrinos, > > > > As we discussed during the Shanghai PTG I just proposed changes to move > > neutron-interconnection project out of stadium to "x/" namespace. > > So it will not be official neutron project anymore after those changes. > > Patches for that are in [1] and [2] > > > > I also proposed to remove neutron-interconnection api-ref from neutron-lib. > > Patch is here [3]. Please review it :) > > > > [1] https://review.opendev.org/#/c/694478/ > > [2] https://review.opendev.org/#/c/694480/ > > [3] https://review.opendev.org/#/c/694466/ > > > > Looking at > https://review.opendev.org/#/q/project:openstack/neutron-interconnection > > I suggest to retire only with those few changes to the repo - and > nothing in the last few months. > > Or is there anybody committing to continue the work? > > We can also retire now - and create again in the "x/" namespace if > interest suddenly arises. But let's not move repos that are de-facto > dead around, > > Andreas > -- > Andreas Jaeger aj at suse.com Twitter: jaegerandi > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > (HRB 36809, AG Nürnberg) GF: Felix Imendörffer > GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 > -- Slawek Kaplonski Senior software engineer Red Hat From sean.mcginnis at gmx.com Fri Nov 15 16:34:20 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 15 Nov 2019 10:34:20 -0600 Subject: [all] Nominations for the "V" release name In-Reply-To: <20191115155557.kt6yazzkzkr3mztl@yuggoth.org> References: <20191115152903.GA29931@sm-workstation> <20191115155557.kt6yazzkzkr3mztl@yuggoth.org> Message-ID: <20191115163420.GA1678@sm-workstation> > [...] > > The next developer event will take place in Vancouver, BC. > [...] > > The name must refer to the physical or human geography of the > > region encompassing the location of the OpenStack design summit > > for the corresponding release. > [...] > > It's worth noting we haven't had an OpenStack Design Summit for > years now (not since the PTG/Forum split), and the last few have > been Open Infrastructure Summits. But the upcoming event in > Vancouver isn't one of those either (event naming yet to be > determined), so presumably the event name in this rule is being > interpreted loosely. > > (With love, your friendly neighborhood pedant.) Oops! I had actually updated that on the wiki page, but then forgot to do so in the announcement. :) From openstack at nemebean.com Fri Nov 15 17:00:39 2019 From: openstack at nemebean.com (Ben Nemec) Date: Fri, 15 Nov 2019 11:00:39 -0600 Subject: [oslo] Virtual PTG Planning In-Reply-To: References: Message-ID: <990d6f1f-ac09-a2cb-2512-360195340202@nemebean.com> Okay, so far just three of us have responded to the poll. Since this was sort of short notice for next week and so far everyone seems to be available on any of the days, I'm going to propose that we do this on Nov. 25. As an added bonus that means it can double as a virtual birthday party for me. :-) If that ends up not working for anyone we can revisit this, but otherwise let's plan on doing it then. Thanks. -Ben On 11/13/19 12:08 PM, Ben Nemec wrote: > Hi Osloers, > > Given that a lot of the team was not in Shanghai and we had a few topics > proposed that didn't make sense to discuss as a result, I would like to > try doing a virtual PTG the way a number of the other teams are. I've > added a section to the PTG etherpad[0] with some proposed details, but > in general I'm thinking we meet on Jitsi (it's open source) around the > time of the Oslo meeting. It's possible we might be able to get through > everything in the regularly scheduled hour, but if possible I'd like to > keep the following hour (1600-1700 UTC) open as well. If everyone's > available we could do it next week (the 18th) or possibly the following > week (the 25th), although that runs into Thanksgiving week in the US so > people might be out. I've created a Doodle poll[1] with selections for > the next three weeks so please respond there if you can make it any of > those days. If none of them work well we can discuss alternative options. > > Thanks. > > -Ben > > 0: https://etherpad.openstack.org/p/oslo-shanghai-topics > 1: https://doodle.com/poll/8bqiv865ucyt8499 > From mnaser at vexxhost.com Fri Nov 15 17:45:37 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 15 Nov 2019 12:45:37 -0500 Subject: [sig] Forming a Large scale SIG In-Reply-To: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> References: <49b91728-1109-a379-a089-ac9bb5f7f046@openstack.org> Message-ID: On Wed, Nov 13, 2019 at 6:22 AM Thierry Carrez wrote: > > Hi everyone, > > In Shanghai we held a forum session to gauge interest in a new SIG to > specifically address cluster scaling issues. In the past we had several > groups ("Large deployments", "Performance", LCOO...) but those efforts > were arguably a bit too wide and those groups are now abandoned. > > My main goal here is to get large users directly involved in a domain > where their expertise can best translate into improvements in the > software. It's easy for such a group to go nowhere while trying to boil > the ocean. To maximize its chances of success and make it sustainable, > the group should have a narrow focus, and reasonable objectives. > > My personal idea for the group focus was to specifically address scaling > issues within a single cluster: basically identify and address issues > that prevent scaling a single cluster (or cell) past a number of nodes. > By sharing analysis and experience, the group could identify common pain > points that, once solved, would help raising that number. > > There was a lot of interest in that session[1], and it predictably > exploded in lots of different directions, including some that are > definitely past a single cluster (like making Neutron better support > cells). I think it's fine: my initial proposal was more of a strawman. > Active members of the group should really define what they collectively > want to work on. And the SIG name should be picked to match that. > > I'd like to help getting that group off the ground and to a place where > it can fly by itself, without needing external coordination. The first > step would be to identify interested members and discuss group scope and > objectives. Given the nature of the group (with interested members in > Japan, Europe, Australia and the US) it will be hard to come up with a > synchronous meeting time that will work for everyone, so let's try to > hold that discussion over email. > > So to kick this off: if you are interested in that group, please reply > to this email, introduce yourself and tell us what you would like the > group scope and objectives to be, and what you can contribute to the group. Count me in, I'll be watching from the sidelines and chiming in when I see things happen and come up. > Thanks! > > [1] https://etherpad.openstack.org/p/PVG-large-scale-SIG > > -- > Thierry Carrez (ttx) > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. https://vexxhost.com From timothy.gresham at intel.com Fri Nov 15 19:12:22 2019 From: timothy.gresham at intel.com (Gresham, Timothy) Date: Fri, 15 Nov 2019 19:12:22 +0000 Subject: Intel 3rd Party CI - Offline until Monday due to upgrades. Message-ID: <5A3D1F5D71F58E4A9E7DAA38716F9FB9B92119E1@FMSMSX112.amr.corp.intel.com> Infrastructure upgrades are occurring in the lab which hosts Intel's OpenStack 3rd party CI. These upgrades will require us to take our CI offline for the weekend. Jobs covering the following areas will be offline. * Persistent memory * Tap as a Service * NFV * PCI * SRIOV Jobs covering Cinder/RSD should not be impacted. Service is expected to be restored Monday afternoon Pacific time. We will send out another email once service has been restored. Tim Gresham Cloud Engineer - Intel Corporation Intel Architecture, Graphics, and Software -------------- next part -------------- An HTML attachment was scrubbed... URL: From laurentfdumont at gmail.com Fri Nov 15 19:15:06 2019 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Fri, 15 Nov 2019 14:15:06 -0500 Subject: [nova] Displayed state of VM when nova-compute is down/unresponsive. Message-ID: Hey everyone, We had a discussion with some colleagues at work. There was some confusion over the expected behavior of Openstack/Nova regarding the state of VMs on a compute that is down(or one where Nova is in a "bad" state and unable to update properly). Right now, it seems that the VMs will stay in the last state they we're seen. We we're wondering if there was a way to expose the fact that the underlying hypervisor is down? Something like a "Warning : no data from compute since xx:xx:xx" I did not see any documentation regarding a possible configuration option somewhere but a lot of posts with people with similar questions. I understand that the state of the VM shouldn't be changed based on the status of a compute - but exposing the fact that the state itself is not current might be good middle-ground. I do see a possible issue with the fact that the hypervisor itself is not known if the User/Project is not admin. Is anyone aware of anything similar in the past? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From kotobi at dkrz.de Fri Nov 15 19:42:06 2019 From: kotobi at dkrz.de (Amjad Kotobi) Date: Fri, 15 Nov 2019 20:42:06 +0100 Subject: Freezer Project Update In-Reply-To: <003201d59aaf$cf415fa0$6dc41ee0$@fingent.com> References: <003201d59aaf$cf415fa0$6dc41ee0$@fingent.com> Message-ID: <14B06106-F3AB-47DA-BD35-99B18383C147@dkrz.de> Hi, This project is pretty much in production state, from last summit it got active again from developer ends, we are using it for backup solution too. Documentation side isn’t that bright, very soon gonna get updated, anyhow you are able to install as standalone project in instance, I did it manually, didn’t use any provision tools. Let me know for specific part of deployment that is not clear. Amjad > On 14. Nov 2019, at 06:53, Deepa wrote: > > Hello Team > > Good Day > > I am Deepa from Fingent Global Solutions and we are a big fan of Openstack and we do have 4 + openstack setup (including production) > We have deployed Openstack using juju and Maas .So when we check for backup feasibility other than cinder-backup we were able to see > Freezer Project. But couldn’t find any charms for it in juju charms. Also there isn’t a clear documentation on how to install freezer . > https://docs.openstack.org/releasenotes/freezer/train.html . No proper release notes in the latest version as well. > Can you please tell me whether this project is in developing state? Whether charms will be added to juju in future. > Can you also share a proper documentation on how to install Freezer in cluster setup. > > Thanks for your help. > > Regards, > Deepa K R -------------- next part -------------- An HTML attachment was scrubbed... URL: From dms at danplanet.com Fri Nov 15 20:02:15 2019 From: dms at danplanet.com (Dan Smith) Date: Fri, 15 Nov 2019 12:02:15 -0800 Subject: [nova] Displayed state of VM when nova-compute is down/unresponsive. In-Reply-To: (Laurent Dumont's message of "Fri, 15 Nov 2019 14:15:06 -0500") References: Message-ID: > We we're wondering if there was a way to expose the fact that the > underlying hypervisor is down? Something like a "Warning : no data > from compute since xx:xx:xx" I did not see any documentation regarding > a possible configuration option somewhere but a lot of posts with > people with similar questions. You're looking for "host_status" in the detailed server output. It gives you an indication of what the state is of the host the instance is on without revealing too much and without altering the state of the instance itself, which as you note could be wrong if the problem is merely communication. https://docs.openstack.org/api-ref/compute/?expanded=show-server-details-detail#show-server-details This is controlled by policy and only visible past microversion 2.16, so make sure both of those details are handled for whatever users you want to be able to have that level of vibility. --Dan From info at dantalion.nl Sat Nov 16 09:36:32 2019 From: info at dantalion.nl (info at dantalion.nl) Date: Sat, 16 Nov 2019 10:36:32 +0100 Subject: [oslo][i18n][pbr] get_available_languages() always only returns ['en_US'] Message-ID: <862457b9-719c-59b7-85ff-694370946d62@dantalion.nl> Hello, Across several projects I have noticed that both with unit tests or while the service is running calling olso_i18n.get_available_languages() only returns ['en_US'] (which is inserted as default). Even though the projects I have tested this on have several languages available in the locale directory. Calling any of the python setup.py extract_messages / compile_catalog / update_catalog / install commands does not solve this. However, when I manually copy the .mo files into /usr/share/locale/**/LC_MESSAGES it works as expected. My questions are: Surely there must be a less manual method to properly install all locale files but what is it? Why aren't the locale files installed by pbr when python setup.py install is called? I hope anyone knows the answers to these questions. Kind regards, Corne Lukken (Dantali0n) From thomas.morin at orange.com Sat Nov 16 15:38:39 2019 From: thomas.morin at orange.com (thomas.morin at orange.com) Date: Sat, 16 Nov 2019 16:38:39 +0100 Subject: [neutron] Removing neutron-interconnection out of stadium In-Reply-To: References: <20191115085605.zj35uembs2gaql4v@skaplons-mac> Message-ID: <14033_1573918723_5DD01803_14033_397_1_2a8a0e9f-0ad9-45f0-8519-7eefab4d9a30@OPEXCAUBM41.corporate.adroot.infra.ftgroup> Hi stackers & neutrinos, I understand the need to adapt the project status to the lack of activity on the project in the past year. During this time, my time as a reviewer, preventing the code submitted to be merged, has been taken by other activites (still OpenStack related!). Having been at the origin of the project, I have to apologize for the lack of communication of where we were about this project. My apologies for that. We still would like to have a place to let the proposal exit, code be reviewed and tested. Hosting under "x/" would work for us. Hope that this can work like this... Thanks! -Thomas Andreas Jaeger : > On 15/11/2019 09.56, Slawek Kaplonski wrote: >> Hi neutrinos, >> >> As we discussed during the Shanghai PTG I just proposed changes to move >> neutron-interconnection project out of stadium to "x/" namespace. >> So it will not be official neutron project anymore after those changes. >> Patches for that are in [1] and [2] >> >> I also proposed to remove neutron-interconnection api-ref from neutron-lib. >> Patch is here [3]. Please review it :) >> >> [1] https://review.opendev.org/#/c/694478/ >> [2] https://review.opendev.org/#/c/694480/ >> [3] https://review.opendev.org/#/c/694466/ >> > Looking at > https://review.opendev.org/#/q/project:openstack/neutron-interconnection > > I suggest to retire only with those few changes to the repo - and > nothing in the last few months. > > Or is there anybody committing to continue the work? > > We can also retire now - and create again in the "x/" namespace if > interest suddenly arises. But let's not move repos that are de-facto > dead around, > > Andreas _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. From aj at suse.com Sat Nov 16 16:52:46 2019 From: aj at suse.com (Andreas Jaeger) Date: Sat, 16 Nov 2019 17:52:46 +0100 Subject: [neutron] Removing neutron-interconnection out of stadium In-Reply-To: <14033_1573918723_5DD01803_14033_397_1_2a8a0e9f-0ad9-45f0-8519-7eefab4d9a30@OPEXCAUBM41.corporate.adroot.infra.ftgroup> References: <20191115085605.zj35uembs2gaql4v@skaplons-mac> <14033_1573918723_5DD01803_14033_397_1_2a8a0e9f-0ad9-45f0-8519-7eefab4d9a30@OPEXCAUBM41.corporate.adroot.infra.ftgroup> Message-ID: On 16/11/2019 16.38, thomas.morin at orange.com wrote: > Hi stackers & neutrinos, > > I understand the need to adapt the project status to the lack of > activity on the project in the past year. > During this time, my time as a reviewer, preventing the code submitted > to be merged, has been taken by other activites (still OpenStack related!). > > Having been at the origin of the project, I have to apologize for the > lack of communication of where we were about this project. > My apologies for that. > > We still would like to have a place to let the proposal exit, code be > reviewed and tested. > Hosting under "x/" would work for us. Sure, no problem. Please read first https://governance.openstack.org/tc/resolutions/20190711-mandatory-repository-retirement.html - that's the process that applies here. So, we retire the repo completely (with the existing changes) - and you can anytime push up a new change to create the repo in the "x" namespace import the content (minus the retirement change) into it... I'll quickly review such an import if it shows up, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From fungi at yuggoth.org Sat Nov 16 19:34:18 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sat, 16 Nov 2019 19:34:18 +0000 Subject: [neutron] Removing neutron-interconnection out of stadium In-Reply-To: References: <20191115085605.zj35uembs2gaql4v@skaplons-mac> <14033_1573918723_5DD01803_14033_397_1_2a8a0e9f-0ad9-45f0-8519-7eefab4d9a30@OPEXCAUBM41.corporate.adroot.infra.ftgroup> Message-ID: <20191116193418.g7wthwefudfc6rv7@yuggoth.org> On 2019-11-16 17:52:46 +0100 (+0100), Andreas Jaeger wrote: [...] > So, we retire the repo completely (with the existing changes) - and you > can anytime push up a new change to create the repo in the "x" namespace > import the content (minus the retirement change) into it... [...] Even easier is probably to import it all (just set the upstream import to the opendev.org clone URL for the original project), and then as part of the first change to the new project revert the retirement commit. That doesn't require pushing a temporary copy of the repository anywhere for import. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From hberaud at redhat.com Mon Nov 18 09:07:00 2019 From: hberaud at redhat.com (Herve Beraud) Date: Mon, 18 Nov 2019 10:07:00 +0100 Subject: [oslo] Virtual PTG Planning In-Reply-To: <990d6f1f-ac09-a2cb-2512-360195340202@nemebean.com> References: <990d6f1f-ac09-a2cb-2512-360195340202@nemebean.com> Message-ID: +1 Wise decision. Do we need to bring some party favors? Le ven. 15 nov. 2019 à 18:04, Ben Nemec a écrit : > Okay, so far just three of us have responded to the poll. Since this was > sort of short notice for next week and so far everyone seems to be > available on any of the days, I'm going to propose that we do this on > Nov. 25. As an added bonus that means it can double as a virtual > birthday party for me. :-) > > If that ends up not working for anyone we can revisit this, but > otherwise let's plan on doing it then. > > Thanks. > > -Ben > > On 11/13/19 12:08 PM, Ben Nemec wrote: > > Hi Osloers, > > > > Given that a lot of the team was not in Shanghai and we had a few topics > > proposed that didn't make sense to discuss as a result, I would like to > > try doing a virtual PTG the way a number of the other teams are. I've > > added a section to the PTG etherpad[0] with some proposed details, but > > in general I'm thinking we meet on Jitsi (it's open source) around the > > time of the Oslo meeting. It's possible we might be able to get through > > everything in the regularly scheduled hour, but if possible I'd like to > > keep the following hour (1600-1700 UTC) open as well. If everyone's > > available we could do it next week (the 18th) or possibly the following > > week (the 25th), although that runs into Thanksgiving week in the US so > > people might be out. I've created a Doodle poll[1] with selections for > > the next three weeks so please respond there if you can make it any of > > those days. If none of them work well we can discuss alternative options. > > > > Thanks. > > > > -Ben > > > > 0: https://etherpad.openstack.org/p/oslo-shanghai-topics > > 1: https://doodle.com/poll/8bqiv865ucyt8499 > > > > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaronzhu1121 at gmail.com Mon Nov 18 11:46:16 2019 From: aaronzhu1121 at gmail.com (Rong Zhu) Date: Mon, 18 Nov 2019 19:46:16 +0800 Subject: [release][stable][telemetry]Please add Rong Zhu to ceilometer-stable-maint group Message-ID: Hi Stable Maintenance Core team, I am the current Telemetry PTL, could you please add me to ceilometer-stable-maint group. And please also add Lingxian Kong to this group. -- Thanks, Rong Zhu -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Mon Nov 18 14:17:36 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Mon, 18 Nov 2019 06:17:36 -0800 Subject: [horizon] Changing the release model to cycle-with-intermediary Message-ID: Hi, I just proposed a patch to change the horizon release model to cycle-with-intermediary. https://review.opendev.org/#/c/694772/ It was discussed during the Shanghai PTG. Horizon provides GUI to users. On the other hand, it is a library for horizon plugins. When horizon plugins would like to use recent changes or to avoid bugs, they need to consume beta releases of horizon. More frequent releases of horizon would make more sense, so I am proposing the release model change. If there are concerns, reply to this thread or drop comments in the review mentioned above. Thanks, Akihiro Motoki (irc: amotoki) From kgiusti at gmail.com Mon Nov 18 14:42:41 2019 From: kgiusti at gmail.com (Ken Giusti) Date: Mon, 18 Nov 2019 09:42:41 -0500 Subject: [oslo] Virtual PTG Planning In-Reply-To: <990d6f1f-ac09-a2cb-2512-360195340202@nemebean.com> References: <990d6f1f-ac09-a2cb-2512-360195340202@nemebean.com> Message-ID: +1 (year for Ben) +1 to Nov 25th On Fri, Nov 15, 2019 at 12:14 PM Ben Nemec wrote: > Okay, so far just three of us have responded to the poll. Since this was > sort of short notice for next week and so far everyone seems to be > available on any of the days, I'm going to propose that we do this on > Nov. 25. As an added bonus that means it can double as a virtual > birthday party for me. :-) > > If that ends up not working for anyone we can revisit this, but > otherwise let's plan on doing it then. > > Thanks. > > -Ben > > On 11/13/19 12:08 PM, Ben Nemec wrote: > > Hi Osloers, > > > > Given that a lot of the team was not in Shanghai and we had a few topics > > proposed that didn't make sense to discuss as a result, I would like to > > try doing a virtual PTG the way a number of the other teams are. I've > > added a section to the PTG etherpad[0] with some proposed details, but > > in general I'm thinking we meet on Jitsi (it's open source) around the > > time of the Oslo meeting. It's possible we might be able to get through > > everything in the regularly scheduled hour, but if possible I'd like to > > keep the following hour (1600-1700 UTC) open as well. If everyone's > > available we could do it next week (the 18th) or possibly the following > > week (the 25th), although that runs into Thanksgiving week in the US so > > people might be out. I've created a Doodle poll[1] with selections for > > the next three weeks so please respond there if you can make it any of > > those days. If none of them work well we can discuss alternative options. > > > > Thanks. > > > > -Ben > > > > 0: https://etherpad.openstack.org/p/oslo-shanghai-topics > > 1: https://doodle.com/poll/8bqiv865ucyt8499 > > > > -- Ken Giusti (kgiusti at gmail.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alifshit at redhat.com Mon Nov 18 15:02:28 2019 From: alifshit at redhat.com (Artom Lifshitz) Date: Mon, 18 Nov 2019 10:02:28 -0500 Subject: [all] Nominations for the "V" release name In-Reply-To: <20191115152903.GA29931@sm-workstation> References: <20191115152903.GA29931@sm-workstation> Message-ID: Top posting for the lulz: Clearly the name must be V for Vendetta. On Fri, Nov 15, 2019 at 10:33 AM Sean McGinnis wrote: > > Hey everyone, > > There is ongoing discussion about changing our release naming process, but for > the time being we are going to stick with what we have been doing. That means > it's time to start thinking about the "V" release name! > > The next developer event will take place in Vancouver, BC. The geographic > location for this release will be things starting with "V" in the British > Columbia province. > > The nomination period is now open. Please add suitable names to > https://wiki.openstack.org/wiki/Release_Naming/V_Proposals. We will accept > nominations until December 6, 2019 23:59:59 UTC. > > A recap of our current naming rules: > > * Each release name must start with the letter of the ISO basic Latin > alphabet following the initial letter of the previous release, starting > with the initial release of "Austin". After "Z", the next name should > start with "A" again. > > * The name must be composed only of the 26 characters of the ISO basic > Latin alphabet. Names which can be transliterated into this character > set are also acceptable. > > * The name must refer to the physical or human geography of the region > encompassing the location of the OpenStack design summit for the > corresponding release. The exact boundaries of the geographic region > under consideration must be declared before the opening of nominations, > as part of the initiation of the selection process. > > * The name must be a single word with a maximum of 10 characters. Words > that describe the feature should not be included, so "Foo City" or "Foo > Peak" would both be eligible as "Foo". > > Names which do not meet these criteria but otherwise sound really cool > should be added to a separate section of the wiki page and the TC may > make an exception for one or more of them to be considered in the > Condorcet poll. The naming official is responsible for presenting the > list of exceptional names for consideration to the TC before the poll opens. > > > Additional information about the release naming process can be found here: > > https://governance.openstack.org/tc/reference/release-naming.html > > Looking forward to having a name for our next release! > > Sean From fungi at yuggoth.org Mon Nov 18 15:11:35 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 18 Nov 2019 15:11:35 +0000 Subject: [all] Nominations for the "V" release name In-Reply-To: References: <20191115152903.GA29931@sm-workstation> Message-ID: <20191118151135.wwztoxfyp2zo3iym@yuggoth.org> On 2019-11-18 10:02:28 -0500 (-0500), Artom Lifshitz wrote: > Top posting for the lulz: > > Clearly the name must be V for Vendetta. [...] If only you could have made that joke two weeks ago, it would have been far more timely. ;) -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Mon Nov 18 15:34:55 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 18 Nov 2019 15:34:55 +0000 Subject: [all] Nominations for the "V" release name In-Reply-To: <20191118151135.wwztoxfyp2zo3iym@yuggoth.org> References: <20191115152903.GA29931@sm-workstation> <20191118151135.wwztoxfyp2zo3iym@yuggoth.org> Message-ID: <20191118153455.zfw7r3e425dei3vo@yuggoth.org> On 2019-11-18 15:11:35 +0000 (+0000), Jeremy Stanley wrote: > On 2019-11-18 10:02:28 -0500 (-0500), Artom Lifshitz wrote: > > Top posting for the lulz: > > > > Clearly the name must be V for Vendetta. > [...] > > If only you could have made that joke two weeks ago, it would have > been far more timely. ;) Sorry, because this allusion confused some people, the plot line in "V for Vendetta" centers around Guy Fawkes Night which is observed annually on November 5. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From luka.peschke at objectif-libre.com Mon Nov 18 15:41:01 2019 From: luka.peschke at objectif-libre.com (Luka Peschke) Date: Mon, 18 Nov 2019 16:41:01 +0100 Subject: [cloudkitty] 18/11 IRC meeting recap Message-ID: <1ad2de5b6d9e6afeaaa93d4b112f4357@objectif-libre.com> Hello, This is the recap for today's IRC meeting of the cloudkitty team. The agenda can be found at [1] and the logs can be found at [2]. CloudKitty 11.0.1 ================= CloudKitty 11.0.1 has been released. It includes a fix for security issue on the GET /v1/dataframes and GET /v2/dataframes endpoints which had been introduced during the train development cycle. Various updates =============== Two patches had been discussed during the previous meeting: one for developer documentation for scope fetchers, and one allowing to group results by timestamps on GET /v2/summary. Both have been merged. API improvements ================ Two patches improving the API are currently under review: - The first one updates the way oslo.context and oslo.policy are used. External reviews on this one would be very helpful [3] - The second one improves the way various drivers are loaded in the v2 API. It is available at [4] Standalone cloudkitty dashboard =============================== Julien Pinchelimouroux (julien-pinchelim) has been working on a standalone dashboard (which will also support keystone authentication)for cloudkitty. A 0.1.0 release should happen during in Q4 2019. Some screenshots are available at [5] Cheers, -- Luka Peschke (peschk_l) [1] https://etherpad.openstack.org/p/cloudkitty-meeting-topics [2] http://eavesdrop.openstack.org/meetings/cloudkitty/2019/cloudkitty.2019-11-18-14.02.log.html [3] https://review.opendev.org/#/c/692333/ [4] https://review.opendev.org/#/c/686393/ [5] https://kutt.it/khA6yF From donny at fortnebula.com Mon Nov 18 16:05:19 2019 From: donny at fortnebula.com (Donny Davis) Date: Mon, 18 Nov 2019 11:05:19 -0500 Subject: [all] Nominations for the "V" release name In-Reply-To: References: <20191115152903.GA29931@sm-workstation> Message-ID: I know I only get a +1, but I +1 this name a thousand times. On Mon, Nov 18, 2019 at 10:08 AM Artom Lifshitz wrote: > Top posting for the lulz: > > Clearly the name must be V for Vendetta. > > On Fri, Nov 15, 2019 at 10:33 AM Sean McGinnis > wrote: > > > > Hey everyone, > > > > There is ongoing discussion about changing our release naming process, > but for > > the time being we are going to stick with what we have been doing. That > means > > it's time to start thinking about the "V" release name! > > > > The next developer event will take place in Vancouver, BC. The geographic > > location for this release will be things starting with "V" in the British > > Columbia province. > > > > The nomination period is now open. Please add suitable names to > > https://wiki.openstack.org/wiki/Release_Naming/V_Proposals. We will > accept > > nominations until December 6, 2019 23:59:59 UTC. > > > > A recap of our current naming rules: > > > > * Each release name must start with the letter of the ISO basic Latin > > alphabet following the initial letter of the previous release, starting > > with the initial release of "Austin". After "Z", the next name should > > start with "A" again. > > > > * The name must be composed only of the 26 characters of the ISO basic > > Latin alphabet. Names which can be transliterated into this character > > set are also acceptable. > > > > * The name must refer to the physical or human geography of the region > > encompassing the location of the OpenStack design summit for the > > corresponding release. The exact boundaries of the geographic region > > under consideration must be declared before the opening of nominations, > > as part of the initiation of the selection process. > > > > * The name must be a single word with a maximum of 10 characters. Words > > that describe the feature should not be included, so "Foo City" or "Foo > > Peak" would both be eligible as "Foo". > > > > Names which do not meet these criteria but otherwise sound really cool > > should be added to a separate section of the wiki page and the TC may > > make an exception for one or more of them to be considered in the > > Condorcet poll. The naming official is responsible for presenting the > > list of exceptional names for consideration to the TC before the poll > opens. > > > > > > Additional information about the release naming process can be found > here: > > > > https://governance.openstack.org/tc/reference/release-naming.html > > > > Looking forward to having a name for our next release! > > > > Sean > > > -- ~/DonnyD C: 805 814 6800 "No mission too difficult. No sacrifice too great. Duty First" -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Nov 18 16:08:15 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 18 Nov 2019 17:08:15 +0100 Subject: [neutron][drivers] Drivers meeting cancel Message-ID: <20191118160815.xrb7zslwnjgitzzz@skaplons-mac> Hi, I can't attend next drivers meeting on Friday, 22.11.2019. I know also that 2 other members of drivers team will not be able to be on the meeting so as we will not have quorum on this meeting, lets cancel it. See You on the meeting next week, on 29.11.2019 where (I hope) we should have quorum even if it's just after Thanksgiving. -- Slawek Kaplonski Senior software engineer Red Hat From fsbiz at yahoo.com Mon Nov 18 16:14:59 2019 From: fsbiz at yahoo.com (fsbiz at yahoo.com) Date: Mon, 18 Nov 2019 16:14:59 +0000 (UTC) Subject: Scheduler sends VM to HV that lacks resources In-Reply-To: <7d53de2f-46de-edcf-63dc-fe7ba8b61f83@gmail.com> References: <656b175f-f6a3-8cb7-8b55-d3e77a6972fb@gmail.com> <78766172.92122.1573718625984@mail.yahoo.com> <1952364384.238482.1573747741880@mail.yahoo.com> <7d53de2f-46de-edcf-63dc-fe7ba8b61f83@gmail.com> Message-ID: <1515926373.1791856.1574093699349@mail.yahoo.com> Thanks Matt for the excellent suggestions in this email and the prior one.I am currently trying to eliminate them one by one and will update. Yes, by  forced host I do mean creating the server with an availability zone in the ZONE:NODE format.   Yes, I understand the scheduler filters aren't run but why should that bean issue?  For now, I am tracing all the logs from the PaaS layer all the way to Openstack nova placement API tosee if there is anything unusual. Thanks,Fred. On Thursday, November 14, 2019, 10:07:15 AM PST, Matt Riedemann wrote: On 11/14/2019 10:09 AM, fsbiz at yahoo.com wrote: > The requests coming in are "forced host" requests.  The PaaS layer > maintains > an inventory of actual bare-metal available nodes and a user has to > explicitly select > a baremetal node.  The PaaS layer then makes a nova api call for an > instance to be created > on that specific baremetal node. To be clear, by forced host you mean creating the server with an availability zone in the format ZONE:HOST:NODE or ZONE:NODE where NODE is the ironic node UUID, correct? https://docs.openstack.org/nova/latest/admin/availability-zones.html#using-availability-zones-to-select-hosts Yeah that's a problem because then the scheduler filters aren't run. A potential alternative is to create the server using a hypervisor_hostname query hint that will run through the JsonFilter: https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#jsonfilter Then at least you're not forcing the node and run the scheduler filters. I forget exactly how the scheduler code works in Queens with respect to forced hosts/nodes on server create but the scheduler still has to allocate resources in placement. It looks like we work around that in Queens by disabling the limit we place on getting allocation candidates from placement: https://review.opendev.org/#/c/584616/ My guess is your PaaS layer has bugs in it since it's allowing users to select hosts that are already consumed, or it's just racy. Anyway, this is why nova uses placement since Pike for atomic consumption of resources during scheduling. -- Thanks, Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Mon Nov 18 17:37:02 2019 From: openstack at nemebean.com (Ben Nemec) Date: Mon, 18 Nov 2019 11:37:02 -0600 Subject: [oslo] Virtual PTG Planning In-Reply-To: References: <990d6f1f-ac09-a2cb-2512-360195340202@nemebean.com> Message-ID: I was going to suggest party hats, but I can't wear one with my headset on. :-) On 11/18/19 3:07 AM, Herve Beraud wrote: > +1 Wise decision. > Do we need to bring some party favors? > > Le ven. 15 nov. 2019 à 18:04, Ben Nemec > a écrit : > > Okay, so far just three of us have responded to the poll. Since this > was > sort of short notice for next week and so far everyone seems to be > available on any of the days, I'm going to propose that we do this on > Nov. 25. As an added bonus that means it can double as a virtual > birthday party for me. :-) > > If that ends up not working for anyone we can revisit this, but > otherwise let's plan on doing it then. > > Thanks. > > -Ben > > On 11/13/19 12:08 PM, Ben Nemec wrote: > > Hi Osloers, > > > > Given that a lot of the team was not in Shanghai and we had a few > topics > > proposed that didn't make sense to discuss as a result, I would > like to > > try doing a virtual PTG the way a number of the other teams are. > I've > > added a section to the PTG etherpad[0] with some proposed > details, but > > in general I'm thinking we meet on Jitsi (it's open source) > around the > > time of the Oslo meeting. It's possible we might be able to get > through > > everything in the regularly scheduled hour, but if possible I'd > like to > > keep the following hour (1600-1700 UTC) open as well. If everyone's > > available we could do it next week (the 18th) or possibly the > following > > week (the 25th), although that runs into Thanksgiving week in the > US so > > people might be out. I've created a Doodle poll[1] with > selections for > > the next three weeks so please respond there if you can make it > any of > > those days. If none of them work well we can discuss alternative > options. > > > > Thanks. > > > > -Ben > > > > 0: https://etherpad.openstack.org/p/oslo-shanghai-topics > > 1: https://doodle.com/poll/8bqiv865ucyt8499 > > > > > > -- > Hervé Beraud > Senior Software Engineer > Red Hat - Openstack Oslo > irc: hberaud > -----BEGIN PGP SIGNATURE----- > > wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ > Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ > RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP > F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G > 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g > glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw > m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ > hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 > qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y > F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 > B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O > v6rDpkeNksZ9fFSyoY2o > =ECSj > -----END PGP SIGNATURE----- > From kendall at openstack.org Mon Nov 18 18:08:40 2019 From: kendall at openstack.org (Kendall Waters) Date: Mon, 18 Nov 2019 12:08:40 -0600 Subject: Shanghai PTG Team Photos Message-ID: <4C08DD59-9EFE-4670-ACFA-D13CC8626234@openstack.org> Hi everyone, Thank you for attending the Project Teams Gathering in Shanghai! If your team took a team picture, you can find a copy of the photo file in this Dropbox folder: https://www.dropbox.com/sh/1my6wdtuc1hf58o/AACU49pjWxzFNzcZJgjLG8n1a?dl=0 If you are unable to open Dropbox, please send me an email with which team photo you are looking for and I can send you the file directly. Cheers, Kendall Kendall Waters OpenStack Marketing & Events kendall at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kchamart at redhat.com Mon Nov 18 18:11:06 2019 From: kchamart at redhat.com (Kashyap Chamarthy) Date: Mon, 18 Nov 2019 19:11:06 +0100 Subject: On next minimum libvirt / QEMU versions for "V" release Message-ID: <20191118181106.GD7032@paraplu> Heya, The last time we incremented versions for libvirt and QEMU was during the "Stein" release[1]. For "Train" we didn't do any. Although we advertized NEXT_MIN_{LIBVIRT,QEMU} versions for "Train" release to be libvirt 4.0.0 and QEMU 2.11.0, but we actually didn't bump; we'll do that for "Ussuri". But before we bump the versions for "Ussuri", we need to pick NEXT_MIN versions for the "V" release. Based on the updated the DistroSupportMatrix page[2], it looks we can pick the next libvirt and QEMU versions for "V" release to the following: libvirt: 5.0.0 [GAed on: 15-Jan-2019] QEMU: 4.0.0 [GAed on: 24-Apr-2019] I have the initial patch here[3] for comments. Debian, Fedora, Ubuntu[4], CentOS, RHEL currently already ship the above versions (actually, higher than those). And it is reasonable to assume -- but let's confirm below -- that openSUSE, SLES, and Oracle Linux would also have the above versions available by "V" release time. Action Items for Linux Distros ------------------------------ (a) Oracle Linux: Please update your libvirt/QEMU versions for Oracle Linux 8? I couldn't find anything related to libvirt/QEMU here: https://yum.oracle.com/oracle-linux-8.html. (My educated guess is: the versions roughly match what's in CentOS/RHEL.) (b) openSUSE and SLES: Same request as above. Andreas Jaegaer said on #openstack-infra that the proposed versions for 'V' release should be fine for SLES. (And by extension open SUSE, I assume.) - - - Assuming Oracle Linux and SLES confirm, please let us know if there are any objections if we pick NEXT_MIN_* versions for the OpenStack "V" release to be libvirt: 5.0.0 and QEMU: 4.0.0. Comments / alternative proposals welcome :-) [1] https://opendev.org/openstack/nova/commit/489b5f762e -- Pick next minimum libvirt / QEMU versions for "T" release, 2018-09-25) [2] https://wiki.openstack.org/wiki/LibvirtDistroSupportMatrix [3] https://review.opendev.org/694821 -- [RFC] Pick NEXT_MIN libvirt/QEMU versions for "V" release [4] For Ubuntu, I updated the versions based on what is in the Cloud Archive repo for "Bionic" (the LTS) release: http://reqorts.qa.ubuntu.com/reports/ubuntu-server/cloud-archive/train_versions.html -- /kashyap From gouthampravi at gmail.com Mon Nov 18 18:25:13 2019 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Mon, 18 Nov 2019 10:25:13 -0800 Subject: [manila] No IRC meeting on Nov 7 and Nov 21 2019 In-Reply-To: References: Message-ID: Folks, A reminder that the manila IRC meeting on 21st November 2019 has been canceled. The next meeting is on 28th November 2019 at 15:00 UTC - however, it's a holiday in the US and many contributors may not join. Carlos Eduardo has graciously agreed to chair this meeting. If we do not have a quorum, we'll defer any decisions and communicate on the mailing list instead. Thank you, Goutham On Thu, Oct 31, 2019 at 8:57 AM Goutham Pacha Ravi wrote: > > Hello Zorillas and interested stackers, > > Due to a part of our community attending the Open Infrastructure > Summit+PTG (Nov 4-8, 2019) and KubeCon+CloudNativeCon (Nov 18-21, > 2019), I propose that we cancel the weekly IRC meetings on Nov 7th and > Nov 21st. > > If you'd like to discuss anything during these weeks, please chime in > on freenode/#openstack-manila, or post to this mailing list. > > Thanks, > Goutham From pkliczew at redhat.com Mon Nov 18 17:59:35 2019 From: pkliczew at redhat.com (Piotr Kliczewski) Date: Mon, 18 Nov 2019 18:59:35 +0100 Subject: [Openstack] FOSDEM 2020 Virtualization & IaaS Devroom CfP Message-ID: Friendly reminder that there are 2 weeks before the submission deadline. Room day update: This year Virt and IaaS room will be on the 2nd of February. See you all at FOSDEM! -------------- next part -------------- An HTML attachment was scrubbed... URL: From colleen at gazlene.net Mon Nov 18 19:39:16 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Mon, 18 Nov 2019 11:39:16 -0800 Subject: [keystone] post-PTG virtual meeting reminder Message-ID: <23af6fa2-d554-480c-9e5e-fdf1762ed3f2@www.fastmail.com> Hi keystoners, As a reminder, we'll be holding our post-PTG meeting tomorrow at 14:00 UTC (with the daylight savings time change that makes it 6:00 PST (RIP me) / 9:00 EST / 19:30 IST). Last week I briefly floated the idea of rescheduling it but quickly decided there was not enough notice, so we're holding it at the day and time that we've had planned for the last few weeks. Our agenda and notes: https://etherpad.openstack.org/p/keystone-shanghai-ptg We'll try jitsi.org again, I think the technical issues we were having last time were on my end and not related to the platform: https://meet.jit.si/keystone-ptg Please please please review the roadmap board and assign yourself to items you have been working on and update their status: https://tree.taiga.io/project/keystone-ussuri-roadmap/kanban . We'll be going over the board together tomorrow, and updating it ahead of time will save us time as a group. Colleen From mnaser at vexxhost.com Mon Nov 18 21:40:04 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 18 Nov 2019 16:40:04 -0500 Subject: [tc][stable] Changing stable branch policy Message-ID: Hi everyone, At the PTG, the TC discussed what we can do about our stable branch policy and there was a few different ideas put on the table, however, something that I felt made a lot of sense was revisiting the way that we currently apply it. We all know that we're definitely a lot more resource limited as a community and it's important for us to start breaking down some of those ideas which made sense when the velocity of the project was very high. One of the things that used to make sense is maintaining a dedicated stable core team across all projects. At the current time: 1. Some projects seem to have some sort of power of their stable branches through historical reasons 2. Some projects don't have access to merging things into stable branches and need to rely on the stable maintenance team to do that 3. We are *really* thankful for our current stable team, but it seems that there is a lot of work that does bottleneck other teams (and really, stable reviews is a difficult task). The proposal that I had was that in mind would be for us to let teams self manage their own stable branches. I think we've reached a point where we can trust most of our community to be familiar with the stable branch policy (and let teams decide for themselves what they believe is best for the success of their own projects). I'd like to invite the community to comment on this change, the approach that we can take to do this (or other ideas) -- being mindful with the limited set of resources that we have inside the community. Thanks, Mohammed From feilong at catalyst.net.nz Mon Nov 18 21:46:30 2019 From: feilong at catalyst.net.nz (Feilong Wang) Date: Tue, 19 Nov 2019 10:46:30 +1300 Subject: [Magnum] Virtual PTG planning Message-ID: <2f35bc6c-b4bb-dbe7-c16d-ede34bc23914@catalyst.net.nz> Hi team, As we discussed on last weekly team meeting, we'd like to have a virtual PTG before the Xmas holiday to plan our work for the U release. The general idea is extending our current weekly meeting time from 1 hour to 2 hours and having 2 sessions with total 4 hours. My current proposal is as below, please reply if you have question or comments. Thanks. Pre discussion/Ideas collection:   20th Nov  9:00AM-10:00AM UTC 1st Session:  27th Nov 9:00AM-11:00AM UTC 2nd Session: 4th Dec 9:00AM-11:00AM UTC -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- From mriedemos at gmail.com Mon Nov 18 22:08:24 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 18 Nov 2019 16:08:24 -0600 Subject: [tc][stable] Changing stable branch policy In-Reply-To: References: Message-ID: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> On 11/18/2019 3:40 PM, Mohammed Naser wrote: > The proposal that I had was that in mind would be for us to let teams > self manage their own stable branches. I think we've reached a point > where we can trust most of our community to be familiar with the > stable branch policy (and let teams decide for themselves what they > believe is best for the success of their own projects). So for a project like nova that has a separate nova-core [1] and nova-stable-maint team [2] where some from [2] aren't in [1], what does this mean? Drop [2] and just rely on [1]? That won't work for those in nova-core that aren't familiar enough with the stable branch guidelines or simply don't care to review stable branch changes, and won't work for those that are in nova-stable-maint but not nova-core. [1] https://review.opendev.org/#/admin/groups/25,members [2] https://review.opendev.org/#/admin/groups/540,members -- Thanks, Matt From openstack at nemebean.com Mon Nov 18 22:35:31 2019 From: openstack at nemebean.com (Ben Nemec) Date: Mon, 18 Nov 2019 16:35:31 -0600 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> Message-ID: <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> On 11/18/19 4:08 PM, Matt Riedemann wrote: > On 11/18/2019 3:40 PM, Mohammed Naser wrote: >> The proposal that I had was that in mind would be for us to let teams >> self manage their own stable branches.  I think we've reached a point >> where we can trust most of our community to be familiar with the >> stable branch policy (and let teams decide for themselves what they >> believe is best for the success of their own projects). > > So for a project like nova that has a separate nova-core [1] and > nova-stable-maint team [2] where some from [2] aren't in [1], what does > this mean? Drop [2] and just rely on [1]? That won't work for those in > nova-core that aren't familiar enough with the stable branch guidelines > or simply don't care to review stable branch changes, and won't work for > those that are in nova-stable-maint but not nova-core. I believe the proposal is to allow the Nova team to manage nova-stable-maint in the same way they do nova-core, not to force anyone to drop their stable-maint team entirely. > > [1] https://review.opendev.org/#/admin/groups/25,members > [2] https://review.opendev.org/#/admin/groups/540,members > From mnaser at vexxhost.com Mon Nov 18 22:35:50 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 18 Nov 2019 17:35:50 -0500 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> Message-ID: On Mon, Nov 18, 2019 at 5:13 PM Matt Riedemann wrote: > > On 11/18/2019 3:40 PM, Mohammed Naser wrote: > > The proposal that I had was that in mind would be for us to let teams > > self manage their own stable branches. I think we've reached a point > > where we can trust most of our community to be familiar with the > > stable branch policy (and let teams decide for themselves what they > > believe is best for the success of their own projects). > > So for a project like nova that has a separate nova-core [1] and > nova-stable-maint team [2] where some from [2] aren't in [1], what does > this mean? Drop [2] and just rely on [1]? That won't work for those in > nova-core that aren't familiar enough with the stable branch guidelines > or simply don't care to review stable branch changes, and won't work for > those that are in nova-stable-maint but not nova-core. Thanks for bringing this up, I think we'll slowly iron those out. I think this can be a team-specific decision, we should have $project-stable-maint for every single project anyways, and the team could decide to put all of $project-core inside of it, or a select group of people. > [1] https://review.opendev.org/#/admin/groups/25,members > [2] https://review.opendev.org/#/admin/groups/540,members > > -- > > Thanks, > > Matt > From nate.johnston at redhat.com Mon Nov 18 23:01:06 2019 From: nate.johnston at redhat.com (Nate Johnston) Date: Mon, 18 Nov 2019 18:01:06 -0500 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> Message-ID: <20191118230106.gl4ctpftmndyzpbn@firewall> On Mon, Nov 18, 2019 at 04:08:24PM -0600, Matt Riedemann wrote: > On 11/18/2019 3:40 PM, Mohammed Naser wrote: > > The proposal that I had was that in mind would be for us to let teams > > self manage their own stable branches. I think we've reached a point > > where we can trust most of our community to be familiar with the > > stable branch policy (and let teams decide for themselves what they > > believe is best for the success of their own projects). > > So for a project like nova that has a separate nova-core [1] and > nova-stable-maint team [2] where some from [2] aren't in [1], what does this > mean? Drop [2] and just rely on [1]? That won't work for those in nova-core > that aren't familiar enough with the stable branch guidelines or simply > don't care to review stable branch changes, and won't work for those that > are in nova-stable-maint but not nova-core. I wouldn't think that anything would need to change about how Nova does things. If the Nova team wants to manage Nova stable branches using nova-stable-maint then this proposal absolutely supports that. The main change is removing stable-maint-core [3] from nove-stable-maint as stable-maint-core would presumably be dissolving as part of this change. Many teams already have a stable team [4]. For the ones that don't seem to (for example packaging-rpm, telemetry, monasca, or kuryr) it would make sense to make a $PROJECT-stable-maint and then leave it up to that project to either add $PROJECT-core to it or designate specific members to manage the stable branches. So in the end all the teams have the option to work like Nova does. Nate > [1] https://review.opendev.org/#/admin/groups/25,members > [2] https://review.opendev.org/#/admin/groups/540,members [3] https://review.opendev.org/#/admin/groups/530,members [4] https://review.opendev.org/#/admin/groups/?filter=stable From xingyongji at gmail.com Tue Nov 19 01:06:50 2019 From: xingyongji at gmail.com (yj x) Date: Tue, 19 Nov 2019 09:06:50 +0800 Subject: how to update back-end storage online Message-ID: Hi, I want to implement a qemu block driver which like qeme/block/iscsi.c. Here are my questions: some targets have be attached to guest system, and then the targets info have changed. How can I update the connection to targets online? I mean the disk in guest system does not change, just update the connection to the back-end storage, and the operation can't affect guest system. I can't find a nova-api or libvirt-api to meet my needs now. Does anyone help me? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zbitter at redhat.com Tue Nov 19 01:17:04 2019 From: zbitter at redhat.com (Zane Bitter) Date: Mon, 18 Nov 2019 17:17:04 -0800 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> Message-ID: <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> On 18/11/19 5:35 pm, Ben Nemec wrote: > > > On 11/18/19 4:08 PM, Matt Riedemann wrote: >> On 11/18/2019 3:40 PM, Mohammed Naser wrote: >>> The proposal that I had was that in mind would be for us to let teams >>> self manage their own stable branches.  I think we've reached a point >>> where we can trust most of our community to be familiar with the >>> stable branch policy (and let teams decide for themselves what they >>> believe is best for the success of their own projects). >> >> So for a project like nova that has a separate nova-core [1] and >> nova-stable-maint team [2] where some from [2] aren't in [1], what >> does this mean? Drop [2] and just rely on [1]? That won't work for >> those in nova-core that aren't familiar enough with the stable branch >> guidelines or simply don't care to review stable branch changes, and >> won't work for those that are in nova-stable-maint but not nova-core. > > I believe the proposal is to allow the Nova team to manage > nova-stable-maint in the same way they do nova-core, not to force anyone > to drop their stable-maint team entirely. I think the proposal was actually for each *-stable-maint team to manage itself. This would avoid the situation where e.g. the TC appoints a brand-new PTL and suddenly they get to make themselves a stable core, as in that case the team would still have to be bootstrapped by the stable-maint team. But it would allow those who are both closest to the project and confirmed to be familiar with the stable guidelines to make decisions about who else is ready to join that group. - ZB >> >> [1] https://review.opendev.org/#/admin/groups/25,members >> [2] https://review.opendev.org/#/admin/groups/540,members >> > From naohiro.sameshima at global.ntt Tue Nov 19 02:02:56 2019 From: naohiro.sameshima at global.ntt (=?utf-8?B?TmFvaGlybyBTYW1lc2hpbWHvvIjprqvls7Yg55u05rSL77yJKEdyb3VwKQ==?=) Date: Tue, 19 Nov 2019 02:02:56 +0000 Subject: [glance] glance_store tests failed Message-ID: Hi, When I run a test with `tox -e py37` in glance_store, two tests failed. The command what I ran is below. 1. git clone https://opendev.org/openstack/glance_store.git 2. tox -e py37 Is there something wrong with how to run test? Thanks & Best Regards, ============================== Failed 2 tests - output below: ============================== glance_store.tests.unit.test_filesystem_store.TestStore.test_add_check_metadata_list_with_valid_mountpoint_locations -------------------------------------------------------------------------------------------------------------------- Captured traceback: ~~~~~~~~~~~~~~~~~~~ b'Traceback (most recent call last):' b' File "/Users/sameshima/glance_store/glance_store/tests/unit/test_filesystem_store.py", line 215, in test_add_check_metadata_list_with_valid_mountpoint_locations' b' self.assertEqual(in_metadata[0], metadata)' b' File "/Users/sameshima/glance_store/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", line 411, in assertEqual' b' self.assertThat(observed, matcher, message)' b' File "/Users/sameshima/glance_store/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", line 498, in assertThat' b' raise mismatch_error' b"testtools.matchers._impl.MismatchError: {'id': 'abcdefg', 'mountpoint': '/tmp'} != {}" b'' glance_store.tests.unit.test_multistore_filesystem.TestMultiStore.test_add_check_metadata_list_with_valid_mountpoint_locations ------------------------------------------------------------------------------------------------------------------------------ Captured traceback: ~~~~~~~~~~~~~~~~~~~ b'Traceback (most recent call last):' b' File "/Users/sameshima/glance_store/glance_store/tests/unit/test_multistore_filesystem.py", line 276, in test_add_check_metadata_list_with_valid_mountpoint_locations' b' self.assertEqual(in_metadata[0], metadata)' b' File "/Users/sameshima/glance_store/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", line 411, in assertEqual' b' self.assertThat(observed, matcher, message)' b' File "/Users/sameshima/glance_store/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", line 498, in assertThat' b' raise mismatch_error' b"testtools.matchers._impl.MismatchError: {'id': 'abcdefg', 'mountpoint': '/tmp'} != {'store': 'file1'}" This email and all contents are subject to the following disclaimer: https://hello.global.ntt/en-us/email-disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Tue Nov 19 02:18:09 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 18 Nov 2019 20:18:09 -0600 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> Message-ID: <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> ---- On Mon, 18 Nov 2019 19:17:04 -0600 Zane Bitter wrote ---- > On 18/11/19 5:35 pm, Ben Nemec wrote: > > > > > > On 11/18/19 4:08 PM, Matt Riedemann wrote: > >> On 11/18/2019 3:40 PM, Mohammed Naser wrote: > >>> The proposal that I had was that in mind would be for us to let teams > >>> self manage their own stable branches. I think we've reached a point > >>> where we can trust most of our community to be familiar with the > >>> stable branch policy (and let teams decide for themselves what they > >>> believe is best for the success of their own projects). > >> > >> So for a project like nova that has a separate nova-core [1] and > >> nova-stable-maint team [2] where some from [2] aren't in [1], what > >> does this mean? Drop [2] and just rely on [1]? That won't work for > >> those in nova-core that aren't familiar enough with the stable branch > >> guidelines or simply don't care to review stable branch changes, and > >> won't work for those that are in nova-stable-maint but not nova-core. > > > > I believe the proposal is to allow the Nova team to manage > > nova-stable-maint in the same way they do nova-core, not to force anyone > > to drop their stable-maint team entirely. > > I think the proposal was actually for each *-stable-maint team to manage > itself. This would avoid the situation where e.g. the TC appoints a > brand-new PTL and suddenly they get to make themselves a stable core, as > in that case the team would still have to be bootstrapped by the > stable-maint team. But it would allow those who are both closest to the > project and confirmed to be familiar with the stable guidelines to make > decisions about who else is ready to join that group. I am still finding difficult to understand the change and how it will solve the current problem. The current problem is: * Fewer contributors in the stable-maintenance team (core stable team and project side stable team) which is nothing but we have fewer contributors who understand the stable policies. * The stable policies are not the problem so we will stick with current stable policies across all the projects. Stable policies have to be maintained at single place for consistency in backports across projects. If we are moving the stable maintenance team ownership from current stable-maintenance team to project side then, how it will solve the issue, does it enable more contributors to understand the stable policy and extend the team? if yes, then why it cannot happen with current model? If the project team or PTL making its core member get more familiar with the stable policy and add as a stable core team then why it cannot happen with the current model. For example, if I am PTL or core of any project and finding hard to get my backport merged then I or my project team core should review more stable branch patches and propose them in stable team core. If we move the stable team ownership to the projects side then I think PTL is going to do the same. Ask the team members to understand the stable policies and do more review and then add them in stable core team. If any member know the stable policies then directly add. I feel that the current problem cannot be solved by moving the ownership of the team, we need to encourage more and more developers to become stable core in existing model especially from projects who find difficulties in merging their backport. One more thing, do we have data that how much time as avg it take to merge the backport and what all projects facing the backport merge issue ? -gmann > > - ZB > > >> > >> [1] https://review.opendev.org/#/admin/groups/25,members > >> [2] https://review.opendev.org/#/admin/groups/540,members > >> > > > > > From anlin.kong at gmail.com Tue Nov 19 07:53:37 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Tue, 19 Nov 2019 20:53:37 +1300 Subject: Trove Image issue In-Reply-To: <008d01d59b70$542067f0$fc6137d0$@xflowresearch.com> References: <008d01d59b70$542067f0$fc6137d0$@xflowresearch.com> Message-ID: On Sat, Nov 16, 2019 at 4:19 AM wrote: > > > Hi Team, > > > > I have deployed kolla-ansible(Stein) with Trove enable and using Trove > prebuild images to build the Database VM but VM is constantly stuck in > build state. I am not sure how to use prebuild image key which is in > https://opendev.org/openstack/trove/src/branch/master/integration/scripts/files/keys folder. > > This key file is not used any more. The Nova keypair used for creating trove instance is configured in 'nova_keypair' config option. > Can you please guide me regarding the key usage for prebuild images ( > http://tarballs.openstack.org/trove/images/) > > > > To build my own trove image on kolla-ansible, is there any specific guide > available for it? > For how to build trove guest image, please refer to the official trove doc https://docs.openstack.org/trove/latest/admin/building_guest_images.html - Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Tue Nov 19 08:46:39 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 19 Nov 2019 09:46:39 +0100 Subject: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained Message-ID: Hello Folks, It looks like gnocchi is "officially" marked as unmaintained: https://github.com/gnocchixyz/gnocchi/issues/1049 Has there been any discussion regarding how it affects OpenStack projects? And/or are there any plans to amend this situation? -yoctozepto -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Tue Nov 19 09:03:19 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Tue, 19 Nov 2019 22:03:19 +1300 Subject: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: Message-ID: We (ceilometer team) will probably add Ceilometer API and mongodb support back, considering the current Gnocchi project situation. However, Gnocchi will still be supported as a publisher in Ceilometer. - Best regards, Lingxian Kong Catalyst Cloud On Tue, Nov 19, 2019 at 9:54 PM Radosław Piliszek < radoslaw.piliszek at gmail.com> wrote: > Hello Folks, > > It looks like gnocchi is "officially" marked as unmaintained: > https://github.com/gnocchixyz/gnocchi/issues/1049 > > Has there been any discussion regarding how it affects OpenStack projects? > And/or are there any plans to amend this situation? > > -yoctozepto > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From merlin.blom at bertelsmann.de Tue Nov 19 09:15:56 2019 From: merlin.blom at bertelsmann.de (Blom, Merlin, NMU-OI) Date: Tue, 19 Nov 2019 09:15:56 +0000 Subject: AW: [RabbitMQ][cinder] Listen to messages In-Reply-To: References: Message-ID: Thank you for your answer! “Are you sure cinder use a dedicated vhost? I'm notconviced, if I'm right they all use the default vhost '/'.” Indeed it does, when deployed with openstack-ansible. But I found a way to find the exchange with RMQ Tracing: https://www.rabbitmq.com/firehose.html Using the GUI plugin I’ve got all the messages flowing through the cinder vhost and found: The exchange for the notification.* queues is “openstack” not cinder. Sometimes I ask myself if there are any kind of standards for RMQ Communication. :P This may be interesting for someone else to use in their projects. Cheers, Merlin Von: Herve Beraud Gesendet: Freitag, 15. November 2019 11:14 An: Blom, Merlin, NMU-OI Cc: openstack-discuss at lists.openstack.org Betreff: Re: [RabbitMQ][cinder] Listen to messages Le ven. 15 nov. 2019 à 10:17, Blom, Merlin, NMU-OI > a écrit : Hey there, it seems to me as if ask.openstack.org is down, so I ask my question here: I’d like to listen to oslo messages from cinder as I do for neutron and octavia to know what is going on. For me the following code worked for neutron: EXCHANGE_NAME = os.getenv('EXCHANGE_NAME', 'neutron') ROUTING_KEY = os.getenv('ROUTING_KEY', 'notifications.info ') QUEUE_NAME = os.getenv('QUEUE_NAME', 'messaging_queue') BROKER_URI = os.getenv('BROKER_URI', 'UNDEFINED') BROKER_PASSWORD = os.getenv('BROKER_PASSWORD', '') class Messages(ConsumerMixin): def __init__(self, connection): self.connection = connection return def get_consumers(self, consumer, channel): exchange = Exchange(EXCHANGE_NAME, type="topic", durable=False) queue = Queue(QUEUE_NAME, exchange, routing_key=ROUTING_KEY, durable=False, auto_delete=True, no_ack=True) return [consumer(queues=[queue], callbacks=[self.on_message])] def on_message(self, body, message): try: print(message) except Exception as e: log.info (repr(e)) if __name__ == "__main__": log.info ("Connecting to broker {}".format(BROKER_URI)) with BrokerConnection(hostname=BROKER_URI, userid='messaging', password=BROKER_PASSWORD, virtual_host='/'+EXCHANGE_NAME, heartbeat=4, failover_strategy='round-robin') as connection: Messaging(connection).run() BrokerConnection.connection.close() But on the cinder vhost (/cinder) Are you sure cinder use a dedicated vhost? I'm notconviced, if I'm right they all use the default vhost '/'. I can’t find an exchange that the code is working on. (cinder, cinder-backup, …) I tried using the rabbitmq tracer: https://www.rabbitmq.com/firehose.html And got all the cinder messages but I don’t want to use it in production because of performance issues. Does anyone have an idea how to find the correct exchange for the notification info queue in cinder? Cheers, Merlin -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5195 bytes Desc: not available URL: From merlin.blom at bertelsmann.de Tue Nov 19 09:25:11 2019 From: merlin.blom at bertelsmann.de (Blom, Merlin, NMU-OI) Date: Tue, 19 Nov 2019 09:25:11 +0000 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: Message-ID: Thanks for your work on ceilometer! The gnocchi situation is realy sad. We implemented solutions on Gnocchi and ceilometer. In my opinion you abandoned the mongodb support for performance reasons and now you are going back to it? Has mongodb made any significant performance improvements for time series data? Best regards, Merlin Von: Lingxian Kong Gesendet: Dienstag, 19. November 2019 10:03 An: Radosław Piliszek Cc: openstack-discuss Betreff: Re: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained We (ceilometer team) will probably add Ceilometer API and mongodb support back, considering the current Gnocchi project situation. However, Gnocchi will still be supported as a publisher in Ceilometer. - Best regards, Lingxian Kong Catalyst Cloud On Tue, Nov 19, 2019 at 9:54 PM Radosław Piliszek > wrote: Hello Folks, It looks like gnocchi is "officially" marked as unmaintained: https://github.com/gnocchixyz/gnocchi/issues/1049 Has there been any discussion regarding how it affects OpenStack projects? And/or are there any plans to amend this situation? -yoctozepto -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5195 bytes Desc: not available URL: From aj at suse.com Tue Nov 19 09:40:48 2019 From: aj at suse.com (Andreas Jaeger) Date: Tue, 19 Nov 2019 10:40:48 +0100 Subject: On next minimum libvirt / QEMU versions for "V" release In-Reply-To: <20191118181106.GD7032@paraplu> References: <20191118181106.GD7032@paraplu> Message-ID: <789413eb-3fde-0283-9ddb-c356879c749d@suse.com> On 18/11/2019 19.11, Kashyap Chamarthy wrote: > Heya, > > The last time we incremented versions for libvirt and QEMU was during > the "Stein" release[1]. For "Train" we didn't do any. Although we > advertized NEXT_MIN_{LIBVIRT,QEMU} versions for "Train" release to be > libvirt 4.0.0 and QEMU 2.11.0, but we actually didn't bump; we'll do > that for "Ussuri". > > But before we bump the versions for "Ussuri", we need to pick NEXT_MIN > versions for the "V" release. Based on the updated the > DistroSupportMatrix page[2], it looks we can pick the next libvirt and > QEMU versions for "V" release to the following: > > libvirt: 5.0.0 [GAed on: 15-Jan-2019] > QEMU: 4.0.0 [GAed on: 24-Apr-2019] > > I have the initial patch here[3] for comments. > > Debian, Fedora, Ubuntu[4], CentOS, RHEL currently already ship the above > versions (actually, higher than those). And it is reasonable to assume > -- but let's confirm below -- that openSUSE, SLES, and Oracle Linux > would also have the above versions available by "V" release time. > > Action Items for Linux Distros > ------------------------------ > > (a) Oracle Linux: Please update your libvirt/QEMU versions for Oracle > Linux 8? > > I couldn't find anything related to libvirt/QEMU here: > https://yum.oracle.com/oracle-linux-8.html. (My educated guess is: > the versions roughly match what's in CentOS/RHEL.) > > (b) openSUSE and SLES: Same request as above. > > Andreas Jaegaer said on #openstack-infra that the proposed versions > for 'V' release should be fine for SLES. (And by extension open > SUSE, I assume.) Yes, those look fine for SLES and openSUSE, Andreas > - - - > > Assuming Oracle Linux and SLES confirm, please let us know if there are > any objections if we pick NEXT_MIN_* versions for the OpenStack "V" > release to be libvirt: 5.0.0 and QEMU: 4.0.0. > > Comments / alternative proposals welcome :-) > > > [1] https://opendev.org/openstack/nova/commit/489b5f762e -- Pick next > minimum libvirt / QEMU versions for "T" release, 2018-09-25) > [2] https://wiki.openstack.org/wiki/LibvirtDistroSupportMatrix > [3] https://review.opendev.org/694821 -- [RFC] Pick NEXT_MIN > libvirt/QEMU versions for "V" release > [4] For Ubuntu, I updated the versions based on what is in the Cloud > Archive repo for "Bionic" (the LTS) release: > http://reqorts.qa.ubuntu.com/reports/ubuntu-server/cloud-archive/train_versions.html > > -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From tobias.urdin at binero.se Tue Nov 19 09:51:34 2019 From: tobias.urdin at binero.se (Tobias Urdin) Date: Tue, 19 Nov 2019 10:51:34 +0100 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: References: Message-ID: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> It sure is, we as well abandoned the MongoDB backend for Gnocchi which works pretty well. Would be a shame if a migration back would be required, maybe we can get a discussion going on a more long-term solution as was discussed when talking about the future of Ceilometer. Supporting Gnocchi or moving to another open source project as a storage backend that is stable and maintained. There were (and still is? Though unofficial out-of-tree) storage backends for Ceilometer that publishes to InfluxDB. I were never able to follow-up on the meetings (I probably missed a lot of it) regarding the Ceilometer roadmap [1]. [1] https://etherpad.openstack.org/p/telemetry-train-roadmap On 11/19/19 10:29 AM, Blom, Merlin, NMU-OI wrote: > > Thanks for your work on ceilometer! > > The gnocchi situation is realy sad. > > We implemented solutions on Gnocchi and ceilometer. > > In my opinion you abandoned the mongodb support for performance > reasons and now you are going back to it? > > Has mongodb made any significant performance improvements for time > series data? > > Best regards, > > Merlin > > *Von:*Lingxian Kong > *Gesendet:* Dienstag, 19. November 2019 10:03 > *An:* Radosław Piliszek > *Cc:* openstack-discuss > *Betreff:* Re: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi > unmaintained > > We (ceilometer team) will probably add Ceilometer API and mongodb > support back, considering the current Gnocchi project situation. > However, Gnocchi will still be supported as a publisher in Ceilometer. > > - > > Best regards, > Lingxian Kong > > Catalyst Cloud > > On Tue, Nov 19, 2019 at 9:54 PM Radosław Piliszek > > wrote: > > Hello Folks, > > It looks like gnocchi is "officially" marked as unmaintained: > https://github.com/gnocchixyz/gnocchi/issues/1049 > > > Has there been any discussion regarding how it affects OpenStack > projects? And/or are there any plans to amend this situation? > > -yoctozepto > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luka.peschke at objectif-libre.com Tue Nov 19 10:20:45 2019 From: luka.peschke at objectif-libre.com (Luka Peschke) Date: Tue, 19 Nov 2019 11:20:45 +0100 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> Message-ID: <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> My two cents from my experience on cloudkitty: We had to implement several storage drivers, and faced more or less the same issues as the telemetry team did before us. We had a gnocchi driver at some point, which worked pretty well, but ended up being very hacky because gnocchi lacked flexibility for non-openstack metrics (ie. data models which aren't resource-based). We ended up implementing a driver for InfluxDB which has relatively good perfs. But given that the open-source version of InfluxDB does not support HA/clustering, we also implemented an experimental Elasticsearch driver (which requires ES>=6.5). The recent ES releases have really improved the support for timeseries, and it is the storage backend for elastic beats. Given that many openstack deployments already have an Elasticsearch deployment for logs, and the large adoption of ES, it'd be my choice for a new Ceilometer storage driver. However, Gnocchi is pretty stable in 4.3, and well integrated with Ceilometer. Wouldn't it be less effort to keep it functional for now (ie only bug/security fixes, no new features), instead of re-integrating deleted features to ceilometer ? Cheers, -- Luka Peschke (peschk_l) Le 2019-11-19 10:51, Tobias Urdin a écrit : > It sure is, we as well abandoned the MongoDB backend for Gnocchi > which works pretty well. > > Would be a shame if a migration back would be required, maybe we can > get a discussion going on a more > long-term solution as was discussed when talking about the future of > Ceilometer. > > Supporting Gnocchi or moving to another open source project as a > storage backend that is stable and maintained. > There were (and still is? Though unofficial out-of-tree) storage > backends for Ceilometer that publishes to InfluxDB. > > I were never able to follow-up on the meetings (I probably missed a > lot of it) regarding the Ceilometer roadmap [1]. > > [1] https://etherpad.openstack.org/p/telemetry-train-roadmap > > On 11/19/19 10:29 AM, Blom, Merlin, NMU-OI wrote: > >> Thanks for your work on ceilometer! >> >> The gnocchi situation is realy sad. >> >> We implemented solutions on Gnocchi and ceilometer. >> >> In my opinion you abandoned the mongodb support for performance >> reasons and now you are going back to it? >> >> Has mongodb made any significant performance improvements for time >> series data? >> >> Best regards, >> >> Merlin >> >> VON: Lingxian Kong >> GESENDET: Dienstag, 19. November 2019 10:03 >> AN: Radosław Piliszek >> CC: openstack-discuss >> BETREFF: Re: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi >> unmaintained >> >> We (ceilometer team) will probably add Ceilometer API and mongodb >> support back, considering the current Gnocchi project situation. >> However, Gnocchi will still be supported as a publisher in Ceilometer. >> >> - >> >> Best regards, >> Lingxian Kong >> >> Catalyst Cloud >> >> On Tue, Nov 19, 2019 at 9:54 PM Radosław Piliszek >> wrote: >> >>> Hello Folks, >>> >>> It looks like gnocchi is "officially" marked as unmaintained: >>> https://github.com/gnocchixyz/gnocchi/issues/1049 [1] >>> >>> Has there been any discussion regarding how it affects OpenStack >>> projects? And/or are there any plans to amend this situation? >>> >>> -yoctozepto > > > > Links: > ------ > [1] > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_gnocchixyz_gnocchi_issues_1049&d=DwMFaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=hTUN4-Trlb-8Fh11dR6m5VD1uYA15z7v9WL8kYigkr8&m=czRC3qwwRqT3qKzfXMSVl78G4Sk8QVwT93okCgkBe34&s=Ob7yLjlWUAz9-8oMikC_QU9ivZBvtBKkqqFEvceGGM0&e= From skaplons at redhat.com Tue Nov 19 10:26:15 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 19 Nov 2019 11:26:15 +0100 Subject: [all][neutron][neutron-fwaas] Maintainers needed Message-ID: <20191119102615.oq46xojyhoybulna@skaplons-mac> Hi, Over the past couple of cycles we have noticed that new contributions and maintenance efforts for neutron-fwaas project were almost non existent. This impacts patches for bug fixes, new features and reviews. The Neutron core team is trying to at least keep the CI of this project healthy, but we don’t have enough knowledge about the details of the neutron-fwaas code base to review more complex patches. During the PTG in Shanghai we discussed that with operators and TC members during the forum session [1] and later within the Neutron team during the PTG session [2]. During these discussions, with the help of operators and TC members, we reached the conclusion that we need to have someone responsible for maintaining project. This doesn’t mean that the maintainer needs to spend full time working on this project. Rather, we need someone to be the contact person for the project, who takes care of the project’s CI and review patches. Of course that’s only a minimal requirement. If the new maintainer works on new features for the project, it’s even better :) If we don’t have any new maintainer(s) before milestone Ussuri-2, which is Feb 10 - Feb 14 according to [3], we will need to mark neutron-fwaas as deprecated and in “V” cycle we will propose to move the project from the Neutron stadium, hosted in the “openstack/“ namespace, to the unofficial projects hosted in the “x/“ namespace. So if You are using this project now, or if You have customers who are using it, please consider the possibility of maintaining it. Otherwise, please be aware that it is highly possible that the project will be deprecated and moved out from the official OpenStack projects. [1] https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored - Lines 379-421 [3] https://releases.openstack.org/ussuri/schedule.html -- Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Tue Nov 19 10:29:18 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 19 Nov 2019 11:29:18 +0100 Subject: [all][neutron][networking-bagpipe][networking-bgpvpn] Maintainers needed Message-ID: <20191119102918.b5cmfecqjf746bqi@skaplons-mac> Hi, Over the past couple of cycles we have noticed that new contributions and maintenance efforts for networking-bagpipe and networking-bgpvpn were almost non existent. This impacts patches for bug fixes, new features and reviews. The Neutron core team is trying to at least keep the CI of this project healthy, but we don’t have enough knowledge about the details of the code base to review more complex patches. During the PTG in Shanghai we discussed that with operators and TC members during the forum session [1] and later within the Neutron team during the PTG session [2]. During these discussions, with the help of operators and TC members, we reached the conclusion that we need to have someone responsible for maintaining those projects. This doesn’t mean that the maintainer needs to spend full time working on those projects. Rather, we need someone to be the contact person for the project, who takes care of the project’s CI and review patches. Of course that’s only a minimal requirement. If the new maintainer works on new features for the project, it’s even better :) If we don’t have any new maintainer(s) before milestone Ussuri-2, which is Feb 10 - Feb 14 according to [3], we will need to mark networking-bgpvpn and networking-bagpipe as deprecated and in “V” cycle we will propose to move the projects from the Neutron stadium, hosted in the “openstack/“ namespace, to the unofficial projects hosted in the “x/“ namespace. So if You are using this project now, or if You have customers who are using it, please consider the possibility of maintaining it. Otherwise, please be aware that it is highly possible that the project will be deprecated and moved out from the official OpenStack projects. [1] https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored - Lines 379-421 [3] https://releases.openstack.org/ussuri/schedule.html -- Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Tue Nov 19 10:41:37 2019 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 19 Nov 2019 11:41:37 +0100 Subject: [all][neutron][neutron-vpnaas] Maintainers needed Message-ID: <20191119104137.pkra6hehfhdjjhh3@skaplons-mac> Hi, Over the past couple of cycles we have noticed that new contributions and maintenance efforts for neutron-vpnaas were almost non existent. This impacts patches for bug fixes, new features and reviews. The Neutron core team is trying to at least keep the CI of this project healthy, but we don’t have enough knowledge about the details of the neutron-vpnaas code base to review more complex patches. During the PTG in Shanghai we discussed that with operators and TC members during the forum session [1] and later within the Neutron team during the PTG session [2]. During these discussions, with the help of operators and TC members, we reached the conclusion that we need to have someone responsible for maintaining project. This doesn’t mean that the maintainer needs to spend full time working on this project. Rather, we need someone to be the contact person for the project, who takes care of the project’s CI and review patches. Of course that’s only a minimal requirement. If the new maintainer works on new features for the project, it’s even better :) If we don’t have any new maintainer(s) before milestone Ussuri-2, which is Feb 10 - Feb 14 according to [3], we will need to mark neutron-vpnaas as deprecated and in “V” cycle we will propose to move the project from the Neutron stadium, hosted in the “openstack/“ namespace, to the unofficial projects hosted in the “x/“ namespace. So if You are using this project now, or if You have customers who are using it, please consider the possibility of maintaining it. Otherwise, please be aware that it is highly possible that the project will be deprecated and moved out from the official OpenStack projects. [1] https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored - Lines 379-421 [3] https://releases.openstack.org/ussuri/schedule.html -- Slawek Kaplonski Senior software engineer Red Hat From aaronzhu1121 at gmail.com Tue Nov 19 10:56:24 2019 From: aaronzhu1121 at gmail.com (Rong Zhu) Date: Tue, 19 Nov 2019 18:56:24 +0800 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> <64cdeb4aa46955c22711d7880a1848ba@objectif-libre.com> Message-ID: I am sorry to the telemetry project happened before, but now current telemetry core team had decided to add ceilometer api and mongodb support and cpu_utils support back. Gnoochi will still support as the backed. All the mentioned database (influxdb, ES....), we would happy everyone to submit patches to support as the database backed in ceilometer. I created a storyboard to track ceilometer Ussuri release todo things in [0]. Free free to add things you want to do in Ussuri release. Due to I will have a vacation this week, I can't hold this week's meeting, we can discuss more in the next irc meeting at 5 Dec 2:00 UTC. [0] https://storyboard.openstack.org/#!/board/205 Luka Peschke 于2019年11月19日 周二18:24写道: > My two cents from my experience on cloudkitty: We had to implement > several storage drivers, and faced more or less the same issues as the > telemetry team did before us. We had a gnocchi driver at some point, > which worked pretty well, but ended up being very hacky because gnocchi > lacked flexibility for non-openstack metrics (ie. data models which > aren't resource-based). > > We ended up implementing a driver for InfluxDB which has relatively > good perfs. But given that the open-source version of InfluxDB does not > support HA/clustering, we also implemented an experimental Elasticsearch > driver (which requires ES>=6.5). > > The recent ES releases have really improved the support for timeseries, > and it is the storage backend for elastic beats. > > Given that many openstack deployments already have an Elasticsearch > deployment for logs, and the large adoption of ES, it'd be my choice for > a new Ceilometer storage driver. > > However, Gnocchi is pretty stable in 4.3, and well integrated with > Ceilometer. Wouldn't it be less effort to keep it functional for now (ie > only bug/security fixes, no new features), instead of re-integrating > deleted features to ceilometer ? > > Cheers, > > -- > Luka Peschke (peschk_l) > > Le 2019-11-19 10:51, Tobias Urdin a écrit : > > It sure is, we as well abandoned the MongoDB backend for Gnocchi > > which works pretty well. > > > > Would be a shame if a migration back would be required, maybe we can > > get a discussion going on a more > > long-term solution as was discussed when talking about the future of > > Ceilometer. > > > > Supporting Gnocchi or moving to another open source project as a > > storage backend that is stable and maintained. > > There were (and still is? Though unofficial out-of-tree) storage > > backends for Ceilometer that publishes to InfluxDB. > > > > I were never able to follow-up on the meetings (I probably missed a > > lot of it) regarding the Ceilometer roadmap [1]. > > > > [1] https://etherpad.openstack.org/p/telemetry-train-roadmap > > > > On 11/19/19 10:29 AM, Blom, Merlin, NMU-OI wrote: > > > >> Thanks for your work on ceilometer! > >> > >> The gnocchi situation is realy sad. > >> > >> We implemented solutions on Gnocchi and ceilometer. > >> > >> In my opinion you abandoned the mongodb support for performance > >> reasons and now you are going back to it? > >> > >> Has mongodb made any significant performance improvements for time > >> series data? > >> > >> Best regards, > >> > >> Merlin > >> > >> VON: Lingxian Kong > >> GESENDET: Dienstag, 19. November 2019 10:03 > >> AN: Radosław Piliszek > >> CC: openstack-discuss > >> BETREFF: Re: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi > >> unmaintained > >> > >> We (ceilometer team) will probably add Ceilometer API and mongodb > >> support back, considering the current Gnocchi project situation. > >> However, Gnocchi will still be supported as a publisher in Ceilometer. > >> > >> - > >> > >> Best regards, > >> Lingxian Kong > >> > >> Catalyst Cloud > >> > >> On Tue, Nov 19, 2019 at 9:54 PM Radosław Piliszek > >> wrote: > >> > >>> Hello Folks, > >>> > >>> It looks like gnocchi is "officially" marked as unmaintained: > >>> https://github.com/gnocchixyz/gnocchi/issues/1049 [1] > >>> > >>> Has there been any discussion regarding how it affects OpenStack > >>> projects? And/or are there any plans to amend this situation? > >>> > >>> -yoctozepto > > > > > > > > Links: > > ------ > > [1] > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_gnocchixyz_gnocchi_issues_1049&d=DwMFaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=hTUN4-Trlb-8Fh11dR6m5VD1uYA15z7v9WL8kYigkr8&m=czRC3qwwRqT3qKzfXMSVl78G4Sk8QVwT93okCgkBe34&s=Ob7yLjlWUAz9-8oMikC_QU9ivZBvtBKkqqFEvceGGM0&e= > > -- Thanks, Rong Zhu -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Tue Nov 19 11:05:28 2019 From: thierry at openstack.org (Thierry Carrez) Date: Tue, 19 Nov 2019 12:05:28 +0100 Subject: [tc][stable] Changing stable branch policy In-Reply-To: <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> References: <8a4ad3b7-45ef-86f8-5903-d125dda2ffde@gmail.com> <999afd98-a032-b6ec-d143-6b89a7b99945@nemebean.com> <1a8cfdda-3441-afc4-7512-2a16503c7db4@redhat.com> <16e8173ef54.bfc4124819624.5055242714327612860@ghanshyammann.com> Message-ID: <80b9c92c-be69-7c96-291a-702a7a8c6498@openstack.org> Ghanshyam Mann wrote: > [...] > I am still finding difficult to understand the change and how it will solve the current problem. > > The current problem is: > * Fewer contributors in the stable-maintenance team (core stable team and project side stable team) > which is nothing but we have fewer contributors who understand the stable policies. > > * The stable policies are not the problem so we will stick with current stable policies across all the projects. > Stable policies have to be maintained at single place for consistency in backports across projects. > [...] I don't think that this the problem this change wants to solve. Currently the stable-core team is perceived as a bottleneck to getting more people into project-specific stable teams, or keeping those teams membership up to date. As a result stable maintenance is still seen in some teams as an alien thing, rather than an integral team duty. I suspect that by getting out of the badge-granting game, stable-core could focus more on stable policy definition and education, and review how well or bad each team does on the stable front. Because reviewing backports for stable branch suitability is just one part of doing stable branch right -- the other is to actively backport relevant patches. Personally, the main reason I support this change is that we have too much "ask for permission" things in OpenStack today, something that was driven by a code-review-for-everything culture. So the more we can remove the need to ask for permission to do some work, the better. -- Thierry Carrez (ttx) From witold.bedyk at suse.com Tue Nov 19 11:36:06 2019 From: witold.bedyk at suse.com (Witek Bedyk) Date: Tue, 19 Nov 2019 12:36:06 +0100 Subject: AW: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi unmaintained In-Reply-To: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> References: <54efd463-6af5-3f4e-23db-90a6208d253c@binero.se> Message-ID: <45ed8eb5-1c90-b3f8-5c29-1cb319fd5f5b@suse.com> Another approach could be to use Monasca as the back end. The publisher has been recently added upstream [1]. It uses InfluxDB as the time series DB. The message queue between the API and TSDB adds resiliency and allows setting up InfluxDB in HA. What you get on top is a generic, multi-tenant monitoring solution. Cloud users can install their own agents, push own application metrics and set up own alerting per project. Support for auto-scaling with Heat templates is included. Greetings Witek [1] https://docs.openstack.org/ceilometer/latest/admin/telemetry-system-architecture.html#supported-databases On 11/19/19 10:51 AM, Tobias Urdin wrote: > It sure is, we as well abandoned the MongoDB backend for Gnocchi which > works pretty well. > > Would be a shame if a migration back would be required, maybe we can get > a discussion going on a more > long-term solution as was discussed when talking about the future of > Ceilometer. > > Supporting Gnocchi or moving to another open source project as a storage > backend that is stable and maintained. > There were (and still is? Though unofficial out-of-tree) storage > backends for Ceilometer that publishes to InfluxDB. > > I were never able to follow-up on the meetings (I probably missed a lot > of it) regarding the Ceilometer roadmap [1]. > > [1] https://etherpad.openstack.org/p/telemetry-train-roadmap > > On 11/19/19 10:29 AM, Blom, Merlin, NMU-OI wrote: >> >> Thanks for your work on ceilometer! >> >> The gnocchi situation is realy sad. >> >> We implemented solutions on Gnocchi and ceilometer. >> >> In my opinion you abandoned the mongodb support for performance >> reasons and now you are going back to it? >> >> Has mongodb made any significant performance improvements for time >> series data? >> >> Best regards, >> >> Merlin >> >> *Von:*Lingxian Kong >> *Gesendet:* Dienstag, 19. November 2019 10:03 >> *An:* Radosław Piliszek >> *Cc:* openstack-discuss >> *Betreff:* Re: [gnocchi][telemetry][ceilometer][cloudkitty] Gnocchi >> unmaintained >> >> We (ceilometer team) will probably add Ceilometer API and mongodb >> support back, considering the current Gnocchi project situation. >> However, Gnocchi will still be supported as a publisher in Ceilometer. >> >> - >> >> Best regards, >> Lingxian Kong >> >> Catalyst Cloud >> >> On Tue, Nov 19, 2019 at 9:54 PM Radosław Piliszek >> > wrote: >> >> Hello Folks, >> >> It looks like gnocchi is "officially" marked as unmaintained: >> https://github.com/gnocchixyz/gnocchi/issues/1049 >> >> >> Has there been any discussion regarding how it affects OpenStack >> projects? And/or are there any plans to amend this situation? >> >> -yoctozepto >> > From romain at ledisez.net Tue Nov 19 13:18:24 2019 From: romain at ledisez.net (Romain LE DISEZ) Date: Tue, 19 Nov 2019 14:18:24 +0100 Subject: =?utf-8?q?Re=3A?==?utf-8?q?_AW=3A?= =?utf-8?q?_=5Bgnocchi=5D=5Btelemetry=5D=5Bceilometer=5D=5Bcloudkitty=5D?= Gnocchi unmaintained In-Reply-To: Message-ID: Hi, at OVH, we kept the mongodb backend (understand: we are currently running an old version of ceilometer-collector). But we modified it to implement real-time aggregation so that we can then get the interresting values immediatly instead of running long calculations when we need them (we use Ceilometer for billing). To do that, mongodb provides some operators such as $inc and $max: https://docs.mongodb.com/manual/reference/operator/update-field/ This implementation scales well, we currently handle more than 20 000 mongodb updates per seconds without problems. (The issue is actually ceilometer-collector consuming too many CPU, forcing us to scale the number of servers to handle the load) -- Romain LE DISEZ From deepa.kr at fingent.com Tue Nov 19 05:39:40 2019 From: deepa.kr at fingent.com (Deepa) Date: Tue, 19 Nov 2019 11:09:40 +0530 Subject: Freezer Project Update In-Reply-To: <14B06106-F3AB-47DA-BD35-99B18383C147@dkrz.de> References: <003201d59aaf$cf415fa0$6dc41ee0$@fingent.com> <14B06106-F3AB-47DA-BD35-99B18383C147@dkrz.de> Message-ID: <001d01d59e9b$bf07b310$3d171930$@fingent.com> Hello Amjad Thanks a lot for the reply. It will great if you can share me link of the document you followed to install it ,also were you able to incorporate Freezer in dashboard. Regards, Deepa K R From: Amjad Kotobi Sent: Saturday, November 16, 2019 1:12 AM To: Deepa Cc: openstack-dev at lists.openstack.org Subject: Re: Freezer Project Update Hi, This project is pretty much in production state, from last summit it got active again from developer ends, we are using it for backup solution too. Documentation side isn’t that bright, very soon gonna get updated, anyhow you are able to install as standalone project in instance, I did it manually, didn’t use any provision tools. Let me know for specific part of deployment that is not clear. Amjad On 14. Nov 2019, at 06:53, Deepa > wrote: Hello Team Good Day I am Deepa from Fingent Global Solutions and we are a big fan of Openstack and we do have 4 + openstack setup (including production) We have deployed Openstack using juju and Maas .So when we check for backup feasibility other than cinder-backup we were able to see Freezer Project. But couldn’t find any charms for it in juju charms. Also there isn’t a clear documentation on how to install freezer . https://docs.openstack.org/releasenotes/freezer/train.html. No proper release notes in the latest version as well. Can you please tell me whether this project is in developing state? Whether charms will be added to juju in future. Can you also share a proper documentation on how to install Freezer in cluster setup. Thanks for your help. Regards, Deepa K R -------------- next part -------------- An HTML attachment was scrubbed... URL: From nsitlani03 at gmail.com Tue Nov 19 16:06:50 2019 From: nsitlani03 at gmail.com (Namrata Sitlani) Date: Tue, 19 Nov 2019 21:36:50 +0530 Subject: [magnum] Kubernetes cluster issue Message-ID: Hello Folks, >From Thursday last week (Nov 14), Magnum is unable to spin up working Kubernetes clusters. We run on Rocky Openstack release. All our Kubernetes pods show CrashLoopBackOff status. We use the following commands to create the Kubernetes clusters : http://paste.openstack.org/show/786348/ . The deployment fails with the following output : http://paste.openstack.org/show/786287/. We tried deploying Kubernetes v1.13.12, v1.14.8, v1.15.5 and v1.16.2 without success. However, if we use version v1.14.6 we can successfully deploy our clusters. Unfortunately, we cannot use v1.14.6 in production because it is not patched for the CVE-2019-11253 vulnerability . Since this stopped working for us on Thursday, we think that the image update https://hub.docker.com/r/openstackmagnum/kubernetes-apiserver/tags done 6 days ago is the culprit. (We previously deployed clusters with versions v1.15.5 and v1.14.8 successfully.) Can you please confirm our findings and help us find a way forward? We can provide more logs if needed. Thank you very much, Namrata Sitlani -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Tue Nov 19 16:31:51 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 19 Nov 2019 10:31:51 -0600 Subject: [all][tc] Planning for dropping the Python2 support in OpenStack In-Reply-To: <16e1d9f5df9.e9dfc74911451.4806654031763681992@ghanshyammann.com> References: <16dd0a42b8d.e847dd3e124645.6364180516762707559@ghanshyammann.com> <16dfe4467a4.db6f72ec168733.7542022367023887408@ghanshyammann.com> <16dff41292e.11b7e81b1177136.7669214833037569841@ghanshyammann.com> <16e19144cf0.f6b07849311271.7773306777497055114@ghanshyammann.com> <20191030004035.rsuegdsij2eezps3@mthode.org> <16e1d9f5df9.e9dfc74911451.4806654031763681992@ghanshyammann.com> Message-ID: <16e848187b9.10540b0f433809.8696342144139236610@ghanshyammann.com> Hello Everyone, I would like to notify all projects about important discussions and agreement happened today to move forward for cross projects dependencies and devstack default installation or py2 drop work. * It is now an official community goal for ussuri[1] and except Swift, all projects agreed to drop py2 as per schedule in goal. * If any project (openstack services which are planned to drop from now till m-1) drop the py2 with removing the py2 requirements and min python version in setup.cfg which makes that project uninstallable in cross projects jobs then: Options 1 (suggested): broken projects has to drop the py2 support/testing immediately. Options 2: if it breaks most of the projects, for example, nova or any other default projects become uninstallable on py2 then we can half-revert[2] the changes from the project caused the break and wait till m1 to merge them back. * Making Devstack to py3 by default TODAY (otherwise it can break gate everyday). **Devstack default is py2 currently and it was planned to make py3 by default after m-1. But after seeing today gate break, it is hard to maintain devstack-py2-by-default. because projects are dropping the py2 support and devstack py2 by default cause the problem[3]. Today it is from nova side and It can happen due to any projects dropping py2 or I should say it can happen every day as py2 drop patches get merged. ** I am ok to make Devstack py3 by default today which is this patch - https://review.opendev.org/#/c/649097/ ** Action for projects who want to keep testing py2 job till m-1 or whenever they plan to drop py2: Explicitly disable the py3 in their py2 jobs (USE_PYTHON3: False). * I have pushed the py2 drop patches on almost all the OpenStack services[4] which migrate py2 jobs to py3, remove the py2 requirement but do not mention the min python version in setup.cfg (which can ben done by followup if projects want to do that). I will suggest to merge them asap to avoid any gate break due to cross projects dependency. [1] https://governance.openstack.org/tc/goals/selected/ussuri/drop-py27.html [2] half-revert means only change in requirements.txt and setup.cfg - https://review.opendev.org/#/c/695007/ [3] https://bugs.launchpad.net/nova/+bug/1853166 [4] https://review.opendev.org/#/q/topic:drop-py27-support+(status:open+OR+status:merged) -gmann ---- On Wed, 30 Oct 2019 12:03:33 -0500 Ghanshyam Mann wrote ---- > ---- On Wed, 30 Oct 2019 06:59:19 -0500 Sean Mooney wrote ---- > > On Tue, 2019-10-29 at 19:40 -0500, Matthew Thode wrote: > > > On 19-10-29 14:53:11, Ghanshyam Mann wrote: > > > > ---- On Thu, 24 Oct 2019 14:32:03 -0500 Ghanshyam Mann wrote ---- > > > > > Hello Everyone, > > > > > > > > > > We had good amount of discussion on the final plan and schedule in today's TC office hour[1]. > > > > > > > > > > I captured the agreement on each point in etherpad (you can see the AGREE:). Also summarizing > > > > > the discussions here. Imp point is if your projects are planning to keep the py2.7 support then do not delay > > > > > to tell us. Reply on this ML thread or add your project in etherpad. > > > > > > > > > > - Projects can start dropping the py2.7 support. Common lib and testing tools need to wait until milestone-2. > > > > > ** pepe8 job to be included in openstack-python3-ussuri-jobs-* templates - > > > > https://review.opendev.org/#/c/688997/ > > > > > ** You can drop openstack-python-jobs template and start using ussuri template once 688997 patch is merged. > > > > > ** Cross projects dependency (if any ) can be sync up among dependent projects. > > > > > > > > > > - I will add this plan and schedule as a community goal. The goal is more about what all things to do and when. > > > > > ** If any project keeping the support then it has to be notified explicitly for its consumer. > > > > > > > > > > - Schedule: > > > > > The schedule is aligned with the Ussuri cycle milestone[2]. I will add the plan in the release schedule also. > > > > > Phase-1: Dec 09 - Dec 13 R-22 Ussuri-1 milestone > > > > > ** Project to start dropping the py2 support along with all the py2 CI jobs. > > > > > Phase-2: Feb 10 - Feb 14 R-13 Ussuri-2 milestone > > > > > ** This includes Oslo, QA tools (or any other testing tools), common lib (os-brick), Client library. > > > > > ** This will give enough time to projects to drop the py2 support. > > > > > Phase-3: Apr 06 - Apr 10 R-5 Ussuri-3 milestone > > > > > ** Final audit on Phase-1 and Phase-2 plan and make sure everything is done without breaking anything. > > > > > This is enough time to measure such break or anything extra to do before ussuri final release. > > > > > > > > > > Other discussions points and agreement: > > > > > - Projects want to keep python 2 support and need oslo, QA or any other dependent projects/lib support: > > > > > ** swift. AI: gmann to reach out to swift team about the plan and exact required things from its dependency > > > > > (the common lib/testing tool). > > > > > > > > I chated with timburke on IRC about things required by swift to keep the py2.7 support[1]. Below are > > > > client lib/middleware swift required for py2 testing. > > > > @timburke, feel free to update if any missing point. > > > > > > > > - devstack. able to keep running swift on py2 and rest all services can be on py3 > > > > - keystonemiddleware and its dependency > > > > - keystoneclient and openstackclient (dep of keystonemiddleware) > > > > - castellan and barbicanclient > > > > > > > > > > > > As those lib/middleware going to drop the py2.7 support in phase-2, we need to cap them for swift. > > > > I think capping them for python2.7 in upper constraint file would not affect any other users but Matthew Thode can > > > > explain better how that will work from the requirement constraint perspective. > > > > > > > > [1] > > > > http://eavesdrop.openstack.org/irclogs/%23openstack-swift/%23openstack-swift.2019-10-28.log.html#t2019-10-28T16:37:33 > > > > > > > > -gmann > > > > > > > > > > ya, there are examples already for libs that have dropped py2 support. > > > What you need to do is update global requirements to be something like > > > the following. > > > > > > sphinx!=1.6.6,!=1.6.7,<2.0.0;python_version=='2.7' # BSD > > > sphinx!=1.6.6,!=1.6.7,!=2.1.0;python_version>='3.4' # BSD > > > > > > or > > > > > > keyring<19.0.0;python_version=='2.7' # MIT/PSF > > > keyring;python_version>='3.4' # MIT/PSF > > on a related note os-vif is blocked form running tempest jobs under python 3 > > until https://review.opendev.org/#/c/681029/ is merged due to > > https://zuul.opendev.org/t/openstack/build/4ff60d6bd2f24782abeb12cc7bdb8013/log/controller/logs/screen-q-agt.txt.gz#308-318 > > > > i think this issue will affect any job that install proejcts that use privsep using the required-proejcts section of the > > zuul job definition. adding a project to required-proejcts sechtion adds it to the LIBS_FROM_GIT varible in devstack. > > this inturn istalls it twice due to https://review.opendev.org/#/c/418135/ . the side effect of this is that the > > privsep helper script gets installed under python2 and the neutron ageint in this case gets install under python 3 so > > when it trys to spawn the privsep deamon and invoke commands it typically expodes due to dependcy issues or in this case > > because it failed to drop privileges correctly. > > > > so as part of phase 1 we need to merge https://review.opendev.org/#/c/681029/ so that lib project that use required- > > projects to run with master of project that comsume it and support depends-on can move to python 3 tempest jobs. > > Thanks for raising this. I agree on not falling back to py2 in Ussuri, I approved 681029. > > -gmann > > > > > > > > > > > > > > > > > From bharat at stackhpc.com Tue Nov 19 16:37:34 2019 From: bharat at stackhpc.com (Bharat Kunwar) Date: Tue, 19 Nov 2019 16:37:34 +0000 Subject: [magnum] Kubernetes cluster issue In-Reply-To: References: Message-ID: <3D89EC73-47ED-4ED8-AF08-4BA549E90989@stackhpc.com> Hi Namrata, This is a known issue being tracked under this story: https://storyboard.openstack.org/#!/story/2006846 As I said before on IRC, the only known fix at the moment is to run Magnum Train with `use_podman=true` label. The solution for atomic is actively being investigated. Best Bharat > On 19 Nov 2019, at 16:06, Namrata Sitlani wrote: > > Hello Folks, > > From Thursday last week (Nov 14), Magnum is unable to spin up working Kubernetes clusters. We run on Rocky Openstack release. > > All our Kubernetes pods show CrashLoopBackOff status. > > We use the following commands to create the Kubernetes clusters : http://paste.openstack.org/show/786348/ . > > The deployment fails with the following output : http://paste.openstack.org/show/786287/ . > > We tried deploying Kubernetes v1.13.12, v1.14.8, v1.15.5 and v1.16.2 without success. However, if we use version v1.14.6 we can successfully deploy our clusters. > Unfortunately, we cannot use v1.14.6 in production because it is not patched for the CVE-2019-11253 vulnerability . > > Since this stopped working for us on Thursday, we think that the image update https://hub.docker.com/r/openstackmagnum/kubernetes-apiserver/tags done 6 days ago is the culprit. > (We previously deployed clusters with versions v1.15.5 and v1.14.8 successfully.) > > Can you please confirm our findings and help us find a way forward? We can provide more logs if needed. > > Thank you very much, > Namrata Sitlani -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.page at canonical.com Tue Nov 19 16:48:50 2019 From: james.page at canonical.com (James Page) Date: Tue, 19 Nov 2019 16:48:50 +0000 Subject: Freezer Project Update In-Reply-To: <14B06106-F3AB-47DA-BD35-99B18383C147@dkrz.de> References: <003201d59aaf$cf415fa0$6dc41ee0$@fingent.com> <14B06106-F3AB-47DA-BD35-99B18383C147@dkrz.de> Message-ID: Hello On Fri, Nov 15, 2019 at 7:43 PM Amjad Kotobi wrote: > Hi, > > This project is pretty much in production state, from last summit it got > active again from developer ends, we are using it for backup solution too. > Great to hear that Freezer is getting some increased developer focus! > Documentation side isn’t that bright, very soon gonna get updated, anyhow > you are able to install as standalone project in instance, I did it > manually, didn’t use any provision tools. > Let me know for specific part of deployment that is not clear. > > Amjad > > On 14. Nov 2019, at 06:53, Deepa wrote: > > Hello Team > > Good Day > > I am Deepa from Fingent Global Solutions and we are a big fan of Openstack > and we do have 4 + openstack setup (including